Recently, I came across few master data management implementations and database performance problems related to them.
What is master data management (MDM):
It is a set of processes to centralize the management of “master data”
i.e, the reference data for a given business. It typically consists of
Customers, Agents, and Products for an insurance company. Typically
an MDM instance focuses on one data domain. If we focus on Customer
domain the implementation is called as Customer Data Integration or when
the domain is Products it is called as Product Information Management.
When
it comes to Insurance industry, even though the “customer centricity”
has been the latest trend, traditionally it has been a “agent” driven
business. So, one key subject area for MDM is “Agent Data” which
actually sells “Products” to “Customers”. For generalization we can call
this as “Channel Master Data.” So we have three subject areas to deal with in a multi-domain-MDM implementation.
The processes include:
1. Data acquisition (Input)
2. Data standardization, duplicate removal, handling missing attributes (Quality)
3. Data mining, analytics and reporting (Output)
The industry tools have standard data and process models for customers and products
to achieve the required functionality of MDM. We can adopt some
customer model to model the agent domain or keep both in a “party”
domain.
The problem with multi-domain MDM:
Here
comes the problem. When it comes to agent domain, the customer
requirement states “get me the agent and his top n customers acquired”
or “get me the agent and his top n products sold”
In a renowned MDM product an SQL query need to join 20+ tables (involving some outer joins) to achieve the requirement and even after caching all the tables in-memory the processing for each agent id is taking 20+ seconds on an Oracle 11.2 database on a decent production sized hardware.
a) Dynamically
querying for such data and building the structure of data into target
structure puts more load/ adds latency to the screen. SLA is not met.
b) Pre-populating the data with a periodic refresh in a materialized view does not make the data current to the real-time.
As
we can’t have both requirements of getting the data in 2sec and it is
to be current in traditional design, we need to design a hybrid approach.
Any ideas or experience in this are welcome. Have anyone implemented a MDM solution on a NoSQL database yet?

3 comments:
Prasad Sir, Nice artical, how would be Hybrid approach?
+nilesh jawale
Event driven update of summarized data to a either a NoSQL or in-memory store is the hybrid approach in my thinking.
Here’s a clear analysis of the article “Data agility in Master Data Management” from the Techno Functional Consulting blog (Feb 22, 2013):
---
๐ง What the Article Is About
At the core, the author discusses a practical performance problem encountered in a Master Data Management (MDM) implementation. It’s not a high-level marketing piece but a hands-on observation from working on real systems.
---
๐ Key Concepts Explained
๐ What is Master Data Management (MDM)
The author starts with a straightforward definition:
MDM is a set of processes and technologies to centralize and manage ‘master data’—the core reference data used across an enterprise (like customers, products, agents).
In insurance, this might include customer records, agent data, product portfolios, etc.
This aligns with general definitions of MDM you’ll find in industry practice, where the aim is consistent, accurate, unified master data across systems for better decision-making and operational coherence.
---
๐จ The Practical Problem: Performance + Agility
❓ What the User Requirement Was
The author highlights a requirement like:
“Get me an agent and their top N customers acquired”
“Get me an agent and their top products sold”
These are common business queries that involve joining several related entities (agent → customer → product).
Instead of a simple lookup, these queries had to traverse 20+ database tables using multiple JOINs. On a reasonably powerful Oracle database, each query took 20+ seconds—too slow for interactive applications.
---
๐ Why This Matters
The article highlights two conflicting goals:
1. Low Latency (e.g., sub-2 second response)
The business wants interactive speed for UI screens and dashboards.
2. Real-time Data Currency
Data must be up-to-date, reflecting changes immediately.
Traditional MDM designs often choose one at the expense of the other:
Pre-compute / materialize views → faster but stale.
Dynamic JOINs → current data but slow.
---
๐ Proposed (High-Level) Insight: Hybrid Approach
The author suggests that you cannot have both with the conventional relational design alone. So you need a hybrid design where:
Summarized or frequently accessed data is stored in a performance-optimized form (e.g., materialized views, denormalized stores, or caches).
The data remains fresh (updated via real-time or near real-time streams/events rather than periodic batch).
This is a valid architecture pattern for performance-critical systems.
---
๐งช Alternate Technologies Mentioned
In the closing comments, a reader mentions NoSQL or in-memory stores as part of a hybrid approach—moving summary data into faster key–value or document stores that support real-time access without heavy joins.
This reflects a real trend in modern MDM and analytical systems where relational databases are complemented with specialized storage for performance.
---
๐งพ High-Level Takeaways
✔ The blog isn’t theoretical; it’s focused on a real performance issue in MDM implementations.
✔ Master Data Management is about centralizing critical reference data (customers, products, etc.) so applications and analytics can use consistent information.
✔ Traditional relational designs struggle with real-time, low-latency aggregated queries across many joins.
✔ To balance agility (speed) and freshness, a hybrid architecture—combining OLTP, caching, denormalization, or NoSQL—becomes necessary.
---
๐ Final Interpretation
The article is valuable as a practical viewpoint from an implementer inside the MDM space. It captures the tension between:
Data centralization and consistency (the goal of MDM), and
Agile, high-performance query requirements demanded by real business use cases.
Rather than offering a polished product solution, the author invites the community to think about design patterns that can handle both performance and data freshness.
---
If you’d like, I can also diagram how a hybrid data architecture might address these specific performance challenges with examples from modern MDM platforms.
Post a Comment