Blog Moved

Future posts related to technology are directly published to LinkedIn
https://www.linkedin.com/today/author/prasadchitta

Friday, February 22, 2013

Data agility in Master Data Management

Recently, I came across few master data management implementations and database performance problems related to them. 
What is master data management (MDM): It is a set of processes to centralize the management of “master data” i.e, the reference data for a given business. It typically consists of Customers, Agents, and Products for an insurance company.  Typically an MDM instance focuses on one data domain. If we focus on Customer domain the implementation is called as Customer Data Integration or when the domain is Products it is called as Product Information Management.
When it comes to Insurance industry, even though the “customer centricity” has been the latest trend, traditionally it has been a “agent” driven business. So, one key subject area for MDM is “Agent Data” which actually sells “Products” to “Customers”. For generalization we can call this as “Channel Master Data.” So we have three subject areas to deal with in a multi-domain-MDM implementation.
The processes include:
1.       Data acquisition (Input)
2.       Data standardization, duplicate removal, handling missing attributes (Quality)
3.       Data mining, analytics and reporting  (Output)
The industry tools have standard data and process models for customers and products to achieve the required functionality of MDM. We can adopt some customer model to model the agent domain or keep both in a “party” domain.
The problem with multi-domain MDM:
Here comes the problem. When it comes to agent domain, the customer requirement states “get me the agent and his top n customers acquired” or “get me the agent and his top n products sold”
In a renowned MDM product an SQL query need to join 20+ tables (involving some outer joins) to achieve the requirement and even after caching all the tables in-memory the processing for each agent id is taking 20+ seconds on an Oracle 11.2 database on a decent production sized hardware. 
a)      Dynamically querying for such data and building the structure of data into target structure puts more load/ adds latency to the screen. SLA is not met.
b)      Pre-populating the data with a periodic refresh in a materialized view does not make the data current to the real-time.
As we can’t have both requirements of getting the data in 2sec and it is to be current in traditional design, we need to design a hybrid approach.
Any ideas or experience in this are welcome. Have anyone implemented a MDM solution on a NoSQL database yet?

3 comments:

Unknown said...

Prasad Sir, Nice artical, how would be Hybrid approach?

Prasad Chitta said...

+nilesh jawale

Event driven update of summarized data to a either a NoSQL or in-memory store is the hybrid approach in my thinking.

Prasad Chitta said...

Here’s a clear analysis of the article “Data agility in Master Data Management” from the Techno Functional Consulting blog (Feb 22, 2013):


---

๐Ÿง  What the Article Is About

At the core, the author discusses a practical performance problem encountered in a Master Data Management (MDM) implementation. It’s not a high-level marketing piece but a hands-on observation from working on real systems.


---

๐Ÿ“Œ Key Concepts Explained

๐Ÿ“ What is Master Data Management (MDM)

The author starts with a straightforward definition:

MDM is a set of processes and technologies to centralize and manage ‘master data’—the core reference data used across an enterprise (like customers, products, agents).

In insurance, this might include customer records, agent data, product portfolios, etc.


This aligns with general definitions of MDM you’ll find in industry practice, where the aim is consistent, accurate, unified master data across systems for better decision-making and operational coherence.


---

๐Ÿšจ The Practical Problem: Performance + Agility

❓ What the User Requirement Was

The author highlights a requirement like:

“Get me an agent and their top N customers acquired”

“Get me an agent and their top products sold”
These are common business queries that involve joining several related entities (agent → customer → product).


Instead of a simple lookup, these queries had to traverse 20+ database tables using multiple JOINs. On a reasonably powerful Oracle database, each query took 20+ seconds—too slow for interactive applications.


---

๐Ÿ• Why This Matters

The article highlights two conflicting goals:

1. Low Latency (e.g., sub-2 second response)

The business wants interactive speed for UI screens and dashboards.

2. Real-time Data Currency

Data must be up-to-date, reflecting changes immediately.

Traditional MDM designs often choose one at the expense of the other:

Pre-compute / materialize views → faster but stale.

Dynamic JOINs → current data but slow.



---

๐Ÿ” Proposed (High-Level) Insight: Hybrid Approach

The author suggests that you cannot have both with the conventional relational design alone. So you need a hybrid design where:

Summarized or frequently accessed data is stored in a performance-optimized form (e.g., materialized views, denormalized stores, or caches).

The data remains fresh (updated via real-time or near real-time streams/events rather than periodic batch).


This is a valid architecture pattern for performance-critical systems.


---

๐Ÿงช Alternate Technologies Mentioned

In the closing comments, a reader mentions NoSQL or in-memory stores as part of a hybrid approach—moving summary data into faster key–value or document stores that support real-time access without heavy joins.

This reflects a real trend in modern MDM and analytical systems where relational databases are complemented with specialized storage for performance.


---

๐Ÿงพ High-Level Takeaways

✔ The blog isn’t theoretical; it’s focused on a real performance issue in MDM implementations.
✔ Master Data Management is about centralizing critical reference data (customers, products, etc.) so applications and analytics can use consistent information.
✔ Traditional relational designs struggle with real-time, low-latency aggregated queries across many joins.
✔ To balance agility (speed) and freshness, a hybrid architecture—combining OLTP, caching, denormalization, or NoSQL—becomes necessary.


---

๐Ÿ“ Final Interpretation

The article is valuable as a practical viewpoint from an implementer inside the MDM space. It captures the tension between:

Data centralization and consistency (the goal of MDM), and

Agile, high-performance query requirements demanded by real business use cases.


Rather than offering a polished product solution, the author invites the community to think about design patterns that can handle both performance and data freshness.


---

If you’d like, I can also diagram how a hybrid data architecture might address these specific performance challenges with examples from modern MDM platforms.