Friday, February 22, 2013
Data agility in Master Data Management
Recently, I came across few master data management implementations and database performance problems related to them.
What is master data management (MDM): It is a set of processes to centralize the management of “master data” i.e, the reference data for a given business. It typically consists of Customers, Agents, and Products for an insurance company. Typically an MDM instance focuses on one data domain. If we focus on Customer domain the implementation is called as Customer Data Integration or when the domain is Products it is called as Product Information Management.
When it comes to Insurance industry, even though the “customer centricity” has been the latest trend, traditionally it has been a “agent” driven business. So, one key subject area for MDM is “Agent Data” which actually sells “Products” to “Customers”. For generalization we can call this as “Channel Master Data.” So we have three subject areas to deal with in a multi-domain-MDM implementation.
The processes include:
1. Data acquisition (Input)
2. Data standardization, duplicate removal, handling missing attributes (Quality)
3. Data mining, analytics and reporting (Output)
The industry tools have standard data and process models for customers and products to achieve the required functionality of MDM. We can adopt some customer model to model the agent domain or keep both in a “party” domain.
The problem with multi-domain MDM:
Here comes the problem. When it comes to agent domain, the customer requirement states “get me the agent and his top n customers acquired” or “get me the agent and his top n products sold”
In a renowned MDM product an SQL query need to join 20+ tables (involving some outer joins) to achieve the requirement and even after caching all the tables in-memory the processing for each agent id is taking 20+ seconds on an Oracle 11.2 database on a decent production sized hardware.
a) Dynamically querying for such data and building the structure of data into target structure puts more load/ adds latency to the screen. SLA is not met.
b) Pre-populating the data with a periodic refresh in a materialized view does not make the data current to the real-time.
As we can’t have both requirements of getting the data in 2sec and it is to be current in traditional design, we need to design a hybrid approach.Any ideas or experience in this are welcome. Have anyone implemented a MDM solution on a NoSQL database yet?