In a BI project it is very important to design the system to confidently visualize the indicators that matter for business. Even senior BI professionals sometime get confused with the terms like measure, metric and KPI.
This post is to give some simple definitions for the hierarchy of KPIs:
Measure: is something that is measured like value of an order. Number of items on the order or lines of code or the number of defects identified during the unit testing. Measures are generally captured during the regular operations of the business. These are the lowest granularity of the FACT in the star schema.
Metric: What is metric? Metrics are generally derived from one or more measures. For example the defect density is the number of defects for 1000 lines of code in a COBOL program. It is derived from lines of code and number of defects measures. A metric can also be max, min, average of a single measure. Generally metrics are computed or derived based on the underlying facts / measures.
Performance Indicator: brings in the business context into the metric. For example reduction of defect density after introducing more rigorous review process (review effectiveness) could be a performance indicator in a software development context. In this case it reduces the rework effort of fixing the unit test bugs and re-testing thereby improving the performance of a software development organization.
A careful selection of KPIs and presenting them with suitable granularity to the right level of management users within the business makes a BI project be successful.
Bringing the data / measures from multiple sources into a star schema is typically ETL cycle of the warehouse. Once the data is refreshed, incrementally summarizing and computing all the metrics is the next stage. Finally visualizing the KPIs is the art of dash boarding.
I have seen several BI projects having problems mixing up all the three steps; trying to clean data during loading, trying to summarize the data into metrics during ETL or trying to summarize during the reporting phase causing multiple performance problems. A good design should try to keep the stages independent of each other and take care of issues like missing refreshes and duplicate refreshes from feeding source systems. Also need to consider parallelization of tasks to take advantages of multiple cores of processors and large clusters of computing resources.
No comments:
Post a Comment