Friday, February 1, 2013
Few thoughts on Data Preparation for #Analytics
It is Friday and it is time for a blog post.
Typical analysis project spends 70% (- 80%) of time in preparing the data. Achieving the right Data quality and right format of data is a primary success factor of success of an analytic project.
What makes this task very knowledge intensive and why is a multifaceted skill required to carry out this task?
I will give a quick/simple example of how the “Functional knowledge” other than the technical knowledge is important in the preparation of the data. There is a functional distinction between missing data and non-existing data.
For example consider a customer data set. If the customer is married and the age of spouse is not available this is missing data. If customer is single, age of spouse is non-existing. In the data mart these two scenarios need to be represented differently so that the analytic model behaves properly.
Dealing with the missing data (data imputation techniques) within the data set while preparing the data impacts on the results of the analytical models.
Dr. Gerhard Svolba of SAS has written extensively on Data Preparation as well as Data Quality (for Analytics) and this presentation gives more details on the subject.
I have made a blog post earlier dealing with these challenges in the “Big data” world - http://technofunctionalconsulting.blogspot.in/2012/10/data-munging-in-big-data-world.html