It is Friday and it is time for a blog post.
Typical
analysis project spends 70% (- 80%) of time in preparing the data.
Achieving the right Data quality and right format of data is a primary
success factor of success of an analytic project.
What makes this task very knowledge intensive and why is a multifaceted skill required to carry out this task?
I
will give a quick/simple example of how the “Functional knowledge”
other than the technical knowledge is important in the preparation of
the data. There is a functional distinction between missing data and
non-existing data.
For example consider a customer data set. If the customer is married and the age of spouse is not available this is missing data. If customer is single, age of spouse is non-existing. In the data mart these two scenarios need to be represented differently so that the analytic model behaves properly.
Dealing
with the missing data (data imputation techniques) within the data set
while preparing the data impacts on the results of the analytical
models.
Dr. Gerhard Svolba of SAS has written extensively on Data Preparation as well as Data Quality (for Analytics) and this presentation gives more details on the subject.
I have made a blog post earlier dealing with these challenges in the “Big data” world - http://technofunctionalconsulting.blogspot.in/2012/10/data-munging-in-big-data-world.html
No comments:
Post a Comment