For a Data Scientist
among the other skill sets, good fundamentals on “data mining” or
“machine learning” is the icing over the cake. These algorithms are also
used in predictive analytics. It is immaterial if the data is “big
data” or not!
We
will end up having several variables either containing numeric or
nominal attributes describing an entity. The numeric variables could be
continuous or discreet like ordinals. Nominal variables
are of the form of known list of values having binary (two i.e., yes or
no; male or female etc.,) or more possible values.
Given
that a data set consisting of a millions of records, each record
containing some 100s of variables an analyst’s job is to derive some
insights to solve the known or unknown business problems! This is where
the application of machine learning algorithms comes into play.
Broadly Machine Learning can be put into two groups.
1. Predicting
a target variable for a given instance of data record. We will have a
set of records with the known values for target variable by which we can
develop a model and train the model, test and put that into production –
This is called supervised learning
a. If the target variable is nominal then these algorithms are called classification.
b. If the target variable is a continuous numeric variable then we need to apply regression
2. There
is no target variable; we need to group the data records into distinct
groups based on multiple variables within the dataset – This is called
unsupervised learning. Clustering and Association Analysis algorithms
are used to achieve this.
Formulating
the problem, preparing the data, visualizing the data, training the
model, testing the model and interpreting the results to generate
insights and them implementing the derived knowledge to the business
operations require multi disciplinary skills in business domain,
operations management and technology.
A
real business analytic solution consists of using multiple techniques
involving machine learning to achieve Customer Segmentation,
Cross-selling, Customer behavior analysis, Customer retention, Marketing
Analytics and campaign management, fraud detection, optimization of
profits etc.,
A recent quick read through the book Machine Learning in Action by Peter Harrington prompted me to write this tech capsule on this Friday…