Friday, January 25, 2013
Machine Learning Algorithms – Classification, Clustering and Regression
For a Data Scientist among the other skill sets, good fundamentals on “data mining” or “machine learning” is the icing over the cake. These algorithms are also used in predictive analytics. It is immaterial if the data is “big data” or not!
We will end up having several variables either containing numeric or nominal attributes describing an entity. The numeric variables could be continuous or discreet like ordinals. Nominal variables are of the form of known list of values having binary (two i.e., yes or no; male or female etc.,) or more possible values.
Given that a data set consisting of a millions of records, each record containing some 100s of variables an analyst’s job is to derive some insights to solve the known or unknown business problems! This is where the application of machine learning algorithms comes into play.
Broadly Machine Learning can be put into two groups.
1. Predicting a target variable for a given instance of data record. We will have a set of records with the known values for target variable by which we can develop a model and train the model, test and put that into production – This is called supervised learning
a. If the target variable is nominal then these algorithms are called classification.
b. If the target variable is a continuous numeric variable then we need to apply regression
2. There is no target variable; we need to group the data records into distinct groups based on multiple variables within the dataset – This is called unsupervised learning. Clustering and Association Analysis algorithms are used to achieve this.
Formulating the problem, preparing the data, visualizing the data, training the model, testing the model and interpreting the results to generate insights and them implementing the derived knowledge to the business operations require multi disciplinary skills in business domain, operations management and technology.
A real business analytic solution consists of using multiple techniques involving machine learning to achieve Customer Segmentation, Cross-selling, Customer behavior analysis, Customer retention, Marketing Analytics and campaign management, fraud detection, optimization of profits etc.,
A recent quick read through the book Machine Learning in Action by Peter Harrington prompted me to write this tech capsule on this Friday…