Friday, March 15, 2013
“White noise” and “Big Data”
For those who are familiar with physics and communications you would have heard about the term “White Noise” – In simple terms it is the noise produced by combining all different frequencies together.
So, what is the relationship between the white noise and big data?
At present, there is a lot of “noise” about big data in both positive and negative frequencies. Some feel it is data in high volume, some unstructured data, some relate it with analytics, some with real-time processing, some with machine learning, some with very large databases, some with in memory computing, some others with regression, still others with pattern recognition and so on….
People have started defining “big data” with 4 v’s (Volume, Velocity, Variety, and Variability) and gone on to add multiple other Vs to it. I have somewhere seen a list of 21Vs defining big data.
So, in simple terms big data is all about unstructured data mostly machine generated in quick succession in high volumes (one scientific example is the Large Hadron Collider generating huge amounts of data from each of its experiments) that need to be handled where the traditional computing models fail to do.
Most of this high volume data is also “white noise” which combines signals of all frequencies produced simultaneously on the social feeds like twitter etc., (The 4th goal by Spain in Euro 2012 match resulted in 15K tweets per second!) which could only prove there are so many people watching and exited about that event and adds minimum “business value” by such piece of information.
How to derive “Value” then?
The real business value of big data can only be realized when the right data sources are identified with the right data channelized through the processing engine to apply the right technique to separate out the right signal from the white data. That is precisely the job of a “Data Scientist” in my honest opinion.
I have not found a really good general use-case in the insurance industry for big data yet! (other than the stray cases related to vehicle telematics in auto sector and some weather/flood/tsunami hazard modeling cases in corporate specialty)
But I am tuned to the white noise anyway looking for the clues that identify some real use cases in insurance and largely in financial services… (Other than the “machine trading” algorithms are already well developed in that field!)