Blog Moved

Future posts related to technology are directly published to LinkedIn

Friday, March 29, 2013

Can we save capitalism from itself?

Thoughts from reading
The Trouble With Markets: Saving Capitalism from Itself, Second Edition
by  Roger Bootle

This book has three sections and the Economist author goes on to say how we have ended up here in the first three chapters.  

Section 1: The great implosion:
The 1930s had seen the Great Depression and the 1970s the Great Inflation. The 1990s had seen the Great Moderation. This was the Great Implosion.
The next 4 chapters he deals with the trouble with the markets.

Section 2: The trouble with the Markets
As Robert Heilbroner put it: “The profit motive, we are constantly being told, is as old as man himself. But it is not. The profit motive as we know it is only as old as modern man.”
OK, The next section of three chapters

Section 3: From implosion to Recovery
Keynes was right in three major respects:

  • Economic activity is permeated by fundamental uncertainty.
  • As a result, many of the major factors that affect the economy are psychological and depend critically on the state of confidence, which is not readily analyzable or predictable.
  • Consequently, the modern economy is inherently unstable and fragile
Conclusion starts with this quote -
All happy families are alike; each unhappy family is unhappy in its own way.
--Leo Tolstoy, 1873

Overall this book is a good read, but I am still unsure of one thing:
So, Is it really possible to save capitalism from itself?

Friday, March 22, 2013

De-normalizing with join materialized views fast refresh on commit

Two weeks back, I wrote a post on result_cache feature of Oracle 11g database to solve a specific performance scenario in MDM implementation. Working on the same set of performance issues, we have encountered another situation where we have a normalized structure which results in writing queries to use OUTER JOINS to achieve the required aggregation.

The structure contains a set of tables for PERSON and another set of tables to represent ORGANIZATION when a CUSTOMER can be a PERSON or an ORGANIZATION.
The requirement is to get a consolidated view of all persons and organizations together with certain attributes. We need to perform a UNION ALL query joining a total of 8 tables that is going to result in something like 10Million records. We will not be able to result_cache this result in memory.

Inevitably we need to create a persistent version of the result of the UNION ALL query in a materialized view. But customer needs real-time data and can’t afford any latency. So, we need a view that gets updated whenever underlying tables change. That is where the “REFRESH FAST ON COMMIT” comes into the picture.
To be able to do fast refresh MATERIALIZED VIEW LOG to be created on all the underlying tables. We have selected “rowid”. All the 8 underlying tables need to have the MV LOGS created before creating a MV as follows:

  p.rowid  AS prowid,
  xp.rowid  AS xprowid,
  xpn.rowid AS xpnrowid,
  pn.rowid  AS pnrowid
FROM person p,
  xperson xp,
  xpersonname xpn,
  personname pn
WHERE p.PID  = xp.XPid
AND pn.CId  = p.CId
AND xpn.preferred_ind  ='Y'
  o.rowid  AS orowid,
  xo.rowid  AS xorowid,
  xon.rowid  AS xonrowid ,
  orgn.rowid AS orgnrowid
FROM org o,
  xorg xo,
  xorgname xon,
  orgname orgn
WHERE o.cid  = xo.xoid
AND xon.xON_id =orgn.ONid
AND orgn.cId  = o.Cid
AND xon.preferred_ind  ='Y';

This MV now has de-normalized data which can be used in the higher 
level queries for looking up requird data without costly joins. We can 
also create INDEXes on the MV to improve lookup.

Any experiences? (both good and bad are welcome for discussion)

Friday, March 15, 2013

“White noise” and “Big Data”

For those who are familiar with physics and communications you would have heard about the term “White Noise” – In simple terms it is the noise produced by combining all different frequencies together.
So, what is the relationship between the white noise and big data?
At present, there is a lot of “noise” about big data in both positive and negative frequencies. Some feel it is data in high volume, some unstructured data, some relate it with analytics, some with real-time processing, some with machine learning, some with very large databases, some with in memory computing, some others with regression, still others with pattern recognition and so on….
People have started defining “big data” with 4 v’s (Volume, Velocity, Variety, and Variability) and gone on to add multiple other Vs to it. I have somewhere seen a list of 21Vs defining big data.
So, in simple terms big data is all about unstructured data mostly machine generated in quick succession in high volumes (one scientific example is the Large Hadron Collider generating huge amounts of data from each of its experiments) that need to be handled where the traditional computing models fail to do.
Most of this high volume data is also “white noise” which combines signals of all frequencies produced simultaneously on the social feeds like twitter etc., (The 4th goal by Spain in Euro 2012 match resulted in 15K tweets per second!) which could only prove there are so many people watching and exited about that event and adds minimum “business value” by such piece of information.
How to derive “Value” then?
The real business value of big data can only be realized when the right data sources are identified with the right data channelized through the processing engine to apply the right technique to separate out the right signal from the white data. That is precisely the job of a “Data Scientist” in my honest opinion.
I have not found a really good general use-case in the insurance industry for big data yet! (other than the stray cases related to vehicle telematics in auto sector and some weather/flood/tsunami hazard modeling cases in corporate specialty)
But I am tuned to the white noise anyway looking for the clues that identify some real use cases in insurance and largely in financial services… (Other than the “machine trading” algorithms are already well developed in that field!)
Comments? Views?

Friday, March 8, 2013

SQL result_cache in Oracle 11g


For the problem mentioned in my past blog post -  there are times where the SQL Queries are expensive and need lot of processing to generate a result set. These queries are executed from multiple sessions and it would be good if we can get the prepared result in the memory.

SQL Result Cache:

This feature is available in Oracle database 11g that can be enabled with the initialization parameter result_cache_mode the possible values for this parameter are FORCE (will cache all results and not recommended) and MANUAL. Setting this value to MANUAL one can selectively cache the results from the SQLs where the hint /*+ RESULT_CACHE */ is added just after the SELECT.

RESULT_CACHE_MAX_SIZE and RESULT_CACHE_MAX_RESULT are the other parameters that impact the way the result cache will function by defining the maximum amount of memory used and the maximum amount of memory a single result set can occupy.

More Information:

Please use the following links on to get better understanding of this feature.

Friday, March 1, 2013

Graph Theory & Pregel River

Euler's original publication in Latin on Graph Theory (The seven bridges of K√∂nigsberg) - 

The basis of all the latest hype around Facebook's Graph Search, Social Graphs, Knowledge Graphs, Interest Graphs etc., are the modern implementations of the above publication that formulates the Graph Theory in 1735.

I was fascinated about the Graph Theory during my college days and I feel it is one of the most natural structures how information is organized. A graph when used as a thinking tool of “mind map” is very effective as well.

The beauty of a Graph is its simplicity of being able to build and use it as a data structure to naturally traverse and split the graph / network into sub-graphs with ease. It is one of the best suited structures for massively parallel processed algorithms.

As the seven bridges of original paper were on the River Pregel, Google named the research project for large scale graph processing with the name. Let the programmers start thinking in the realm of vertices and edges….

What is offered by Pregel:
1.      A large scale, distributed, parallel graph processing API in C++
2.      Fault tolerance of distributed commodity clusters with a worker and master implementation. 
3.      Persistent data can be stored on Google’s GFS or Bigtable.

More on the Google's Pregel research paper -


I expect in future quantum computing will build virtualized, software defined dynamic cliques of order 'n' as needed to solve the computing problem using graph algorithms with highest fault tolerance and highly parallel "just right" performance! Let me name it - "Goldilocks Computing". It is not far in the future...