Blog Moved

Future posts related to technology are directly published to LinkedIn
https://www.linkedin.com/today/author/prasadchitta

Thursday, December 27, 2012

Where Art Thou Information? - Lost in Technology!

As another year of technology hype draws to a close, I wonder where the focus on the Information is. Information is the subject matter of “IT”, where the technology enables a proper capture, processing and presentation of the Information. Most of the focus is shifted to technology led transformation of businesses…..

Appification – a word coined to make an “App” (short for application!) that can be downloaded to perform a generic task on a mobile device.

Gamification – Again a word coined to create a simulated / augmented reality that “Engages” the user with the subject under consideration in a generic manner.

Cloudification – Deploying an elastic, self-provisioned, pay-per-use model of computing performing generic functions.

“Appliance”ification – pre-configured hardware + software bundles pushed in by the vendors to provide “optimized” & “efficient” solutions to problems following a generic pattern.

“Package”fication – deploying vendor provided generic package to provide predefined information flows by the vendor.

With all these “ifications” we have ended up with several suites of “products” that are branded as ERP, EIM, ECM, BPM etc., also we have “anything” as a service (saas, paas, iaas, dbaas, what not aaS!) providing everything on the move i.e, mobile.

So, what is the point I am trying to bring up? With all the possibilities of customization and promises of near zero time to market, and all the solutions around, we lost the focus on the “problem” that we are trying to solve!

If all these are solutions then what is the problem?

All the past year I have been working with the information management of the Insurance companies across the globe and interacting with the different levels employees representing the IT and business functions of those enterprises, the problems seems to lie in fundamental definitions of information flows. Legacy policy administration systems, multiple systems holding the data about quotations, pricing, products, policies & claims causing the “entropy” of not having the correct information to the different operational units within the business in the right time to make decisions seems to me the key problem.

As long as the real problem is not rightly understood with respect to the information flows between the organization units and within the operating environment, implementing newer technology solutions will not solve the problem.

I hope the new year 2013 gets the focus right on the problem of information and transforms the business based on facts i.e., what I call “Information led transformation” rather than the current trend of “technology led transformation”

Happy New Year 2013!!

Tuesday, December 18, 2012

Uncommon Sense, Common Nonsense - A review

Being a technologist, I rarely read management books. Even when I start reading a management book, I hardly complete reading it as it makes little sense to me. Recently, I came across this book “Uncommon Sense, Common Nonsense: Why Some Organisations Consistently Outperform Others” ( ISBN:9781846686009 ) and read till the end. 
 
This is a book which I could make some sense of business strategy and providing a leadership vision to an organization.
The book has four key parts dealing with:
1.        Winners and losers
2.        Strategy and tactics
3.        Organisation and management
4.        Biases and remedies
The fifth part gives Application and Examples from the author’s work.

In essence, there is a lot of “Common Nonsense” (in the form of big-data in today’s world) which is visible to us as well as the competitors (Same with “Common Sense!”).  There is an amount of “uncommon nonsense” about our own organisation known to us alone and that of the competition which is only known to them. But the key thing differentiates is the “Uncommon Sense” that makes organizations constantly outperform. The strategy is about deliberating means of constantly deriving and applying the “uncommon sense”

So, “Without changing our patterns of thought, we will not be able to solve the problems we created with our current patterns of thought.” Albert Einstein’s wisdom comes to rescue.


Overall, this is a good read for all the strategists and leaders in my honest opinion.

Saturday, December 8, 2012

Aphorisms on Information Technology & Systems

Today, coincidentally it is 8888 days from 8/8/88 (The day I started studying Information Technology) and 1000 weeks from 11-Oct-1993 (The day I started working!)



Some theoritical fundamentals on Information Systems & Technology as I see....

100 th post on this blog......

Monday, November 19, 2012

A generation of “infonauts” in the world of “augmented reality”

With the news around the new augmented reality (AR) game from Google called ingress, the mobile augmented reality seems to be coming mainstream. It is all about “engaging” the person with the device and taking the person into the device.  Once the person inside the device collect the behavior and finally “sell” the person something by making the person feel happy having that “thing”! Or it makes the person feel a loss without the “thing”!!
  
Google with its new massively multiplayer map-based mobile alternative reality game (MMMMARG as abbreviated by a Forbes article!) called “ingress” which has objectives to “check-in” to “portals” that may be designer shops etc., guides the “players” to accomplish the “objectives” 
 
So, the current generation is not just happy with the “reality” of the world. They need to augment the reality with information. Location aware, mood aware, temporally aware information to keep themselves busy (I mean really busy!) and active (I mean only mentally active!).  As the word rightly fits them is “infonauts” who are passing through the information. What is this information? It is all about things, likes and dislikes, deals and offers.  All these strive to get a “consumer” to a place where the producer makes money! 
 
Each move within the “augmented reality” is being recorded, modeled and used for making profits by the businesses. Just be aware. 
 
But, it generates a lot of “big-data” for me to handle and hopefully I continue to design solutions to manage that heaps of information and derive some meaningful (that make business sense!) insights for business to make profits!!

Thursday, October 25, 2012

Data Munging in the Big Data world


Recently, when NASA announced a competition on a large US government data sets' data munging problem. It is called NITRD Big Data Challenge series @ http://community.topcoder.com/coeci/nitrd/

The first of the challenges was primarily about "How to create a homogeneous big-data dataset from large soloed data sets available with the multiple government departments such that some meaningful societal decisions can be derived from the knowledge generated from big data analytics"

So, the word coined almost a decade back "Data Munging" has come back into a key skill in today's world of "Data Science" discipline.

What is data munging?
In simplest possible terms, it is making data that is generated in heterogeneous platforms/formats to a common processable format for further munging or analytics!

How is it deferent from ETL/data integration?
Data integration and ETL are fully automatic and programmed where as the munging involves semi automatic; based on human assisted machine learning algorithms.

Why is it important now?
As the massively parallel processing paradigm based on map reduce and other so called "big data" technologies, it is a key thing now how the existing vast amounts of "data" be made available for such kind of processing to derive the knowledge by the means of analytic and machine learning algorithms.

There is an emergence of start-ups trying to generate platforms and tools for data munging are now in the market. In my opinion, this is going to be a key "skill" in future big data based "Data Science" discipline.

So, if you have good skills in data and algorithms based on assisted machine learning for manipulation then go for it!

Wednesday, October 3, 2012

Oracle database 12c

I have been seeing Oracle database form 6, 7, 8i, 9i, 10g, 11g and now it is going to be called 12c - c as in Cloud. OOW12 revealed the new architecture. (there was no tail for 6 and 7.... i stood for internet, g for Grid..... )

The new release with a fundamental architectural change to the data dictionary that gives a concept of "Pluggable Database" - PDB over the standard oracle system "Container Database" - CDB.

The obj$ of Oracle database dictionary will contain all the information of objects in the current architecture.  It is being split into CDB and PDB going forward for easing the "multi-tenant" private cloud databases.

This fundamental seperation of the data dictionary provides
a. Better security of multiple "databases" on the same instance
b. Seperation of CDB from other PDBs will allow easy upgrades.
c. All PDBs will share the instance and overall management of the consolidated database should be much simpler.

Excited about the cloud enabled, multi-tenant, pluggable database from Oracle! 

So, let us wait and see when the stable 12.2 will come out to roll out the new database into production....


Thursday, September 20, 2012

Quality vs. Efficieny i.e, the efficiency trap

Efficiency improvements everywhere! people want to do more in less amount of time. to achieve that, people want technology. technologists are looking for more efficiency in the technology. In the pursuit of efficiency are we loosing quality of the products / services?

Recently there is a news about ISRO making plans to achieve 58 missions in next 5 years. The statistics state that ISRO completed first 50 missions in 20 years, next 50 in 10 years; Now the plan is to achieve next 58 in next 5 years. That means one space mission every month ( 58 is approximated to 60!) So, there is a need to execute projects more efficiently...

Generally all the businesses are in the race of Quarter on Quarter results... Each quarter the business houses wants to show better results; driving the efficiency up is one of the ways to achieve it.

Even with innovative stuff like iPhone5, it is 16% thinner and 20% lighter than its predicessor. And Apple says they have 2 million pre-orders for this phone!

Is there anything wrong with this approach? Do we need more and more efficiency and more and more projects completed in less amount of time? Of what if a phone has a slightly larger display and slightly thinner and lighter than what it was?

Seriously, the focus on the QUALITY research of what really needed to the people is clearly reducing due to the efficiency trap.

Out of 58 ISROs missions half of them would be adding more transponders and more DTH channels! With all the DTH services and channels, my children are still bored!

So, focus on quality of the products and services first. Pause, think and generate some ideas which will add value to the people then get back on to efficency that is what I think is needed at the moment.

Conclusion: Achieving a healthy balance between the Quality and Efficiency is the key for successful value generation to the client businesses. 

Anyone listening?

Wednesday, August 8, 2012

Multi-Tenancy and Resource Management

As my association turns 24years today with the computer software; most of the past year I have spent on Oracle Exadata - An appliance (with hardware + software bundle of specified configuration as 1/4 rack, 1/2 rack or a full rack)

In the pre-appliance world, the underlying deployment architecture of server, network, storage would be built as per the application requirements and the quality attributes are "portability", "scalability" and so on....

An application would be sized for the capacity of required CPU, Memory, I/O, storage along with its local fail-over requirements and the disaster recovery requirements and the underlying infrastructure was built using either physical or virtual components of server(s) and storage. The number of nodes in the cluster and size of each node would be carefully planned.

But, with the Exadata, the pre-configured 8 compute nodes with 14 storage cells in a full rack is configured by Oracle. Each compute node has 24 CPU cores and 96GB of memory.

Now this Exadata appliance need to be shared by multiple applications.. The complexity of multi-tenancy starts here. How to ensure Quality of Service?

1. Server pools and Instance Caging
2. Service design
3. Database Resource Manager
4. I/O Resource Manager

I think it is always good to have a database per application. Hosting multiple applications on a single database instance could be tricky with respective to the CPU allocation.

Next most challenging task is allocating the memory across multiple applications. This is one thing it is still being done manually. Automatic SGA and PGA management and memory tuning within an instance is improving but allocating a memory target for each database should be done manually.

Classifying the workload within a database using the USER NAME, SERVICE, CLIENT PROGRAM NAME, CLIENT USERNAME, MODULE, ACTION etc., parameters to a "resource consumer group" and assigning a resource consumer group to a resource manager plan is achieved using DBRM. Every user session should be put on the right consumer group dynamically based on multiple parameters rather than always putting USER1 on medium_consumer_group. If USER1 is performing an important action that session should be prioritized at a higher level.

Finally, controlling IO operations on the cells across the databases and multiple workloads from within a database also a very important activity to maintain the right Quality of Service (QoS). IO Resource manager database plan and a category plan for prioritizing the workload from within a database should be configured.

In my opinion, the performance management within the appliance world has become more complicated due to the complexity involved in the QoS and resource management. Now we have to develop applications that are best suited to use the platform features of the appliance like Exadata.

Questions to think about:
Is this going opposite of "portability"? How easy is it to port the applications from one appliance to another?

Thursday, July 12, 2012

Software Architecture simplified

I have been asked to present some of my experiences along with some key points of "Software Architecture" to the faculty in an Engineering College.
Hope this gives a good overview!

Tuesday, June 19, 2012

Exadata performance features

Recently I have reviewed an Exadata implementation (about 66TB data warehouse with multiple marts running on different services on a single database of full-rack Exadata V2) for performance improvements. This post tries to summarize the key points application developers / designers / DBAs should be aware of while deploying the applications on  to Oracle Exadata V2.

1. "Smart Scan" is a Cell Offloading feature that the selection / projection of an SQL is offloaded to the storage cell instead of doing that operation on the compute node after reading all the required blocks to the buffer cache. This works with Full Table Scans (FTS) and Full Index Scans when using the direct path reads. This can dramatically improve the performance of a FTS but that does not mean all the processing need to happen over FTS and all the indexes to be removed! When the single row look up need to happen or very small amount of records are read from a large table, still the index based look up is much faster than the FTS even with smart scan.

Smart Scan is more a run-time decision rather than an optimizer time decision. Smart Scan depends on the number of sessions requesting for the data, number of dirty blocks, size of the table (_small_table_threshold - by default Oracle considers 2% of buffer cache as small table threshold; this may not be good in some times! This parameter may be tweaked at session level as needed.)

On the explain plan one can see the "STORAGE" keyword on action like "TABLE ACCESS STORAGE FULL" and the statistic value in V$ views is "cell physical IO bytes eligible for predicate offload".

To force the direct read on serial operations at a session level "_serial_direct_read" can be set to TRUE.

2. "Storage Index" is another feature that each cell builds a dynamic negative index of what data is surely not there on the cell for each column value by making a min and max value ranges that are stored on the cell for a given column. This structure is a in-memory index dynamically built after seeing multiple queries that are offloaded to the storage. This feature gives performance improvement similar to "partition pruning" on partitioned tables. To take best advantage of this feature, an ordered load of data into the table based on the most used where clause predicate columns is recommended.  The ETL processes should use a serial loading of data using "APPEND" hint into the table such that the best advantage of storage index can be achieved on SELECT statements.

3. In a data warehouse type environment, when the most of the times all the rows are accessed but every time only a subset of columns are accessed from the table, Hybrid Columnar Compression improves the performance. Using a COMPRESS FOR QUERY HIGH mode of HCC all the queries that use few columns of the table would only read the required column blocks and perform better.

It is important to consider these features during design of an application and building the application to take advantage of these features will tremendously reduce the resource consumption on the platform at the same time giving best throughput.

But, 

It is important to have correct indexing strategy, correct partitioning strategy in place even with these features to absolutely make sure the performance is predictable.  Just leaving the tables to grow large with all the history data without the right partitioning strategy will leave the performance degrade over time even with smart scan, storage indexes and HCC!
  

Monday, April 23, 2012

Identifying Bottlenecks

Recently, I have been asked to review performance of an ETL workflow of Informatica provided by Oracle as part of OBIEE upgrade. The OLAP schema upgrade which has several transformations to upgrade Siebel Analytics 7.7 to OBIA 7.9.6.2

The problem:
An innocent looking "Insert" load is reading a source table, applying a set of look up transformations, then generating a sequence number as a unique key and inserting to a target table in bulk mode is only able to give a throughput of about 65 records per second on a 4 core, 8GB RAM server. The source table is having over 20 million records.

There are other work flows which are running at a throughput of >2000 records per second. So, I have started investigating into this "Performance Problem"

1. Looked at the AWR reports from the source and target databases as well as the Informatica repository database. There is no indication any bottleneck on the database end.

2. Looked at the Session Log of Informatica PowerCenter. The session log shows the Writer thread 99% busy.

3. Taken a subset of 100K records using the source qualifier started running multiple scenarios to identify the bottleneck.

  • First step is to remove the target database; converted the target to a local flat file; so the workflow is now just reading the source; applying all the transformations & look-ups and writing to the local flat file. The throughput did not improve. 
  • Next step is to remove the large table look-ups. One of the look-up was on a table of more than 20 million records. Removed all the look-ups in the workflow mapping. Still the throughput is only 65 records per second. 
  • As the third step, removed the Sequence Number generation; Now the workflow is just reading from the source table and writing to a flat file after applying few compute transformations. The throughput reached 2400 records per second. 
  • By now we have identified the probable bottleneck with the "Sequence Generator"; to confirm that re-ran the work flow with all the look-up transformations and only disabling the sequence generator. The throughput was 2400 records per second. 

4. Looking at the sequence generator is reusable and set to cache only one value. This is causing a round trip from Informatica powercenter server to Informatica repository for the each sequence number to be generated. This could only do a maximum 65 round trips in a second which is causing the bottleneck in this workflow.

Setting appropriate caching on the sequence generator, we finally achieved a throughput of 2400 records per second and completed the load in less than 2.5 hours; It is an improvement of throughput by around 37 times!  


Spending few hours in identifying the bottleneck and removing it worth its effort...... 

Thursday, April 12, 2012

Data Replication

The need for data replication is evergreen. The use-case may vary from "High Availability" to "Operational Reporting (BI)" to "Real-time Analytics"; whatever may be the case, the need for replicating the data exists in the information systems and solutions.

I try to summarize the evolution of data replication technology from a Oracle Database standpoint in this post.

  1. Looking back at the Oracle based data replication technologies the first one is "Database Link". One can create a database link and pull or push the data directly into a remote database starting Oracle 5 or 6. This is the very first method of replication where the application need to push or pull the data from a remote database and apply necessary logic to identify what data has changed and what to do with those changes.....
  2. The next improvement in the replication is around Oracle 8 - one can setup a trigger based replication. That means whenever a transaction changes the data in a table one can trigger a function that can handle the replication without changing the application. So, database started giving a method to replicate data using triggers....
  3. Then the next improvement has come around 9.2 with streams and log-based replication. The capability is to mine the redo logs and move the committed transactions to the target systems. (DBMS_CDC_PUBLISH and DBMS_CDC_SUBSCRIBE packages were introduced)
  4. Oracle Advanced Queuing enhanced the streams to have robust publish and subscribe model replication that has enqueue and dequeue based communication. I was involved in a very large project that set up a custom Changed Data Capture to migrate data from a legacy system into SAP ERP involving large tables having 100Milion records.... 
  5. Later the log-mining technology was used for physical and logical standby dataabses and evolved to a Active Data Guard technology......
  6. With GoldenGate, the heterogeneous log-based data replication solution is complete from Oracle that has capabilities to extract, replicate, synchronize data in bi-directional movement.  

Depending on the need one should choose the right technology to achieve the needed data replication....

Tuesday, February 14, 2012

in memory computing

Approximately two years back I made a post on Enterprise Data Fabric technology. The aim of the data grid or "in memory data store" is to remove the movement of data in and out of slower disk storage for processing. Instead the data in kept in "pooled main memory" during the processing.

To get above the physical limitations on the amount of main memory, the data grid technologies will create a pooled memory cluster with data distributed over multiple nodes connected using a high bandwidth network.

With SAP bringing HANA, an in memory database that has option to store data in traditional row store and column store (read storing data in rows and columns) within an in-memory technology and Oracle bringing the Exaletics appliance, the in-memory computing is getting more attention.

So, the claims are that the in memory technology will boost the performance by multiple degrees. But the truth is it can only remove the time taken to move the data out of disk into main memory. If there is a query that is processing the data using a wrong access method, even if all the data is moved into a memory store the processing will still take as long to provide the answer!

In memory computing would need re designing the applications to use the technology for better information processing. OLTP workload will surely improve the performance due to memory caching but the consistency of the data need to be managed by the application moving to a event based architecture.

OLAP and Analytical workloads would also improve the performance by using memory based column stores with a good design of the underlying structure of data that suits the processing requirements.

Overall, in memory computing is promising at the moment but without the right design to use the new technology, the old systems will not just get the performance boost just by moving the data store into the main memory

Let us wait and see how the technology shapes further in future.....