Blog Moved

Future posts related to technology are directly published to LinkedIn

Friday, December 17, 2010

Integration - Centralized or Distributed?

Let us imagine a big greenfield IT program that implements multiple packaged products, custom developed applications for an enterprise business.

There are multiple software vendors, package implementation partners and custom development teams, that are involved in the complete life cycle consisting of requirements analysis, architecture/design, code & test, integrate and deploy to production.

In such a scenario, how to deal with the interface development and integration?

Option 1: Centralized Integration Development
1. Agree the interfaces as step 1 of the program.
2. Let an independent group of people (let us call center of excellence) do the integration/interface work.
3. As the different products, applications are deployed the integration team will integrate them.

Option 2: Distributed Integration Development
1. Only have a integration back end (say ESB) be controlled by central team.
2. Each team working on different projects of the program will have their own interface development team.
3. As the different products, application are deployed the respective team will integrate them with the central ESB.

Which of the above options is better?

In my personal opinion, all the integration related activities are better done from a central COE.

Central COE can share the best practices, identify core patterns and leverage the skills.

Challenges in the COE model are acquiring and building the right skills and right sizing the COE for the program.

Any thoughts or ideas?

Monday, December 6, 2010

Historize or Roll-Up

Information Technology is all about acquiring, processing, storing and presenting the "data" to the right people at right time to enable them to derive some meaningful information and in some cases useful intelligence or insight out of that data.

In usual business, a data point is captured only when there is a transaction that changes the data. So, each change to the data is captured, validated and stored. Such systems are called OLTP or On-Line Transaction Processing systems.

But, when the data is acquired at regular intervals in typical process control systems, not all the data points required to be stored; For example
a. the utilization of a system processor at every 5 seconds interval
b. Temperature of a steam turbine scanned at every 400 ms interval

In a typical process control system there will be several thousands of such data points scanned at very high frequency, typically every second.

What to do with all this data?
In a standard relational database storing all this data in raw format will be simply impractical. So, there are two methods to make some sense out of such "time series" data.

1. Historize the data using a process historian. A process historian uses a compression algorithm that only stores a data point only when there is a deviation beyond a set limit using variety of algorithms like straight line interpolation (SLIM1, 2, 3) or swinging door compression etc., to achieve a high degree of compression in storing the time series data.

2. Roll-up data on a periodical (i.e., hourly data for few weeks) basis to store max, min, average, standard deviation etc., values in one single record per data point. A next level roll-up of data can happen for a longer time interval (i.e., daily data for several years) This multi-level roll-up data can be used for historical trending purposes.

There are advantages of both methods. Recently I have seen a patent on dynamic compression of system management data which is interestingly putting the Historization with multiple compression algorithms for storing the systems management data.