Approximately two years back I made a post on Enterprise Data Fabric technology. The aim of the data grid or "in memory data store" is to remove the movement of data in and out of slower disk storage for processing. Instead the data in kept in "pooled main memory" during the processing.
To get above the physical limitations on the amount of main memory, the data grid technologies will create a pooled memory cluster with data distributed over multiple nodes connected using a high bandwidth network.
With SAP bringing HANA, an in memory database that has option to store data in traditional row store and column store (read storing data in rows and columns) within an in-memory technology and Oracle bringing the Exaletics appliance, the in-memory computing is getting more attention.
So, the claims are that the in memory technology will boost the performance by multiple degrees. But the truth is it can only remove the time taken to move the data out of disk into main memory. If there is a query that is processing the data using a wrong access method, even if all the data is moved into a memory store the processing will still take as long to provide the answer!
In memory computing would need re designing the applications to use the technology for better information processing. OLTP workload will surely improve the performance due to memory caching but the consistency of the data need to be managed by the application moving to a event based architecture.
OLAP and Analytical workloads would also improve the performance by using memory based column stores with a good design of the underlying structure of data that suits the processing requirements.
Overall, in memory computing is promising at the moment but without the right design to use the new technology, the old systems will not just get the performance boost just by moving the data store into the main memory
Let us wait and see how the technology shapes further in future.....
Tuesday, February 14, 2012
Friday, December 30, 2011
Pensions Regulation in UK
As the year 2011 draws to a close, we are entering into a new year that is very important for UK workplace pensions. The new regulation around workplace pensions is coming into force in 2012.
Being purely technical for few years, I just want to test my skills around understanding regulatory documents and extract "Functional Information Needs" for a business.
A good review published in October 2010 can be accessed here: http://www.dwp.gov.uk/docs/cp-oct10-full-document.pdf
Basically, all the employers in UK have to automatically enroll all the eligible workers falling in a AGE range and EARNINGS range to a suitable pension scheme. They also need to "certify" the selected pension scheme meets the required quality criteria. (Refer to section 6.5 of the review document above) The requirement is to have at least 8% of earnings are paid towards a pension fund.
Regulation defines "Qualifying Earnings" as gross earnings that include commissions, bonuses, overtime etc., but most of the employers have the pension contribution basis as the "Basic Pay" i.e a pensionable pay. So, employer has primarily following options to "Certify" the pension scheme and comply with it.
Pseudo logic in plain English
1. IF pension contribution basis IS qualifying earnings (within the band) THEN pay the contributions of 8% and no certification is required.
2. IF pension contribution basis IS pensionable pay THEN
2a. CASE pensionable pay is 100% of gross pay THEN pay contributions of 7%
2b. CASE pensionable pay is at least 85% of gross pay THEN pay contributions of 8%
2c. CASE pensionable pay is less than 85% of gross pay THEN pay contributions of 9%
AND self certify pension scheme for all the employees participating in the pension scheme.
So, core information needs to implement this regulation is employee payroll data that covers age, all components of qualifying earnings of all employees. A bit of intelligence is needed to "model" the best possible grouping of employees and assign them to a suitable pension scheme(s) with one (or more) of the pension providers in the market.
The overall goal for a techno-functional consultant like me is to optimize the value of new pension regulations for employees, employers, pension providers and IT consulting companies by optimizing the information flows across various stakeholders!!
Wishing everyone a great new year 2012. Let there be peace, security and prosperity be with one and all.
Being purely technical for few years, I just want to test my skills around understanding regulatory documents and extract "Functional Information Needs" for a business.
A good review published in October 2010 can be accessed here: http://www.dwp.gov.uk/docs/cp-oct10-full-document.pdf
Basically, all the employers in UK have to automatically enroll all the eligible workers falling in a AGE range and EARNINGS range to a suitable pension scheme. They also need to "certify" the selected pension scheme meets the required quality criteria. (Refer to section 6.5 of the review document above) The requirement is to have at least 8% of earnings are paid towards a pension fund.
Regulation defines "Qualifying Earnings" as gross earnings that include commissions, bonuses, overtime etc., but most of the employers have the pension contribution basis as the "Basic Pay" i.e a pensionable pay. So, employer has primarily following options to "Certify" the pension scheme and comply with it.
Pseudo logic in plain English
1. IF pension contribution basis IS qualifying earnings (within the band) THEN pay the contributions of 8% and no certification is required.
2. IF pension contribution basis IS pensionable pay THEN
2a. CASE pensionable pay is 100% of gross pay THEN pay contributions of 7%
2b. CASE pensionable pay is at least 85% of gross pay THEN pay contributions of 8%
2c. CASE pensionable pay is less than 85% of gross pay THEN pay contributions of 9%
AND self certify pension scheme for all the employees participating in the pension scheme.
So, core information needs to implement this regulation is employee payroll data that covers age, all components of qualifying earnings of all employees. A bit of intelligence is needed to "model" the best possible grouping of employees and assign them to a suitable pension scheme(s) with one (or more) of the pension providers in the market.
The overall goal for a techno-functional consultant like me is to optimize the value of new pension regulations for employees, employers, pension providers and IT consulting companies by optimizing the information flows across various stakeholders!!
Wishing everyone a great new year 2012. Let there be peace, security and prosperity be with one and all.
Sunday, December 4, 2011
Storing Rows and Columns
A fundamental requirement of a database is to store and retrieve the data. In Relational Database Management Systems (RDBMS) the data is organized into a table that contain the rows and columns. Traditionally the data is stored into blocks of rows. For example a "sales transaction row" may have 30 data items representing 30 columns. Assuming a record occupies 256 bytes, a block of 8KB can hold 32 such records. Again assuming a million such transactions that need to be stored in 32150 blocks per day. All this works well as long as we need the data as ROWS! We want to access one row or a group of rows at a time to process that data, this organization has no issues.
Let us consider if we want to get a summary of total value of type x items that are sold in past seven days. This query need to retrieve 7million records that contain 30 columns each to just process the count of items of types x. All that we need is two columns item type and amount to process this. This type of analytical requirement lead us to store the data in columns. We group the columns together and store them in blocks. It improves the speed of retrieving the columns from the overall table quickly for the purpose of analyzing the data.
But the column storage has its limitations when it comes to the write and update
With a high volume of social data, where there is high volume of write is needed (like messages and status updates, likes and comments etc.,) , highly distributed, NOSQL based column stores are emerging into mainstream. Apache Cassandra is the new breed of NOSQL column store that was initially developed by Facebook.
So, we have a variety of data base / data stores available now, a standard RDBMS engine with SQL support for OLTP applications, A column based engies for OLAP processing and noSQL based key value pair stores for in-memory processing, highly clustered Hadoop style big data with map/reduce framework for big data processing and noSQL based column stores for high volume social write and read efficiencies.
Making right choice of data store for the problem in had is becoming tough with many solution options. But that is the job of an architect; Is it not?
Let us consider if we want to get a summary of total value of type x items that are sold in past seven days. This query need to retrieve 7million records that contain 30 columns each to just process the count of items of types x. All that we need is two columns item type and amount to process this. This type of analytical requirement lead us to store the data in columns. We group the columns together and store them in blocks. It improves the speed of retrieving the columns from the overall table quickly for the purpose of analyzing the data.
But the column storage has its limitations when it comes to the write and update
With a high volume of social data, where there is high volume of write is needed (like messages and status updates, likes and comments etc.,) , highly distributed, NOSQL based column stores are emerging into mainstream. Apache Cassandra is the new breed of NOSQL column store that was initially developed by Facebook.
So, we have a variety of data base / data stores available now, a standard RDBMS engine with SQL support for OLTP applications, A column based engies for OLAP processing and noSQL based key value pair stores for in-memory processing, highly clustered Hadoop style big data with map/reduce framework for big data processing and noSQL based column stores for high volume social write and read efficiencies.
Making right choice of data store for the problem in had is becoming tough with many solution options. But that is the job of an architect; Is it not?
Friday, November 11, 2011
numbers and counters
There is some amount of hype on the 11-11-11 i.e., 11-November-2011... I see the numbers as just counters and they themselves do not make much sense unless identified with some meaningful thing.
It is 14581 days since i was born, 8495 days since i was associated with software/computers, 6240 days since i started working etc., etc., in all these counters "I" remains constant while the numbers move on...
Several other numbers, top 10s, fortune 500s etc., etc., also make some hype around from time to time; but it will be continuously replaced in the flow of the numbers.
Especially in the current era of very high importance to the numbers and counters the true importance of the "Identity" and "Intelligence" seems to have lost...
Hope the 11:11AM IST of 11-11-11 bring some common sense around in the world in general and Information Technology world in particular.....
All the best flocks!
I would like to add one memorable item from year 2001 on this occasion. We have completed our first "Consolidation" project a decade back and for that we got a 500 Million years old natural slate piece printed with a small message as a memento.
It is 14581 days since i was born, 8495 days since i was associated with software/computers, 6240 days since i started working etc., etc., in all these counters "I" remains constant while the numbers move on...
Several other numbers, top 10s, fortune 500s etc., etc., also make some hype around from time to time; but it will be continuously replaced in the flow of the numbers.
Especially in the current era of very high importance to the numbers and counters the true importance of the "Identity" and "Intelligence" seems to have lost...
Hope the 11:11AM IST of 11-11-11 bring some common sense around in the world in general and Information Technology world in particular.....
All the best flocks!
I would like to add one memorable item from year 2001 on this occasion. We have completed our first "Consolidation" project a decade back and for that we got a 500 Million years old natural slate piece printed with a small message as a memento.
Friday, October 14, 2011
Most efficient multi-set Cartesian join in C
At the beginning of my career, with Indian Space Research Organization, I have been posed with a challenge that required implementing a multi-set Cartesian product with absolutely minimum memory usage to solve an optimization problem. (see my old post on description of the problem: Simple looking Complex problem )
As a tribute to Dennis M Ritchie (also known as dmr) the creator of C language, who passed away yesterday, I am posting my implementation of this algorithm in C language.
I consider the above seven "highlighted" lines of C code as one of the earliest and most notable achievements of my career!
If there is any better implementation to solve the stated problem please let me know by posting a comment.....
As a tribute to Dennis M Ritchie (also known as dmr) the creator of C language, who passed away yesterday, I am posting my implementation of this algorithm in C language.
If there is any better implementation to solve the stated problem please let me know by posting a comment.....
Sunday, October 9, 2011
ACID and BASE of data
I am completing my 18 years of working in the field of Information Technology.
All these days an enterprise data store generally provides the four qualities Atomicity, Consistency, Isolation and Durability (ACID) to the transactions. Oracle has emerged as a leader in providing enterprise class ACID transactional capabilities to the applications.
Recently in the Open World 2011, Oracle announced a noSQL database which typically characterized by the BASE acronym. Basically Available, Soft state, Eventually consistent (BASE)
I see a lot of debate on SQL vs NoSQL, ACID vs BASE and Shared Everything vs Shared Nothing architectures of data stores of late; and with Oracle getting on to the NoSQL bandwagon, this debate is just took up additional momentum.
Oracle has posted this paper nicely explaining their NoSQL database. http://www.oracle.com/technetwork/database/nosqldb/learnmore/nosql-database-498041.pdf
In my opinion, SQL and NOSQL choice is straight forward to make:-
big query: Are we storing data or BIG-DATA (read my old post on transactional data vs machine generated big data - http://technofunctionalconsulting.blogspot.com/2011/02/analytics.html)
With the new trends in 'BIG DATA' all the data almost become key, value pair with read and insert only operations with minimal or no updates to the data records. NoSQL/BASE is best suited to handle this type of data. Still the traditional transactional databases of OLTP nature, needs ACID complaint transactions.
So, when designing the big data solutions, an architect should surely look at the NoSQL dataBASE. Is it not?
Publishing this post on 09/10/11 (dd/mm/yy) and this is my 85th post to this blog.
All these days an enterprise data store generally provides the four qualities Atomicity, Consistency, Isolation and Durability (ACID) to the transactions. Oracle has emerged as a leader in providing enterprise class ACID transactional capabilities to the applications.
Recently in the Open World 2011, Oracle announced a noSQL database which typically characterized by the BASE acronym. Basically Available, Soft state, Eventually consistent (BASE)
I see a lot of debate on SQL vs NoSQL, ACID vs BASE and Shared Everything vs Shared Nothing architectures of data stores of late; and with Oracle getting on to the NoSQL bandwagon, this debate is just took up additional momentum.
Oracle has posted this paper nicely explaining their NoSQL database. http://www.oracle.com/technetwork/database/nosqldb/learnmore/nosql-database-498041.pdf
In my opinion, SQL and NOSQL choice is straight forward to make:-
big query: Are we storing data or BIG-DATA (read my old post on transactional data vs machine generated big data - http://technofunctionalconsulting.blogspot.com/2011/02/analytics.html)
With the new trends in 'BIG DATA' all the data almost become key, value pair with read and insert only operations with minimal or no updates to the data records. NoSQL/BASE is best suited to handle this type of data. Still the traditional transactional databases of OLTP nature, needs ACID complaint transactions.
So, when designing the big data solutions, an architect should surely look at the NoSQL dataBASE. Is it not?
Publishing this post on 09/10/11 (dd/mm/yy) and this is my 85th post to this blog.
Wednesday, September 28, 2011
User Experience - HTML5
Approximately two years back, I have made a post on "Rich Internet Applications" where the development of user experience focused on the plug-ins on the browser or run-time environments like Adobe AIR for rich experience. (See Desktop Widgets post.... )
As the technology progressed in last two years, the new/emerging HTML 5 seems to take on the web user experience design to the standard based, plug-in independent mode.
Some examples can be found on : http://www.apple.com/html5/
Another dimension today is the "mobile devices" along with the browser on desktop/laptop.
When it comes to the mobile devices and integration with the specific device capabilities, one should develop "native applications" to take full advantage of the native hardware of the device. Standards are good but Native Applications can do better. On the other hand, using the standards we can develop once and deploy on multiple devices where as native applications development requires "effort/time" on each platform...
So, there is no silver bullet for the problem of Rich User Experience needs of ever changing world!
As the technology progressed in last two years, the new/emerging HTML 5 seems to take on the web user experience design to the standard based, plug-in independent mode.
Some examples can be found on : http://www.apple.com/html5/
Another dimension today is the "mobile devices" along with the browser on desktop/laptop.
When it comes to the mobile devices and integration with the specific device capabilities, one should develop "native applications" to take full advantage of the native hardware of the device. Standards are good but Native Applications can do better. On the other hand, using the standards we can develop once and deploy on multiple devices where as native applications development requires "effort/time" on each platform...
So, there is no silver bullet for the problem of Rich User Experience needs of ever changing world!
Saturday, September 24, 2011
Ancient advice applicable for projects & tasks
I have been writing this techno functional consulting blog for past 4 years and I like to bring some ancient touch for modern Project Management:
अफलानि दुरन्तानि समव्ययफलानि च
अशक्यानि च कार्याणि नारभेत विचक्षणः
aphalAni = Those without fruit, durantAni = those with a bad ending (i.e., ends in failure), sama-vyaya-phalaani ca = and, those who are equal in effort and result (i.e., that do not end in either profit or in loss!), aSakyAni ca = and, those which is beyond the capability (i.e., impossible ones!) kAryANi = activities, projects, tasks na+ArabhEta = should not be started or initiated by vicakshaNaH= the wise man.
In my view there is only a 50% success rate in the Information Technology projects. So, it is wise to start only the projects that are sure to be successful. so, the ancient scholar of the above verse saying:
Don't take up a task/project if it is known to be:
a. meaningless or fruitless,
b. sure to land in a bad-end,
c. is of no-gain; no-loss
d. impossible or beyond one's own capability
We will improve the "success rate" if we follow this basic advice before taking up the projects & tasks!!
Given a chance I will put this in the beginning of PMP and PRINCE 2 certification material.
अफलानि दुरन्तानि समव्ययफलानि च
अशक्यानि च कार्याणि नारभेत विचक्षणः
aphalAni = Those without fruit, durantAni = those with a bad ending (i.e., ends in failure), sama-vyaya-phalaani ca = and, those who are equal in effort and result (i.e., that do not end in either profit or in loss!), aSakyAni ca = and, those which is beyond the capability (i.e., impossible ones!) kAryANi = activities, projects, tasks na+ArabhEta = should not be started or initiated by vicakshaNaH= the wise man.
In my view there is only a 50% success rate in the Information Technology projects. So, it is wise to start only the projects that are sure to be successful. so, the ancient scholar of the above verse saying:
Don't take up a task/project if it is known to be:
a. meaningless or fruitless,
b. sure to land in a bad-end,
c. is of no-gain; no-loss
d. impossible or beyond one's own capability
We will improve the "success rate" if we follow this basic advice before taking up the projects & tasks!!
Given a chance I will put this in the beginning of PMP and PRINCE 2 certification material.
Saturday, September 10, 2011
Tiered data storage
Hierarchical Storage Management
Not too long ago (about 10 years back), I have done a strategy for "Data Archival" options for a system that has lots of data which need to be preserved for 25 years due to legal reasons. (sort of Records Management requirement) The requirement is to have it fully query-able fine grained data in the system. The key challenge was keeping all data in on-line storage with the technology available at that time. So, we need to have a clear "Archival Strategy" to move the data off from the disk to tape and preserve the "Tapes" in a way they can be retrieved (by methods of proper labeling etc.,) on-demand within the given service levels. This technology later named as Hierarchical Storage Management. Overall strategy included manual tiering of data between the disks and tapes sometimes using a mechanical robotic hands and associated software around them.
Information Life-cycle Management
As the technology advanced, the disk storage evolved to multiple bands of cost/functionality. The database software like Oracle came up with options like table partitioning and advanced compression. Combining these advances in the database management systems and the storage a new strategy emerged as Information Life-cycle Management. Logically partitioning the tables and putting them in the different types of storage like Enterprise Flash Disks (EFD), Fiber Channel (FC) and SATA disks using an automated storage tiering is the trend of the day.
Thin provisioning technologies like EMCs Fully Automated Storage Tiering - Virtual Pooling FAST VP and Hitachi's Dynamic Tiering etc., when used with Oracle's ASM and the partitioning & advanced compression options gives the best flexibility, performance and value for money. There is a good whitepaper from EMC with published few months back that can be found here.
Conclusion:
Most of the storage vendors now have the Tiered storage technology embedded in the disk controller software layer that can automate the data migration or intelligently cache and tire the data across multiple types of storage. Using the available technology with right mix of logical features of database and storage virtualization leads to better data availability at the optimal cost. Still the "right solution" is a job of a knowledgeable Architect! (who can understand the Business and Technology well!!)
Not too long ago (about 10 years back), I have done a strategy for "Data Archival" options for a system that has lots of data which need to be preserved for 25 years due to legal reasons. (sort of Records Management requirement) The requirement is to have it fully query-able fine grained data in the system. The key challenge was keeping all data in on-line storage with the technology available at that time. So, we need to have a clear "Archival Strategy" to move the data off from the disk to tape and preserve the "Tapes" in a way they can be retrieved (by methods of proper labeling etc.,) on-demand within the given service levels. This technology later named as Hierarchical Storage Management. Overall strategy included manual tiering of data between the disks and tapes sometimes using a mechanical robotic hands and associated software around them.
Information Life-cycle Management
As the technology advanced, the disk storage evolved to multiple bands of cost/functionality. The database software like Oracle came up with options like table partitioning and advanced compression. Combining these advances in the database management systems and the storage a new strategy emerged as Information Life-cycle Management. Logically partitioning the tables and putting them in the different types of storage like Enterprise Flash Disks (EFD), Fiber Channel (FC) and SATA disks using an automated storage tiering is the trend of the day.
Thin provisioning technologies like EMCs Fully Automated Storage Tiering - Virtual Pooling FAST VP and Hitachi's Dynamic Tiering etc., when used with Oracle's ASM and the partitioning & advanced compression options gives the best flexibility, performance and value for money. There is a good whitepaper from EMC with published few months back that can be found here.
Conclusion:
Most of the storage vendors now have the Tiered storage technology embedded in the disk controller software layer that can automate the data migration or intelligently cache and tire the data across multiple types of storage. Using the available technology with right mix of logical features of database and storage virtualization leads to better data availability at the optimal cost. Still the "right solution" is a job of a knowledgeable Architect! (who can understand the Business and Technology well!!)
Monday, August 8, 2011
web age of WWW
As the WWW turns 20 years over the weekend (Link to the first webpage), my association with the computers turns 23 years today. The WWW is estimated to have approx. 20 billion pages as of today.
The information hungry world started making "Assets" out of information. Information has been classified as confidential, sensitive, internal, limited circulation, public etc., and some companies purely live only on "Informational Assets" today...
Protecting these information assets in the current day scenario of (operation shady RAT and reports stating that the claims of shady RAT themselves are shady!! ) hacking is truly a challenge. The information storage and its regulated flow to different end points need to be fully governed and secured.
My past blog posts related to the Information Security:
1. Data Security Technologies
2. Maximum Security Architecture
3. Identity and Access Management
with all these technology still there is a lot of "insecurity" among the technologists. Why?
Originally the information is published by the owner of that information and he/she would secure it with necessary proven authentication. Overall the information flow is between two known entities. (e-mail etc.,)
OR
Public information is broadcasted to reach maximum number of recipients. (spam mails etc.,)
As the WWW advanced to "Social" media the information is now being published by individuals for consumption by different like minded individuals who are directly known or unknown to the original publisher. This mode of information flow makes the whole process of information security very complex.
Technology surely can live up to the challenges that are posed by the trends in the information management area. Only thing needed now is cleaver brains to tackle the threats... It is all in the proper implementation of the available technology...
On this 8400 day of my association with computers and software, I am working on securing the information in the financial industry... Let us all hope we will have another 20 years flourishing, safe and secure WWW....
The information hungry world started making "Assets" out of information. Information has been classified as confidential, sensitive, internal, limited circulation, public etc., and some companies purely live only on "Informational Assets" today...
Protecting these information assets in the current day scenario of (operation shady RAT and reports stating that the claims of shady RAT themselves are shady!! ) hacking is truly a challenge. The information storage and its regulated flow to different end points need to be fully governed and secured.
My past blog posts related to the Information Security:
1. Data Security Technologies
2. Maximum Security Architecture
3. Identity and Access Management
with all these technology still there is a lot of "insecurity" among the technologists. Why?
Originally the information is published by the owner of that information and he/she would secure it with necessary proven authentication. Overall the information flow is between two known entities. (e-mail etc.,)
OR
Public information is broadcasted to reach maximum number of recipients. (spam mails etc.,)
As the WWW advanced to "Social" media the information is now being published by individuals for consumption by different like minded individuals who are directly known or unknown to the original publisher. This mode of information flow makes the whole process of information security very complex.
Technology surely can live up to the challenges that are posed by the trends in the information management area. Only thing needed now is cleaver brains to tackle the threats... It is all in the proper implementation of the available technology...
On this 8400 day of my association with computers and software, I am working on securing the information in the financial industry... Let us all hope we will have another 20 years flourishing, safe and secure WWW....
Subscribe to:
Posts (Atom)



