Blog Moved

Future posts related to technology are directly published to LinkedIn
https://www.linkedin.com/today/author/prasadchitta

Friday, April 5, 2013

Accelerating Analytics using “Blink” aka “BLU acceleration”

This Friday marks completion of my 2 years in the second innings with TCS ‘s Technology Excellence Group and it is time for a technical blog post.
During this week, I have seen IBM announcing new “BLU acceleration” enabled DB2 10.5 that claims a 10 to 20 times performance improvement out of box.  (Ref: http://ibmdatamag.com/2013/04/super-analytics-super-easy/ )
This post aims at giving a brief summary of the Blink Project which has brought in this acceleration to the analytic queries.
The Blink technology has primarily two components that achieve the said acceleration to the analytic processing:
1.       The compression at the load time
2.       The query processing
Compression & Storage:
At load time each column is compressed using a “Frequency Partitioning” order preserving fixed length dictionary encoding method. Each partition of the column has a dictionary of its own making it to use shorter column codes. As it preserves order the comparison operators/predicates can be applied directly to the encoded values without needing to uncompress them.
Rows of are packed using the bit aligned columns to a byte aligned banks of 8, 16, 32 or 64bits for efficient ALU operations. This bank-major storage is combined to form blocks that are then loaded into the memory (or storage.) This bank-major storage exploits SIMD (Single Instruction, Multiple Data) capability of modern POWER processor chips of IBM.
Query Processing:
In Blink there are no indexes, no materialized views nor a run-time query optimizer. So, it is simple. But the query must be compiled to take care of different encoded column lengths of each horizontal partition of the data.
Each SQL is split into a series of single-table queries (STQs) which does scans with filtering. All the joins are hash joins. These scans happen in an outside-in fashion on a typical snowflake schema creating intermediate hybrid STQs.
Blink executes these STQs in multiple blocks to threads each running on a processor core. As most modern ALUs can operate on 128bit registers all the operations are bit operations exploiting SIMD which makes the processing fast.
For more technical details of Blink project refer to - http://sites.computer.org/debull/A12mar/blink.pdf
Hope this will bring “Analytics” a boost and some competition to Oracle’s Exa- appliances. Views, Comments?