Blog Moved

Future posts related to technology are directly published to LinkedIn
https://www.linkedin.com/today/author/prasadchitta

Saturday, August 8, 2015

My continued association with Computer Science

It has been 27years of my association with "Computer Science" today!

Recently I heard a student making a remark while selecting the course for undergraduate: "What is there in computer science? I can learn java on my own"

My clarification is as follows: "Computer Science" is not just a programming language or the skill of writing an program. A deep understanding of operating systems, memory management, compilers, data structures and algorithms, data storage, compression, encryption and security, parallel processing, analytics and on and on....

Along with the understanding, ability to apply the understanding to implement the algorithms with existing computing resources for solving the problems makes up the study of "Computer Science"

Of late, I have started learning the statistical language "R" and trying to experiment my ability to apply machine learning on some #kaggle challenges. My first submission to a competition: (Currently stand at 1110th position)

https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction/leaderboard 

Having moved my regular technology blogging to LinkedIn, my last year's post:

https://www.linkedin.com/pulse/20140807010347-15133503-social-capital-gamification 

Earlier on this blog: http://technofunctionalconsulting.blogspot.in/2013/08/science-research-consulting-and.html 

All my last year posts can be found on LinkedIn: https://www.linkedin.com/today/author/15133503

So, like any other subject, "Computer Science" has lot of depth and breadth, if one wants to explore it!! 


Tuesday, July 29, 2014

Salt, Soldiers and Salary

What is the common thing in salt, salary and soldiers? All these words have the same root 'sal' in Latin. A probable link to Sanskrit word salila (सलिल - which flows, water) is the common root for this word.

Probably, the most ancient job that paid salary could be soilder. Ancient romans used to pay soldiers with bars of salt. (sal dire means to give salt = soldier) Also the salt bars could be exchanged with other commodities so it worked just like money (coins) due to its unavailability in the past.

Later we have a very established career option of salaried employees with a monthly pay period. I have took up such an employment with the first pay period falling on October 1993 and a payroll processed towards me for 250 months (July 2014 is 250th) in one company's system or the other. A majority of those monthly salaries are from TCS.

TCS 164
Oracle 50
ISTRAC 27
IBM 9
Total 250

Ideally, by savouring on the TATA salt provided by salary from TCS, one should always be worth the salt.

More than half a million employees are working under the industrial empire formed under the TATA group under the great leadership of visionaries like JRD Tata the founder of TCS which alone employes 3 lakhs globally.

A quote from JRD Tata - “Strive for perfection and you will reach excellence”

Remembering a great visionary on his 110th birth anniversary.

Saturday, May 31, 2014

Data and Analytics - some thoughts on consulting

On this technical blog, I have not been very regular now-a-days primarily due to the other writing engagements on artha SAstra and on Medium.

Yesterday, I have been asked to address a group of analytic enthusiasts in an interactive session arranged by NMIMS at Bangalore on a theme "Visualize to Stratagise". Having spent around three hours several topics on Analytics and specifically on Visual Analytics were discussed.

I thought of writing on two aspects of Analytics which I have seen in past few months on this post to give a little food for thought to those who are consulting on Analytics.

Let "data" speak.
Few weeks back one of my customers had a complaint on database. Customer said, we have allocated a large amount of storage to the database and in one month time all the space was consumed. As per the customer's IT department at the maximum of 30K business transactions were only performed by a user group of 50 on the application which is supported by this database. So, they have concluded there is something wrong on the database and hence an escalation to me to look into it.

I have suspected some interfacing schema that could be storing the CLOB/BLOB type data and there could be missing cleanup and asked my DBA to give me a tablespace growth trend. The growth is on the transaction schema and across multiple transaction tables in that schema. I have ruled out some abnormal allocation on a single object with this observation.

We thought of running a simple analytics on the transaction data to see the created user on those transactions to verify if someone has run any migration script that could have got a huge amount of data into transaction tables or some other human error.

For our surprise we have seen 1100 active users who have created 600,000+ transactions in the database. All through the different times and most regular working day, working hour pattern. No nightly batch or migration user created the data. We went ahead with a detailed analytics on the data which has mapped all the users across geography of the country of operation.

We created a simple drill down visualization of the data and submitted to business and IT groups at the customer with a conclusion that the data indeed valid and created by their users and there is no problem with the system.

So, the data spoke for itself and the customer's business team said to the IT team that they have started using the system across the country for the last month and all the users were updating transactions on this system. This fact the IT team was not aware of. IT team is still thinking it is running pilot mode with one location and 50 users.

Let the data speak. Let it show itself to those who need it for decision making.Democratize the data.

The second point which came up evidently yesterday was

"If you torture your data long enough, it will confess anything"
Do not try to prove the known hypothesis with the help of data. It is not the purpose of analytics. With data and statistics you can possibly infer anything. Any bias towards a specific result will defeat the purpose of analytics.

So, let the data with its modern visualization ability be an unbiased representative which shows the recorded history of the business with all its deficiencies, with all its recording errors and all possible quality problems; in the process of decision making and strategising..... 

Hope I made clear my two points while consulting on Analytics....

Friday, April 4, 2014

Data streams, lakes, oceans and “Finding Nemo” in them….

This weekend, I complete 3 years of TCS second innings. Most of the three years I have been working with large insurance providers trying to figure out the ways to add value with the technology to their operations and strategy.

The concurrent period has been a period of re-imagination. Companies and individuals (consumers / employees) slowly moving towards reimagining themselves in the wake of converging digital forces like cloud computing, analytics & big data, social networking and mobile computing.

Focus of my career in Information Technology has always been “Information” and not technology. I am a firm believer in “Information” led transformation rather than “technology” led transformation. The basis for information is data and the ability to process and interpret the data, making it applicable and relevant for the operational or strategic issues being addressed by the corporate business leaders.

Technologists are busy making claims that their own technology is best suited for the current data processing needs. Storage vendors are finding business in providing the storage in cloud. Mobility providers are betting big on wearable devices making computing more and more pervasive. The big industrial manufacturers are busy fusing sensors everywhere and connecting them on the internet following the trend set by the human social networking sites. A new breed of scientists calling themselves data scientists are inventing algorithms to quickly derive insights from the data that is being collected. Each one of them is pushing themselves to the front taking support of the others to market themselves.

In the rush, there is a distinctive trend in the business houses. The CTO projecting technology as a business growth driver and taking a dominant role is common. The data flows should be plumbed across the IT landscape across various technologies causing a lot of hurried and fast changing plumbing issues.

In my view the data flow should be natural just like streams of water. Information should be flowing naturally in the landscape and technology should be used to make the flow gentle avoiding the floods and tsunamis. Creating data pools in the cloud storage and connecting the pools to form a knowledge ecosystem grow the right insights relevant to the business context remains the big challenge today.

The information architecture in the big data and analytics arena is just like dealing with big rivers and having right reservoirs and connecting them to get best benefit in the landscape. And a CIO is still needed and responsible for this in the corporate.

If data becomes an ocean and insights become an effort like “Finding Nemo” the overall objective may be lost. Cautiously avoiding the data ocean let us keep the (big) data in its pools and lakes as usable information while reimagining data in the current world of re-imagination. This applies to both corporate business houses as well as individuals.

Hoping Innovative reimagination in the digital world helps improve the life in the ecosystems of the real world….

Friday, January 31, 2014

Cloud Architecture Security & Reliability

Yesterday, I was doing a presentation at SSIT, Tumkur on Cloud Architecture Security & Reliability to the faculty members of SSIT and SIT Tumkur.

With the advent of Cloud Computing paradigm there are at least five categories of "Actors" emerged.
1. Cloud Consumers, 2. Cloud Providers, 3. Cloud Brokers, 4. Cloud Auditors, 5. Cloud Carriers. The NIST conceptual reference model gives a nice overview of these. ( http://www.nist.gov/itl/cloud/upload/NIST_SP-500-291_Version-2_2013_June18_FINAL.pdf )


Image description not specified.

The security of more specifically "Information Security" is a cross cutting concern across all these actors. The CSA publishes top threats regularly here. The top threats 2013 are
  1. Data Breaches
  2. Data Loss
  3. Account Hijacking
  4. Insecure APIs
  5. Denial of Service
  6. Malicious Insiders
  7. Abuse of Cloud Services
  8. Insufficient Due Diligence
  9. Shared Technology Issues

All these threats translate to protecting four major areas of Cloud Architecture...

  1. Application Access - Authentication and Authorization
  2. Separation of Concerns - Privileged user access to sensitive data
  3. Key – Management - of encryption keys
  4. Data at Rest - Secure management of copies of data
Interestingly the ENISA threat landscape also points to similar emerging threats related to Cloud Computing -

Image description not specified.

Is there any shortcut to achieve security to any of the actors in the Cloud? I do not think so. The perspective presented by Booz & Co on cloud security has a nice ICT Resilience life clycle that was discussed.

Finally, there was a good discussion on the Reliability and Redundancy. The key aspect was how do we achieve better reliability of a complex IT system consisting of multiple components across multiple layers (i.e., web, application, database) to make best utility of non failing components to share the load while isolating the failure component and decoupling it from the cluster and seamlessly re-balancing the workload to the rest of the working components.

Overall it was a good session to interact with academia!

The slide deck that was used:

Friday, January 10, 2014

Social Analytics for Online Communities

This is the first post of 2014. Happy new year to one and all...........

A recent discussion on knome (TCS' internal social platform) related to managing online communities, controlling spam, making the best out of an enterprise social platform of the scale of ~200K members made me study the application of Social Analytics to achieve these objectives.

As I research on the internet, came across this paper - http://vmwebsrv01.deri.ie/sites/default/files/publ... titled "Scalable Social Analytics for Online Communities" by Marcel Karnstedt, Digital Enterprise Research Institute (DERI), National University of Ireland, Galway Email: marcel.karnstedt@deri.org

This post is to summarize the contents of the paper and some of my thoughts around it.

Success of a social platform depends on strength of analytics understanding and driving the dynamics of the network built by the platform.

To achieve these goals we need to have a set of tools that can perform multidimensional analysis of the structure, behavioural, content/semantic and cross community analysis.

Structural Analysis: Analyse all the communities, memberships, sub-communities based on strong relations between the members, influencers/leaders and followers.

Behavioural Analysis: Analyse the interactions to identify the helpful experts (or sub-groups) who provide information and newbies who are seeking information that are benefited by the interactions. Both a micro-level or individual level and a macro-level analysis is needed.

Content / Semantic analysis: Use text mining to detect, track and quantitatively measure current interest and shift in interest in topic and sentiment within the community.

Cross community dynamics: Understand how the community structure and sub structures are influencing each other to detect redundancies and complementary to merge and link them together.

There is a need to sufficiently combine all the analysis from all four dimensions in a scalable real-time model to achieve best understanding, control and utility of socially generated data. (rather knowledge!)

New solutions for new problems! Have a nice weekend...........

Tuesday, November 19, 2013

Context analytics for better decisions - Analytics 3.0

Today's #BigData world, #analytics took additional complexity beyond pure statistics or pattern recognition using clustering, segmentation or predictive analytics using logistic regression methods.

One of the great challenge for big data's unstructured analytics is the 'context'. In traditional processing of data, we have removed the context and just recorded the content. All the we try to do with sentiment analysis is based on deriving the words, phrases & entities and try to combine them into 'concepts' and score them by matching known patterns of pre-discovered knowledge and assign the sentiment to the content.

The success rate in this method is fairly low. (This is my own personal observation!) One of the thoughts to improve the quality of this data is to add the context back to the content. To do this the technology enables is again a 'Big Data' solution. Means, we start with a big data problem and find the solution in the big data space. Interesting. Isn't it?

Take the content data at rest, analyze it. and enrich with the context information like spatial and temporal information and derive knowledge from it. Visualize the data by putting similar concepts together and by merging same concepts into a single entity.

The big blue is doing this after realizing the fact. Few months back they published a 'red paper' that can be found here.

Finally putting the discovered learning into action in real time gives all the needed business impact and takes it to the world of Analytics 3.0. (Refer to http://iianalytics.com/a3/)

Exciting world of opportunities....

Thursday, November 14, 2013

Analytical Processing of Data

A short presentation on Analytical Processing of Data; very high level overview.....

Friday, October 11, 2013

Transfer orbits, low-energy transfers and 20 years of career!

This day 20 years back (11th October 1993), a young graduate with a bag full of science books and few pairs of cloths landed here in Bangalore to pursue a career. Born in Andhra Pradesh, studied in Tamilnadu it is the third south Indian state Karnataka, I came to join Indian Space Research Organization as ‘Scientific Assistant – B’.

It was a long selection process to get to the job. I had to qualify a written test, an interview with a big panel of ISRO and IISc scientists, a police verification to join central government of India. With the planned Mangalyaan launch on 28th October, would like to give some science behind travelling beyond Earth’s orbit.

If we want to go to moon or mars we can’t aim a rocket towards the target and fire it. As the distance between earth and moon is about 4 lakh kilometers and to that of mars it is 55 million km. So, we need slightly intelligent way of going there. One way of going there is using Hohmann Transfer Orbit In simple terms, the spacecraft is placed in a highly elliptical orbit around earth and using a delta-v at right point transferred into another elliptical orbit at a suitable time to the target orbit.

There is another low-energy transfer using Lagrange points which will probably take longer transit time. The Interplanetary Transport Network (ITN) formed for deep space missions that travel purely using solar energy or very little fuel to fire the thrusters.

Mangalyaan is taking the first method of Hohmann Transfer Orbit between Earth and Mars to start it journey in this opportunity window to reach mars by November 2014. I wish all the best to my first employer in this mission which is going to prove the technology and ISRO’s ability to apply the science to take the satellite into orbit mars. There are challenges of handling the launch, orbit maneuvers, deep space communication network for both payload data and machine control (it is 20 light minutes distance between earth and mars at the maximum so the two way communication takes 40 minutes making it complex to manage the communications!) It is only unfortunate to see critics criticizing the cost of this mission which is around 450 crores Indian Rupees where as ONE Fodder scam is 950 Crores worth loss to nation; not to mention any other scams of recent past in India.


How is it related to my career? Even I ended up using the high energy transfer orbits and low energy transfers during this journey around different companies and quantum leaps to different roles on working on orbits to orbitals, providing management solutions to energy grids and computing grids, optimizing satellite operations to smart metering operations handling data movements in and out of commercial ERP systems and geospatial databases, deriving forecasts of orbits of satellites to insights from big data using analytics in these 20 years.

Apart from the science, math & technology, these 20 years took me around this little globe physically from India to Singapore to various European Countries (Belgium, Germany, France, Netherlands, Luxemburg) to UK then to USA, Japan, Korea; provided an opportunity to work with large enterprises from Australia, China, South Africa etc., to make me meet exceptional personalities from varied cultures, walks of life to interact and learn about the most colorful part of God’s creation.

A majority of the orbiting been with TCS, (which was my second employer between 1996 and 2006 for 10 years) followed by Oracle Corporation and IBM India Pvt. Ltd. to come back to TCS in 2011.

I take this opportunity to thank one and all who helped me through this journey, who have challenged me and those who have been neutral towards me; each of those gestures gave me immense experience and made the journey very colorful and interesting.

With entry and re-entry tested, I hope to go along on the orbit for few more years continuing my learning until the mission retires….

Thursday, September 12, 2013

Bayesian Belief Networks for inference and learning

I have attended a daylong seminar at IIM Bangalore on 10th September on the subject of Bayesian Belief Networks. Dr. Lionel Jouffe, gave 4 case studies during the one day technical session.

Introduction to Bayesian Belief Networks (BBNs) and building the network with multiple modes i.e., a network is built from a. mining the past data, or b. built purely from expert knowledge capture or a combination of both methods. Once the conditional probabilities for each node exist and associations between the nodes are built, both assisted and non-assisted learning can be used.

First case study involved knowledge discovery in the stock market whereby loading publicly available stock market data, a BBN is built, automatic clustering algorithm using the 'discritized' continuous variables was run to find similar tickers. ( http://www.bayesia.us/index.php/knowledgediscovery )

Second case study showed was on segmentation using BBNs. Input contained market share of 58 stores selling juices. Three groups of juices like local brands, national brands and premium brands of juices are sold in one state of US across 11 brands of above three groups. Using this data, BBN was built, automatic segmentation performed into 5 segments with a good statistical description of segmentation.

Third case study involved a marketing mix analysis to describe and predict the efficiency of multiple channel campaigns (like TV, radio, online) on product sales. ( http://www.bayesia.us/index.php/mktgmix )

The fourth case study covered a vehicle safety scenario taking publicly available accident data to discover the two key factors that can reduce fatality of injury based on parameters of vehicle, driver etc., ( http://www.bayesia.us/index.php/vehiclesafety )


Conclusion is any analytic problem can be converted into a BBN and solved. I have seen few advantages of this approach:
1. The BBN can be built in a No Data scenarios with expert knowledge completely hand crafted. It can also be built from big data scenario deriving the conditional probabilities mining the data.
2. One strong theoretical framework solving the problems making it easier to learn. No need to learn multiple theories.
As a technique, it has some promising features. The whitepapers presented are useful in understanding the technique in different scenarios. Views? Comments?