Monday, December 3, 2012


So, what is it?
It is data that exceeds the processing capacity of conventional database systems. It is too big – terabytes or pentabytes, moves too fast, or doesn’t fit the strictures of your database architectures. The data is increasing at exponential pace – both structured and unstructured, and the technology is increasing according to Moore`s law which is helping the companies to store and process such huge amount of data.

What is it`s use?
 “Big Data” is drawing a lot of attention but for decades, companies have been making business decisions based on transactional data stored in relational databases. So, why the hype now? Beyond that critical data (transactional), there is a potential treasure trove of unstructured data: weblogs, social media, email, sensors, and photographs that can be mined for useful information, and understanding of this kind of data needs more than just storage and computing capacity, it requires advances in understanding of data, as this data is subjective and context depended. Eg. When a person posts “unbelievable” on a website regarding a product, we don`t know whether a person is satisfied or dissatisfied with the product. Big data can unlock significant value by making information usable at much higher frequency leading to better decision making.

Second, as organizations create and store more data in digital form, they are able to collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore can boost performance. Leading companies are using data for forecasting to adjust their business according to the demand.              
Third, big data allows more accurate segmentation of customers and hence better delivery of services. For example, retailers usually know who buys their products. Use of social media and web log files from their e-commerce sites can help them understand who didn't buy and why they chose not to. This can enable much more effective micro customer segmentation and targeted marketing campaigns, as well as improve supply chain efficiencies.

Finally, big data can be used to improve the development of the next generation of products and services. For instance, insurance companies know in advance which customers are likely to leave them in future and on what service aspect they should work on to stop that attrition.

New organizations like social media sites like Facebook and LinkedIn simply wouldn't exist without big data. Their business model requires a personalized experience on the web, which can only be delivered by capturing and using all the available data about a user or member. 

Characteristics of Big Data: 4 V`s of Big Data

Volume: It is the most attractive factor because of which companies want to analyze data as it leads to more accuracy for predictions. High volume of data requires scalable storage and distributed querying approach, along with high processing machines and statistical sampling techniques.

Variety: It is the most significant factor and ability to process the variety of information will determine the future of big data and analytics. Majority of data available is in unstructured form – from social network, text information, image information, audio and video. According to analysts 80% of the data is not numeric which is difficult to process directly into application but is extremely vital for decision making.

Velocity:  It is the rate at which data is being produced and processed to meet demand. The torrent of data being produced is through clicks on internet via computers, tablets and smartphones.

Variability: The flow of data is highly inconsistent and can be highly periodic. Example sales of crackers are more during Diwali and New Year as compared to the rest of the year. This periodicity of data is difficult to manage especially with social media involved.

Big Data Solutions

The kind of complexity involved due to unstructured data requires developing sophisticated algorithms and logic. It requires a combination of business, IT, math and behavioral sciences to define and systematically capture the insights from the data.

The technologies used to manage and help in extracting the information from data are SAS, SQL, Hadoop, parallel processing, clustering, large grid environments and cloud computing.  

Challenges & Future Outlook

As the amount of data being generated is increasing at exponential pace, it is getting ever more difficult to generate insightful information from it. Questions that seek answers from data scientists are which data to store? What to analyze? which data points are relevant? And how to make best use of it?

Another major challenge looming over the Big Data is the shortage of talent. By 2018, in US only there could be a shortage of 140,000 to 190,000 people having analytical skills and 1.5 m managers and analysts to make effective decisions from the analysis of data.

Other issues being faced by data collecting organizations are - capturing the complete and accurate data and policies regarding privacy and security.

If we can find a way to deal with all these challenges then the future of Big Data and analytics will be phenomenal especially in industries like computer and electronic products, information sectors, finance and insurance. 


-Abhishek Arora
MDI 2012-2014

1 comment:

Navroz Dhillon said...
This comment has been removed by the author.