So, what is it?
It is
data that exceeds the processing capacity of conventional database systems. It is
too big – terabytes or pentabytes, moves too fast, or doesn’t fit the
strictures of your database architectures. The data is increasing at exponential
pace – both structured and unstructured, and the technology is increasing
according to Moore`s law which is helping the companies to store and process such
huge amount of data.
What is it`s use?
“Big Data” is drawing a lot of attention but for
decades, companies have been making business decisions based on transactional
data stored in relational databases. So, why the hype now? Beyond that critical
data (transactional), there is a potential treasure trove of unstructured data:
weblogs, social media, email, sensors, and photographs that can be mined for
useful information, and understanding of this kind of data needs more than just
storage and computing capacity, it requires advances in understanding of data,
as this data is subjective and context depended. Eg. When a person posts
“unbelievable” on a website regarding a product, we don`t know whether a person
is satisfied or dissatisfied with the product. Big
data can unlock significant value by making information usable at much higher
frequency leading to better decision making.
Second,
as organizations create and store more data in digital form, they are able to
collect more accurate and detailed performance information on everything from
product inventories to sick days, and therefore can boost performance. Leading
companies are using data for forecasting to adjust their business according to
the demand.
Third,
big data allows more accurate segmentation of customers and hence better
delivery of services. For example, retailers usually know who buys their
products. Use of social media and web log files from their e-commerce sites can
help them understand who didn't buy and why they chose not to. This can enable
much more effective micro customer segmentation and targeted marketing
campaigns, as well as improve supply chain efficiencies.
Finally,
big data can be used to improve the development of the next generation of
products and services. For instance, insurance companies know in advance which
customers are likely to leave them in future and on what service aspect they
should work on to stop that attrition.
New
organizations like social media sites like Facebook and LinkedIn simply wouldn't exist without big data. Their business model requires a personalized
experience on the web, which can only be delivered by capturing and using all
the available data about a user or member.
Characteristics of Big Data: 4 V`s of Big
Data
Volume:
It is the most attractive factor because of which companies want to analyze
data as it leads to more accuracy for predictions. High volume of data requires
scalable storage and distributed querying approach, along with high processing
machines and statistical sampling techniques.
Variety:
It is the most significant factor and ability to process the variety of
information will determine the future of big data and analytics. Majority of
data available is in unstructured form – from social network, text information,
image information, audio and video. According to analysts 80% of the data is
not numeric which is difficult to process directly into application but is
extremely vital for decision making.
Velocity:
It is the rate at which data is being
produced and processed to meet demand. The torrent of data being produced is
through clicks on internet via computers, tablets and smartphones.
Variability:
The flow of data is highly inconsistent and can be highly periodic. Example
sales of crackers are more during Diwali and New Year as compared to the rest
of the year. This periodicity of data is difficult to manage especially with
social media involved.
Big Data Solutions
The
kind of complexity involved due to unstructured data requires developing
sophisticated algorithms and logic. It requires a combination of business, IT,
math and behavioral sciences to define and systematically capture the insights
from the data.
The
technologies used to manage and help in extracting the information from data
are SAS, SQL, Hadoop, parallel processing, clustering, large grid environments
and cloud computing.
Challenges & Future Outlook
As
the amount of data being generated is increasing at exponential pace, it is
getting ever more difficult to generate insightful information from it.
Questions that seek answers from data scientists are which data to store? What
to analyze? which data points are relevant? And how to make best use of it?
Another
major challenge looming over the Big Data is the shortage of talent. By 2018,
in US only there could be a shortage of 140,000 to 190,000 people having
analytical skills and 1.5 m managers and analysts to make effective decisions
from the analysis of data.
Other
issues being faced by data collecting organizations are - capturing the
complete and accurate data and policies regarding privacy and security.
If
we can find a way to deal with all these challenges then the future of Big Data
and analytics will be phenomenal especially in industries like computer and
electronic products, information sectors, finance and insurance.
References:
-Abhishek Arora
MDI 2012-2014