Topics: Business intelligence, Data mining, Hadoop Pages: 79 (21186 words) Published: September 29, 2013

Big Data



In 2004, Wal-Mart claimed to have the largest data warehouse with 500 terabytes storage (equivalent to 50 printed collections of the US Library of Congress). In 2009, eBay storage amounted to eight petabytes (think of 104 years of HD-TV video). Two years later, the Yahoo warehouse totalled 170 petabytes1 (8.5 times of all hard disk drives created in 1995)2. Since the rise of digitisation, enterprises from various verticals have amassed burgeoning amounts of digital data, capturing trillions of bytes of information about their customers, suppliers and operations. Data volume is also growing exponentially due to the explosion of machine-generated data (data records, web-log files, sensor data) and from growing human engagement within the social networks. The growth of data will never stop. According to the 2011 IDC Digital Universe Study, 130 exabytes of data were created and stored in 2005. The amount grew to 1,227 exabytes in 2010 and is projected to grow at 45.2% to 7,910 exabytes in 2015.3 The growth of data constitutes the “Big Data” phenomenon – a technological phenomenon brought about by the rapid rate of data growth and parallel advancements in technology that have given rise to an ecosystem of software and hardware products that are enabling users to analyse this data to produce new and more granular levels of insight.

Figure 1: A decade of Digital Universe Growth: Storage in Exabytes




Error! Reference source not found.3

Ovum. What is Big Data: The End Game. [Online] Available from: [Accessed 9th July 2012]. IBM. Data growth and standards. [Online] Available from: [Accessed 9th July 2012]. IDC. The 2011 Digital Universe Study: Extracting Value from Chaos. [Online] Available from: [Accessed 9th July 2012].


4.1.1 What is Big Data?
According to McKinsey,4 Big Data refers to datasets whose size are beyond the ability of typical database software tools to capture, store, manage and analyse. There is no explicit definition of how big a dataset should be in order to be considered Big Data. New technology has to be in place to manage this Big Data phenomenon. IDC defines Big Data technologies as a new generation of technologies and architectures designed to extract value economically from very large volumes of a wide variety of data by enabling high velocity capture, discovery and analysis. According to O’Reilly, “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or does not fit the structures of existing database architectures. To gain value from these data, there must be an alternative way to process it.”5

4.1.2 Characteristics of Big Data
Big Data is not just about the size of data but also includes data variety and data velocity. Together, these three attributes form the three Vs of Big Data.

Figure 2: The 3 Vs of Big Data

Volume is synonymous with the “big” in the term, “Big Data”. Volume is a relative term – some smaller-sized organisations are likely to have mere gigabytes or terabytes of data storage as opposed to the petabytes or exabytes of data that big global enterprises have. Data volume will continue to grow, regardless of the organisation’s size. There is a natural tendency for companies to store data of all sorts: financial data, medical data, environmental data and so on. Many of these companies’ datasets are within the terabytes range today but, soon they could reach petabytes or even exabytes.

Data can come from a variety of sources (typically both internal and external to an organisation) and in a variety of types. With the explosion of sensors, smart devices as well as social networking, data



James Manyika, et al. Big...
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • Bigdata Essay
  • Bigdata Essay

Become a StudyMode Member

Sign Up - It's Free