# Big Data

Topics: Data analysis, Google, Data management Pages: 18 (6190 words) Published: April 14, 2013
CS4103 Distributed Systems Coursework Part 1: Big Data
Student ID: 080010830 March 16, 2012
Word Count: 3887
Abstract Big data is one of the most vibrant topics among multiple industries, thus in this paper we have covered examples as well as current research that is being conducted in the ﬁeld. This was done based on real applications that have to deal with big data on a daily basis together with a clear focus on their achievements and challenges. The results are very convincing that big data is a critical subject that will continue to receive further study.

1

Introduction

Big data – in information technology – refers to the extremely large volume of data that needs to be captured [1], stored [2], searched [3, 1], shared [4, 1], analysed [2] and visualised [5, 1]. The exponential growth of these datasets can result in exabytes1 or even zettabytes2 of information. For example, telecommunications networks have seen their capacity to exchange information grow from 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007 and predictions say that it will reach 667 exabytes annually by 2013 [6]. Furthermore, to put these numbers into perspective, 5 exabytes of information is equal to “all words ever spoken by human beings” [7, 8, 9] and if we add all the combined capacity of all the computer hard drives that were available in the world in 2006 the total amount of free space would be approximately 160 exabytes [10]. However, this storage capacity is increasing at an astonishing rate and a proof of that is Seagate’s report that during the 2011 ﬁscal year alone, they have sold hard drives of a combined capacity of 330 exabytes [11]. These impressive statistics and the fact that more people than ever before interact directly with data [6] makes the analysis of big data very relevant, if not crucial. 1 2

1 EB = 1018 bytes = 1 000 000 000 gigabytes = 1 000 000 terabytes 1 ZB = 1021 bytes = 1 000 000 000 000 gigabytes = 1 000 000 000 terabytes

1

2

Examples

Data went from scarce to abundant in the last few decades, bringing on one hand extensive beneﬁts but on the other hand a number of diﬃculties. Furthermore, data is continuously being gathered at an ever increasing rate due to the ubiquity of “information-sensing mobile devices, aerial sensory technologies also known as remote sensing, software logs, cameras, microphones, radio-frequency identiﬁcation readers, and wireless sensor networks” [12]. We will expand on the beneﬁts and disadvantages of big data in a few key scientiﬁc and industrial applications that are currently facing them.

2.1

Scientiﬁc Applications

The main scientiﬁc applications where scientists work in a regular basis with tremendous amounts of information include meteorology, genomics, connectomics, complex physics simulations, biological, and environmental research [12]. It is important to analyse each of these sciences in detail because in each and every case there are unique beneﬁts and diﬃculties being faced. 2.1.1 Meteorology

Meteorology is “the science dealing with the atmosphere and its phenomena, including weather and climate” [13]. Although, it might give the impression that it is trivial, meteorology can have life-shattering consequences, particularly in the case of hurricanes and tornados. Thus, it is vital that data is examined and understood thoroughly. After Hibbard [14] concluded that all the most important weather modeling centers had charts and printed maps on the walls, were trying to build 3D Plexiglas, and were generally not indiﬀerent to the great amount of data they had to deal with, it became obvious that they needed a tool to integrate all these diﬀerent kinds of data into an uniﬁed 3D picture. Furthermore, this new tool would allow scientists to look at their large data sets in a much more accessible and interactive way, making it less diﬃcult to comprehend. The program used to combine all this data into an uniﬁed...