Preview

Hadoop Distributed File System Case Study

Better Essays
Open Document
Open Document
1572 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Hadoop Distributed File System Case Study
Introduction
In 2004, Wal-Mart claimed to have the largest data warehouse with 500 terabytes storage. In 2009, eBay storage amounted to eight petabytes equivalent to 104 years of HD-TV video. Two years later, the Yahoo warehouse totaled 170 petabytes1 which is 8.5 times of all hard disk drives created in 1995.
Since the rise of digitization, enterprises from various verticals have amassed burgeoning amounts of digital data, capturing trillions of bytes of information about their customers, suppliers and operations. Data volume is also growing exponentially due to the explosion of machine-generated data -data records, web-log files, and sensor data- and from growing human engagement within the social networks – Facebook, Google, and Twitter.
…show more content…
Hadoop clusters are built with inexpensive computers. If one computer or node fails, the cluster can continue to operate without losing data or interrupting work by simply re-distributing the work to the remaining machines in the cluster. HDFS manages storage on the cluster by breaking files into small blocks and storing duplicated copies of them across the pool of nodes. The figure below illustrates how a data set is typically stored across a cluster of five nodes. In this example, the entire data set will still be available even if two of the servers have …show more content…
The goal of an enterprise data hub is to provide an organization with a centralized, unified data source that can quickly provide diverse business users with the information they need to do their jobs.
Enterprise data hubs differ from traditional data management models because the data remains in place. In the traditional extract, transform and load (ETL) model, data is extracted from one system, transformed into the required format and then loaded another system for analysis or other business purposes. In an enterprise data hub model, however, data is first loaded into the Hadoop platform, and then analytics and data mining tools are applied to the data where it resides in the

You May Also Find These Documents Helpful

  • Best Essays

    Nt1310 Unit 4 Exercise 1

    • 1486 Words
    • 6 Pages

    As it is evident from the related work discussed in the section 2, when small files are stored on HDFS, disk utilization is not a bottleneck. In general, small file problem occurs when memory of NameNode is highly consumed by the metadata and BlockMap of huge numbers of files. NameNode stores file system metadata in main memory and the metadata of one file takes about 250 bytes of memory. For each block by default three replicas are created and its metadata takes about 368 bytes [9]. Let the number of memory bytes that NameNode consumed by itself be denoted as α. Let the number of memory bytes that are consumed by the BlockMap be denoted as β. The size of an HDFS block is denoted as S. Further assume that there are N…

    • 1486 Words
    • 6 Pages
    Best Essays
  • Powerful Essays

    Ibm 211 Week 3

    • 4383 Words
    • 18 Pages

    IBM Telecommunications Data Warehouse V8.4 and IBM Health Plan Data Model V8.4 help accelerate development of cost-efficient industry data warehouse solutions…

    • 4383 Words
    • 18 Pages
    Powerful Essays
  • Good Essays

    1.Hadoop distributed file system: HDFS is where we store the data. It is a distributed file system that provides built-in redundancy and fault tolerance for all the Hadoop processing…

    • 496 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    [4] Storage Conference. The Hadoop Distributed File System http://storageconference.org/ 2010/ Papers/ MSST/Shvachko.pdf [5] A Tutorial on Clustering Algorithms. K-Means Clustering http://home.dei.polimi.it/matteucc/ Clustering/ tutorial_html/kmeans.html [6] International Journal of Computer Science Issues. Setting up of an Open Source based Private Cloud http://ijcsi.org/papers/IJCSI-8-3-1-354-359.pdf [7] Eucalyptus. Modifying a prepackaged image http://open.eucalyptus.com/participate/wiki/modifyi ng-prepackaged-image [8] Michael G. Noll. Running Hadoop On Ubuntu Linux (Single-Node Cluster) http://www.michaelnoll.com/tutorials/running-hadoop-on-ubuntu-linuxsingle-node-cluster/ [9] 8K Miles Cloud Solutions. Hadoop: CDH3 – Cluster (Fully-Distributed) Setup http://cloudblog.8kmiles.com/2011/12/08/hadoopcdh3-cluster-fully-distributed-setup/ [10] Apache Mahout. Creating Vectors from Text https://cwiki.apache.org/MAHOUT/creatingvectors-from-text.html…

    • 3006 Words
    • 13 Pages
    Powerful Essays
  • Powerful Essays

    Rlht2 Task 3

    • 1508 Words
    • 7 Pages

    Datanal, Inc., was established by five IT entrepreneur colleagues in 2002. It enjoys a reputation for outstanding performance and presently employs some 350 IT specialists, most with proven skill in analyzing, organizing, and managing large, diversified streams of data and databases in logical, systematic form, transparently and effectively bridging present artificial separations. By enabling customers to assimilate a consistently large influx of new data while simultaneously drawing from previously unrealized complementary database…

    • 1508 Words
    • 7 Pages
    Powerful Essays
  • Powerful Essays

    ch02 c

    • 2468 Words
    • 11 Pages

    5. Despite the growth of social data, images and web documents, modern IT professionals must continue to place primary emphasis on management of structured, high quality data.…

    • 2468 Words
    • 11 Pages
    Powerful Essays
  • Better Essays

    Created in many different forms and formats, data is collected, processed, stored, and retrieved by business to support the many informational needs of organizations.�� INCLUDEPICTURE "https://api.turnitin.com/images/spacer.gif" * MERGEFORMATINET �� HYPERLINK "javascript:void(0);" Business data enters an organization 's information system through software applications. The software applications process and code the data with proprietary formats that are difficult to extract or report without the help of sophisticated report writer or data extraction tools.�� INCLUDEPICTURE "https://api.turnitin.com/images/spacer.gif" * MERGEFORMATINET �� HYPERLINK "javascript:void(0);" Data is the heart of any business. Without good data turned into information, management can not make the proper decisions.�� INCLUDEPICTURE "https://api.turnitin.com/images/spacer.gif" * MERGEFORMATINET �� HYPERLINK "javascript:void(0);" The advances in computer processing power, storage capabilities, and the development of more ways to add information to data have paved the way for a radically new approach to collecting, storing, retrieving, and reporting business information: to build an entire information…

    • 1645 Words
    • 7 Pages
    Better Essays
  • Good Essays

    Week 6 Discussion 2

    • 582 Words
    • 3 Pages

    Humans currently generate the same amount of data every 48 hours as we did in the time period from the beginning of history until 2003 (Rieland, 2012). Every action we take is recorded and data sent to its respective depository (Rieland, 2012). It is this constant data mining that has led to the development of entire industries centered around the collection, sale, and analysis of data. Google, for example, developed BigQuery, which scans huge amounts of data in mere seconds (Reiland, 2012). Other companies collect and exchange data as a commodity.…

    • 582 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    with Hadoop and analyze data stored in Hadoop. For a list of BI tools that support Hadoop please see this…

    • 3076 Words
    • 13 Pages
    Powerful Essays
  • Powerful Essays

    The problem, headache and challenges arise out of the multiple ways that businesses have to collect and store information. Customers are more willing to provide personal information which has become overwhelming. The article shows that the huge quantities of data have acquired the somewhat understated name “big data”. Many giant companies have been caught off guard by the boom in big data (“Big Data needn’t…,” 2012). In order to manage the mind boggling amount of data, businesses have found themselves with options for storage. Some companies have increased the size of their data centre in-house facilities, while others have turned to cloud service use. There are some companies that have decided to outsource their data, but are concerned with the risks that it…

    • 1730 Words
    • 7 Pages
    Powerful Essays
  • Good Essays

    It is well documented the value of the web for finding information on businesses, governments, and economics—just about any type of information that 's useful for our research. Many Big Data projects focus on this type of information, attempting to gain unique insights and actionable strategies from big picture perspectives that escape the notice of individual searchers who are limited in the amounts of information they can process. Many firms are mining sites such as Facebook, Pinterest, LinkedIn, and others to glean insights into the needs and want of the users who are generating content on their sites (Epstein, 2010). And they 're also observing the behavior of users as they interact with this content and with other users to leverage this knowledge to better-target marketing and sales campaigns.…

    • 592 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    1. Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons. REI is building a data warehouse because they want to better serve their customers with their products. The data ware house allows REI to make the customers experience with their company a much more fulfilling one ensuring their return.…

    • 310 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Sqoop

    • 24694 Words
    • 99 Pages

    New methods of collecting, managing, and analyzing data Cloud computing that offers inexpensive storage and flexible, on-demand computing power for massive data sets Visualization techniques that turn complex data into images that tell a compelling story Tools that make the power of data available to anyone…

    • 24694 Words
    • 99 Pages
    Powerful Essays
  • Good Essays

    audi

    • 2549 Words
    • 10 Pages

    Data warehouse helps the executives to organize, understand and use their data to take strategic decision.…

    • 2549 Words
    • 10 Pages
    Good Essays
  • Good Essays

    Analytics: Enterprises across the globe relies heavily on Analytical tools like Big Data, which plays a crucial role in analyzing large chunks of data obtained (structured & unstructured) via social, mobile and cloud to get valuable customer insights and to drive business forward.…

    • 822 Words
    • 4 Pages
    Good Essays