Hadoop Distributed File System Case Study

Introduction
In 2004, Wal-Mart claimed to have the largest data warehouse with 500 terabytes storage. In 2009, eBay storage amounted to eight petabytes equivalent to 104 years of HD-TV video. Two years later, the Yahoo warehouse totaled 170 petabytes1 which is 8.5 times of all hard disk drives created in 1995.
Since the rise of digitization, enterprises from various verticals have amassed burgeoning amounts of digital data, capturing trillions of bytes of information about their customers, suppliers and operations. Data volume is also growing exponentially due to the explosion of machine-generated data -data records, web-log files, and sensor data- and from growing human engagement within the social networks – Facebook, Google, and Twitter. …show more content…
Hadoop clusters are built with inexpensive computers. If one computer or node fails, the cluster can continue to operate without losing data or interrupting work by simply re-distributing the work to the remaining machines in the cluster. HDFS manages storage on the cluster by breaking files into small blocks and storing duplicated copies of them across the pool of nodes. The figure below illustrates how a data set is typically stored across a cluster of five nodes. In this example, the entire data set will still be available even if two of the servers have …show more content…
The goal of an enterprise data hub is to provide an organization with a centralized, unified data source that can quickly provide diverse business users with the information they need to do their jobs.
Enterprise data hubs differ from traditional data management models because the data remains in place. In the traditional extract, transform and load (ETL) model, data is extracted from one system, transformed into the required format and then loaded another system for analysis or other business purposes. In an enterprise data hub model, however, data is first loaded into the Hadoop platform, and then analytics and data mining tools are applied to the data where it resides in the

Hadoop Distributed File System Case Study

You May Also Find These Documents Helpful

Nt1310 Unit 4 Exercise 1

Nt1310 Unit 4 Exercise 1

Ibm 211 Week 3

Ibm 211 Week 3

Nt1330 Unit 3 Problem Analysis Paper

Nt1330 Unit 3 Problem Analysis Paper

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Rlht2 Task 3

Rlht2 Task 3

ch02 c

ch02 c

Riordan Manufacturing Case Study

Riordan Manufacturing Case Study

Week 6 Discussion 2

Week 6 Discussion 2

Hadoop Discrimination Research Paper

Hadoop Discrimination Research Paper

Annotated Bibliography on four peered reviewed journals

Annotated Bibliography on four peered reviewed journals

Data Mining and Actionable Information

Data Mining and Actionable Information

Data Warehouse: Understanding Rei

Data Warehouse: Understanding Rei

Sqoop

Sqoop

audi

audi

SMAC Your Business Case Study

SMAC Your Business Case Study

Related Topics