In 2004, Wal-Mart claimed to have the largest data warehouse with 500 terabytes storage. In 2009, eBay storage amounted to eight petabytes equivalent to 104 years of HD-TV video. Two years later, the Yahoo warehouse totaled 170 petabytes1 which is 8.5 times of all hard disk drives created in 1995.
Since the rise of digitization, enterprises from various verticals have amassed burgeoning amounts of digital data, capturing trillions of bytes of information about their customers, suppliers and operations. Data volume is also growing exponentially due to the explosion of machine-generated data -data records, web-log files, and sensor data- and from growing human engagement within the social networks – Facebook, Google, and Twitter. …show more content…
Hadoop clusters are built with inexpensive computers. If one computer or node fails, the cluster can continue to operate without losing data or interrupting work by simply re-distributing the work to the remaining machines in the cluster. HDFS manages storage on the cluster by breaking files into small blocks and storing duplicated copies of them across the pool of nodes. The figure below illustrates how a data set is typically stored across a cluster of five nodes. In this example, the entire data set will still be available even if two of the servers have …show more content…
The goal of an enterprise data hub is to provide an organization with a centralized, unified data source that can quickly provide diverse business users with the information they need to do their jobs.
Enterprise data hubs differ from traditional data management models because the data remains in place. In the traditional extract, transform and load (ETL) model, data is extracted from one system, transformed into the required format and then loaded another system for analysis or other business purposes. In an enterprise data hub model, however, data is first loaded into the Hadoop platform, and then analytics and data mining tools are applied to the data where it resides in the