Bigdata

Only available on StudyMode
  • Topic: Data management, Master Data Management, Hadoop
  • Pages : 26 (3484 words )
  • Download(s) : 32
  • Published : January 11, 2013
Open Document
Text Preview
Addressing the Challenge of Big Data & MDM  in the Large Enterprise

Presented by: 

Manish Sood, Founder & CEO, Reltio, Inc.
manish@reltio.com October, 2012

Image: "Data Deluge," Brett Ryder, The Economist, Feb. 2010

Agenda 1. What is Big Data? 2. What is NoSQL vs. Relational DBs? 3. What is Hadoop (HDFS and MapReduce)? 4. MDM and Big Data – a Case Study

Confidential and Proprietary – please do not distribute without prior permission  

2

Trend – Growing data sets
DATA VOLUME
Zettabyte

1.4 Zettabytes in Enterprise Data

2011

Machine To Machine

Exabyte

Petabyte

Interactions
Terabyte

Transactions
Mainframe PC Internet Mobile Machine

Time

Zettabyte = 1,000,000,000,000,000,000,000 Bytes Graph based on IDC and UC Berkeley Data Growth Estimates, Source: IDC & CosmoBC.com: http://techblog.cosmobc.com/2011/08/26/data‐storage‐ infographic/

Confidential and Proprietary – please do not distribute without prior permission  

3

Trend – Information Connectivity

Information Connectivity

Internet of  Things

Semantic Web Tagging Social Networks Text Files RDBMS Hypertext Blogs RDF Folksonomies User generated  content

Web 1.0

Web 2.0

Web 3.0

1990

2000

2010

2020

Confidential and Proprietary – please do not distribute without prior permission  

4

Trend – Data Complexity
Text files and  Lists Majority of  Webpages

Relational Databases

Performance

Social Networks

Internet of  Things

Custom work

Data Complexity
Confidential and Proprietary – please do not distribute without prior permission   5

Characteristics of Big Data Velocity
Volume Variety Value

$
10’s of Billions of Daily Records From Terabytes  to  Petabytes Multi‐ Structured Business Insights

Big data is where the data volume, acquisition velocity, or  data representation limits the ability to perform effective  analysis using traditional relational approaches or  requires the use of significant horizontal scaling for  efficient processing

Big Data

Big Data  Science

Big Data  Framework

Big Data  Infrastructure
Confidential and Proprietary – please do not distribute without prior permission   6

Agenda 1. What is Big Data? 2. What is NoSQL vs. Relational DBs? 3. What is Hadoop (HDFS and MapReduce)? 4. MDM and Big Data – a Case Study

Confidential and Proprietary – please do not distribute without prior permission  

7

From SQL to NoSQL

Confidential and Proprietary – please do not distribute without prior permission  

8

NoSQL databases  The misleading term “NoSQL” is short for “Not Only SQL”  Common features:     non-relational schema-free - usually do not require a fixed table schema horizontal scalable, distributed, easily replication support mostly open source

 More characteristics

 Do not fully support relational features

 relax one or more of the ACID properties (see CAP theorem)  replication support  simple API (if SQL, then only its very restricted variant)  no join operations (except within partitions),  no referential integrity constraints across partitions.

Confidential and Proprietary – please do not distribute without prior permission  

9

CAP Theorem with ACID and BASE Visualized

ACID with eventual availability
Atomicity: transaction  treated an all or nothing  operation Consistency: database values  correct before and after Isolation: events within  transaction hidden from  others Durability: results will survive  subsequent malfunction

Partition  Tolerance

BASE with  eventual consistency
Basically available:  Allowance for parts of a  system to fail Soft state: An object may  have multiple simultaneous  values Eventually consistent:  Consistency achieved over  time

Consistency

Availability

Small data sets can be both consistent and available
Confidential and Proprietary – please do not distribute without prior permission   10

Confidential and Proprietary –...
tracking img