Google Files Systems

Only available on StudyMode
  • Download(s) : 101
  • Published : June 2, 2011
Open Document
Text Preview
The Google File System

The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google

Niek Linnenbank
Faculty of Science Vrije universiteit nlk800@few.vu.nl

March 17, 2010

The Google File System Outline

1 Introduction 2 Architecture 3 Measurements 4 Latest Work 5 Conclusion

The Google File System Introduction

Size of the Internet

6,767,805,208 people on earth 1,733,993,741 people on the internet 5,000,000 terabytes of data (Eric Schmidt, 2005)

The Google File System Introduction

Top 10 Search Provider in US, January 2010
RANK 1 2 3 4 5 6 7 8 9 10 PROVIDER ALL SEARCH GOOGLE SEARCH YAHOO SEARCH MSN SEARCH AOL SEARCH ASK.COM SEARCH MY WEB SEARCH SEARCH COMCAST SEARCH YELLOW PAGES SEARCH NEXTAG SEARCH BIZRATE SEARCH SEARCHES (000) 10,272,099 6,805,424 1,488,476 1,116,546 251,762 194,161 112,356 59,608 35,101 34,736 20,123 SHARE 100.0 66.3 14.5 10.9 2.5 1.9 1.1 0.6 0.3 0.3 0.2

The Google File System Introduction

The Google Way

Google does web indexing (and more) Cheap commodity hardware Patented PageRank(tm) technology

The Google File System Introduction

Google Filesystem

Scalable distributed filesystem Designed for cheap clusters Capable of storing hundreds of terabytes

The Google File System Architecture

Assumptions

Component failures are the norm Inexpensive commodity hardware Large files Files mutated with appends Workload typically large streaming reads and appends

The Google File System Architecture

Design

One master process keeps file metadata. Files are split into chunks. Multiple chunkservers to store chunks. Multiple clients may access concurrently. POSIX-a-like API (create, read, write, append, delete)

The Google File System Architecture

Design
client

chunk data

chunk locations

chunk server

master

chunk server

chunk server

chunk server

chunk server

chunk server

chunk server

chunk server

The Google File System Architecture

Chunk Replicas
client

chunk data

chunk locations

Regular 64MB Linux file

chunk server

master

chunk server instructions

chunk server

chunk server

chunk server

chunk server

chunk server

chunk server

The Google File System Architecture

Replica Leases
client

chunk data

chunk locations of current lease chunk server

secondary acknowledge chunk data primary acknowledge chunk data secondary

master

chunk server

chunk server

chunk server

chunk server

The Google File System Architecture

Chunk Versions
client

chunk data

chunk locations of current lease chunk server

secondary acknowledge

master

3
chunk data primary chunk server

3
acknowledge chunk data secondary

chunk server

3

chunk server

chunk server

The Google File System Architecture

Stale Replica
client

secondary

master

4

chunk server

Wrong  version!

primary

4

chunk server

chunk server

secondary

3

chunk server

chunk server

The Google File System Architecture

Chunk Orphans
client

secondary

master

4

chunk server

Unknown  chunk!

primary

4

chunk server

chunk server

?

secondary

3

chunk server

chunk server

The Google File System Architecture

Chunk Corruption
client

Corruption! (invalid checksum)

secondary

master

4

chunk server

primary

4

chunk server

chunk server

?
chunk server chunk server

secondary

3

The Google File System Architecture

Garbage Collection
client

secondary

master

4

chunk server

primary

4

chunk server

chunk server

?

secondary

3

chunk server

chunk server

The Google File System Architecture

Garbage Collection
client

Only one  replica left!

secondary

master

chunk server

primary

4

chunk server

chunk server

secondary

chunk server

chunk server

The Google File System Architecture...
tracking img