Mark Kerzner <mark@hadoopilluminated.com>
Sujee Maniyam <sujee@hadoopilluminated.com>
Hadoop Illuminated by Mark Kerzner and Sujee Maniyam
Dedication
To the open source community
This book on GitHub [https://github.com/hadoop-illuminated/hadoop-book]
Companion project on GitHub [https://github.com/hadoop-illuminated/HI-labs]
i
Acknowledgements
From Mark
I would like to express gratitude to my editors, co-authors, colleagues, and bosses who shared the thorny path to working clusters - with the hope to make it less thorny for those who follow. Seriously, folks,
Hadoop is hard, and Big Data is tough, and there are many related products and skills that you need to master. Therefore, have fun, provide your feedback [http://groups.google.com/group/hadoop-illuminated], …show more content…
She is seventeen and studies at Beren Academy in Houston, TX. She has attended the Glassell school of art for many summers.
Her collection of artwork can be seen here [https://picasaweb.google.com/110574221375288281850/RebeccaSArt].
Rebecca started working on Hadoop illustrations two years ago, when she was fifteen.
It is interesting to follow her artistic progress. For example, the first version of the
"Hadoop Zoo" can be found here [https://plus.google.com/photos/110574221375288281850/albums/5850230050814801649/5852442815905345938?banner=pwa]. Note the programmer's kids who
"came to see the Hadoop Zoo" are all pretty young. But as the artist grew, so also did her heroes, and in the current version of the same sketch [https://plus.google.com/photos/110574221375288281850/albums/5850230050814801649/5852442453552955282?banner=pwa] you can see the same characters, …show more content…
Meaning we can crunch a large volume of data in parallel.
The compute framework of Hadoop is called Map Reduce. Map Reduce has been proven to the scale of peta bytes.
Chapter 9, Introduction To MapReduce [31]
3.5. Hadoop provides rich analytics
Native Map Reduce supports Java as primary programming language. Other languages like Ruby, Python and R can be used as well.
Of course writing custom Map Reduce code is not the only way to analyze data in Hadoop. Higher level
Map Reduce is available. For example a tool named Pig takes english like data flow language and translates them into Map Reduce. Another tool Hive, takes SQL queries and runs them using Map Reduce.
Business Intelligence (BI) tools can provide even higher level of analysis. Quite a few BI tools can work with Hadoop and analyze data stored in Hadoop. For a list of BI tools that support Hadoop please see this chapter : Chapter 13, Business Intelligence Tools For Hadoop and Big Data [52]
6
Chapter 4. Big Data
4.1. What is Big Data?
You probably heard the term Big Data -- it is one of the most hyped terms now. But what exactly is big data?
Big Data is very large, loosely structured data set that defies traditional