Scalability in Distributed Systems (10%)
09488987 David Kearns
-Problems YouTube was facing
A big problem YouTube were facing during its rise in popularity was that the number of developers went up by two but the number of videos being watched went up as Mike tells us in the YouTube vid ‘9 orders of magnitude’. In the video Mike also describes a fatal mistake he had made when coding the interface between the servlet and template layer he used a dictionary which is a bad idea. Like a lot of programming errors this was a lot harder to fix later on in the project as opposed to the early life of the program. Other problems Mike describes is splitting the entire project into components, you want to be able to swap components in and out. There’s not going to be one individual or programmer who will know the whole bulk of the code inside out, having so many people working on a particular project can become difficult to organize. This is all about good software design but it is important that components work with each other. The majority of YouTube was initially coded with python and as Mike points out when we watch a video it still executes primarily through this. ‘It’s about what not to do in python ’ he states later on in the video suggesting throughout the years there has been a lot of trial and error in the solving of some problems. Another problem YouTube may have faced was its million lines of python code and maintaining this code in its different components.
Database problems, early years:
* MySQL used to save ‘information data’/Meta-data.
* YouTube went to a common evolution in servers i.e. Single server evolved into master with multiple read-slaves. ‘sharding approach’. * Updates causing slow replication/cache miss.
* Solution to this was to split data traffic into two clusters.
Database: Later years:
- YouTube evolved by switching to database partitioning; which is basically building smaller databases. - Split into shards with users assigned to different shards. - Write function and read function were changed to different servers etc. - Caching more locally.
Due to its massive growth, it was important to look at cost factors: Hardware, bandwidth & Power consumption. Speed for video serving became better due to more machines serving each video. Less viewed videos i.e. under 30 views a day use Youtube servers in different less costly sites. ‘Caching doesn't do a lot of good in this scenario, so spending money on more cache may not make sense. This is a very interesting point. If you have a long tail product ’ Interesting facts:
There is 6 years length of video uploaded everyday.
‘YouTube Reaches One Billion Views Per Day. That’s at least 11,574 views per second, 694,444 views per minute, and 41,666,667 views per hour. ’
-Scalability Techniques used:
Mike points out an important view on things at the beginning of this section; ‘Tao of YouTube’ : Find the easiest solution which is most practical without making things too complicated, practicality also comes into play. Flexibility is needed to solve problems, in future if more complicated methods are used then it becomes more difficult to solve these down the line. ‘’Your problem becomes automatically more complex when you try and make all those guarantees’’. You leave yourself no way out, basically he is saying we should stick to our original plan of doing things and not over complicate the solving of problems, throughout the cycle of our program we will chop and change different things and of course things will not finish the way we expected but that’s the beauty of it, we just need to keep in the mind frame of our original plan and step by step work our way through it. Throughout this section Mike explains a few different ideas to make sure systems stay scalable. * Divide & Conquer:
Which is the principle of partitioning/ divide sections and decide how to execute this? From YouTube’s point...
Please join StudyMode to read the full document