# Markov Chains & Google's Page Rank

Topics: World Wide Web, Markov chain, PageRank Pages: 5 (1772 words) Published: June 27, 2009
﻿Introduction
Before the inception of Google in the late 1990s, the results obtained from the typical search engine left one to sift through large amounts of irrelevant web pages the just happened to match the search text. Google search listings always seemed to deliver more pertinent results up front. The genius behind the world’s most dominant search engine is its PageRank algorithm, which quantitatively values the relative importance of each webpage. This allows Google to rank the pages, and consequently present the most relevant and useful ones first. Page and Brin describe PageRank a model of user behavior. They begin with the assumption that a random web surfer who is given a random Web page begins clicking on links, never hitting the “back” button. Whenever he gets bored, he jumps to another random page. The probability that this random surfer visits a specific page is its PageRank. At each page, the algorithm also analyses the probability that the random surfer will get bored and jump to another random page. The unsystematic process that illustrates the surfer’s behavior is known as a Markov Chain. The intrinsic characteristics of Markov’s theorem imply that regardless of the starting point, the probability that our random web surfer lands on a specific page is the same. Calculating PageRank

The searchable web currently has an immense number of nodes (pages) and edges (links). Pages can have both forward links and backlinks. Google takes advantage of this link structure to produce a global ranking of each webpage’s importance. It can be generally implied that a site with a high number of back links is quite important. However, PageRank implements a more sophisticated method for link counting and weighing. In order to elaborate, we will consider the following simplified web with only four pages:

If we apply this formula to the four paged web depicted in the picture above we can calculate the rank of page 1 thusly: x1 = x3/1 + x4/2, because pages 3 and 4 have back links to page 1, page 3 has only one link, and page 4 contains 2, which splits its vote in half. Following this scheme, x2 = x1/3, x3 = x1/3 + x2/2 + x4/2, and x4 = x1/3 + x2/2. These linear...