# PageRank Algorithm

December 9, 2012

Abstract

This paper dicsusses the PageRank algorithm. We carefully go through each step of the algorithm and explain each procedure. We also explain the mathematical setup of the algorithm, including all computations that are used in the PageRank algorithm. Some of the topics that we touch on include the following, but not limited to, are: linear algebra, node analysis, matrix theory, and numerical methods. But primarily this paper concerns itself with the use of the linear algebra involved in the computation of the Google matrix, which results in the Pagerank, which descibribes how important a page is. Importance is placed on the intuition of all related mathematical topics involved in the algorithm and clarity of understanding

1

Introduction

It was around the late ‘90’s when two young computer science doctoral students were developing a ranking system to be used in a search engine. They developed an algorithm named PageRank. This algorithm is behind the search engine Google, which we all know as a verb these days. This made Sergey Brin and Larry Page instant billionaires and Google became the primer search engine and continues to be to this day. Brin and Page took advantage of a special characteristic that the World Wide Web has to get a ranking for webpages. That characteristic is the hyperlink structure of the internet. A hyperlink is a location on a webpage. Let us say for instance that you are reading a webpage. While reading this webpage you click on a word or object that takes you to another webpage to gain more information about what you are reading about. This word or object is called a hyperlink and the Internet is ﬁlled with these. In this paper, we set up a fantasy 5 page world wide web to set up the algorithm from the ground up. Once that is done, the next thing to do is to compute our ranking of our webpages using the Power method. Of course not all things go as planned. I will talk about how in the real world, the hyperlink structure of the internet does not allow the Power method to converge to the PageRank vector. In order to get convergence, I will explain how Brin and Page make adjustments to the hyperlink matrix to guarantee convergence on this new Google Matrix. Finally I will concluded with several examples of Matlab code to brieﬂy demonstrate the Power Method in action.

2

Hyperlink Matrix

Figure 1 shows our directed graph, a simple fantasy World Wide Web. The arrows represent a hyperlink from 1 page to another page. For example, page A goes to C, E, and D etc. We now set up a table that describes our directed graph. Table 1 shows how we will get our hyperlink matrix. The columns represent the starting page and the rows represtent the ending page. A 1 will be place in this table if a page has a hyperlink to another page. For example, Page A has a hyperlink to Page C and so there is a 1 in the C row and A column.

Now we will “extract” our hyperlink matrix from this. We will denote this matrix by L. Now from Figure 2 we need to do one additional thing. We need to adjust this matrix so that when we add each element from each row, the sum equals 1 (the motivation behind this will be discussed later). We can do this by getting the sum of each row and dividing each element in that row by that sum. Now our revised hyperlink matrix looks like Figure 3 and we will denote it by H. 1

The above process was a technical representation of how to get this H matrix. Now we will present an intuitive way to get an understanding of what will need to be done in calculating the PageRank of each page. While the previous way is useful (it will be useful in developing the code for it), it does not let us see what must be done to calculate a PageRank. The idea Brin and Page had was [3, 13] that each page would transfer a porportion of its PageRank to the page it had a hyperlink to it. So in the above example, Since page A has a hyperlink to pages C, D, and E, page...

References: Princeton University Press, 2006.

[3] Sergey Brin, Lawrence Page, The antaomy of a large-scale hypertextual Web search engine, Computer

Networks and ISDN Systems, 33: 107-17, 1998.

[10] Ron Larson, Elementary Linear Algebra 7th Edition, Brooks Cole, 2012, pp 550-556.

May 2006

[14] Masaaki Kijima, Markov Processes for Stochastic Modeling, CRC Press, 1997, pp 295-297

Please join StudyMode to read the full document