Preview

Parallelization of Pagerank and Hits Algorithm on Cuda

Powerful Essays
Open Document
Open Document
4677 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Parallelization of Pagerank and Hits Algorithm on Cuda
Parallelization of PageRank and HITS Algorithm on CUDA Architecture
‡ ‡ Kumar Ishan, Mohit Gupta, Naresh Kumar, Ankush Mittal† ‡

Department of electronics & Computer Engineering, Indian Institute of Technology, Roorkee, India. {kicomuec, mickyuec, naresuec, ankumfec}@iitr.ernet.in

Abstract Efficiency of any search engine mostly depends on how efficiently and precisely it can determine the importance and popularity of a web document. Page Rank algorithm and HITS algorithm are widely known approaches to determine the importance and popularity of web pages. Due to large number of documents available on World Wide Web, huge amount of computations are required to determine the rank of web pages making it very time consuming. Researchers have devoted much attention in parallelizing PageRank on PC Cluster, Grids, and Multi-core processors like Cell Broadband Engine to overcome this issue but with little or no success. In this paper, we discuss the issues in porting these algorithms on Compute Unified Device Architecture (CUDA) and introduce efficient parallel implementation of these algorithms on CUDA by exploiting the block structure of web, which not only cut down the computation time but also significantly reduces of the cost of hardware required (only few thousands).

1. Introduction In present days, the unceasing growth of World Wide Web has lead to a lot of research in page ranking algorithms used by the search engines to provide the most relevant results to the user for any particular query. The dynamic and diverse nature of web graph further exaggerates the challenges in achieving the optimum results. Web link analysis provides a way to order the web pages by studying the link structure of web graphs. PageRank and HITS (Hyperlink - Induced Topic Search) are two such most popular algorithms widely used by the current search engines either in same or modified form to rank the documents based on the link structure of the documents. PageRank, originally



References: [1] S. Brin and L. Page, “The Anatomy of a Large Scale Hypertextual Web Search Engine,” Computer Networks and ISDN Systems archive, Volume 30, Issue 1-7, April. 1998. [2] B. Manaskasemsak, P. Uthayopas, A. Rungsawang, “A Mixed MPI-Thread Approach for Parallel Page Ranking Computation”, OTM 2006, LNCS Volume 4276, 2006, pp. 1223-1233. [3] A. Rungsawang and B. Manaskasemsak, “Partition-Based Parallel PageRank Algorithm”, Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05), Sydney, 4th - 7th July, 2005. [4] A. Rungsawang and B. Manaskasemsak, “PageRank Computation Using PC Cluster”, Proceedings of the 10th European PVM/MPI User’s Group Meeting, Venice, Italy, 29th Sep – 2nd Oct 2003. [5] C. Kohlschutter, P. Chirita, and W. Nejdl, “E cient Parallel Computation of PageRank”, Proceedings of the 28th European Conference on Information Retrieval (ECIR), London, United Kingdom, 2006. [6] S. Kamvar, T.H. Haveliwala, C. D. Manning ,G. H. Golu, “Exploiting the Block Structure of the Web for Computing PageRank”, Technical Report CSSM-03-02, Computer Science Department, Stanford University, 2003. [7] T.H. Haveliwala, “Efficient Computation of PageRank”, Technical Report, Computer Science Department, Stanford University, 1999. [8] A. Arasu, J. Novak, A. Tomkins, and J. Tomlin, “PageRank Computation and the Structure of the Web: Experiments and Algorithms”, In Proceedings of the 11th World Wide Web Conference, poster track, Honolulu, Hawaii, 7-11 May 2002. [9] G. Buehrer, S. Parthasarathy, and M. Goyder, "Data mining on the cell broadband engine", Proceedings of ICS’08, Cairo, Egypt, 20-24 October, 2008. [10] B. Manaskasemsak and A. Rungsawang, “Parallel PageRank Computation on a Gigabit PC Cluster”. Proceedings of the 18th International Conference on Advanced Information Networking and Application (AINA ’04), Fukuoka, Japan, 29-31 March 2004. [11] J.M. Kleinberg, “Authoritative Sources in a Hyperlinked Environment”, Journal of the ACM (JACM) archive. Volume 46, Issue 5, September 1999. [12] Y.G. Saffar, K.S. Esmaili, M. Ghodsi, and H. Abolhassani, “Parallel Online Ranking of Web Pages”, The 4th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA-06), UAE, March 2006, pp. 104-109. [13] S. Nomura Satoshi Oyama Tetsuo Hayamizu, and Toru Ishida, “Analysis and Improvement of HITS Algorithm for DetectingWeb Communities”. [14] NVIDIA CUDA Programming Guide 2.2 by NVIDIA Corporation. [15] Daily estimated size of World Wide Web, http://www.worldwidewebsize.com [16] WebGraph Laboratory, http://webgraph.dsi.unimi.it/ in 2006

You May Also Find These Documents Helpful

  • Powerful Essays

    data is widely available and there is an imminent need for turning such data into useful information. This need is fulfilled by the process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data provided by Data Mining. In case of a single system with few processors, there are restrictions on the speed of processing as well as the size of the data that can be processed at a time. The speed as well as the limit on the size of the data to be processed can be increased if data mining is carried out in parallel fashion with the help of the coordinated systems connected in LAN. But the problem with this solution is that LAN is not elastic, i.e. the number of systems in which the work is to be distributed on basis of the size of the data to be processed cannot be changed. Our main aim is to distribute data to be analyzed in various nodes in cloud. For optimum data distribution and efficient data mining as per user’s desire, various algorithms must be implemented.…

    • 3006 Words
    • 13 Pages
    Powerful Essays
  • Good Essays

    Itc 101 Quiz

    • 2722 Words
    • 11 Pages

    4. Metasearch engines search several engines at once and integrate the findings of the various search engines. ( )…

    • 2722 Words
    • 11 Pages
    Good Essays
  • Powerful Essays

    mine the most relevant results in the index. Although the precise workings of these algorithms are kept at least as secret as Coca-Cola’s formula they are usually based on two main functions: keyword analysis (for evaluating pages along such dimensions as frequency of specific words) and link analysis (based on the number of times a page is linked to from other sites and the rank of these other sites) (see Figure 1).…

    • 4479 Words
    • 18 Pages
    Powerful Essays
  • Better Essays

    * Select in-degree, out-degree, etc in the menu to calculate overall graph metrics which can be used later for future analyzation or just the in-degree and out-degree.…

    • 1080 Words
    • 5 Pages
    Better Essays
  • Good Essays

    Alias Name

    • 1240 Words
    • 5 Pages

    An individual is typically referred by numerous name aliases on the web. Accurate identification of aliases of a given person name is useful in various web related tasks such as information retrieval, sentiment analysis, personal name disambiguation, and relation extraction. We propose a method to extract aliases of a given personal name from the web. Given a personal name, the proposed method first extracts a set of candidate aliases. Second, we rank the extracted candidates according to the likelihood of a candidate being a correct alias of the given name. We propose a novel, automatically extracted lexical pattern-based approach to efficiently extract a large set of candidate aliases from snippets retrieved from a web search engine. We define numerous ranking scores to evaluate candidate aliases using three approaches: lexical pattern frequency, word co-occurrences in an anchor text graph, and page counts on the web. To construct a robust alias detection system, we integrate the different ranking scores into a single…

    • 1240 Words
    • 5 Pages
    Good Essays
  • Good Essays

    The use of search engines on the Internet is a very significant aspect towards attaining information ranging from research purposes, like stock quotes, to daily use such as the weather in your hometown. The ability to find information on these engines all depend on experience, knowledge of certain search techniques, and remembering the strengths and advantages of each engine for particular information.…

    • 1537 Words
    • 7 Pages
    Good Essays
  • Good Essays

    Nebular Theory

    • 914 Words
    • 4 Pages

    You are about to go on a journey through the World Wide Web www in search of knowledge that will help you gain experience in using the Internet to your advantage. There are many search engines out on the web to make our everyday lives easier. One of the most well known search engines is....... (Reference document titled Internet Scavenger Hunt Reference Sheet for steps on certain computer processes)…

    • 914 Words
    • 4 Pages
    Good Essays
  • Good Essays

    I began my research by narrowing down the top three most popular World Wide Web (Internet) search engines; Google, Bing, and Ask. My goal was to answer the following questions pertaining to search engines.…

    • 2926 Words
    • 12 Pages
    Good Essays
  • Powerful Essays

    Google Case Study

    • 5629 Words
    • 23 Pages

    In search for a dissertation theme, Page considered—among other things—exploring the mathematical properties of the World Wide Web, understanding its link structure as a huge graph. His supervisor Terry Winograd encouraged him to pick this idea (which Page later recalled as "the best advice I ever got") and Page focused on the problem of finding out which web pages link to a given page, considering the number and nature of such back links to be valuable information about that page (with the role of citations in academic publishing in mind). In his research project, nicknamed "BackRub", he was soon joined by Sergey Brin, a fellow Stanford Ph.D. student supported by a National Science Foundation Graduate Fellowship. Brin was already a close friend, whom Page had first met in the summer of 1995 in a group of potential new students which Brin had volunteered to show around the campus. Page's web crawler began exploring the web in March 1996, setting out from Page's own Stanford home page as its only starting point. To convert the backlink data that it gathered into a measure of importance for a given web page, Brin and Page developed the PageRank algorithm. Analysing BackRub's output—which, for a given URL, consisted of a list of backlinks ranked by importance—it occurred to them that a search engine based on PageRank would produce better results than existing techniques (existing search engines at the time essentially ranked results according to how many times the search term appeared on a page).…

    • 5629 Words
    • 23 Pages
    Powerful Essays
  • Powerful Essays

    Google History

    • 4218 Words
    • 13 Pages

    Page's web crawler began exploring the web in March 1996, with Page's own Stanford home page serving as the only starting point.[3] To convert the backlink data that it gathered for a given web page into a measure of importance, Brin and Page developed the PageRank algorithm.[3] While analyzing BackRub's output—which, for a given URL, consisted of a list of backlinks ranked by importance—the pair realized that a search engine based on PageRank…

    • 4218 Words
    • 13 Pages
    Powerful Essays
  • Satisfactory Essays

    Google Search

    • 414 Words
    • 2 Pages

    Uses a set of signals to determine how trustworthy, reputable, or authoritative a source is. One of these signals is PageRank, one of Google's first algorithms, which looks at links between pages to determine their relevance.…

    • 414 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Web Search Engines

    • 626 Words
    • 3 Pages

    Typically, Web search engines work by sending out a spider to fetch as many documents as possible. Another program, called an indexer, then reads these documents and creates an index based on the words contained in each document. Each search engine uses a proprietary algorithm to create its indices such that, ideally, only meaningful results are returned for each query.…

    • 626 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    Google Files Systems

    • 1348 Words
    • 6 Pages

    6,767,805,208 people on earth 1,733,993,741 people on the internet 5,000,000 terabytes of data (Eric Schmidt, 2005)…

    • 1348 Words
    • 6 Pages
    Powerful Essays
  • Powerful Essays

    Social Information Network

    • 1274 Words
    • 6 Pages

    Social information networks are some of the most popular websites on the Internet, and they are by far the best for aggregating online content. In 2006, I recall, Digg.com was ranked somewhere around the 60th most popular website on the Internet. In fact, when a bookmarked website reached the front page of Digg something known as the “digg effect” occurred. The massive number of unique users visiting a website through Digg would end up shutting the site down if it had not properly prepared its hosting situation beforehand. Many sites quickly became inaccessible after reaching the front-page of Digg. There are a wide variety of reasons why social information network sites are so popular, and this paper touches upon several of those reasons. One of the primary reasons that said sites become so popular is that through their socially-oriented comment structure they provide a go-to aggregate of the experiences of group of individuals associated with a topic…

    • 1274 Words
    • 6 Pages
    Powerful Essays
  • Satisfactory Essays

    ▪ Page Rank and Authority-Hub analysis is to utilize the hyperlinks to find pages with high authorities.…

    • 393 Words
    • 2 Pages
    Satisfactory Essays

Related Topics