Preview

Web Structure Mining: a Comparative Analysis of Hits Algorithm

Powerful Essays
Open Document
Open Document
1689 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Web Structure Mining: a Comparative Analysis of Hits Algorithm
Web Structure Mining: A Comparative Analysis of HITS Algorithm
Mrs. Charmy Patel#1, Mrs. Kinjan Chauhan#2 and Mrs. Priti Patel#3
#Shree Ramkrishna Institute of Computer Education and Applied Sciences,
M.T.B College Campus, Athwalines,
Surat, Gujarat, India.
1charmyspatel@gmail.com
2Kinjanchauhan99@gmail.com
3priti_patel22@hotmail.com

Abstract: Today the amount of data available online is increasing widely. the World Wide Web has becoming one of the most valuable resources for information retrievals and knowledge discoveries. Web mining technologies are the right solutions for knowledge discovery on the Web. The knowledge extracted from the Web can be used to raise the performances for Web information retrievals, question answering, and Web based data warehousing. In this paper, we provide an introduction of Web mining as well as a review of the Web mining categories. But we focus on one of the category called the Web structure mining.
Two page ranking algorithms, HITS and PageRank, are commonly used in web structure mining. Both algorithms treat all links equally when distributing rank scores. A comparative analysis on popular methods applied in Web structure mining algorithm, show that HITS performs better than PageRank algorithm in terms of returning larger number of relevant pages to a given query.

Keywords: Web mining, Web Structure Mining, Page Rank, HITS.

I. INTRODUCTION

The World Wide Web is today 's largest warehouse of knowledge. It is a huge, widely distributed, global source for information services, hyper-link information, access and usage information and web-site contents & organizations. With the transformation of the Web into a ubiquitous tool for .e-activities. Such as e-commerce, e-learning, e-government, e-science, its use has pervaded to the realms of day-to-day work, information retrieval and business management.

Due to the increasing amount of data available online, the World Wide Web has becoming one of the most



References: [1] M. Kobayashi, and K. Takeda, .Information Retrieval on the Web., ACM Computing Surveys, Vol. 32, No.2, June 2000. [2] R. Kosala, and H. Blockeel, .Web Mining Research: A survey., SIGKDD Explorations, Vol. 2, Issue 1, July 2000, pp. 1-15. [3] http://www.cse.iitb.ac.in/internal/techreports/reports/TR-CSE-2010-31.pdf [4] http://horicky.blogspot.com/2010/03/ [5] Data Mining Techniques – Arun K Pujari

You May Also Find These Documents Helpful

  • Best Essays

    Demirdjian, Z. S. (2011). The world wide web: The stepchild of the internet. The Business Review, Cambridge, 17(1), 2-I,II. Retrieve from http://search.proquest.com/docview/871194214?accountid=12085…

    • 2336 Words
    • 7 Pages
    Best Essays
  • Powerful Essays

    Cis 500 Data Mining Report

    • 2046 Words
    • 9 Pages

    Web mining to discover business intelligence from Web customers is used in a variety of ways because this technique is designed to discover patterns from the web. One of the most popular ways is to determine the search patterns for a particular group of people from a particular region. Other means include visiting e-commerce websites to determine what the best and worst sellers are. Additionally popular sites can also be identified by determining the number of links that refer to the site. Advantages of using techniques like this for businesses are increased sales because you have the ability to track a web users browsing behavior down to the mouse clicks. The applications of web mining enable a business to personalize services for individual customers on a massive scale. This helps businesses by satisfying customer needs and increasing brand loyalty. By using a personalized and customer oriented approach, the content of a website can be updated and adapted to a customer’s preference. Efforts like this ensure the right offers can be made to the right…

    • 2046 Words
    • 9 Pages
    Powerful Essays
  • Powerful Essays

    mine the most relevant results in the index. Although the precise workings of these algorithms are kept at least as secret as Coca-Cola’s formula they are usually based on two main functions: keyword analysis (for evaluating pages along such dimensions as frequency of specific words) and link analysis (based on the number of times a page is linked to from other sites and the rank of these other sites) (see Figure 1).…

    • 4479 Words
    • 18 Pages
    Powerful Essays
  • Good Essays

    SEO Analysis Paper

    • 600 Words
    • 3 Pages

    SEO may be defined as the optimization of a website for search engines, so that the search engines views it in an optimal manner. Various techniques and methods are available to achieve high rankings and become visible on search engines. Various processes are gradually evolving for optimizing the website, by observing the working of search engines. Every major search engine has its own respective algorithm. All the power of ranking the websites on the results page is with these algorithms. Website relevance and ranking are two important factors that are addressed by search algorithms. (Humayun, 2009)…

    • 600 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    Midterm Paper

    • 2298 Words
    • 10 Pages

    With the increasing availability of online resources, collecting information on the Web and analyzing data play important roles in today’s problem solving task. 1.…

    • 2298 Words
    • 10 Pages
    Powerful Essays
  • Good Essays

    Website Structure Paper

    • 634 Words
    • 3 Pages

    The purpose of this paper is to discuss and compare three Web site structures from the student textbook “New Perspectives on the Internet” by Schneider and Evans. This student will identify the preferred structure and why; provide two Web site locations with URL addresses, and discuss advantages to Cascading Style Sheets in the creation of a web page.…

    • 634 Words
    • 3 Pages
    Good Essays
  • Good Essays

    When the spider has built up an index of different pages, it builds a list of words and notes where they were found. It then builds an index of these websites by creating a system of weighting. The more times a series of words is mentioned on the website, such as ‘BBC’, the website will be higher. If a website is linked from the BBC, which will also feature higher in the search than if it was linked to a less known website. After the spider has created an index it encodes the data to a save space and stores data for users to access.…

    • 1449 Words
    • 5 Pages
    Good Essays
  • Good Essays

    How to Analyze a Web Page

    • 797 Words
    • 4 Pages

    Over the last twenty years the internet has exploded onto seen. Most webpages are unfortunately posted by people who do not do the research needed to provide individuals with the facts they are looking for. Because of this individuals who are looking for a proven webpage to find truthful information need to know how to analyze the site. Anyone can go on to the web and search for whatever they are looking for. For example, if someone searches “human services” more than 1.5 billion results are available and these results range anywhere from what is human services to how to become a human service worker. Because of this when someone wants information they Google it and will sometimes will take the first result they come to and believe it as fact. In this paper we will be looking at some of the ways to analyze the overwhelming results and how to determine what is relevant to the search.…

    • 797 Words
    • 4 Pages
    Good Essays
  • Good Essays

    Google search

    • 1242 Words
    • 5 Pages

    Assume a small universe of four web pages: '''A''', '''B''', '''C''' and '''D'''. Links from a page to itself, or multiple outbound links from one single page to another single page, are ignored. PageRank is initialized to the same value for all pages. In the original form of PageRank, the sum of PageRank over all pages was the total number of pages on the web at that time, so each page in this example would have an initial PageRank of 1. However, later versions of PageRank, and the remainder of this section, assume a [[probability distribution]] between 0 and 1. Hence the initial value for each page is 0.25.…

    • 1242 Words
    • 5 Pages
    Good Essays
  • Satisfactory Essays

    There are over 86 billion web pages published, and most of those pages are not worth quoting. To successfully sift it all, you must use consistent and reliable filtering methods. You will need patience to see the full breadth of writing on any single topic. And you will need your critical thinking skills to disbelieve anything until it is intelligently validated.…

    • 1884 Words
    • 7 Pages
    Satisfactory Essays
  • Good Essays

    The next approach that defines the steps of the working of web search engines is the process of indexing. Only gathering the information and data is not enough, the same need to be organized or indexed in a particular way so that it becomes easier for the person in charge to find the information. The search engines have special embedded systems with the help of which they classify and segregate the data and the links on the basis of keywords or some other logic.…

    • 466 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    Google SEO Methodology Guide

    • 9286 Words
    • 28 Pages

    Before you can begin the SEO process for a keyword, you must first select the landing page you hope will rank for the phrase. In most instances, the best landing page to select for Google can be found with the following query: site:example.com keyword phrase. This will show you what page from your site Google considers to be the most relevant for the keyword. If you decide to create a brand new page for the targeted phrase, then you should utilize the keyword in the filename. Once you have selected the landing page you can then begin the following search engine optimization process.…

    • 9286 Words
    • 28 Pages
    Powerful Essays
  • Powerful Essays

    The advent of the Internet has been one of the most exciting major events in the second…

    • 2567 Words
    • 11 Pages
    Powerful Essays
  • Satisfactory Essays

    Google Search

    • 414 Words
    • 2 Pages

    There are several algorithms and programs to understand and deliver the best possibe result. Algorithms like autocomplete, spelling, synonyms, query understanding etc are used to understand what the user actually want. Then using these information the most relevant pages are sorted out based on over 200 factors.…

    • 414 Words
    • 2 Pages
    Satisfactory Essays
  • Better Essays

    Webanalytics

    • 11739 Words
    • 47 Pages

    S. No 1 2 3 4 5 6 7 8 9 10 11 Brief Idea Introduction of Web Analytics Definition Framework Overview Building Block Terms Visit Characterization Content characterization Onsite Web Analytics Technologies Common Sources of errors in Web Analytics Web Analytics Maturity Model Web Analytics and CRM Why integrate Web Analytics with your CRM Topic 3 6 9 11 15 20 25 31 33 35 38 Page No.…

    • 11739 Words
    • 47 Pages
    Better Essays