Surat, Gujarat, India.
Abstract: Today the amount of data available online is increasing widely. the World Wide Web has becoming one of the most valuable resources for information retrievals and knowledge discoveries. Web mining technologies are the right solutions for knowledge discovery on the Web. The knowledge extracted from the Web can be used to raise the performances for Web information retrievals, question answering, and Web based data warehousing. In this paper, we provide an introduction of Web mining as well as a review of the Web mining categories. But we focus on one of the category called the Web structure mining. Two page ranking algorithms, HITS and PageRank, are commonly used in web structure mining. Both algorithms treat all links equally when distributing rank scores. A comparative analysis on popular methods applied in Web structure mining algorithm, show that HITS performs better than PageRank algorithm in terms of returning larger number of relevant pages to a given query.
Keywords: Web mining, Web Structure Mining, Page Rank, HITS.
The World Wide Web is today's largest warehouse of knowledge. It is a huge, widely distributed, global source for information services, hyper-link information, access and usage information and web-site contents & organizations. With the transformation of the Web into a ubiquitous tool for .e-activities. Such as e-commerce, e-learning, e-government, e-science, its use has pervaded to the realms of day-to-day work, information retrieval and business management.
Due to the increasing amount of data available online, the World Wide Web has becoming one of the most valuable resources for information retrievals and knowledge discoveries. Web mining technologies are the right solutions for knowledge discovery on the Web. The knowledge extracted from the Web can be used to raise the performances for Web information retrievals, question answering, and Web based data warehousing.
II. WEB MINING
The term Web Data Mining is a technique used to crawl through various web resources to collect required information, which enables an individual or a company to promote business, understanding marketing dynamics, new promotions floating on the Internet, etc. There is a growing trend among companies, organizations and individuals alike to gather information through web data mining to utilize that information in their best interest. Web mining is used to discover the content of the Web,
the users’ behavior in the past, and webpage that the users want to view in the future. Web mining consists of Web Content Mining, Web Structure Mining, and Web Usage Mining. Web Content Mining deals with the discovery of useful information from web content. Web Usage Mining ascertains user profiles and the users’ behavior recorded inside the web log file. Web Structure Mining categorizes web pages and generates related patterns, such as the similarity and the relationships between different Web sites. Technically, Web Content Mining focuses mainly on the structure within a document (the inner-document level) while Web Structure Mining tries to discover the link structure of the hyperlinks between documents (the interdocument level). The numbers of inlinks (links to a page) and of outlinks (links from a page) are valuable information in web mining. This is due to the facts that a popular webpage is often referred to by other pages and that an “important” webpage contains a high number of outlinks. Therefore, Web Structure Mining is seen as an important approach to web mining. [pic]
Figure 1: Classification of Web Mining
A. Web structure mining