The Actions and Future of Web Mining
Ms. Preety Khatri Mr. Sanjay Pachauri Mr. Ritesh Singhal (Pursuing PhD, MCA, (Pursuing PhD, M.Tech (Pursuing PhD, M.Phil, M.Phil) MBA(IT), MCSE, MSc.) M.Sc., MIT) Lecturer(IT), Coordinator PGDM Associate Professor, HOD-IT Associate Professor,HOD- . QT/OR BLS Institute of Management, Mohan Nagar, Ghaziabad (U.P.) Abstract:
From its very beginning, the potential of extracting valuable knowledge from the Web has been quite evident. Web mining – i.e. the application of data mining techniques to extract knowledge from Web content, structure, and usage – is the collection of technologies to fulfill this potential. Web mining is the application of data mining techniques to extract knowledge from Web data, where at least one of structure (hyperlink) or usage (Web log) data is used in the mining process (with or without other types of Web data). Interest in Web mining has grown rapidly in its short existence, both in the research and practitioner communities. This paper provides a brief overview of the accomplishments of the field – both in terms of technologies and applications – and outlines key future research directions. Keywords: Web mining, Data mining, Web, Process mining, temporal
Web mining is the application of data mining techniques to extract knowledge from Web data - including Web documents, hyperlinks between documents, usage logs of web sites, etc. Two different approaches were taken in initially defining Web mining. First was a ‘process- centric view’, which defined Web mining as a sequence of tasks. Second was a ‘data-centric view’, which defined Web mining in terms of the types of Web data that was being used in the mining process. In this paper we follow the data-centric view, and refine the definition of Web mining as, Web mining is the application of data mining techniques to extract knowledge from Web data, where at least one of structure (hyperlink) or usage (Web log) data is used in the mining process (with or without other types of Web data). There is a purpose to adding the extra clause about structure and usage data. The reason being that mining Web content by itself is no different than general data mining, since it makes no difference whether the content was obtained from the Web, a database, a file system or through any other means. As shown in Figure 2, Web content can be variegated, containing text and hypertext, image, audio, video, records, etc. Mining each of these media types is by itself a sub-field of data mining. The attention paid to Web mining, in research, software industry, and Web-based organizations, has led to the accumulation of a lot of experiences. It is our attempt in this paper to capture them in a systematic manner, and identify directions for future research. One way to think about work in Web mining is as shown in Figure 1.
Figure 1. Web mining research & applications.
WEB MINING TAXONOMY:
Web Mining can be broadly divided into three distinct categories, according to the kinds of data to be mined:
1. Web Content Mining: Web Content Mining is the process of extracting useful information from the contents of Web documents. Content data corresponds to the collection of facts a Web page was designed to convey to the users. It may consist of text, images, audio, video, or structured records such as lists and tables. Text mining and its application to Web content has been the most...