Lavalee Singh1 Arun Singh2
1 M.Tech (C.S.) Student IIMT Engineering College Meerut (U.P.) India firstname.lastname@example.org
2Associate Professor IIMT Engineering College Meerut (U.P.) India
The World-Wide-Web contains a large amount of information. Everyone can store and retrieve the information from web. It is difficult to find the relevant piece of information from web. Extracting the important information from web is called Web Mining. Web mining technologies are best suited for web information extraction and information retrieval. Web mining is one of the mining technologies, which applies data mining techniques in large amount of web data to improve the web services. We are going to give a brief description of web mining and its categorization namely: web content mining, web structure mining and web usage mining. This paper also reports the web data mining with applications. Keywords: Web Mining, Information Extraction, Information Retrieval, Web content mining, Web structure mining, Web usage mining and Web crawling
The World Wide Web is a popular and interactive medium to disseminate information today. With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for users to utilize automated tools in order to find, extract, filter, and evaluate the desired information and resources. The World Wide Web provides a vast source of information of almost all types, ranging from DNA databases to resumes to lists of popular multiplexes. Web has a large amount of data and it is not easy task to find out the content or information of our interest. Web mining is one of the techniques to solve such kind of problem. We are not saying that this is the only technique, a no. of technique are namely Machine Learning, Natural Language Processing etc. Due to the large availability of data the World Wide Web, it has become very important for users to use automated tools to find the desired information resources. Information Retrieval is the automatic retrieval of all relevant documents while at the same time retrieving as few of the non-relevant as possible. Information extraction aims to extract relevant facts from the documents while aims to select relevant documents .
As shown is Figure (1) YAHOO, GOOGLE and MSN are search engines, used to extract the information from web. The extracted information may be relevant but also contain less relevant, and some time irrelevant information.
2.0 WEB MINING
Web mining is the application of data mining techniques to extract useful information and knowledge from web data, including web documents, hyperlinks between documents, usage logs of web sites, etc. to improve the web services . Web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the Web data. A natural combination of Data Mining and World Wide Web may be referred to as Web Mining. Web mining is the Data Mining technique that automatically discovers or extracts the information from web documents. It consists of following tasks :
* Resource finding: It involves the task of retrieving intended web documents. It is the process by which we extract the data either from online or offline text resources available on web. It includes information retrieval and extraction from web pages. * .Information selection and pre-processing: It involves the automatic selection and pre processing of...