A New Apporach of Frequent Pattern Mining in Web Usage Mining

Only available on StudyMode
  • Download(s) : 86
  • Published : March 20, 2013
Open Document
Text Preview
A NEW APPORACH OF FREQUENT PATTERN MINING IN WEB USAGE MINING Mrs.R.Kousalya PhD. Scholar, Manonmaniam Sundaranar University, HOD/Asst professor, Dr. N.G.P. Arts and Science College, Coimbatore-641 048, India Mob no. +91 9894656526 Kousalyacbe@gmail.com Ms.S.Pradeepa M.Phil.Scholar, Department of Computer Science, Dr.N.G.P. Arts and Science College, Coimbatore-641 048, India Mob no. +91 9489551185 Prathy.it@gmail.com Ms.K.Suguna M.Phil.Scholar, Department of Computer Science, Dr.N.G.P. Arts and Science College, Coimbatore-641 048, India Mob no. +91 9787331723 Sugunakr29@gmail.com

ABSTRACT
The web usage mining is the branch of web mining. In web usage mining consist of three phases. There are Data preprocessing, Pattern Discovery and Pattern analysis. The data is assembled has result in awfully large information in web access. The data is grouped the neighborhood data by using divisive clustering method. The divisive analysis is one of the types of hierarchical method of clustering, the divisive analysis is used to separate single clusters from the group of clustered datasets. In this paper, we proposed the new algorithm DFP to mine the most frequently accessed webpage from web log files.

into a single cluster. The DFP algorithm is used to mine the most frequent clustered datasets.

2.HIERARCHICAL CLUSTERING
Hierarchical clustering is a process of cluster analysis which seeks to assemble a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types: 2.1 AGGLOMERATIVE: This is a "bottom up" approach. Each observation starts in its own cluster and pairs of clusters are merged as one move up the hierarchy. 2.2 DIVISIVE ANALYSIS: This is a "top down" approach. Here the datasets are clustered using divisive analysis, the clustered datasets are split into a single cluster. H H, C, D C, D H, C, D, F, At, Ct, Al F, At F, At, Ct, Al Ct, Al Ct Al F At C D

General Terms
Data Mining and Web Mining.

Keywords
World Wide Web, Web Usage Mining, Clustering, Data Preprocessing, Pattern discovery, Pattern analysis, Apriori, FP and DFP Algorithm.

1. INTRODUCTION
Data mining is about discovery new information in a group of data. Web mining is one of the form of data mining, Which is used to mine the data from the web. Web usage mining is one of the type of web mining, which extracts the behavior of the user from the web log files. Hierarchical clustering is a type of clustering analysis. The hierarchical clustering is divided into two types: Agglomerative and Divisive clustering. Divisive is a "top down" approach, all details combined into cluster and splits into the many nodes. In Apriori algorithm the frequent occurring data is mined and in FP Growth algorithm the most frequent datasets are mined. In DFP algorithm, the divisive analysis in hierarchical clustering method is applied in FP Growth algorithm to mine the clustered datasets. In this proposed algorithm, we cluster the dataset using divisive analysis and split that clustered datasets

Figure 1: Divisive Analysis

3. WEBUSAGE MINING
Web mining is the application of data mining techniques to discover patterns from the Web. Web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining. The process of Web Usage Mining consists of three main steps are Data Preprocessing, Pattern Discovery and Pattern Analysis.

WEB SERVER

WEB LOG FILES

WEB USAGE MINING DATA PREPROCESSING DATA CLEANING  DIVISIVE ANALYSIS

PATTERN DISCOVERY  APRIORI

PATTERN ANALYSIS  FP GROWTH

RESULT

The data is cleaned in data preprocessing and the divisive analysis method is applied to split the clustered nodes to mine the useful patterns from the web. The clustered datasets are passed to the pattern discovery phase for the further process. Function divisive Input: Web log File Output: Grouped datasets A. Divisive Analysis 1. Find the object, which has the highest average...
tracking img