Data Mining Report

Topics: Cluster analysis, Machine learning, Data mining Pages: 14 (2227 words) Published: October 18, 2014
DATA MINING REPORT
A Comparison of K-means and DBSCAN
Algorithm
Data Mining with Iris Data Set Using K-Means
Cluster method within Weak Data Mining Toolkit.

Team Task ......................................................................................................................................... 3 1.0 Introduction ................................................................................................................................. 3 2.0 Related Works ............................................................................................................................. 4 2.1 Clustering analysis .............................................................................................................. 4 2.2 K-Means algorithm ............................................................................................................. 4 3.0 Dataset & Preprocess .................................................................................................................. 7 3.1. Dataset........................................................................................................................................ 7 3.2. Preprocess .......................................................................................................................... 8 3.2.1 Data set type convert ................................................................................................ 8 3.2.2 Data convert (z-score) .............................................................................................. 9 4.0 Accuracy Measure ..................................................................................................................... 10 5.0 Implementation ......................................................................................................................... 10 5.1 weka & K-Means .............................................................................................................. 10 5.2 Weka & DBSCAN ............................................................................................................ 13 6.0 Conclusion ................................................................................................................................ 14

2

Abstract
We choose a data set which is called Iris Data Set and also has very high click rate from UCL. At first, we make preprocessing about Iris Data Set and repair lost data. We also use Java to implement K-Means algorithm and we get high accuracy rate and high efficiency. Then we use Weka Toolkit to make comparison between K-Means and NavieBayes and Decision Tree. Finally, we make comparison with other results in our report and it is proved that our algorithm has very high accuracy and effectiveness.

1.0 Introduction
In our report, we mainly complement K-Means in data clustering. We build model through K-Means algorithm, and then we make training confirmation by using online data sets Iris. When we analysis the final result of cluster, we find the accuracy of result is very high. This analysis result can prove that K-Means algorithm has high accuracy and efficiency on certain data set. For showing the performance of clustering better, we find a report is about using same data set makes study about algorithm performance. After contracting with each other, we find K-Means algorithm model is 3

not lower than other Optimization classification algorithm in accuracy. Also K-Means algorithm can run very quickly. It is also the most quickly one in these Optimization cluster algorithms. Therefore, for real system, there is no doubt that K-Means algorithm is a valuable consideration first classification algorithm in weighting accuracy and efficiency comparison.

2.0 Related Works
2.1 Clustering analysis
Clustering is according to certain standard (often the cosine similarity) to make a specific data set sample is divided into different categories, makes the class attribute or individual...
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • Data Mining Techniques (Mini Project Essay
  • Data Mining Information Essay
  • DATA CLUSTERING Essay
  • Data Mining of Chemical Analysis for White Wine Quality Essay
  • Data Mining Essay
  • Essay about Data Mining
  • Essay about Data Mining
  • Essay about Data Mining

Become a StudyMode Member

Sign Up - It's Free