# Data Mining Report

**Topics:**Cluster analysis, Machine learning, Data mining

**Pages:**14 (2227 words)

**Published:**October 18, 2014

A Comparison of K-means and DBSCAN

Algorithm

Data Mining with Iris Data Set Using K-Means

Cluster method within Weak Data Mining Toolkit.

Team Task ......................................................................................................................................... 3 1.0 Introduction ................................................................................................................................. 3 2.0 Related Works ............................................................................................................................. 4 2.1 Clustering analysis .............................................................................................................. 4 2.2 K-Means algorithm ............................................................................................................. 4 3.0 Dataset & Preprocess .................................................................................................................. 7 3.1. Dataset........................................................................................................................................ 7 3.2. Preprocess .......................................................................................................................... 8 3.2.1 Data set type convert ................................................................................................ 8 3.2.2 Data convert (z-score) .............................................................................................. 9 4.0 Accuracy Measure ..................................................................................................................... 10 5.0 Implementation ......................................................................................................................... 10 5.1 weka & K-Means .............................................................................................................. 10 5.2 Weka & DBSCAN ............................................................................................................ 13 6.0 Conclusion ................................................................................................................................ 14

2

Abstract

We choose a data set which is called Iris Data Set and also has very high click rate from UCL. At first, we make preprocessing about Iris Data Set and repair lost data. We also use Java to implement K-Means algorithm and we get high accuracy rate and high efficiency. Then we use Weka Toolkit to make comparison between K-Means and NavieBayes and Decision Tree. Finally, we make comparison with other results in our report and it is proved that our algorithm has very high accuracy and effectiveness.

1.0 Introduction

In our report, we mainly complement K-Means in data clustering. We build model through K-Means algorithm, and then we make training confirmation by using online data sets Iris. When we analysis the final result of cluster, we find the accuracy of result is very high. This analysis result can prove that K-Means algorithm has high accuracy and efficiency on certain data set. For showing the performance of clustering better, we find a report is about using same data set makes study about algorithm performance. After contracting with each other, we find K-Means algorithm model is 3

not lower than other Optimization classification algorithm in accuracy. Also K-Means algorithm can run very quickly. It is also the most quickly one in these Optimization cluster algorithms. Therefore, for real system, there is no doubt that K-Means algorithm is a valuable consideration first classification algorithm in weighting accuracy and efficiency comparison.

2.0 Related Works

2.1 Clustering analysis

Clustering is according to certain standard (often the cosine similarity) to make a specific data set sample is divided into different categories, makes the class attribute or individual...

Please join StudyMode to read the full document