DATA CLUSTERING

KMEANS CLUSTERING IN THE CONTEXT OF REAL
WORLD DATA CLUSTERING
ADEWALE .O . MAKO

DATA MINING
INTRODUCTION:
Data mining is the analysis step of knowledge discovery in databases or a field at the intersection of computer science and statistics. It is also the analysis of large observational datasets to find unsuspected relationships. This definition refers to observational data as opposed to experimental data.
Data mining typically deals with data that has already been collected for some purpose or the other than the data mining analysis. It is often referred to as ‘secondary data analysis.
The overall goal of the data mining process is to extract information from a dataset and transform it into an understandable structure for further use.

SCORE FUNCTIONS IN DATA MINING
A score function is a measure of one’s performance while making decisions under uncertainty. The purpose of a score function in data mining is to rank models as a function of how useful the models are to the data miner. A chosen score function should reflect the overall goals of the data mining task as far as is possible. Different score functions have different properties and are useful in different situations which is why one should avoid using a convenient score function because it will most likely be inappropriate for the task at hand.

CLUSTERING
Clustering is one of the most important unsupervised learning techniques. It deals with finding a structure in a collection of unlabelled data as every other problem of this kind.
Clustering is in the eye of the beholder.in the other word there is not accurately correct clustering algorithm.
We can describe clustering as a process of organizing objects into groups that members have some similarity in particular way.in the other word a cluster is therefore a collection of objects that are similarity between them and are dissimilarity to the objects belonging to other cluster. An advantage of clustering is

References: Allan, T. 2012, Evaluating Clusters CS3002, Artificial Intelligence, Brunel University Chen, C (1999). Information Visualisation and Virtual Environments. . Kent: Gray Publishing. P61-82. Hand et al. (2001). Score Functions for Data Mining Algorithms. In:,. Principles of Data Mining. 4th ed. Massachusetts: MIT Press. NIL. Lloyd, S. . (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory. 7 (28), p129–137. Witten, IH. Frank E (2005). Data Mining: Practical Learning Tools and Techniques.. San Francisco: Morgan Kaufmann Publishers. NIL.

DATA CLUSTERING

You May Also Find These Documents Helpful

Unit 2 Problem Set 1

Unit 2 Problem Set 1

Computer Homework page 144 145

Computer Homework page 144 145

Course Project Decision Management 530

Course Project Decision Management 530

Database Concepts Pt2520

Database Concepts Pt2520

Cis 850 Study Guid

Cis 850 Study Guid

Prioritizing the IT Project Portfolio Paper

Prioritizing the IT Project Portfolio Paper

True False

True False

LCI Results

LCI Results

Cis 500 Data Mining Report

Cis 500 Data Mining Report

Nursing Practice Guidelines: Pressure Ulcer Prevention And Treatment Policy

Nursing Practice Guidelines: Pressure Ulcer Prevention And Treatment Policy

Data Mining and Actionable Information

Data Mining and Actionable Information

Data Mining Bankruptcy Case

Data Mining Bankruptcy Case

Gestalt: Perception and German Psychologists

Gestalt: Perception and German Psychologists

List Some Possible Advantages & Disadvantages of Using Computer Technology for Managerial Decision Making?

List Some Possible Advantages & Disadvantages of Using Computer Technology for Managerial Decision Making?

Gestalt Psychology Reflection

Gestalt Psychology Reflection

Related Topics