Page 1 of 8

Non-Hierarchical Cluster Analysis

Continues for 7 more pages »
Read full document

Non-Hierarchical Cluster Analysis

  • By
  • November 9, 2011
  • 2267 Words
  • 292 Views
Page 1 of 8
Non-Hierarchical Cluster Analysis

Non-hierarchical cluster analysis (often known as K-means Clustering Method) forms a grouping of a set of units, into a pre-determined number of groups, using an iterative algorithm that optimizes a chosen criterion. Starting from an initial classification, units are transferred from one group to another or swapped with units from other groups, until no further improvement can be made to the criterion value. There is no guarantee that the solution thus obtained will be globally optimal - by starting from a different initial classification it is sometimes possible to obtain a better classification. However, starting from a good initial classification much increases the chances of producing an optimal or near-optimal solution.

(source: http://www.asreml.com/products/genstat/mva/NonHierarchicalClusterAnalysis.htm)

The algorithm is called k-means, where k is the number of clusters you want; since a case is assigned to the cluster for which its distance to the cluster mean is the smallest. The action in the algorithm centers on finding the k-means. You start out with an initial set of means and classify cases based on their distances to the centers. Next, you compute the cluster means again, using the cases that are assigned to the cluster; then, you reclassify all cases based on the new set of means. You keep repeating this step until cluster means don’t change much between successive steps. Finally, you calculate the means of the clusters once again and assign the cases to their permanent clusters.

(source: http://www.norusis.com/pdf/SPC_v13.pdf)

Steps in Non-Hierarchical Cluster Analyisis

In this method, the desired number of clusters is specified in advance and the ’best’ solution is chosen. The steps in such a method are as follows:

1. Choose initial cluster centers (essentially this is a set of observations that are far apart — each subject forms a cluster of one and its center is the value of the...