At the outset I would like to express my gratitude to Dr. Subir Kumar Sarkar, H.O.D, Electronics and communication engineering, Jadavpur University, Kolkata for providing all the resources and amenities required throughout the life of the project. I would like to extend my thanks to Prof. Ananda Sankar Chowdhary, under whose expert supervision, professional guidance and his valuable insights on the subject which helped me in collecting the information to a great extent. I am equally thankful to all my friends who helped me to collect the required data and plan the report. Throughout the writing of this report my parents stood by me and I am extremely grateful to them.
Hierarchical clustering algorithm are of different philosophy from the other algorithms. Specifically, instead of producing a single clustering, they produces a hierarchy of clusterings. This kind of algorithm is usually found in social sciences & biological taxonomy. In addition, they have been used in many other fields, including modern biology, medicine, and archaeology. In this section, the hierarchical clustering algorithm is used for large data set.
Clustering is one of the most primitive mental activities of humans, used to handle the huge amount of information they receive everyday. Processing every piece of information as a single entity would be impossible. Thus human tend to categorize entities into clusters. Each clusters is then characterized by the common attributes of the entities it contains.
TYPES OF CLUSTERING ALGORITHM :-
1. Sequential Algorithm.
2. Hierarchical Clustering Algorithm.
3. Clustering Algorithm Based On Cost Function Optimization.
1. Sequential algorithms :-
These algorithms produce a single clustering. They are quite straightforward and fast methods. In most of them, all the feature vectors are presented to the algorithm once or a few times.
2. Hierarchical Clustering Algorithm :-
Hierarchical Clustering organizes the data into large groups which contains smaller groups & so on. This can be represented using a tree or a dendrogram.
►Agglomerative algorithms:- These algorithms produce a sequence of clusterings of decreasing number of clusters, m, at each step. The clustering produced at each step results from the previous one by merging two clusters into one. The main representatives of the agglomerative algorithms are the single and complete link algorithms.
►Divisive algorithms:- These algorithms act in the opposite direction; that is, they produce a sequence of clusterings of increasing m at each step. The clustering produced at each step results from the previous one by splitting a single cluster into two.
Figure 1 : Represent the graph on Agglomerative and Divisive clustering.
3. Clustering algorithms based on cost function optimization:- This category contains algorithms in which “sensible” is quantified by a cost function, J, in terms of which a clustering is evaluated. Usually, the number of clusters m is kept fixed. Most of these algorithms use differential calculus concepts andproduce successive clusterings while trying to optimize J . They terminate when a local optimum of J is determined. Algorithms of this category are also called iterative function optimization schemes.
► Hard or crisp clustering algorithms:-Hard clustering algorithm where a vector belongs exclusively to a specific cluster. The assignment of the vectors to individual clusters is carried out optimally, according to the adopted optimality criterion. The most famous algorithm of this category is the Isodata or Lloyd algorithm
► Probabilistic clustering algorithms:- Probabilistic clustering algorithm are a special type of hard clustering algorithms that follow Bayesian classification arguments and each vector x is assigned to the cluster Ci for which P(Ci |x) is maximum....