Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. It is one of the oldest clustering techniques, and is a method of agglomerative hierarchical clustering.
What is Hierarchical Clustering?
Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. It is one of the oldest clustering techniques, and is a method of agglomerative hierarchical clustering. This means that at each step, two clusters are merged until only one cluster remains.
Steps in Hierarchical Clustering
- Single Linkage Clustering: The distance between two clusters is the minimum distance between any two points in the two clusters.
- Complete Linkage Clustering: The distance between two clusters is the maximum distance between any two points in the two clusters.
- Average Linkage Clustering: The distance between two clusters is the average distance between all pairs of points in the two clusters.
Applications of Hierarchical Clustering
Hierarchical clustering has a wide range of applications, including:
- Market Basket Analysis: To identify groups of customers who purchase similar items.
- Image Segmentation: To segment images into different regions.
- Document Clustering: To group similar documents together.
Further Reading
For more information on hierarchical clustering, you can visit our Clustering Algorithms tutorial.
Example
Let's consider a simple example of hierarchical clustering with two points:
- Point A: (2, 3)
- Point B: (5, 7)
Using single linkage clustering, the distance between these points is:
d(A, B) = sqrt((2 - 5)^2 + (3 - 7)^2) = sqrt(9 + 16) = 5
Using complete linkage clustering, the distance between these points is:
d(A, B) = max(|2 - 5|, |3 - 7|) = max(3, 4) = 4
Using average linkage clustering, the distance between these points is:
d(A, B) = (|2 - 5| + |3 - 7|) / 2 = (3 + 4) / 2 = 3.5
In this example, we can see that the average linkage clustering produces a distance of 3.5, which is the smallest of the three methods. Therefore, the points are most similar according to average linkage clustering.