Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. It is one of the oldest clustering techniques, and is a method of agglomerative hierarchical clustering.

What is Hierarchical Clustering?

Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. It is one of the oldest clustering techniques, and is a method of agglomerative hierarchical clustering. This means that at each step, two clusters are merged until only one cluster remains.

Steps in Hierarchical Clustering

  1. Single Linkage Clustering: The distance between two clusters is the minimum distance between any two points in the two clusters.
  2. Complete Linkage Clustering: The distance between two clusters is the maximum distance between any two points in the two clusters.
  3. Average Linkage Clustering: The distance between two clusters is the average distance between all pairs of points in the two clusters.

Applications of Hierarchical Clustering

Hierarchical clustering has a wide range of applications, including:

  • Market Basket Analysis: To identify groups of customers who purchase similar items.
  • Image Segmentation: To segment images into different regions.
  • Document Clustering: To group similar documents together.

Further Reading

For more information on hierarchical clustering, you can visit our Clustering Algorithms tutorial.

Example

Let's consider a simple example of hierarchical clustering with two points:

  • Point A: (2, 3)
  • Point B: (5, 7)

Using single linkage clustering, the distance between these points is:

d(A, B) = sqrt((2 - 5)^2 + (3 - 7)^2) = sqrt(9 + 16) = 5

Using complete linkage clustering, the distance between these points is:

d(A, B) = max(|2 - 5|, |3 - 7|) = max(3, 4) = 4

Using average linkage clustering, the distance between these points is:

d(A, B) = (|2 - 5| + |3 - 7|) / 2 = (3 + 4) / 2 = 3.5

In this example, we can see that the average linkage clustering produces a distance of 3.5, which is the smallest of the three methods. Therefore, the points are most similar according to average linkage clustering.