Machine LearningUnsupervised Learning

Hierarchical Clustering

Overview

Direct Answer

Hierarchical clustering is an unsupervised learning method that organises data points into a nested tree structure (dendrogram) by iteratively merging similar clusters (agglomerative approach) or splitting heterogeneous clusters (divisive approach). Unlike partitioning methods such as K-means, it does not require pre-specifying the number of clusters.

How It Works

Agglomerative hierarchical clustering begins with each data point as a singleton cluster, then sequentially merges the two closest clusters using a linkage criterion—such as single linkage (minimum distance), complete linkage (maximum distance), or average linkage (mean distance)—until a single encompassing cluster remains. The process generates a dendrogram that visualises cluster relationships at all granularities, allowing analysts to cut the tree at any height to obtain a desired number of clusters.

Why It Matters

Organisations value hierarchical clustering for exploratory data analysis because it reveals underlying cluster structure without prior assumptions, supports dendrogram-based decision-making, and scales naturally across domains from genomics to customer segmentation. The interpretability of the dendrogram aids stakeholders in validating cluster quality and understanding relationships between groups.

Common Applications

Applications include biological taxonomy and gene expression analysis in bioinformatics, document organisation and text mining in information retrieval, customer segmentation in retail and finance, and ecological species classification. Dendrograms are widely used in phylogenetic analysis and hierarchical taxonomy construction.

Key Considerations

Computational complexity grows quadratically with dataset size, making the method impractical for very large datasets. Linkage choice significantly influences results; greedy merging decisions are irreversible, potentially trapping the algorithm in suboptimal configurations.

Cross-References(1)

Machine Learning

More in Machine Learning