Overview
Direct Answer
Hierarchical clustering is an unsupervised learning method that organises data points into a nested tree structure (dendrogram) by iteratively merging similar clusters (agglomerative approach) or splitting heterogeneous clusters (divisive approach). Unlike partitioning methods such as K-means, it does not require pre-specifying the number of clusters.
How It Works
Agglomerative hierarchical clustering begins with each data point as a singleton cluster, then sequentially merges the two closest clusters using a linkage criterion—such as single linkage (minimum distance), complete linkage (maximum distance), or average linkage (mean distance)—until a single encompassing cluster remains. The process generates a dendrogram that visualises cluster relationships at all granularities, allowing analysts to cut the tree at any height to obtain a desired number of clusters.
Why It Matters
Organisations value hierarchical clustering for exploratory data analysis because it reveals underlying cluster structure without prior assumptions, supports dendrogram-based decision-making, and scales naturally across domains from genomics to customer segmentation. The interpretability of the dendrogram aids stakeholders in validating cluster quality and understanding relationships between groups.
Common Applications
Applications include biological taxonomy and gene expression analysis in bioinformatics, document organisation and text mining in information retrieval, customer segmentation in retail and finance, and ecological species classification. Dendrograms are widely used in phylogenetic analysis and hierarchical taxonomy construction.
Key Considerations
Computational complexity grows quadratically with dataset size, making the method impractical for very large datasets. Linkage choice significantly influences results; greedy merging decisions are irreversible, potentially trapping the algorithm in suboptimal configurations.
Cross-References(1)
More in Machine Learning
Class Imbalance
Feature Engineering & SelectionA situation where the distribution of classes in a dataset is significantly skewed, with some classes vastly outnumbering others.
Semi-Supervised Learning
Advanced MethodsA learning approach that combines a small amount of labelled data with a large amount of unlabelled data during training.
Data Augmentation
Feature Engineering & SelectionTechniques that artificially increase the size and diversity of training data through transformations like rotation, flipping, and cropping.
Multi-Task Learning
MLOps & ProductionA machine learning approach where a model is simultaneously trained on multiple related tasks to improve generalisation.
Curriculum Learning
Advanced MethodsA training strategy that presents examples to a model in a meaningful order, typically from easy to hard.
Regularisation
Training TechniquesTechniques that add constraints or penalties to a model to prevent overfitting and improve generalisation to new data.
Feature Selection
MLOps & ProductionThe process of identifying and selecting the most relevant input variables for a machine learning model.
Deep Reinforcement Learning
Reinforcement LearningCombining deep neural networks with reinforcement learning to enable agents to learn complex decision-making from raw sensory input.