Machine LearningUnsupervised Learning

Clustering

Overview

Direct Answer

Clustering is an unsupervised learning technique that partitions datasets into groups of similar data points without requiring predefined class labels. It identifies inherent patterns and structures within data by measuring similarity or distance between observations.

How It Works

Clustering algorithms compute similarity metrics (such as Euclidean distance or cosine similarity) between data points and iteratively assign observations to groups that minimise within-group variance or maximise cohesion. Common approaches include centroid-based methods like K-means, density-based methods like DBSCAN, and hierarchical approaches that build dendrograms of nested partitions.

Why It Matters

Organisations use clustering to discover hidden customer segments, reduce dimensionality for downstream analysis, and identify anomalies without manual labelling costs. It enables data-driven decision-making in scenarios where ground truth is unavailable or expensive to obtain.

Common Applications

Applications include customer segmentation in retail and marketing, genomic sequence grouping in bioinformatics, document organisation in information retrieval, and anomaly detection in cybersecurity. It supports image segmentation in computer vision and helps identify disease subtypes in medical research.

Key Considerations

Practitioners must select appropriate distance metrics and algorithm families based on data geometry, as results are sensitive to initialisation and feature scaling. Determining the optimal number of clusters remains a fundamental challenge requiring domain expertise and validation metrics like silhouette scores.

Cross-References(1)

Machine Learning

Referenced By4 terms mention Clustering

Other entries in the wiki whose definition references Clustering — useful for understanding how this concept connects across Machine Learning and adjacent domains.

More in Machine Learning