Machine LearningSupervised Learning

K-Nearest Neighbours

Overview

Direct Answer

K-Nearest Neighbours (KNN) is a non-parametric, instance-based learning algorithm that classifies data points by identifying the k closest training examples in feature space and assigning the majority class label among those neighbours. Unlike parametric models, it makes no assumptions about underlying data distribution.

How It Works

The algorithm calculates distances (typically Euclidean or Manhattan) between a query point and all training samples, then selects the k nearest instances. Classification is determined by majority voting among these k neighbours; regression variants average their target values. Distance metric and k value selection directly influence model behaviour and accuracy.

Why It Matters

KNN remains valuable for rapid prototyping and problems with non-linear decision boundaries where linear assumptions fail. Its interpretability—decisions trace directly to nearest examples—supports explainability requirements in regulated sectors. Performance depends heavily on feature scaling and neighbourhood size, making it essential for baseline comparisons.

Common Applications

The method is widely deployed in recommendation systems, medical diagnosis support (identifying similar patient cases), credit scoring, and image recognition. Collaborative filtering systems use distance-based neighbour selection to suggest content, whilst spatial analysis applications leverage its natural handling of geometric relationships.

Key Considerations

Computational cost scales linearly with training set size since all distances must be calculated at prediction time, making it impractical for massive datasets without optimisation techniques like KD-trees or ball trees. Curse of dimensionality severely degrades performance in high-dimensional spaces where distance metrics become less meaningful.

More in Machine Learning