Machine LearningMLOps & Production

Active Learning

Overview

Direct Answer

Active learning is a machine learning paradigm in which an algorithm selectively queries an oracle (typically a human annotator) to label the most informative unlabelled data points, rather than passively consuming a pre-labelled dataset. This approach reduces annotation effort whilst maintaining or improving model performance.

How It Works

The algorithm trains on an initial small labelled set, then iteratively identifies which unlabelled samples would provide the greatest reduction in model uncertainty or error if annotated. Selection strategies include uncertainty sampling (highest entropy predictions), query-by-committee (disagreement among ensemble members), and expected model change. The newly labelled samples are incorporated into the training set, and the process repeats until a stopping criterion is met.

Why It Matters

Organisations face significant costs when acquiring expert labels, particularly in domains requiring domain-specific knowledge such as medical imaging, compliance review, or scientific research. Active learning can reduce labelling costs by 50–80 per cent relative to random sampling whilst achieving equivalent model accuracy, accelerating deployment timelines and reducing expenses for resource-constrained teams.

Common Applications

Applications include medical diagnosis systems where radiologist annotations are expensive, sentiment analysis in low-resource languages, anomaly detection in security systems, and biological sequence classification. Legal technology firms employ active learning to optimise document review workflows by prioritising uncertain cases for human review.

Key Considerations

The effectiveness of active learning depends heavily on the quality of the selection strategy and the availability of reliable oracles; poor query design can waste annotations. Additionally, active learning introduces complexity in model validation and may exhibit suboptimal performance in highly imbalanced datasets or when the initial sample is unrepresentative.

Cross-References(2)

Machine Learning
Blockchain & DLT

More in Machine Learning

See Also