Quantisation

Overview

Direct Answer

Quantisation is the process of reducing the numerical precision of neural network parameters and activations from high-precision floating-point (typically 32-bit) to lower-bit integer or fixed-point representations (commonly 8-bit or lower). This compression technique directly decreases model size and computational requirements whilst maintaining acceptable inference accuracy.

How It Works

The process maps continuous weight and activation values to a discrete set of representative values through scaling and rounding operations. Post-training quantisation applies this transformation after model training completes, whilst quantisation-aware training incorporates bit-width constraints during training itself. Calibration techniques determine optimal scaling factors by analysing the distribution of values across training data, ensuring minimal information loss from the precision reduction.

Why It Matters

Quantised models require significantly less memory, enabling deployment on resource-constrained devices such as mobile phones, embedded systems, and edge servers. The reduced computational complexity accelerates inference speed and decreases power consumption, critical factors for real-time applications and large-scale distributed inference where bandwidth and latency directly impact operational costs.

Common Applications

Mobile computer vision applications rely heavily on quantised models for efficient object detection and image classification. Edge devices in IoT networks employ quantisation to run language models and recommendation systems locally. Automotive and robotics systems utilise quantised neural networks for real-time perception tasks within power budgets.

Key Considerations

Aggressive quantisation can degrade model accuracy, particularly for complex tasks requiring high precision. The relationship between bit-width reduction and performance degradation is non-linear and task-dependent, requiring empirical validation for each specific application.

Cross-References(2)

Deep Learning

Neural Network

Artificial Intelligence

Precision

Related in Evaluation & Metrics

AI Benchmark

Standardised tests and datasets used to evaluate and compare the performance of AI models across specific tasks.

BLEU Score

A metric for evaluating the quality of machine-generated text by comparing it to reference translations or texts.

Perplexity

A measurement of how well a probability model predicts a sample, commonly used to evaluate language model performance.

F1 Score

A harmonic mean of precision and recall, providing a single metric that balances both false positives and false negatives.

Confusion Matrix

A table used to evaluate classification model performance by comparing predicted classifications against actual classifications.

ROC Curve

A graphical plot illustrating the diagnostic ability of a binary classifier as its discrimination threshold is varied.

AUC Score

Area Under the ROC Curve, a single metric summarising a classifier's ability to distinguish between classes.

Precision

The ratio of true positive predictions to all positive predictions, measuring accuracy of positive classifications.

Recall

The ratio of true positive predictions to all actual positive instances, measuring completeness of positive identification.

TinyML

Machine learning techniques optimised to run on microcontrollers and extremely resource-constrained embedded devices.

More in Artificial Intelligence

Forward Chaining

Reasoning & Planning

An inference strategy that starts with known facts and applies rules to derive new conclusions until a goal is reached.

AI Hallucination

Safety & Governance

When an AI model generates plausible-sounding but factually incorrect or fabricated information with high confidence.

Planning Algorithm

Reasoning & Planning

An AI algorithm that generates a sequence of actions to achieve a specified goal from an initial state.

Zero-Shot Prompting

Prompting & Interaction

Querying a language model to perform a task it was not explicitly trained on, without providing any examples in the prompt.

Commonsense Reasoning

Foundations & Theory

The AI capability to make inferences based on everyday knowledge that humans typically take for granted.

AI Ethics

Foundations & Theory

The branch of ethics examining moral issues surrounding the development, deployment, and impact of artificial intelligence on society.

Ontology

Foundations & Theory

A formal representation of knowledge as a set of concepts, categories, and relationships within a specific domain.

Chinese Room Argument

Foundations & Theory

A thought experiment by John Searle arguing that executing a program cannot give a computer genuine understanding or consciousness.

Overview

Direct Answer

How It Works

Why It Matters

Common Applications

Key Considerations

Cross-References(2)

Related in Evaluation & Metrics

AI Benchmark

BLEU Score

Perplexity

F1 Score

Confusion Matrix

ROC Curve

AUC Score

Precision

Recall

TinyML

More in Artificial Intelligence

Forward Chaining

AI Hallucination

Planning Algorithm

Zero-Shot Prompting

Commonsense Reasoning

AI Ethics

Ontology

Chinese Room Argument

See Also

Neural Network