Artificial IntelligenceEvaluation & Metrics

Quantisation

Overview

Direct Answer

Quantisation is the process of reducing the numerical precision of neural network parameters and activations from high-precision floating-point (typically 32-bit) to lower-bit integer or fixed-point representations (commonly 8-bit or lower). This compression technique directly decreases model size and computational requirements whilst maintaining acceptable inference accuracy.

How It Works

The process maps continuous weight and activation values to a discrete set of representative values through scaling and rounding operations. Post-training quantisation applies this transformation after model training completes, whilst quantisation-aware training incorporates bit-width constraints during training itself. Calibration techniques determine optimal scaling factors by analysing the distribution of values across training data, ensuring minimal information loss from the precision reduction.

Why It Matters

Quantised models require significantly less memory, enabling deployment on resource-constrained devices such as mobile phones, embedded systems, and edge servers. The reduced computational complexity accelerates inference speed and decreases power consumption, critical factors for real-time applications and large-scale distributed inference where bandwidth and latency directly impact operational costs.

Common Applications

Mobile computer vision applications rely heavily on quantised models for efficient object detection and image classification. Edge devices in IoT networks employ quantisation to run language models and recommendation systems locally. Automotive and robotics systems utilise quantised neural networks for real-time perception tasks within power budgets.

Key Considerations

Aggressive quantisation can degrade model accuracy, particularly for complex tasks requiring high precision. The relationship between bit-width reduction and performance degradation is non-linear and task-dependent, requiring empirical validation for each specific application.

Cross-References(2)

Deep Learning
Artificial Intelligence

More in Artificial Intelligence

See Also