Overview
Direct Answer
A mini-batch is a small, fixed-size subset of training data used to compute a single gradient update during iterative optimisation. It represents a practical compromise between processing individual samples (stochastic gradient descent) and the entire dataset (batch gradient descent).
How It Works
During each training iteration, a mini-batch of typically 32 to 512 samples is selected from the training dataset. The model computes predictions for all samples in the subset, calculates the loss across those samples, and backpropagates to produce a single gradient estimate. This aggregated gradient is used to update model weights before the next mini-batch is processed.
Why It Matters
Mini-batches enable efficient hardware utilisation by vectorising computations across multiple samples simultaneously, reducing training time substantially on GPUs and TPUs. They also provide more stable gradient estimates than single-sample updates, improving convergence behaviour and final model accuracy whilst maintaining computational feasibility for large datasets.
Common Applications
Mini-batch training is standard in deep learning frameworks across computer vision (image classification), natural language processing (transformer model training), and recommender systems. It is universally employed in production machine learning pipelines for neural networks, whether in research institutions or enterprise deployments.
Key Considerations
The choice of batch size introduces a hyperparameter tuning requirement; larger batches reduce noise but may converge to sharper minima, whilst smaller batches provide regularisation effects but increase training iterations. Memory constraints and hardware availability often dictate practical batch size limits.
Cross-References(2)
More in Machine Learning
t-SNE
Unsupervised Learningt-Distributed Stochastic Neighbour Embedding — a technique for visualising high-dimensional data in two or three dimensions.
Collaborative Filtering
Unsupervised LearningA recommendation technique that makes predictions based on the collective preferences and behaviour of many users.
Random Forest
Supervised LearningAn ensemble learning method that constructs multiple decision trees during training and outputs the mode of their predictions.
UMAP
Unsupervised LearningUniform Manifold Approximation and Projection — a dimensionality reduction technique for visualisation and general non-linear reduction.
Model Serving
MLOps & ProductionThe infrastructure and processes for deploying trained machine learning models to production environments for real-time predictions.
K-Nearest Neighbours
Supervised LearningA simple algorithm that classifies data points based on the majority class of their k closest neighbours in feature space.
Logistic Regression
Supervised LearningA classification algorithm that models the probability of a binary outcome using a logistic function.
Epoch
MLOps & ProductionOne complete pass through the entire training dataset during the machine learning model training process.