Machine LearningTraining Techniques

Mini-Batch

Overview

Direct Answer

A mini-batch is a small, fixed-size subset of training data used to compute a single gradient update during iterative optimisation. It represents a practical compromise between processing individual samples (stochastic gradient descent) and the entire dataset (batch gradient descent).

How It Works

During each training iteration, a mini-batch of typically 32 to 512 samples is selected from the training dataset. The model computes predictions for all samples in the subset, calculates the loss across those samples, and backpropagates to produce a single gradient estimate. This aggregated gradient is used to update model weights before the next mini-batch is processed.

Why It Matters

Mini-batches enable efficient hardware utilisation by vectorising computations across multiple samples simultaneously, reducing training time substantially on GPUs and TPUs. They also provide more stable gradient estimates than single-sample updates, improving convergence behaviour and final model accuracy whilst maintaining computational feasibility for large datasets.

Common Applications

Mini-batch training is standard in deep learning frameworks across computer vision (image classification), natural language processing (transformer model training), and recommender systems. It is universally employed in production machine learning pipelines for neural networks, whether in research institutions or enterprise deployments.

Key Considerations

The choice of batch size introduces a hyperparameter tuning requirement; larger batches reduce noise but may converge to sharper minima, whilst smaller batches provide regularisation effects but increase training iterations. Memory constraints and hardware availability often dictate practical batch size limits.

Cross-References(2)

More in Machine Learning