Machine LearningTraining Techniques

Stochastic Gradient Descent

Overview

Direct Answer

Stochastic Gradient Descent (SGD) is an optimisation algorithm that updates model parameters using the gradient computed from a single training example or small batch, rather than the entire dataset. This probabilistic approach to parameter adjustment trades some convergence certainty for computational efficiency and faster iteration cycles.

How It Works

At each iteration, SGD samples a single instance or mini-batch randomly from the training set, computes the loss gradient with respect to that sample, and adjusts parameters in the direction opposite to the gradient by a step size called the learning rate. The stochastic nature—randomness in sample selection—introduces noise into the parameter trajectory, which can help escape local minima and reduce memory requirements compared to full-batch methods.

Why It Matters

SGD enables training on datasets too large to fit in memory and reduces wall-clock time per iteration significantly, making it essential for modern deep learning at scale. The noise-induced exploration properties often lead to better generalisation on unseen data, whilst the reduced computational footprint per step allows practitioners to iterate on model design rapidly.

Common Applications

SGD is the foundation for training neural networks across computer vision, natural language processing, and recommendation systems. It underpins backpropagation in deep learning frameworks and remains standard in federated learning environments where data partitioning across devices necessitates sample-wise or batch-wise updates.

Key Considerations

The learning rate becomes critical since constant steps with noisy gradients risk divergence; adaptive variants like Adam and RMSprop address this by adjusting step sizes per parameter. Convergence guarantees weaken compared to batch gradient descent, and practitioners must balance batch size, learning rate scheduling, and epoch count empirically.

Cross-References(1)

Machine Learning

Referenced By1 term mentions Stochastic Gradient Descent

Other entries in the wiki whose definition references Stochastic Gradient Descent — useful for understanding how this concept connects across Machine Learning and adjacent domains.

More in Machine Learning