Deep LearningArchitectures

Exploding Gradient

Overview

Direct Answer

Exploding gradient is a numerical instability during backpropagation in which gradients accumulate multiplicatively across layers, reaching excessively large values that cause weight updates to overshoot optimal parameters and destabilise training. This phenomenon is distinct from vanishing gradients and occurs most frequently in recurrent neural networks and very deep feedforward architectures.

How It Works

During backpropagation, gradients are computed via the chain rule by multiplying partial derivatives across layers. When activation function derivatives and weight matrices have values greater than one, successive multiplications produce exponentially growing gradient magnitudes. In recurrent networks, unrolling across many timesteps amplifies this effect, leading to NaN or Inf values in weight updates that render the model untrainable within a few iterations.

Why It Matters

Training instability directly increases computational cost through failed training runs and necessitates careful hyperparameter selection. In production pipelines, unstable training reduces model reliability and increases time-to-deployment for sequential architectures used in natural language processing and time-series forecasting, where recurrence is fundamental.

Common Applications

The problem is prevalent in long short-term memory networks, gated recurrent units, and multi-layer perceptrons exceeding 10–20 layers. Applications include machine translation, speech recognition, and financial forecasting where temporal dependencies require deep or recurrent architectures.

Key Considerations

Gradient clipping and normalisation techniques mitigate the issue but introduce hyperparameter tuning overhead. The severity depends on weight initialisation strategy and activation function choice, requiring practitioners to balance architectural expressiveness against training stability.

Cross-References(1)

Machine Learning

Referenced By1 term mentions Exploding Gradient

Other entries in the wiki whose definition references Exploding Gradient — useful for understanding how this concept connects across Deep Learning and adjacent domains.

More in Deep Learning

See Also