Deep LearningTraining & Optimisation

ReLU

Overview

Direct Answer

Rectified Linear Unit (ReLU) is an activation function that applies the transformation f(x) = max(0, x), allowing positive inputs to pass through whilst suppressing all negative values to zero. Its simplicity and computational efficiency make it the dominant activation function in modern deep neural networks.

How It Works

ReLU operates element-wise on the output of each neuron, introducing non-linearity by creating a piecewise linear function with a hard threshold at zero. During backpropagation, gradients flow unattenuated through positive regions (gradient = 1), whilst negative regions contribute no gradient signal (gradient = 0), facilitating faster training compared to sigmoid or tanh functions.

Why It Matters

The function's efficiency reduces computational overhead in large-scale neural networks, enabling faster training and inference across GPU and CPU architectures. Its empirical success in achieving state-of-the-art accuracy on image classification, natural language processing, and reinforcement learning tasks has made it the standard choice for practitioners optimising model performance and training speed.

Common Applications

ReLU is ubiquitous in convolutional neural networks for computer vision, recurrent architectures for sequence modelling, and transformer-based language models. It serves as the default activation in frameworks handling image recognition, autonomous vehicle perception systems, and large language model implementations.

Key Considerations

The 'dying ReLU' problem occurs when neurons become inactive and output zero for all inputs, potentially degrading network capacity. Variants such as Leaky ReLU and GELU have been developed to mitigate this limitation whilst preserving computational benefits.

Cross-References(1)

Deep Learning

More in Deep Learning