Deep LearningTraining & Optimisation

Activation Function

Overview

Direct Answer

An activation function is a mathematical operation applied to the weighted sum of inputs at each neuron, introducing non-linearity to enable neural networks to learn complex, non-linear relationships in data. Without it, stacked layers would collapse into a single linear transformation, severely limiting representational capacity.

How It Works

During forward propagation, each neuron computes a weighted sum of its inputs plus a bias term, then passes this value through the chosen function (such as ReLU, sigmoid, or tanh) before outputting to the next layer. This non-linear transformation allows the network to approximate arbitrary functions. During backpropagation, the derivative of the function is used to compute gradients for weight updates.

Why It Matters

Selection of the appropriate function directly impacts training speed, convergence behaviour, and final model accuracy. Poor choices can cause vanishing or exploding gradients, slowing training significantly or preventing learning altogether. Efficient functions like ReLU reduce computational overhead, lowering inference costs in production systems.

Common Applications

ReLU is standard in convolutional neural networks for image recognition tasks. Sigmoid and tanh remain prevalent in recurrent networks for time-series forecasting. Softmax is essential in multi-class classification layers across natural language processing and computer vision applications.

Key Considerations

ReLU units can suffer from the 'dying ReLU' problem where neurons become inactive permanently. The choice must align with the output layer's requirements: sigmoid for binary classification, softmax for multi-class, and linear for regression tasks.

Cross-References(1)

Deep Learning

Referenced By3 terms mention Activation Function

Other entries in the wiki whose definition references Activation Function — useful for understanding how this concept connects across Deep Learning and adjacent domains.

More in Deep Learning