Overview
Direct Answer
A Recurrent Neural Network (RNN) is a deep learning architecture with feedback loops that enable hidden states to persist across sequential inputs, allowing the model to retain and leverage information from previous timesteps. This internal memory mechanism makes RNNs particularly suited to tasks where temporal dependencies and context are critical.
How It Works
RNNs process input sequences one element at a time, passing a hidden state forward alongside each computation. At each timestep, the network combines the current input with the previous hidden state through matrix multiplication and activation functions, creating a chain of memory. This feedback structure allows gradients to propagate backward through time, though vanishing or exploding gradients can complicate training on long sequences.
Why It Matters
Organisations rely on RNNs for sequence modelling where temporal patterns directly impact accuracy and business outcomes. Applications requiring context-awareness—such as language understanding, time-series forecasting, and speech recognition—benefit from the architecture's inherent ability to model dependencies without explicit feature engineering, reducing development cycle time and improving predictive performance.
Common Applications
RNNs power natural language processing tasks including machine translation, sentiment analysis, and named entity recognition. Time-series forecasting in finance and operations, speech-to-text systems, and video frame prediction represent key industrial applications where sequential patterns must be learned and extrapolated.
Key Considerations
Training RNNs on very long sequences faces the vanishing gradient problem, limiting effective memory depth; variants such as LSTMs and GRUs address this through gating mechanisms. Computational cost scales with sequence length, and RNNs are often slower to train than transformer-based alternatives for certain tasks.
Cross-References(1)
Referenced By1 term mentions Recurrent Neural Network
Other entries in the wiki whose definition references Recurrent Neural Network — useful for understanding how this concept connects across Deep Learning and adjacent domains.
More in Deep Learning
Flash Attention
ArchitecturesAn IO-aware attention algorithm that reduces memory reads and writes by tiling the attention computation, enabling faster training of long-context transformer models.
Attention Head
Training & OptimisationAn individual attention computation within a multi-head attention layer that learns to focus on different aspects of the input, with outputs concatenated for richer representations.
Weight Initialisation
ArchitecturesThe strategy for setting initial parameter values in a neural network before training begins.
Residual Connection
Training & OptimisationA skip connection that adds a layer's input directly to its output, enabling gradient flow through deep networks and allowing training of architectures with hundreds of layers.
LoRA
Language ModelsLow-Rank Adaptation — a parameter-efficient fine-tuning technique that adds trainable low-rank matrices to frozen pretrained weights.
Pretraining
ArchitecturesTraining a model on a large general dataset before fine-tuning it on a specific downstream task.
Gradient Checkpointing
ArchitecturesA memory optimisation that trades computation for memory by recomputing intermediate activations during the backward pass instead of storing them all during the forward pass.
Mixture of Experts
ArchitecturesAn architecture where different specialised sub-networks (experts) are selectively activated based on the input.