Overview
Direct Answer
A deep neural network architecture that employs skip connections (residual connections) to allow input signals to bypass one or more layers, enabling the training of networks with 100+ layers by mitigating the vanishing gradient problem.
How It Works
Skip connections add the input of a layer directly to its output, forcing the network to learn residual mappings—the difference between desired and input signals—rather than learning the full transformation. This architectural modification preserves gradient magnitude during backpropagation, allowing errors to flow through very deep networks without exponential decay.
Why It Matters
Residual networks dramatically improved accuracy in large-scale image recognition tasks and became foundational for modern computer vision systems. The ability to train substantially deeper models with better convergence properties reduced training time and improved performance on complex visual and sequential tasks, driving adoption across industries requiring high-accuracy perception systems.
Common Applications
Medical image analysis for diagnostic detection, object recognition in autonomous vehicle systems, and large-scale image classification in e-commerce platforms rely on residual architectures. Natural language processing models and speech recognition systems also employ residual connections to process sequential data more effectively.
Key Considerations
Deeper networks do not automatically produce better results; residual connections mitigate training difficulties but require careful hyperparameter tuning and computational resources. Practitioners must balance network depth against overfitting risk and deployment constraints.
Cross-References(1)
More in Deep Learning
Parameter-Efficient Fine-Tuning
Language ModelsMethods for adapting large pretrained models to new tasks by only updating a small fraction of their parameters.
Variational Autoencoder
ArchitecturesA generative model that learns a probabilistic latent space representation, enabling generation of new data samples.
Skip Connection
ArchitecturesA neural network shortcut that allows the output of one layer to bypass intermediate layers and be added to a later layer's output.
Capsule Network
ArchitecturesA neural network architecture that groups neurons into capsules to better capture spatial hierarchies and part-whole relationships.
Residual Connection
Training & OptimisationA skip connection that adds a layer's input directly to its output, enabling gradient flow through deep networks and allowing training of architectures with hundreds of layers.
Mixture of Experts
ArchitecturesAn architecture where different specialised sub-networks (experts) are selectively activated based on the input.
Word Embedding
Language ModelsDense vector representations of words where semantically similar words are mapped to nearby points in vector space.
Key-Value Cache
ArchitecturesAn optimisation in autoregressive transformer inference that stores previously computed key and value tensors to avoid redundant computation during sequential token generation.