Encoder-Decoder Architecture — Technology Wiki

Overview

Direct Answer

An encoder-decoder architecture is a neural network framework in which an encoder network compresses variable-length input into a fixed-size context vector, and a decoder network reconstructs or generates output from that representation. This design enables processing of sequential data with different input and output lengths.

How It Works

The encoder processes input tokens sequentially through recurrent or transformer layers, extracting semantic meaning into a dense vector or sequence of hidden states. The decoder then uses this context representation as its initial state, generating output tokens one at a time through conditional probability distributions. Attention mechanisms often bridge encoder and decoder, allowing the decoder to focus selectively on relevant input regions during generation.

Why It Matters

This architecture fundamentally enables sequence-to-sequence tasks where input and output have mismatched structures, improving accuracy on translation, summarisation, and dialogue systems. Organisations benefit from unified handling of variable-length problems without task-specific feature engineering, reducing development time and operational complexity.

Common Applications

Applications include machine translation (translating between languages), automatic speech recognition (audio to text), image captioning (visual input to textual description), and abstractive summarisation. Medical transcription, customer support automation, and code generation systems rely on this approach.

Key Considerations

The fixed-size bottleneck in traditional designs can lose information from long sequences, mitigated by attention mechanisms and hierarchical encoders. Computational cost scales with sequence length; inference speed may constrain real-time applications.

Cross-References(1)

Deep Learning

Neural Network

Related in Architectures

Deep Learning

A subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected nodes that process information in layers.

Convolutional Neural Network

A deep learning architecture designed for processing structured grid data like images, using convolutional filters to detect features.

Recurrent Neural Network

A neural network architecture where connections between nodes form directed cycles, enabling processing of sequential data.

Long Short-Term Memory

A recurrent neural network architecture designed to learn long-term dependencies by using gating mechanisms to control information flow.

Gated Recurrent Unit

A simplified variant of LSTM that combines the forget and input gates into a single update gate.

Transformer

A neural network architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel processing of sequences.

Attention Mechanism

A neural network component that learns to focus on relevant parts of the input when producing each element of the output.

Autoencoder

A neural network trained to encode input data into a compressed representation and then decode it back to reconstruct the original.

Variational Autoencoder

A generative model that learns a probabilistic latent space representation, enabling generation of new data samples.

Batch Normalisation

A technique that normalises layer inputs during training to stabilise and accelerate deep neural network learning.

Embedding

A learned dense vector representation of discrete data (like words or categories) in a continuous vector space.

More in Deep Learning

Convolutional Layer

Architectures

A neural network layer that applies learnable filters across input data to detect local patterns and features.

Pretraining

Architectures

Training a model on a large general dataset before fine-tuning it on a specific downstream task.

Parameter-Efficient Fine-Tuning

Language Models

Methods for adapting large pretrained models to new tasks by only updating a small fraction of their parameters.

Dropout

Training & Optimisation

A regularisation technique that randomly deactivates neurons during training to prevent co-adaptation and reduce overfitting.

Mamba Architecture

Architectures

A selective state space model that achieves transformer-level performance with linear-time complexity by incorporating input-dependent selection mechanisms into the recurrence.

Contrastive Learning

Architectures

A self-supervised learning approach that trains models by comparing similar and dissimilar pairs of data representations.

Flash Attention

Architectures

An IO-aware attention algorithm that reduces memory reads and writes by tiling the attention computation, enabling faster training of long-context transformer models.

Fine-Tuning

Architectures

The process of taking a pretrained model and further training it on a smaller, task-specific dataset.