Overview
Direct Answer
A Variational Autoencoder (VAE) is a generative deep learning model that encodes input data into a probabilistic latent space and reconstructs it through a decoder, enabling both data compression and synthesis of novel samples. Unlike standard autoencoders, VAEs impose a prior distribution on the latent representation, making the learned space suitable for generative tasks.
How It Works
The encoder network maps input data to parameters of a probability distribution (typically Gaussian) in latent space rather than fixed point values. The decoder samples from this distribution and reconstructs the input, whilst the model optimises a loss function combining reconstruction error and a Kullback–Leibler divergence term that regularises the latent distribution towards the prior. This dual objective ensures the latent space remains continuous and well-structured for interpolation and sampling.
Why It Matters
VAEs provide a principled framework for learning interpretable, continuous latent representations whilst maintaining tractable inference and generation. This capability reduces data annotation burden, enables anomaly detection through reconstruction likelihood, and supports downstream machine learning tasks through dimensionality reduction without sacrificing generative capability.
Common Applications
Applications include image generation and manipulation in computer vision, anomaly detection in manufacturing and healthcare diagnostics, and feature learning for semi-supervised classification. VAEs are also employed in drug discovery for molecular generation and in recommendation systems for learning latent user preferences.
Key Considerations
VAEs typically produce blurrier reconstructions than deterministic autoencoders due to the stochastic sampling process. The model's performance depends heavily on appropriate weighting between reconstruction and regularisation terms, and the choice of prior distribution significantly influences the learned latent structure and generative quality.
More in Deep Learning
Data Parallelism
ArchitecturesA distributed training strategy that replicates the model across multiple devices and divides training data into batches processed simultaneously, synchronising gradients after each step.
Self-Attention
Training & OptimisationAn attention mechanism where each element in a sequence attends to all other elements to compute its representation.
Diffusion Model
Generative ModelsA generative model that learns to reverse a gradual noising process, generating high-quality samples from random noise.
Attention Head
Training & OptimisationAn individual attention computation within a multi-head attention layer that learns to focus on different aspects of the input, with outputs concatenated for richer representations.
Gradient Checkpointing
ArchitecturesA memory optimisation that trades computation for memory by recomputing intermediate activations during the backward pass instead of storing them all during the forward pass.
Knowledge Distillation
ArchitecturesA model compression technique where a smaller student model learns to mimic the behaviour of a larger teacher model.
Tensor Parallelism
ArchitecturesA distributed computing strategy that splits individual layer computations across multiple devices by partitioning weight matrices along specific dimensions.
Pre-Training
Language ModelsThe initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.