Overview
Direct Answer
Dimensionality reduction comprises mathematical techniques that compress high-dimensional datasets into lower-dimensional representations whilst preserving the most informative aspects of the original data. This process removes redundant or noisy features, reducing computational complexity without sacrificing essential patterns or predictive power.
How It Works
These methods operate through either feature selection (identifying and retaining the most relevant original variables) or feature extraction (mathematically combining variables into new, uncorrelated dimensions). Principal Component Analysis identifies orthogonal axes of maximum variance; manifold learning techniques like t-SNE preserve local neighbourhood structure; autoencoders use neural networks to learn compressed latent representations through reconstruction objectives.
Why It Matters
High-dimensional data increases computational cost, memory usage, and model training time exponentially. Reducing dimensionality accelerates algorithms, improves model interpretability, mitigates overfitting risk, and enables visualisation of complex datasets. This directly lowers infrastructure costs and improves inference latency in production systems.
Common Applications
Applications include image compression and feature extraction in computer vision pipelines, gene expression analysis in genomics, customer segmentation in marketing analytics, and noise reduction in signal processing. Text data undergoes dimensionality reduction through techniques like Latent Semantic Analysis before classification or clustering tasks.
Key Considerations
Information loss is inevitable; practitioners must balance compression gains against the cost of discarding potentially relevant information. The choice of technique depends critically on data structure, interpretability requirements, and whether preserving global or local patterns matters more for the downstream task.
Referenced By3 terms mention Dimensionality Reduction
Other entries in the wiki whose definition references Dimensionality Reduction — useful for understanding how this concept connects across Machine Learning and adjacent domains.
More in Machine Learning
Cross-Validation
Training TechniquesA resampling technique that partitions data into subsets, training on some and validating on others to assess model generalisation.
Gradient Descent
Training TechniquesAn optimisation algorithm that iteratively adjusts parameters in the direction of steepest descent of the loss function.
Polynomial Regression
Supervised LearningA form of regression analysis where the relationship between variables is modelled as an nth degree polynomial.
Semi-Supervised Learning
Advanced MethodsA learning approach that combines a small amount of labelled data with a large amount of unlabelled data during training.
Bagging
Advanced MethodsBootstrap Aggregating — an ensemble method that trains multiple models on random subsets of data and averages their predictions.
Underfitting
Training TechniquesWhen a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test data.
Deep Reinforcement Learning
Reinforcement LearningCombining deep neural networks with reinforcement learning to enable agents to learn complex decision-making from raw sensory input.
Supervised Learning
MLOps & ProductionA machine learning paradigm where models are trained on labelled data, learning to map inputs to known outputs.