Overview
Direct Answer
Feature selection is the process of identifying and selecting a subset of input variables that are most predictive or relevant for a machine learning model, while eliminating redundant, irrelevant, or noisy attributes. This differs from dimensionality reduction in that it retains interpretable original variables rather than transforming them.
How It Works
Selection methods operate through three primary approaches: filter methods evaluate variable importance independently using statistical measures; wrapper methods assess subsets by training models iteratively; and embedded methods select features during model training itself, such as regularisation-based approaches. Each method ranks or scores variables based on their contribution to predictive performance or correlation with the target output.
Why It Matters
Reducing input dimensionality decreases computational cost, training time, and model complexity whilst often improving generalisation and interpretability. In regulated industries, fewer variables simplify compliance documentation and explainability requirements. Smaller feature sets also mitigate the curse of dimensionality and reduce storage requirements in resource-constrained deployments.
Common Applications
Healthcare applications use feature selection to identify clinically relevant biomarkers from high-dimensional genomic or imaging datasets. Financial institutions apply it to credit risk models where only the most predictive variables are retained for regulatory reporting. Text classification and natural language processing tasks benefit significantly by selecting informative words or embeddings from vocabularies of millions of potential features.
Key Considerations
The optimal feature subset is often task-specific and dataset-dependent; techniques that perform well on one problem may not transfer directly to another. Over-aggressive feature removal risks discarding subtle but collectively important signals, whilst retaining too many variables undermines the efficiency and interpretability benefits.
Cross-References(1)
Referenced By2 terms mention Feature Selection
Other entries in the wiki whose definition references Feature Selection — useful for understanding how this concept connects across Machine Learning and adjacent domains.
More in Machine Learning
Adam Optimiser
Training TechniquesAn adaptive learning rate optimisation algorithm combining momentum and RMSProp for efficient deep learning training.
Class Imbalance
Feature Engineering & SelectionA situation where the distribution of classes in a dataset is significantly skewed, with some classes vastly outnumbering others.
Matrix Factorisation
Unsupervised LearningA technique that decomposes a matrix into constituent matrices, widely used in recommendation systems and dimensionality reduction.
Tabular Deep Learning
Supervised LearningThe application of deep neural networks to structured tabular datasets, competing with traditional methods like gradient boosting through specialised architectures and regularisation.
Continual Learning
MLOps & ProductionA machine learning paradigm where models learn from a continuous stream of data, accumulating knowledge over time without forgetting previously learned information.
Gradient Descent
Training TechniquesAn optimisation algorithm that iteratively adjusts parameters in the direction of steepest descent of the loss function.
Bagging
Advanced MethodsBootstrap Aggregating — an ensemble method that trains multiple models on random subsets of data and averages their predictions.
Stochastic Gradient Descent
Training TechniquesA variant of gradient descent that updates parameters using a randomly selected subset of training data each iteration.