Overview
Direct Answer
Feature engineering is the process of selecting, transforming, and creating input variables from raw data to maximise the predictive power and generalisation capability of machine learning models. It bridges domain expertise and algorithmic capability by deliberately constructing representations that algorithms can learn from effectively.
How It Works
Practitioners analyse raw data to identify which variables carry predictive signal, then apply transformations such as normalisation, polynomial expansion, binning, or interaction terms to expose non-linear relationships. Domain knowledge informs decisions about variable selection and derivation—for instance, converting timestamps into cyclical features or combining multiple weak signals into composite indicators—which the learning algorithm then leverages during training.
Why It Matters
Well-engineered features substantially reduce model training time, improve prediction accuracy, and decrease the amount of data required to achieve target performance. This directly lowers computational costs and enables organisations to deploy models with higher confidence in lower-data regimes, particularly important in regulated industries where data scarcity is common.
Common Applications
Financial services use feature construction to detect fraud patterns from transaction metadata; healthcare organisations engineer temporal and demographic features for disease prediction; e-commerce platforms derive behavioural indicators from clickstream data for recommendation systems.
Key Considerations
Over-engineering features increases model complexity and overfitting risk without corresponding gains in generalisation; conversely, insufficient attention to feature quality wastes model capacity. The effort remains labour-intensive and domain-dependent, making it difficult to automate and transfer across problem contexts.
Cited Across coldai.org2 pages mention Feature Engineering
Industry pages, services, technologies, capabilities, case studies and insights on coldai.org that reference Feature Engineering — providing applied context for how the concept is used in client engagements.
Referenced By1 term mentions Feature Engineering
Other entries in the wiki whose definition references Feature Engineering — useful for understanding how this concept connects across Machine Learning and adjacent domains.
More in Machine Learning
Stochastic Gradient Descent
Training TechniquesA variant of gradient descent that updates parameters using a randomly selected subset of training data each iteration.
Reinforcement Learning
MLOps & ProductionA machine learning paradigm where agents learn optimal behaviour through trial and error, receiving rewards or penalties.
Supervised Learning
MLOps & ProductionA machine learning paradigm where models are trained on labelled data, learning to map inputs to known outputs.
Mini-Batch
Training TechniquesA subset of the training data used to compute a gradient update during stochastic gradient descent.
Tabular Deep Learning
Supervised LearningThe application of deep neural networks to structured tabular datasets, competing with traditional methods like gradient boosting through specialised architectures and regularisation.
Unsupervised Learning
MLOps & ProductionA machine learning approach where models discover patterns and structures in data without labelled examples.
Self-Supervised Learning
Advanced MethodsA learning paradigm where models generate their own supervisory signals from unlabelled data through pretext tasks.
Semi-Supervised Learning
Advanced MethodsA learning approach that combines a small amount of labelled data with a large amount of unlabelled data during training.