Underfitting — Technology Wiki

Overview

Direct Answer

Underfitting occurs when a machine learning model lacks sufficient complexity to learn the underlying patterns and relationships in the training data, resulting in consistently poor predictive performance on both training and test datasets. This typically indicates the model architecture or feature set is inadequate rather than a problem with data quality.

How It Works

An overly simplistic model—such as linear regression applied to non-linear data or a shallow neural network for complex classification tasks—cannot represent the decision boundaries or functional relationships present in the data. The model remains biased away from the true target function, causing high training loss and high test loss simultaneously, with no opportunity to improve through additional training iterations.

Why It Matters

Organisations investing in machine learning initiatives require models that generalise effectively to new data. Underfitting wastes computational resources and delays deployment timelines, whilst producing unreliable predictions that undermine business decisions in domains such as credit risk assessment, demand forecasting, and medical diagnostics.

Common Applications

Underfitting is frequently observed when applying simple baseline models to complex datasets in finance, healthcare analytics, and natural language processing. Examples include using a linear model for image classification, applying polynomial degree-1 regression to non-linear physical phenomena, or deploying shallow decision trees for high-dimensional fraud detection.

Key Considerations

Distinguishing between underfitting and other performance issues requires comparative analysis across model complexity levels and cross-validation strategies. Practitioners must balance model sophistication against interpretability requirements and computational constraints, avoiding the assumption that increased complexity always resolves performance deficits.

Related in Training Techniques

Ridge Regression

A regularised regression technique that adds an L2 penalty term to prevent overfitting by constraining coefficient magnitudes.

Elastic Net

A regularisation technique combining L1 and L2 penalties, balancing feature selection and coefficient shrinkage.

Cross-Validation

A resampling technique that partitions data into subsets, training on some and validating on others to assess model generalisation.

Overfitting

When a model learns the training data too well, including noise, resulting in poor performance on unseen data.

Bias-Variance Tradeoff

The balance between a model's ability to minimise bias (error from assumptions) and variance (sensitivity to training data fluctuations).

Regularisation

Techniques that add constraints or penalties to a model to prevent overfitting and improve generalisation to new data.

Gradient Descent

An optimisation algorithm that iteratively adjusts parameters in the direction of steepest descent of the loss function.

Stochastic Gradient Descent

A variant of gradient descent that updates parameters using a randomly selected subset of training data each iteration.

Adam Optimiser

An adaptive learning rate optimisation algorithm combining momentum and RMSProp for efficient deep learning training.

Learning Rate

A hyperparameter that controls how much model parameters are adjusted with respect to the loss gradient during training.

Loss Function

A mathematical function that measures the difference between predicted outputs and actual target values during model training.

Backpropagation

The algorithm for computing gradients of the loss function with respect to network weights, enabling neural network training.

More in Machine Learning

Reinforcement Learning

MLOps & Production

A machine learning paradigm where agents learn optimal behaviour through trial and error, receiving rewards or penalties.

Model Serving

MLOps & Production

The infrastructure and processes for deploying trained machine learning models to production environments for real-time predictions.

Curriculum Learning

Advanced Methods

A training strategy that presents examples to a model in a meaningful order, typically from easy to hard.

Feature Engineering

Feature Engineering & Selection

The process of using domain knowledge to create, select, and transform input variables to improve model performance.

Active Learning

MLOps & Production

A machine learning approach where the algorithm interactively queries a user or oracle to label new data points.

Transfer Learning

Advanced Methods

A technique where knowledge gained from training on one task is applied to a different but related task.

Naive Bayes

Supervised Learning

A probabilistic classifier based on applying Bayes' theorem with the assumption of independence between features.

Batch Learning

MLOps & Production

Training a machine learning model on the entire dataset at once before deployment, as opposed to incremental updates.