Data Science & AnalyticsData Governance

Data Drift

Overview

Direct Answer

Data drift refers to the degradation of machine learning model performance caused by shifts in the statistical distribution of input features or target variables after deployment. This phenomenon occurs when the real-world data generating process diverges from the training data, violating the assumption that training and production distributions remain constant.

How It Works

Models learn patterns from historical training data and optimise weights based on those distributions. When production data exhibits different feature correlations, class proportions, or value ranges, the model's learned decision boundaries become misaligned with actual patterns. This misalignment accumulates as predictions become increasingly inaccurate without explicit retraining or monitoring mechanisms to detect distributional changes.

Why It Matters

Model degradation directly impacts business outcomes through reduced prediction accuracy, flawed decision-making, and compliance violations in regulated industries. Organisations that fail to detect and remediate drift experience financial losses, customer dissatisfaction, and reputational damage. Continuous monitoring and retraining are essential to maintain model reliability and ROI.

Common Applications

Fraud detection systems experience drift as fraudster behaviour evolves; credit risk models drift when economic conditions shift; recommendation engines drift as user preferences change seasonally; medical diagnostic models drift as patient demographics or equipment calibration varies.

Key Considerations

Distinguishing data drift from concept drift (target distribution changes) requires different remediation strategies. Drift detection introduces operational overhead and latency considerations that must be balanced against the cost of model degradation.

Cross-References(1)

Machine Learning

Referenced By1 term mentions Data Drift

Other entries in the wiki whose definition references Data Drift — useful for understanding how this concept connects across Data Science & Analytics and adjacent domains.

More in Data Science & Analytics

See Also