Overview
Direct Answer
Data drift refers to the degradation of machine learning model performance caused by shifts in the statistical distribution of input features or target variables after deployment. This phenomenon occurs when the real-world data generating process diverges from the training data, violating the assumption that training and production distributions remain constant.
How It Works
Models learn patterns from historical training data and optimise weights based on those distributions. When production data exhibits different feature correlations, class proportions, or value ranges, the model's learned decision boundaries become misaligned with actual patterns. This misalignment accumulates as predictions become increasingly inaccurate without explicit retraining or monitoring mechanisms to detect distributional changes.
Why It Matters
Model degradation directly impacts business outcomes through reduced prediction accuracy, flawed decision-making, and compliance violations in regulated industries. Organisations that fail to detect and remediate drift experience financial losses, customer dissatisfaction, and reputational damage. Continuous monitoring and retraining are essential to maintain model reliability and ROI.
Common Applications
Fraud detection systems experience drift as fraudster behaviour evolves; credit risk models drift when economic conditions shift; recommendation engines drift as user preferences change seasonally; medical diagnostic models drift as patient demographics or equipment calibration varies.
Key Considerations
Distinguishing data drift from concept drift (target distribution changes) requires different remediation strategies. Drift detection introduces operational overhead and latency considerations that must be balanced against the cost of model degradation.
Cross-References(1)
Referenced By1 term mentions Data Drift
Other entries in the wiki whose definition references Data Drift — useful for understanding how this concept connects across Data Science & Analytics and adjacent domains.
More in Data Science & Analytics
Time Series Analysis
Statistics & MethodsStatistical techniques for analysing time-ordered data points to identify trends, cycles, and forecasting patterns.
Network Analysis
Statistics & MethodsThe study of graphs representing relationships between discrete objects to understand network structure and dynamics.
Concept Drift
Statistics & MethodsChanges in the underlying patterns that a model was trained to capture, requiring model adaptation.
Data Democratisation
Statistics & MethodsMaking data accessible to all members of an organisation regardless of their technical expertise.
Cohort Analysis
Applied AnalyticsA behavioural analytics technique that groups users with shared characteristics to track metrics over time.
Monte Carlo Simulation
Statistics & MethodsA computational technique using repeated random sampling to obtain numerical results for problems with many coupled variables.
Graph Analytics
Applied AnalyticsAnalysing relationships and connections between entities represented as nodes and edges in a graph structure.
Business Analytics
Statistics & MethodsThe practice of iterative exploration of organisational data to drive business planning and decision-making.