Overview
Direct Answer
Logistic regression is a statistical classification algorithm that estimates the probability of a binary outcome by fitting a sigmoid curve to training data. Unlike linear regression, it constrains predictions to a probability range between 0 and 1, making it well-suited for classification tasks.
How It Works
The algorithm applies a logistic function (sigmoid) to a linear combination of input features, transforming any real-valued output into a probability. Coefficients are estimated using maximum likelihood optimisation, which iteratively adjusts weights to maximise the likelihood of observed class labels. A decision boundary is then established at a probability threshold (typically 0.5) to assign class predictions.
Why It Matters
This method provides interpretable probability estimates alongside classifications, enabling organisations to calibrate decision-making thresholds based on business costs. Its computational efficiency and relatively low data requirements make it practical for production systems, whilst probabilistic outputs support risk assessment and compliance reporting in regulated industries.
Common Applications
Medical diagnosis (disease presence prediction), credit risk assessment, email spam detection, customer churn prediction, and loan approval decisions routinely employ this approach. It serves as a baseline model in healthcare, finance, and marketing analytics workflows.
Key Considerations
The algorithm assumes a linear relationship between features and log-odds, limiting its effectiveness on non-linear problems. Imbalanced datasets and multicollinearity among features can degrade performance, requiring careful feature engineering and class weighting.
More in Machine Learning
Overfitting
Training TechniquesWhen a model learns the training data too well, including noise, resulting in poor performance on unseen data.
Semi-Supervised Learning
Advanced MethodsA learning approach that combines a small amount of labelled data with a large amount of unlabelled data during training.
Clustering
Unsupervised LearningUnsupervised learning technique that groups similar data points together based on inherent patterns without predefined labels.
Regularisation
Training TechniquesTechniques that add constraints or penalties to a model to prevent overfitting and improve generalisation to new data.
Unsupervised Learning
MLOps & ProductionA machine learning approach where models discover patterns and structures in data without labelled examples.
Model Registry
MLOps & ProductionA versioned catalogue of trained machine learning models with metadata, lineage, and approval workflows, enabling reproducible deployment and governance at enterprise scale.
Hierarchical Clustering
Unsupervised LearningA clustering method that builds a tree-like hierarchy of clusters through successive merging or splitting of groups.
Gradient Descent
Training TechniquesAn optimisation algorithm that iteratively adjusts parameters in the direction of steepest descent of the loss function.