Rolling Update

Overview

Direct Answer

A Rolling Update is a deployment strategy that incrementally replaces running instances of an application with a new version whilst maintaining service availability. It eliminates the need for complete service downtime by distributing the upgrade across a sequence of controlled replacements.

How It Works

The mechanism removes a small subset of running instances from load balancers, upgrades them to the new version, and returns them to the pool before proceeding to the next batch. This continues iteratively until all instances run the new version. The process is governed by parameters such as the number of instances replaced per cycle and the health-check interval between cycles.

Why It Matters

Organisations adopt this approach to minimise user-facing disruption during deployments whilst maintaining predictable capacity and response times. The strategy reduces operational risk by enabling rapid rollback if issues are detected, and allows teams to validate new releases against live traffic patterns incrementally.

Common Applications

Rolling updates are standard in containerised environments managed by orchestration platforms, microservices architectures, and cloud-native applications where horizontal scaling is deployed. Common scenarios include updating web service fleets, database connection pools, and load-balanced API gateways.

Key Considerations

The strategy requires backward compatibility between versions during the transition window and careful management of database schema changes. Performance validation and health checks must be sufficiently robust to detect failures before all instances are replaced.

Cross-References(1)

Business & Strategy

Strategy

Related in CI/CD

DevOps

A set of practices combining software development and IT operations to shorten the development lifecycle and deliver continuous value.

CI/CD Pipeline

An automated workflow that builds, tests, and deploys software changes from development to production.

Build Automation

The process of automating the compilation, testing, and packaging of software applications.

Artifact Repository

A centralised storage system for managing binary artifacts produced during the software build process.

ChatOps

A collaboration model connecting tools, processes, and automation with team chat platforms for operations management.

Post-Mortem Analysis

A structured review conducted after an incident to identify root causes and prevent recurrence.

Blameless Culture

An organisational approach where incident reviews focus on systemic improvements rather than individual blame.

Mean Time to Recovery

The average time it takes to restore a system to normal operation after a failure or incident.

Mean Time Between Failures

The average time between system failures, measuring reliability and availability.

Service Level Objective

A target value for a service level indicator that defines acceptable service performance.

Service Level Indicator

A quantitative measure of some aspect of the level of service being provided.

Playbook

A comprehensive guide containing strategies, procedures, and best practices for managing specific operational scenarios.

More in DevOps & Infrastructure

Prometheus

Observability

An open-source monitoring and alerting toolkit designed for reliability and scalability in cloud-native environments.

Vertical Scaling

CI/CD

Increasing the resources (CPU, RAM, storage) of an existing machine to handle more load.

Helm

Containers & Orchestration

A package manager for Kubernetes that simplifies the deployment and management of applications using charts.

Alerting

Observability

Automated notifications triggered when system metrics or conditions exceed predefined thresholds.

Puppet

Infrastructure as Code

A configuration management tool that automates the provisioning and management of infrastructure.

Graceful Degradation

CI/CD

A design approach where a system continues to operate with reduced functionality when components fail.

Site Reliability Engineering

Site Reliability

A discipline applying software engineering principles to infrastructure and operations to create scalable, reliable systems.

Observability

The ability to understand a system's internal state from its external outputs, encompassing metrics, logs, and traces.

Overview

Direct Answer

How It Works

Why It Matters

Common Applications

Key Considerations

Cross-References(1)

Related in CI/CD

DevOps

CI/CD Pipeline

Build Automation

Artifact Repository

ChatOps

Post-Mortem Analysis

Blameless Culture

Mean Time to Recovery

Mean Time Between Failures

Service Level Objective

Service Level Indicator

Playbook

More in DevOps & Infrastructure

Prometheus

Vertical Scaling

Helm

Alerting

Puppet

Graceful Degradation

Site Reliability Engineering

Observability

See Also

Strategy