Overview
Direct Answer
A Rolling Update is a deployment strategy that incrementally replaces running instances of an application with a new version whilst maintaining service availability. It eliminates the need for complete service downtime by distributing the upgrade across a sequence of controlled replacements.
How It Works
The mechanism removes a small subset of running instances from load balancers, upgrades them to the new version, and returns them to the pool before proceeding to the next batch. This continues iteratively until all instances run the new version. The process is governed by parameters such as the number of instances replaced per cycle and the health-check interval between cycles.
Why It Matters
Organisations adopt this approach to minimise user-facing disruption during deployments whilst maintaining predictable capacity and response times. The strategy reduces operational risk by enabling rapid rollback if issues are detected, and allows teams to validate new releases against live traffic patterns incrementally.
Common Applications
Rolling updates are standard in containerised environments managed by orchestration platforms, microservices architectures, and cloud-native applications where horizontal scaling is deployed. Common scenarios include updating web service fleets, database connection pools, and load-balanced API gateways.
Key Considerations
The strategy requires backward compatibility between versions during the transition window and careful management of database schema changes. Performance validation and health checks must be sufficiently robust to detect failures before all instances are replaced.
Cross-References(1)
More in DevOps & Infrastructure
Site Reliability Engineering
Site ReliabilityA discipline applying software engineering principles to infrastructure and operations to create scalable, reliable systems.
Runbook
Site ReliabilityA documented set of procedures for handling routine operations and troubleshooting common issues.
Distributed Tracing
ObservabilityA method of tracking requests as they flow through distributed systems to diagnose latency and failure points.
Service Discovery
CI/CDThe automatic detection of devices and services on a network, enabling dynamic service-to-service communication.
Chef
Infrastructure as CodeA configuration management tool using Ruby-based scripts to automate infrastructure setup and maintenance.
Ansible
Infrastructure as CodeAn open-source automation tool for configuration management, application deployment, and task automation.
Observability
ObservabilityThe ability to understand a system's internal state from its external outputs, encompassing metrics, logs, and traces.
Incident Management
Site ReliabilityThe processes and tools for detecting, responding to, resolving, and learning from service disruptions.