Blue-Green Infrastructure — Technology Wiki

Overview

Direct Answer

Blue-green infrastructure is a deployment strategy that maintains two identical production environments—designated blue and green—allowing rapid cutover from the active environment to a standby one. This approach minimises downtime and risk during version releases by enabling instant traffic routing reversal.

How It Works

One environment (blue) serves live traffic whilst the new application version deploys to the inactive environment (green). After validation tests pass on green, a load balancer or routing layer redirects traffic to green in seconds. The previous blue environment remains available for immediate rollback if issues occur, and roles swap for the next release cycle.

Why It Matters

This strategy dramatically reduces deployment risk and downtime in mission-critical systems by enabling zero-downtime releases and rapid rollback without manual intervention. It also permits thorough pre-production validation against the full infrastructure stack, reducing defects that reach customers and minimising business disruption.

Common Applications

Blue-green deployments are widely used in e-commerce platforms, financial services, and SaaS applications where continuous availability is essential. Cloud-native architectures using Kubernetes frequently implement this pattern through traffic management controllers, and organisations deploying microservices adopt it to coordinate multiple service updates safely.

Key Considerations

The approach requires doubled infrastructure costs, increased complexity in maintaining synchronised environments, and database consistency challenges when schema changes are involved. Organisations must also manage stateful connections and session persistence carefully to avoid customer disruption during the switch.

Related in CI/CD

DevOps

A set of practices combining software development and IT operations to shorten the development lifecycle and deliver continuous value.

CI/CD Pipeline

An automated workflow that builds, tests, and deploys software changes from development to production.

Build Automation

The process of automating the compilation, testing, and packaging of software applications.

Artifact Repository

A centralised storage system for managing binary artifacts produced during the software build process.

ChatOps

A collaboration model connecting tools, processes, and automation with team chat platforms for operations management.

Post-Mortem Analysis

A structured review conducted after an incident to identify root causes and prevent recurrence.

Blameless Culture

An organisational approach where incident reviews focus on systemic improvements rather than individual blame.

Mean Time to Recovery

The average time it takes to restore a system to normal operation after a failure or incident.

Mean Time Between Failures

The average time between system failures, measuring reliability and availability.

Service Level Objective

A target value for a service level indicator that defines acceptable service performance.

Service Level Indicator

A quantitative measure of some aspect of the level of service being provided.

Playbook

A comprehensive guide containing strategies, procedures, and best practices for managing specific operational scenarios.

More in DevOps & Infrastructure

Helm

Containers & Orchestration

A package manager for Kubernetes that simplifies the deployment and management of applications using charts.

Capacity Planning

Site Reliability

The process of determining the production capacity needed to meet changing demands for an organisation's products.

Elasticity

CI/CD

The ability of a system to automatically scale resources up or down based on current demand.

Monitoring

Observability

The continuous observation of system performance, availability, and health using automated tools and dashboards.

Vertical Scaling

CI/CD

Increasing the resources (CPU, RAM, storage) of an existing machine to handle more load.

Immutable Infrastructure

Infrastructure as Code

An approach where infrastructure components are never modified after deployment but replaced entirely with updated versions.

Secret Management

CI/CD

The practice of securely storing, accessing, and managing sensitive credentials, API keys, and certificates.

Runbook

Site Reliability

A documented set of procedures for handling routine operations and troubleshooting common issues.