Python in Production: The Complete DevOps & SRE Architecture Guide | Mehran Khanjan

Python Application Deployment

Virtual Environments in Production

Virtual environments isolate your application's dependencies from system Python and other projects, ensuring reproducibility and avoiding "works on my machine" issues. In production, always use a dedicated venv with pinned dependencies (pip freeze > requirements.txt) and never install packages globally.

# Production deployment pattern
python -m venv /opt/myapp/venv
source /opt/myapp/venv/bin/activate
pip install -r requirements.txt --no-cache-dir

Python Version Management (pyenv)

Pyenv allows you to install and switch between multiple Python versions per-user or per-project without touching system Python—critical when different apps require different Python versions on the same server.

# Install and set Python version
pyenv install 3.11.4
pyenv local 3.11.4          # Creates .python-version file
pyenv global 3.11.4         # Sets default version

# Project structure
myproject/
├── .python-version         # Contains: 3.11.4
├── requirements.txt
└── app.py

WSGI Servers

WSGI (Web Server Gateway Interface) is the standard interface between Python web applications and web servers, handling synchronous requests. Gunicorn and uWSGI are production-grade WSGI servers that spawn multiple worker processes to handle concurrent requests.

# Gunicorn with optimal workers (2-4 × CPU cores)
gunicorn --workers 4 --bind 0.0.0.0:8000 myapp:app

# With Unix socket (faster for reverse proxy)
gunicorn --workers 4 --bind unix:/run/myapp.sock myapp:app

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Nginx     │────▶│  Gunicorn   │────▶│  Flask/     │
│  (Reverse   │     │  (WSGI)     │     │  Django     │
│   Proxy)    │     │  Workers    │     │  App        │
└─────────────┘     └─────────────┘     └─────────────┘

ASGI Servers

ASGI (Asynchronous Server Gateway Interface) extends WSGI to support async/await, WebSockets, and HTTP/2—essential for modern real-time applications. Uvicorn and Hypercorn are the primary ASGI servers, typically paired with FastAPI or Starlette.

# Uvicorn with multiple workers
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

# Production: Gunicorn managing Uvicorn workers
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Reverse Proxy Configuration

A reverse proxy (Nginx, HAProxy) sits in front of your Python app to handle SSL termination, static files, request buffering, and load distribution—never expose Gunicorn/Uvicorn directly to the internet.

# /etc/nginx/sites-available/myapp
upstream python_app {
    server unix:/run/myapp.sock fail_timeout=0;
}

server {
    listen 80;
    server_name example.com;

    location / {
        proxy_pass http://python_app;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    location /static/ {
        alias /opt/myapp/static/;
        expires 30d;
    }
}

SSL/TLS Certificates

SSL/TLS certificates encrypt traffic between clients and servers, establishing trust through certificate authorities. In production, always enforce HTTPS, use TLS 1.2+, configure strong cipher suites, and implement HSTS headers.

server {
    listen 443 ssl http2;
    server_name example.com;

    ssl_certificate /etc/ssl/certs/example.com.crt;
    ssl_certificate_key /etc/ssl/private/example.com.key;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers on;

    add_header Strict-Transport-Security "max-age=31536000" always;
}

Let's Encrypt

Let's Encrypt provides free, automated SSL certificates with 90-day validity, using the ACME protocol via Certbot for automatic renewal—there's no excuse for running HTTP in production.

# Install and obtain certificate
sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d example.com -d www.example.com

# Auto-renewal (added automatically to cron/systemd)
sudo certbot renew --dry-run

# Certificate location
/etc/letsencrypt/live/example.com/
├── fullchain.pem    # Certificate + intermediates
├── privkey.pem      # Private key
└── cert.pem         # Certificate only

Load Balancing

Load balancing distributes incoming traffic across multiple application instances for scalability and fault tolerance. Common algorithms include round-robin, least connections, and IP hash for session affinity.

                         ┌──────────────┐
                         │  App Server 1│
┌────────┐   ┌────────┐  ├──────────────┤
│ Client │──▶│  Load  │──│  App Server 2│
└────────┘   │Balancer│  ├──────────────┤
             └────────┘  │  App Server 3│
                         └──────────────┘

upstream myapp {
    least_conn;                          # Algorithm
    server 10.0.0.1:8000 weight=3;
    server 10.0.0.2:8000;
    server 10.0.0.3:8000 backup;
}

Database Connection Pooling

Connection pooling maintains a cache of reusable database connections, eliminating the overhead of establishing new connections per request. SQLAlchemy, psycopg2-pool, or PgBouncer significantly reduce database load and latency.

from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

engine = create_engine(
    "postgresql://user:pass@localhost/db",
    poolclass=QueuePool,
    pool_size=10,           # Maintained connections
    max_overflow=20,        # Additional connections allowed
    pool_timeout=30,        # Wait time for connection
    pool_recycle=1800       # Recycle connections after 30min
)

┌─────────────┐     ┌─────────────────┐     ┌──────────┐
│  App        │────▶│  Connection     │────▶│ Database │
│  Instances  │◀────│  Pool (10-30)   │◀────│          │
└─────────────┘     └─────────────────┘     └──────────┘

Database Backups

Regular, tested backups are non-negotiable—implement automated daily backups with point-in-time recovery capability, store them off-site (S3, GCS), and regularly test restoration procedures.

#!/bin/bash
# backup_postgres.sh
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups"
DB_NAME="production"

# Full backup with compression
pg_dump -Fc $DB_NAME > $BACKUP_DIR/db_$DATE.dump

# Upload to cloud storage
gsutil cp $BACKUP_DIR/db_$DATE.dump gs://my-backups/postgres/

# Retain last 30 days locally
find $BACKUP_DIR -mtime +30 -delete

# Cron: 0 2 * * * /scripts/backup_postgres.sh

Database Replication

Replication creates copies of your database across multiple servers for high availability and read scalability—use synchronous replication for zero data loss or asynchronous for better performance with slight lag tolerance.

┌─────────────────────────────────────────────┐
│              Replication Topology           │
├─────────────────────────────────────────────┤
│                                             │
│    ┌──────────┐      WAL Stream             │
│    │  Primary │─────────────────┐           │
│    │  (R/W)   │                 │           │
│    └──────────┘                 ▼           │
│         │               ┌──────────┐        │
│         │               │ Replica 1│        │
│         │               │  (Read)  │        │
│         │               └──────────┘        │
│         │                                   │
│         └──────────────▶┌──────────┐        │
│                         │ Replica 2│        │
│                         │  (Read)  │        │
│                         └──────────┘        │
└─────────────────────────────────────────────┘

Redis Deployment

Redis serves as an in-memory cache, session store, or message broker—in production, deploy with persistence (RDB/AOF), configure maxmemory with eviction policies, and use Redis Sentinel or Cluster for high availability.

import redis
from redis.sentinel import Sentinel

# Single instance
r = redis.Redis(host='localhost', port=6379, db=0, decode_responses=True)

# High availability with Sentinel
sentinel = Sentinel([
    ('sentinel1', 26379),
    ('sentinel2', 26379),
    ('sentinel3', 26379)
], socket_timeout=0.1)

master = sentinel.master_for('mymaster', socket_timeout=0.1)
slave = sentinel.slave_for('mymaster', socket_timeout=0.1)

master.set('key', 'value')
value = slave.get('key')

Message Queue Systems (RabbitMQ, Kafka)

Message queues decouple producers from consumers, enabling async processing, load leveling, and fault tolerance—use RabbitMQ for traditional task queues and Kafka for high-throughput event streaming and log aggregation.

# RabbitMQ with Celery
from celery import Celery

app = Celery('tasks', broker='amqp://guest@localhost//')

@app.task
def process_order(order_id):
    # Long-running task
    return f"Processed {order_id}"

# Kafka producer
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers=['kafka1:9092', 'kafka2:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
producer.send('events', {'event': 'user_signup', 'user_id': 123})

Process Managers (systemd, supervisor)

Process managers ensure your Python application starts on boot, restarts on failure, and logs output properly—systemd is the modern Linux standard, while Supervisor offers simpler configuration and multi-process management.

# /etc/systemd/system/myapp.service
[Unit]
Description=My Python Application
After=network.target

[Service]
Type=notify
User=www-data
Group=www-data
WorkingDirectory=/opt/myapp
Environment="PATH=/opt/myapp/venv/bin"
ExecStart=/opt/myapp/venv/bin/gunicorn -w 4 -b unix:/run/myapp.sock main:app
ExecReload=/bin/kill -s HUP $MAINPID
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

sudo systemctl enable myapp
sudo systemctl start myapp
sudo systemctl status myapp

Environment Variable Management

Environment variables separate configuration from code, enabling the same codebase to run across development, staging, and production—use python-dotenv for local development and native env vars or secret managers in production.

# config.py
import os
from dotenv import load_dotenv

load_dotenv()  # Load .env file in development

class Config:
    DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
    DATABASE_URL = os.environ['DATABASE_URL']  # Required
    REDIS_URL = os.getenv('REDIS_URL', 'redis://localhost:6379')
    SECRET_KEY = os.environ['SECRET_KEY']

# .env (never commit this!)
DEBUG=false
DATABASE_URL=postgresql://user:pass@db:5432/prod
SECRET_KEY=your-super-secret-key-here

Secrets Management

Never store secrets in code or plain config files—use dedicated secret managers (HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager) that provide encryption, access control, audit logging, and automatic rotation.

# GCP Secret Manager
from google.cloud import secretmanager

def get_secret(secret_id: str, version: str = "latest") -> str:
    client = secretmanager.SecretManagerServiceClient()
    name = f"projects/my-project/secrets/{secret_id}/versions/{version}"
    response = client.access_secret_version(request={"name": name})
    return response.payload.data.decode("UTF-8")

DATABASE_PASSWORD = get_secret("db-password")
API_KEY = get_secret("api-key")

Application Monitoring (New Relic, DataDog)

APM tools provide real-time visibility into application performance, transaction traces, and error rates—they automatically instrument your Python code to track response times, throughput, and identify bottlenecks.

# DataDog setup
from ddtrace import patch_all, tracer

patch_all()  # Auto-instrument popular libraries
tracer.configure(hostname='datadog-agent', port=8126)

# New Relic - just configure via env vars
# NEW_RELIC_LICENSE_KEY=xxx
# NEW_RELIC_APP_NAME=my-python-app

# newrelic-admin run-program gunicorn myapp:app

┌─────────────────────────────────────────────────────┐
│                   Dashboard                         │
├─────────────────────────────────────────────────────┤
│  Response Time: 145ms (p99)  │  Throughput: 1.2k/s │
│  Error Rate: 0.02%           │  Apdex: 0.97        │
│                                                     │
│  ████████████░░░ CPU: 65%                          │
│  ██████░░░░░░░░░ Memory: 40%                       │
└─────────────────────────────────────────────────────┘

Log Aggregation (ELK Stack)

The ELK stack (Elasticsearch, Logstash, Kibana) centralizes logs from all application instances, enabling search, analysis, and visualization—use structured JSON logging for better queryability.

import logging
import json_log_formatter

formatter = json_log_formatter.JSONFormatter()
handler = logging.StreamHandler()
handler.setFormatter(formatter)

logger = logging.getLogger('myapp')
logger.addHandler(handler)
logger.setLevel(logging.INFO)

logger.info('Order processed', extra={
    'order_id': '12345',
    'customer_id': 'C001',
    'amount': 99.99
})

# Output: {"message": "Order processed", "order_id": "12345", ...}

┌─────────┐    ┌──────────┐    ┌───────────────┐    ┌────────┐
│  Apps   │───▶│ Logstash │───▶│ Elasticsearch │◀───│ Kibana │
│ (JSON)  │    │ /Fluentd │    │    Cluster    │    │  (UI)  │
└─────────┘    └──────────┘    └───────────────┘    └────────┘

Error Tracking and Monitoring

Dedicated error tracking tools (Sentry, Rollbar) capture exceptions with full stack traces, context, and user information—they group similar errors, track frequency, and integrate with alerting systems.

import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration

sentry_sdk.init(
    dsn="https://xxx@sentry.io/123",
    integrations=[FlaskIntegration()],
    traces_sample_rate=0.1,  # 10% of transactions
    environment="production",
    release="myapp@1.2.3"
)

# Errors automatically captured, or manually:
try:
    process_payment(order)
except PaymentError as e:
    sentry_sdk.capture_exception(e)
    sentry_sdk.set_context("order", {"id": order.id, "amount": order.total})
    raise

Performance Monitoring

Performance monitoring tracks response times, database queries, external API calls, and resource usage—identify slow endpoints, N+1 queries, and memory leaks before they impact users.

# Custom timing decorator
import time
import functools
from prometheus_client import Histogram

REQUEST_LATENCY = Histogram(
    'request_latency_seconds',
    'Request latency',
    ['endpoint', 'method']
)

def track_performance(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        try:
            return func(*args, **kwargs)
        finally:
            duration = time.perf_counter() - start
            REQUEST_LATENCY.labels(
                endpoint=func.__name__,
                method='GET'
            ).observe(duration)
    return wrapper

APM Tools

Application Performance Monitoring tools combine tracing, metrics, and logs to provide end-to-end visibility across distributed systems—they correlate frontend performance with backend transactions and infrastructure metrics.

# OpenTelemetry (vendor-agnostic APM)
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)

otlp_exporter = OTLPSpanExporter(endpoint="collector:4317")
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(otlp_exporter))

with tracer.start_as_current_span("process-order") as span:
    span.set_attribute("order.id", order_id)
    result = process_order(order_id)

Infrastructure as Code (Terraform)

Terraform enables declarative infrastructure provisioning across cloud providers—version control your infrastructure, review changes before applying, and maintain consistency across environments.

# main.tf - GCP Cloud Run deployment
provider "google" {
  project = "my-project"
  region  = "us-central1"
}

resource "google_cloud_run_service" "app" {
  name     = "python-app"
  location = "us-central1"

  template {
    spec {
      containers {
        image = "gcr.io/my-project/app:v1.0.0"
        resources {
          limits = {
            cpu    = "1000m"
            memory = "512Mi"
          }
        }
        env {
          name  = "DATABASE_URL"
          value_from {
            secret_key_ref {
              name = "db-url"
              key  = "latest"
            }
          }
        }
      }
    }
  }
}

Configuration Management (Ansible)

Ansible automates server configuration, application deployment, and orchestration using YAML playbooks—idempotent tasks ensure servers reach desired state regardless of starting point.

# deploy.yml
---
- name: Deploy Python Application
  hosts: webservers
  become: yes

  tasks:
    - name: Create virtual environment
      pip:
        requirements: /opt/myapp/requirements.txt
        virtualenv: /opt/myapp/venv
        virtualenv_python: python3.11

    - name: Copy systemd service
      template:
        src: myapp.service.j2
        dest: /etc/systemd/system/myapp.service
      notify: Restart myapp

    - name: Ensure app is running
      systemd:
        name: myapp
        state: started
        enabled: yes

  handlers:
    - name: Restart myapp
      systemd:
        name: myapp
        state: restarted
        daemon_reload: yes

Container Orchestration

Container orchestration (Kubernetes, Docker Swarm) manages deployment, scaling, and operations of containerized applications across clusters—handles service discovery, load balancing, rolling updates, and self-healing.

┌──────────────────────────────────────────────────────────┐
│                    Kubernetes Cluster                    │
├──────────────────────────────────────────────────────────┤
│  ┌─────────────────┐  ┌─────────────────┐                │
│  │     Node 1      │  │     Node 2      │                │
│  │  ┌───┐ ┌───┐   │  │  ┌───┐ ┌───┐   │                │
│  │  │Pod│ │Pod│   │  │  │Pod│ │Pod│   │                │
│  │  └───┘ └───┘   │  │  └───┘ └───┘   │                │
│  └─────────────────┘  └─────────────────┘                │
│                                                          │
│  ┌─────────┐  ┌─────────────┐  ┌───────────────────┐    │
│  │ Service │  │   Ingress   │  │ ConfigMaps/Secrets│    │
│  └─────────┘  └─────────────┘  └───────────────────┘    │
└──────────────────────────────────────────────────────────┘

CI/CD Pipelines

CI/CD automates testing, building, and deploying code changes—continuous integration catches bugs early, continuous delivery ensures code is always deployable, and continuous deployment automates releases to production.

┌────────────────────────────────────────────────────────────┐
│                    CI/CD Pipeline                          │
├────────────────────────────────────────────────────────────┤
│                                                            │
│  ┌──────┐   ┌──────┐   ┌──────┐   ┌──────┐   ┌──────────┐ │
│  │ Code │──▶│ Test │──▶│Build │──▶│ Push │──▶│ Deploy   │ │
│  │Commit│   │      │   │Image │   │ Reg  │   │(staging) │ │
│  └──────┘   └──────┘   └──────┘   └──────┘   └──────────┘ │
│                                                   │        │
│                                         ┌─────────▼──────┐ │
│                                         │ Deploy (prod)  │ │
│                                         │ (manual gate)  │ │
│                                         └────────────────┘ │
└────────────────────────────────────────────────────────────┘

GitLab CI/CD

GitLab CI/CD uses .gitlab-ci.yml in your repository to define pipelines—it provides built-in container registry, environments, and deployment tracking with powerful caching and artifact management.

# .gitlab-ci.yml
stages:
  - test
  - build
  - deploy

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.pip-cache"

test:
  stage: test
  image: python:3.11
  cache:
    paths:
      - .pip-cache/
  script:
    - pip install -r requirements.txt
    - pytest --cov=app tests/
  coverage: '/TOTAL.+ ([0-9]{1,3}%)/'

build:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

deploy_production:
  stage: deploy
  script:
    - kubectl set image deployment/app app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  environment:
    name: production
  only:
    - main
  when: manual

GitHub Actions

GitHub Actions provides workflow automation directly in GitHub with extensive marketplace actions—workflows run on GitHub-hosted or self-hosted runners, triggered by events like pushes, PRs, or schedules.

# .github/workflows/deploy.yml
name: Deploy Python App

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'
      - run: pip install -r requirements.txt
      - run: pytest --cov

  deploy:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - uses: google-github-actions/auth@v2
        with:
          credentials_json: ${{ secrets.GCP_SA_KEY }}
      - uses: google-github-actions/deploy-cloudrun@v2
        with:
          service: python-app
          image: gcr.io/${{ secrets.GCP_PROJECT }}/app:${{ github.sha }}

Jenkins

Jenkins is the veteran CI/CD server with extensive plugin ecosystem—use declarative Jenkinsfiles for pipeline-as-code, shared libraries for reuse, and agents for distributed builds.

// Jenkinsfile
pipeline {
    agent {
        docker {
            image 'python:3.11'
        }
    }
    
    environment {
        REGISTRY = 'gcr.io/my-project'
    }
    
    stages {
        stage('Test') {
            steps {
                sh 'pip install -r requirements.txt'
                sh 'pytest --junitxml=reports/junit.xml'
            }
            post {
                always {
                    junit 'reports/junit.xml'
                }
            }
        }
        
        stage('Build & Push') {
            steps {
                script {
                    docker.build("${REGISTRY}/app:${BUILD_NUMBER}")
                    docker.push("${REGISTRY}/app:${BUILD_NUMBER}")
                }
            }
        }
        
        stage('Deploy') {
            when { branch 'main' }
            steps {
                sh "kubectl set image deployment/app app=${REGISTRY}/app:${BUILD_NUMBER}"
            }
        }
    }
}

Blue-Green Deployment

Blue-green deployment maintains two identical production environments—deploy to the inactive one, verify it works, then switch traffic instantly via load balancer or DNS, enabling instant rollback if issues arise.

                    ┌─────────────────────────────────┐
                    │         Load Balancer           │
                    └───────────────┬─────────────────┘
                                    │
                    ┌───────────────┴───────────────┐
                    │                               │
            ┌───────▼───────┐             ┌────────▼───────┐
            │    BLUE       │             │     GREEN      │
            │   (v1.0)      │             │    (v1.1)      │
            │   ACTIVE      │             │   INACTIVE     │
            └───────────────┘             └────────────────┘
                    │                               │
                    │         After switch:         │
                    │                               │
            ┌───────────────┐             ┌────────────────┐
            │    BLUE       │             │     GREEN      │
            │   (v1.0)      │             │    (v1.1)      │
            │   INACTIVE    │◀── switch ─▶│    ACTIVE      │
            └───────────────┘             └────────────────┘

Canary Deployment

Canary deployment gradually shifts traffic to a new version (1% → 10% → 50% → 100%), monitoring error rates and latency at each step—this limits blast radius and enables data-driven rollout decisions.

# Istio VirtualService for canary
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp
  http:
    - route:
        - destination:
            host: myapp
            subset: v1
          weight: 90
        - destination:
            host: myapp
            subset: v2
          weight: 10  # Canary receives 10% traffic

Traffic: 100%
        │
        ├──90%──▶ Version 1.0 (stable)
        │
        └──10%──▶ Version 1.1 (canary) ← Monitor closely

Rolling Updates

Rolling updates gradually replace old instances with new ones, maintaining availability throughout—Kubernetes default strategy that respects maxUnavailable and maxSurge parameters to control update speed.

# Kubernetes deployment with rolling update
apiVersion: apps/v1
kind: Deployment
metadata:
  name: python-app
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # At most 1 pod unavailable
      maxSurge: 1        # At most 1 extra pod
  template:
    spec:
      containers:
        - name: app
          image: myapp:v2
          readinessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 5

Time →
Pod 1: [v1]──────────────[v2]
Pod 2: [v1]────────[v2]
Pod 3: [v1]──[v2]
Pod 4: [v1]────────────────[v2]

Zero-Downtime Deployment

Zero-downtime deployment combines health checks, connection draining, and graceful shutdown to ensure continuous availability—never terminate pods until they've finished processing in-flight requests.

# Graceful shutdown handling
import signal
import sys
from flask import Flask

app = Flask(__name__)
shutting_down = False

def graceful_shutdown(signum, frame):
    global shutting_down
    shutting_down = True
    print("Shutting down gracefully...")
    # Wait for in-flight requests (handled by WSGI server)
    sys.exit(0)

signal.signal(signal.SIGTERM, graceful_shutdown)

@app.route('/health')
def health():
    if shutting_down:
        return 'Shutting down', 503
    return 'OK', 200

# Kubernetes graceful shutdown config
spec:
  terminationGracePeriodSeconds: 30
  containers:
    - lifecycle:
        preStop:
          exec:
            command: ["sleep", "5"]  # Allow LB to drain

Auto-Scaling

Auto-scaling automatically adjusts instance count based on metrics (CPU, memory, request rate, custom metrics)—Kubernetes HPA scales pods, while cloud auto-scalers manage VM instances.

# Kubernetes Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: python-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: python-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

Cloud Platforms (GCP, AWS, Azure)

Major cloud platforms offer managed services that simplify deployment—use managed databases (Cloud SQL, RDS), serverless compute (Cloud Run, Lambda), and platform-specific tooling while avoiding vendor lock-in where possible.

┌─────────────────────────────────────────────────────────────┐
│           Cloud Platform Comparison                         │
├─────────────────────────────────────────────────────────────┤
│ Service          │   GCP          │   AWS         │  Azure │
├──────────────────┼────────────────┼───────────────┼────────┤
│ Containers       │ Cloud Run      │ App Runner    │ ACA    │
│ Kubernetes       │ GKE            │ EKS           │ AKS    │
│ Serverless       │ Cloud Functions│ Lambda        │ Funcs  │
│ Database         │ Cloud SQL      │ RDS           │ Azure  │
│                  │                │               │ SQL    │
│ Object Storage   │ GCS            │ S3            │ Blob   │
│ Message Queue    │ Pub/Sub        │ SQS/SNS       │ SB     │
└─────────────────────────────────────────────────────────────┘

Serverless Deployment

Serverless computing runs code without managing servers—ideal for event-driven workloads, APIs with variable traffic, and cost optimization (pay only for execution time), with cold start latency as the main tradeoff.

# Serverless Python function structure
def handler(event, context):
    """
    AWS Lambda / GCP Cloud Functions handler
    """
    # Parse incoming request
    data = event.get('body') or event
    
    # Business logic
    result = process_data(data)
    
    # Return response
    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps(result)
    }

Cloud Run

Cloud Run is GCP's fully managed container platform that scales to zero—deploy any containerized application with automatic HTTPS, autoscaling, and pay-per-request pricing without Kubernetes complexity.

# Dockerfile for Cloud Run
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Cloud Run sets PORT env var
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 main:app

# Deploy to Cloud Run
gcloud run deploy myapp \
    --image gcr.io/project/myapp:v1 \
    --platform managed \
    --region us-central1 \
    --memory 512Mi \
    --min-instances 0 \
    --max-instances 100 \
    --allow-unauthenticated

Cloud Functions

Cloud Functions are GCP's FaaS offering for event-driven code—trigger from HTTP, Pub/Sub, Cloud Storage, Firestore, or schedules, with automatic scaling and sub-second billing.

# main.py - HTTP Cloud Function
import functions_framework
from flask import jsonify

@functions_framework.http
def process_request(request):
    """HTTP Cloud Function."""
    data = request.get_json(silent=True) or {}
    
    result = {
        'message': f"Hello, {data.get('name', 'World')}!",
        'processed': True
    }
    return jsonify(result)

# Pub/Sub triggered function
@functions_framework.cloud_event
def process_pubsub(cloud_event):
    """Background Cloud Function triggered by Pub/Sub."""
    import base64
    data = base64.b64decode(cloud_event.data["message"]["data"])
    print(f"Received: {data}")

gcloud functions deploy process_request \
    --runtime python311 \
    --trigger-http \
    --entry-point process_request

Lambda Functions

AWS Lambda executes Python code in response to events—supports API Gateway, S3, DynamoDB, SQS triggers with up to 15-minute execution time and integration with AWS services via IAM roles.

# lambda_function.py
import json
import boto3

def lambda_handler(event, context):
    """AWS Lambda handler"""
    # API Gateway event
    body = json.loads(event.get('body', '{}'))
    
    # Access AWS services
    s3 = boto3.client('s3')
    
    return {
        'statusCode': 200,
        'headers': {
            'Content-Type': 'application/json',
            'Access-Control-Allow-Origin': '*'
        },
        'body': json.dumps({
            'message': 'Success',
            'input': body
        })
    }

# serverless.yml (Serverless Framework)
service: my-python-api
provider:
  name: aws
  runtime: python3.11
  region: us-east-1

functions:
  api:
    handler: handler.lambda_handler
    events:
      - httpApi:
          path: /process
          method: post

Kubernetes Manifests

Kubernetes manifests are YAML/JSON files declaring desired state—Deployments, Services, ConfigMaps, and Secrets define how your Python app runs, scales, and connects to other services.

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: python-app
  labels:
    app: python-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: python-app
  template:
    metadata:
      labels:
        app: python-app
    spec:
      containers:
        - name: app
          image: gcr.io/project/app:v1.0.0
          ports:
            - containerPort: 8000
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: app-secrets
                  key: database-url
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: 8000
---
apiVersion: v1
kind: Service
metadata:
  name: python-app
spec:
  selector:
    app: python-app
  ports:
    - port: 80
      targetPort: 8000
  type: ClusterIP

Helm Charts

Helm is the Kubernetes package manager—charts template manifests with values, enabling reusable, versioned, and configurable deployments across environments.

# Chart.yaml
apiVersion: v2
name: python-app
version: 1.0.0
appVersion: "1.0.0"

# values.yaml
replicaCount: 3
image:
  repository: gcr.io/project/app
  tag: "v1.0.0"
  pullPolicy: IfNotPresent

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}
spec:
  replicas: {{ .Values.replicaCount }}
  template:
    spec:
      containers:
        - name: app
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"

# Deploy with Helm
helm install myapp ./python-app-chart -f values-prod.yaml
helm upgrade myapp ./python-app-chart --set image.tag=v1.1.0

Service Mesh (Istio)

Service mesh adds observability, security, and traffic management to microservices without code changes—Istio injects sidecar proxies for mTLS, circuit breaking, retries, and advanced routing.

# Istio traffic management
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: python-app
spec:
  hosts:
    - python-app
  http:
    - match:
        - headers:
            x-canary:
              exact: "true"
      route:
        - destination:
            host: python-app
            subset: v2
    - route:
        - destination:
            host: python-app
            subset: v1
      retries:
        attempts: 3
        perTryTimeout: 2s
      timeout: 10s

┌──────────────────────────────────────────────────────────┐
│                    Istio Service Mesh                    │
├──────────────────────────────────────────────────────────┤
│  ┌─────────────────┐         ┌─────────────────┐         │
│  │   Pod           │         │   Pod           │         │
│  │ ┌─────┐ ┌─────┐│  mTLS   │ ┌─────┐ ┌─────┐ │         │
│  │ │ App │ │Envoy││◄───────▶│ │Envoy│ │ App │ │         │
│  │ └─────┘ └─────┘│         │ └─────┘ └─────┘ │         │
│  └─────────────────┘         └─────────────────┘         │
└──────────────────────────────────────────────────────────┘

Observability (Prometheus, Grafana)

Prometheus collects metrics via pull model, storing time-series data for alerting and analysis; Grafana visualizes metrics in customizable dashboards—together they form the standard open-source observability stack.

# Python app with Prometheus metrics
from prometheus_client import Counter, Histogram, generate_latest
from flask import Flask, Response

app = Flask(__name__)

REQUEST_COUNT = Counter(
    'http_requests_total', 
    'Total HTTP requests',
    ['method', 'endpoint', 'status']
)

REQUEST_LATENCY = Histogram(
    'http_request_duration_seconds',
    'HTTP request latency',
    ['method', 'endpoint']
)

@app.route('/metrics')
def metrics():
    return Response(generate_latest(), mimetype='text/plain')

@app.before_request
def before_request():
    request.start_time = time.time()

@app.after_request
def after_request(response):
    latency = time.time() - request.start_time
    REQUEST_COUNT.labels(request.method, request.path, response.status_code).inc()
    REQUEST_LATENCY.labels(request.method, request.path).observe(latency)
    return response

Distributed Tracing (Jaeger, Zipkin)

Distributed tracing tracks requests across microservices, revealing latency bottlenecks and failure points—each request gets a trace ID propagated through all service calls.

from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

# Setup tracing
trace.set_tracer_provider(TracerProvider())
jaeger_exporter = JaegerExporter(
    agent_host_name="jaeger",
    agent_port=6831,
)
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(jaeger_exporter)
)

# Auto-instrument HTTP clients
RequestsInstrumentor().instrument()

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("process-order") as span:
    span.set_attribute("order.id", order_id)
    # Trace automatically propagated to downstream services
    response = requests.get(f"http://inventory-service/check/{order_id}")

Trace View:
├─ API Gateway (5ms)
│  └─ Order Service (45ms)
│     ├─ Inventory Service (15ms)
│     ├─ Payment Service (25ms) ◄── Bottleneck identified
│     └─ Notification Service (3ms)

Security Scanning

Security scanning identifies vulnerabilities in dependencies, container images, and code—integrate into CI/CD to catch issues before deployment using tools like Snyk, Trivy, or Bandit.

# Dependency scanning
pip install safety
safety check -r requirements.txt

# Python code security analysis
pip install bandit
bandit -r src/ -f json -o bandit-report.json

# Container image scanning
trivy image myapp:latest --severity HIGH,CRITICAL

# GitHub Actions security scanning
- name: Run Trivy vulnerability scanner
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: 'myapp:${{ github.sha }}'
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'CRITICAL,HIGH'

- name: Upload Trivy scan results
  uses: github/codeql-action/upload-sarif@v2
  with:
    sarif_file: 'trivy-results.sarif'

Vulnerability Assessment

Vulnerability assessment systematically identifies security weaknesses across infrastructure, applications, and configurations—regular scans, penetration testing, and CVE monitoring protect against known threats.

# Check for known vulnerabilities in requirements
# requirements.txt vulnerabilities check

import subprocess
import json

def scan_dependencies():
    result = subprocess.run(
        ['pip-audit', '--format', 'json'],
        capture_output=True,
        text=True
    )
    
    vulnerabilities = json.loads(result.stdout)
    
    critical = [v for v in vulnerabilities if v['severity'] == 'CRITICAL']
    if critical:
        raise SystemExit(f"Found {len(critical)} critical vulnerabilities!")
    
    return vulnerabilities

# Run in CI pipeline
# pip install pip-audit
# pip-audit --strict --vulnerability-service osv

Compliance and Auditing

Compliance ensures adherence to security standards (SOC2, HIPAA, PCI-DSS, GDPR)—implement audit logging, access controls, data encryption, and regular assessments with automated policy checks.

# Audit logging for compliance
import logging
from datetime import datetime
import json

class AuditLogger:
    def __init__(self):
        self.logger = logging.getLogger('audit')
        handler = logging.FileHandler('/var/log/audit.json')
        self.logger.addHandler(handler)
    
    def log_event(self, action, user, resource, status, details=None):
        event = {
            'timestamp': datetime.utcnow().isoformat(),
            'action': action,
            'user': user,
            'resource': resource,
            'status': status,
            'ip_address': request.remote_addr,
            'details': details
        }
        self.logger.info(json.dumps(event))

# Usage
audit = AuditLogger()
audit.log_event(
    action='DATA_ACCESS',
    user='user@example.com',
    resource='customer_records',
    status='SUCCESS',
    details={'record_count': 150}
)

Disaster Recovery

Disaster recovery (DR) ensures business continuity after catastrophic failures—define RTO (Recovery Time Objective) and RPO (Recovery Point Objective), then implement backup/restore, replication, and failover procedures.

┌──────────────────────────────────────────────────────────────┐
│                  Disaster Recovery Tiers                     │
├──────────────────────────────────────────────────────────────┤
│  Tier │ Strategy          │ RTO      │ RPO      │ Cost      │
├───────┼───────────────────┼──────────┼──────────┼───────────┤
│   1   │ Backup & Restore  │ Hours    │ Hours    │ Low       │
│   2   │ Pilot Light       │ Minutes  │ Minutes  │ Medium    │
│   3   │ Warm Standby      │ Minutes  │ Seconds  │ High      │
│   4   │ Active-Active     │ Zero     │ Zero     │ Very High │
└──────────────────────────────────────────────────────────────┘

DR Workflow:
┌─────────┐    Replicate    ┌─────────┐
│ Primary │───────────────▶ │ Standby │
│  (GCP)  │                 │  (AWS)  │
└────┬────┘                 └────┬────┘
     │         Failover          │
     └───────────────────────────┘

High Availability Architecture

High availability (HA) eliminates single points of failure through redundancy at every layer—multiple instances, load balancing, database replication, and multi-AZ deployments achieve 99.9%+ uptime.

┌──────────────────────────────────────────────────────────────┐
│                 High Availability Architecture               │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│         ┌──────────────────────────────┐                    │
│         │      Global Load Balancer    │                    │
│         └──────────────┬───────────────┘                    │
│                        │                                     │
│         ┌──────────────┴───────────────┐                    │
│         ▼                              ▼                     │
│  ┌─────────────┐                ┌─────────────┐             │
│  │   Zone A    │                │   Zone B    │             │
│  │  ┌───────┐  │                │  ┌───────┐  │             │
│  │  │App x3 │  │                │  │App x3 │  │             │
│  │  └───────┘  │                │  └───────┘  │             │
│  │  ┌───────┐  │   Replicate    │  ┌───────┐  │             │
│  │  │Primary│──┼────────────────┼─▶│Replica│  │             │
│  │  │  DB   │  │                │  │  DB   │  │             │
│  │  └───────┘  │                │  └───────┘  │             │
│  └─────────────┘                └─────────────┘             │
└──────────────────────────────────────────────────────────────┘

Multi-Region Deployment

Multi-region deployment distributes applications across geographic regions for lower latency, regulatory compliance, and disaster resilience—requires data synchronization strategies and careful consistency trade-offs.

# Terraform multi-region setup
variable "regions" {
  default = ["us-central1", "europe-west1", "asia-east1"]
}

resource "google_cloud_run_service" "app" {
  for_each = toset(var.regions)
  
  name     = "python-app"
  location = each.value

  template {
    spec {
      containers {
        image = "gcr.io/project/app:v1"
      }
    }
  }
}

resource "google_compute_global_address" "default" {
  name = "global-app-ip"
}

# Global load balancer routes to nearest region

                    ┌─────────────┐
                    │   Global    │
                    │     LB      │
                    └──────┬──────┘
           ┌───────────────┼───────────────┐
           ▼               ▼               ▼
    ┌──────────┐    ┌──────────┐    ┌──────────┐
    │ US-WEST  │    │  EU-WEST │    │ASIA-EAST │
    │  Region  │    │  Region  │    │  Region  │
    └──────────┘    └──────────┘    └──────────┘

CDN Strategies

CDNs cache static content at edge locations worldwide, reducing latency and origin server load—configure cache headers, purge strategies, and edge functions for dynamic content optimization.

# Flask response with CDN-friendly caching headers
from flask import Flask, send_from_directory, make_response

app = Flask(__name__)

@app.route('/static/<path:filename>')
def serve_static(filename):
    response = make_response(send_from_directory('static', filename))
    # Cache at CDN for 1 year (immutable assets)
    response.headers['Cache-Control'] = 'public, max-age=31536000, immutable'
    response.headers['CDN-Cache-Control'] = 'max-age=31536000'
    return response

@app.route('/api/data')
def api_data():
    response = make_response(get_data())
    # Short cache for dynamic content
    response.headers['Cache-Control'] = 'public, max-age=60, s-maxage=300'
    response.headers['Vary'] = 'Accept-Encoding, Authorization'
    return response

User Request → CDN Edge (Cache HIT) → Response
                    ↓ (Cache MISS)
               Origin Server

DDoS Protection

DDoS protection defends against volumetric, protocol, and application-layer attacks—use cloud provider protection (Cloud Armor, AWS Shield), rate limiting, and geographic filtering to maintain availability under attack.

# GCP Cloud Armor security policy
resource "google_compute_security_policy" "policy" {
  name = "ddos-protection"

  # Rate limiting
  rule {
    action   = "rate_based_ban"
    priority = "1000"
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    rate_limit_options {
      conform_action = "allow"
      exceed_action  = "deny(429)"
      rate_limit_threshold {
        count        = 1000
        interval_sec = 60
      }
      ban_duration_sec = 600
    }
  }

  # Block known bad IPs
  rule {
    action   = "deny(403)"
    priority = "100"
    match {
      expr {
        expression = "evaluateThreatIntelligence('iplist-known-malicious-ips')"
      }
    }
  }
}

Web Application Firewall

WAF inspects HTTP traffic to block SQL injection, XSS, and other OWASP Top 10 attacks—deploy at the edge with managed rule sets and custom rules for application-specific protection.

# AWS WAF rules (CloudFormation)
Resources:
  WebACL:
    Type: AWS::WAFv2::WebACL
    Properties:
      DefaultAction:
        Allow: {}
      Rules:
        - Name: AWSManagedRulesCommonRuleSet
          Priority: 1
          OverrideAction:
            None: {}
          Statement:
            ManagedRuleGroupStatement:
              VendorName: AWS
              Name: AWSManagedRulesCommonRuleSet
          VisibilityConfig:
            SampledRequestsEnabled: true
            CloudWatchMetricsEnabled: true
            MetricName: CommonRules
            
        - Name: SQLiRule
          Priority: 2
          Action:
            Block: {}
          Statement:
            SqliMatchStatement:
              FieldToMatch:
                Body: {}
              TextTransformations:
                - Priority: 0
                  Type: URL_DECODE

Request Flow:
┌────────┐    ┌─────┐    ┌─────────┐    ┌─────────────┐
│ Client │───▶│ CDN │───▶│   WAF   │───▶│ Application │
└────────┘    └─────┘    │ (Block  │    └─────────────┘
                         │  SQLi,  │
                         │  XSS)   │
                         └─────────┘