System Design: The Complete Developer's Guide (Beginner to Intermediate) | Mehran Khanjan

Fundamentals

Client-Server Architecture

A computing model where clients (browsers, mobile apps) request resources/services from centralized servers that process requests and return responses. The client handles UI/presentation while the server manages data, business logic, and security.

┌──────────┐         Request          ┌──────────┐
│  CLIENT  │ ───────────────────────► │  SERVER  │
│ (Browser)│ ◄─────────────────────── │  (API)   │
└──────────┘         Response         └──────────┘

Request-Response Model

The fundamental communication pattern where a client sends a request (with method, headers, body) and waits synchronously for the server's response (with status code, headers, body). Every HTTP interaction follows this pattern.

# Simple request-response example
import requests
response = requests.get("https://api.example.com/users/1")  # Request
print(response.status_code)  # 200 - Response
print(response.json())       # {"id": 1, "name": "John"}

Network Protocols (HTTP/HTTPS, TCP/IP, UDP)

Rules governing data transmission: TCP/IP provides reliable, ordered delivery with handshakes; UDP offers fast, connectionless transmission (good for streaming/gaming); HTTP/HTTPS is the application-layer protocol for web communication, with HTTPS adding TLS encryption.

┌─────────────────────────────────────┐
│  Application    (HTTP/HTTPS)        │
├─────────────────────────────────────┤
│  Transport      (TCP / UDP)         │
├─────────────────────────────────────┤
│  Network        (IP)                │
├─────────────────────────────────────┤
│  Link           (Ethernet/WiFi)     │
└─────────────────────────────────────┘

DNS (Domain Name System)

The internet's phonebook that translates human-readable domain names (google.com) into IP addresses (142.250.80.46) through a hierarchical system of resolvers, root servers, TLD servers, and authoritative nameservers.

User types "google.com"
        │
        ▼
┌──────────────┐    ┌──────────────┐    ┌─────────────────┐
│ DNS Resolver │───►│ Root Server  │───►│ .com TLD Server │
└──────────────┘    └──────────────┘    └─────────────────┘
        │                                        │
        │         ┌────────────────────┐         │
        │◄────────│ Authoritative DNS  │◄────────┘
        │         │   (142.250.80.46)  │
        ▼         └────────────────────┘
  Returns IP to browser

IP Addressing & Subnetting

IP addresses uniquely identify devices on networks; IPv4 uses 32-bit addresses (192.168.1.1), IPv6 uses 128-bit. Subnetting divides networks into smaller segments using subnet masks to improve security, performance, and address management.

IP Address:    192.168.1.100
Subnet Mask:   255.255.255.0  (/24)
               └─────┬─────┘└──┬──┘
                 Network     Host
                 Portion    Portion

Network:       192.168.1.0
Usable Range:  192.168.1.1 - 192.168.1.254
Broadcast:     192.168.1.255

Latency vs Throughput

Latency is the time for a single request to travel from source to destination (measured in ms); Throughput is the amount of data transferred per unit time (measured in req/sec or Mbps). Low latency doesn't guarantee high throughput—optimize based on your use case.

Latency (Time for ONE request):
├────────────────────────────────►│  100ms

Throughput (Requests per second):
│ ──► ──► ──► ──► ──► ──► ──► ──► │  1000 req/sec

Highway analogy:
- Latency = How fast ONE car travels
- Throughput = How many cars pass per hour

Availability vs Consistency

From the CAP theorem: Availability means every request receives a response (system is always up); Consistency means every read returns the most recent write. In distributed systems during network partitions, you must choose one—banks prioritize consistency, social media prioritizes availability.

Consistency Priority (CP):              Availability Priority (AP):
┌────────┐      ┌────────┐             ┌────────┐      ┌────────┐
│Node A  │──X───│Node B  │             │Node A  │──X───│Node B  │
│Data: 5 │      │Data: ? │             │Data: 5 │      │Data: 3 │
└────────┘      └────────┘             └────────┘      └────────┘
  "Wait until sync"                      "Serve stale data"
  (May reject requests)                  (Always respond)

Vertical vs Horizontal Scaling

Vertical scaling (scale up) adds more power to existing machines (CPU, RAM, SSD); Horizontal scaling (scale out) adds more machines to the pool. Vertical is simpler but has hardware limits; horizontal offers unlimited growth but adds complexity (load balancing, data consistency).

Vertical Scaling:                    Horizontal Scaling:
┌─────────────────┐                 ┌───────┐ ┌───────┐ ┌───────┐
│                 │                 │Server │ │Server │ │Server │
│  BIGGER SERVER  │                 │   1   │ │   2   │ │   3   │
│  32GB → 128GB   │                 └───────┘ └───────┘ └───────┘
│  4CPU → 16CPU   │                      │        │        │
│                 │                      └────────┼────────┘
└─────────────────┘                          Load Balancer

Monolithic Architecture

A single, unified codebase where all components (UI, business logic, data access) are tightly coupled and deployed together as one unit. Simple to develop and deploy initially, but becomes difficult to scale, maintain, and update as the application grows.

┌──────────────────────────────────────────┐
│            MONOLITHIC APP                │
│  ┌──────────┬──────────┬──────────────┐  │
│  │    UI    │ Business │   Data       │  │
│  │   Layer  │  Logic   │   Access     │  │
│  └──────────┴──────────┴──────────────┘  │
│         Single Deployable Unit           │
└──────────────────────────────────────────┘
                    │
              ┌─────┴─────┐
              │ Database  │
              └───────────┘

API Design (REST, SOAP)

REST (Representational State Transfer) uses HTTP methods on resources with stateless, cacheable responses in JSON—simple and widely adopted. SOAP (Simple Object Access Protocol) uses XML with strict schemas, built-in security (WS-Security), and ACID compliance—preferred in enterprise/financial systems.

# REST API Example
GET    /users          # List users
GET    /users/123      # Get user 123
POST   /users          # Create user
PUT    /users/123      # Update user 123
DELETE /users/123      # Delete user 123

# Response (JSON)
{
    "id": 123,
    "name": "John",
    "email": "john@example.com"
}

Data Formats (JSON, XML, Protocol Buffers)

JSON is human-readable, widely supported, good for web APIs. XML is verbose but supports schemas and namespaces, used in enterprise. Protocol Buffers (protobuf) is Google's binary format—smaller, faster, strongly typed, ideal for microservices and high-performance systems.

JSON (Human-readable):          Protobuf (Binary, compact):
{                               message User {
  "id": 123,                      int32 id = 1;
  "name": "John",                 string name = 2;
  "active": true                  bool active = 3;
}                               }
Size: ~45 bytes                 Size: ~15 bytes, 3x faster parsing

Basic Components

Web Servers

Software that handles HTTP requests, serves static content (HTML, CSS, images), and forwards dynamic requests to application servers. Examples: Nginx (high performance, reverse proxy), Apache (feature-rich, .htaccess), Caddy (automatic HTTPS).

             ┌─────────────────────────────────────┐
             │           WEB SERVER                │
             │            (Nginx)                  │
HTTP ───────►│  Static files ───► /var/www/html    │
Request      │  Dynamic  ───────► App Server       │
             └─────────────────────────────────────┘

# Nginx config example
server {
    listen 80;
    location / {
        root /var/www/html;
    }
    location /api {
        proxy_pass http://app_server:8080;
    }
}

Application Servers

Executes business logic, processes dynamic content, handles database operations, and runs application code (Java, Python, Node.js). Unlike web servers, they understand and execute programming languages to generate responses dynamically.

┌────────────┐      ┌──────────────────┐      ┌────────────┐
│ Web Server │─────►│  Application     │─────►│  Database  │
│  (Nginx)   │      │  Server          │      │ (Postgres) │
└────────────┘      │  ┌────────────┐  │      └────────────┘
                    │  │ Your Code  │  │
                    │  │ - Routes   │  │
                    │  │ - Logic    │  │
                    │  │ - Auth     │  │
                    │  └────────────┘  │
                    └──────────────────┘
Examples: Gunicorn (Python), Tomcat (Java), PM2 (Node.js)

Database Servers

Dedicated systems that store, retrieve, and manage data with features like ACID transactions, indexing, replication, and query optimization. Types include relational (PostgreSQL, MySQL) for structured data and NoSQL (MongoDB, Redis) for flexible schemas or specific access patterns.

┌─────────────────────────────────────────────┐
│              DATABASE SERVER                │
│  ┌─────────┐  ┌─────────┐  ┌─────────────┐  │
│  │ Query   │  │ Storage │  │ Transaction │  │
│  │ Engine  │  │ Engine  │  │ Manager     │  │
│  └─────────┘  └─────────┘  └─────────────┘  │
│           ┌────────────────┐                │
│           │   Data Files   │                │
│           └────────────────┘                │
└─────────────────────────────────────────────┘

Relational: Tables, SQL, ACID, Joins
NoSQL: Documents, Key-Value, Flexible Schema

Reverse Proxy

Sits in front of servers, receiving client requests and forwarding them to appropriate backend servers. Provides load balancing, SSL termination, caching, security (hiding server IPs), and compression—clients don't know which actual server handles their request.

                              ┌─────────────┐
                         ┌───►│  Server 1   │
┌────────┐  ┌─────────┐  │    └─────────────┘
│ Client │─►│ REVERSE │──┤    ┌─────────────┐
│        │  │  PROXY  │──┼───►│  Server 2   │
└────────┘  └─────────┘  │    └─────────────┘
                         │    ┌─────────────┐
                         └───►│  Server 3   │
                              └─────────────┘

Client sees: proxy.example.com
Doesn't know about: server1, server2, server3

Forward Proxy

Sits in front of clients, forwarding their requests to the internet on their behalf. Used for anonymity, access control, caching, and bypassing geo-restrictions—servers don't know the real client IP, only the proxy's IP.

┌──────────┐      ┌─────────────┐      ┌──────────────┐
│ Client A │──┐   │   FORWARD   │      │   Internet   │
└──────────┘  │   │    PROXY    │─────►│   Servers    │
┌──────────┐  ├──►│             │      │              │
│ Client B │──┤   │ - Filtering │      │ Sees only    │
└──────────┘  │   │ - Caching   │      │ proxy IP     │
┌──────────┐  │   │ - Logging   │      └──────────────┘
│ Client C │──┘   └─────────────┘
└──────────┘
              Corporate/School Network

Load Balancers (basics)

Distributes incoming traffic across multiple servers to ensure no single server is overwhelmed. Algorithms include Round Robin (sequential), Least Connections (to least busy), and IP Hash (sticky sessions). Improves availability, reliability, and performance.

                    ┌─────────────────┐
       Requests     │  LOAD BALANCER  │
─────────────────►  │                 │
    1000 req/s      │  Round Robin:   │
                    │  1→A, 2→B, 3→C  │
                    └────────┬────────┘
                 ┌───────────┼───────────┐
                 ▼           ▼           ▼
            ┌────────┐  ┌────────┐  ┌────────┐
            │Server A│  │Server B│  │Server C│
            │~333/s  │  │~333/s  │  │~333/s  │
            └────────┘  └────────┘  └────────┘

Caching (basics)

Stores copies of frequently accessed data in faster storage layers to reduce latency and database load. Types include browser cache, CDN cache, application cache (Redis), and database cache. Key challenge: cache invalidation (when to update stale data).

┌────────┐     ┌───────────┐     ┌──────────┐
│ Client │────►│   CACHE   │────►│ Database │
└────────┘     │  (Redis)  │     └──────────┘
               └───────────┘
                    │
    Cache HIT: 1ms  │  Cache MISS: 100ms
    (from memory)   │  (from database)

# Pseudocode
def get_user(id):
    if cache.exists(id):      # O(1) lookup
        return cache.get(id)  # Cache HIT
    user = db.query(id)       # Cache MISS
    cache.set(id, user, ttl=3600)
    return user

CDN (Content Delivery Network)

A geographically distributed network of edge servers that cache and serve content from locations closest to users. Reduces latency, handles traffic spikes, provides DDoS protection, and offloads origin servers. Essential for global applications serving static assets.

Without CDN:                      With CDN:
User (Tokyo)                      User (Tokyo)
     │                                 │
     │  250ms                          │  20ms
     ▼                                 ▼
Origin (New York)                 Edge (Tokyo)
                                       │ Cache MISS only
                                       ▼
                                  Origin (New York)

┌─────────────────────────────────────────────────┐
│                    CDN Network                  │
│   ┌─────┐    ┌─────┐    ┌─────┐    ┌─────┐     │
│   │Tokyo│    │London│   │Sydney│   │ NYC │     │
│   │Edge │    │ Edge │   │ Edge │   │Edge │     │
│   └─────┘    └─────┘    └─────┘    └─────┘     │
└─────────────────────────────────────────────────┘
Providers: Cloudflare, AWS CloudFront, Akamai

Storage Basics

SQL Databases

Relational databases that store data in structured tables with predefined schemas, using SQL (Structured Query Language) for querying and maintaining data relationships through joins. Examples include PostgreSQL, MySQL, and Oracle.

+----------+       +-----------+
|  Users   |       |  Orders   |
+----------+       +-----------+
| id (PK)  |<------| user_id   |
| name     |       | id (PK)   |
| email    |       | total     |
+----------+       +-----------+

NoSQL Databases

Non-relational databases designed for flexible schemas, horizontal scaling, and specific data models (document, key-value, column-family, graph), trading ACID guarantees for performance and scalability.

// Document Store (MongoDB example)
{
  "_id": "user123",
  "name": "John",
  "orders": [
    {"id": 1, "total": 99.99},
    {"id": 2, "total": 45.00}
  ]
}

ACID Properties

Four guarantees ensuring reliable database transactions: Atomicity (all or nothing), Consistency (valid state transitions), Isolation (concurrent transactions don't interfere), Durability (committed data persists).

BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;  -- Both succeed or both fail (Atomicity)

Database Indexing

Data structures (typically B-trees or hash indexes) that speed up data retrieval by creating pointers to rows, trading write performance and storage space for dramatically faster read queries.

Without Index: Full table scan O(n)
┌─────────────────────────┐
│ Scan all 1M rows...     │
└─────────────────────────┘

With Index: B-tree lookup O(log n)
        [M]
       /   \
    [D]     [T]
   /  \    /   \
 [A] [G] [P]  [Z]  → Direct pointer to row

Primary Keys & Foreign Keys

Primary keys uniquely identify each row in a table (ensuring entity integrity), while foreign keys reference primary keys in other tables to establish and enforce relationships (referential integrity).

CREATE TABLE users (
    id INT PRIMARY KEY,           -- Unique identifier
    name VARCHAR(100)
);

CREATE TABLE orders (
    id INT PRIMARY KEY,
    user_id INT FOREIGN KEY REFERENCES users(id)  -- Links to users
);

Normalization

Process of organizing database tables to reduce redundancy and dependency by dividing data into smaller related tables, typically following normal forms (1NF, 2NF, 3NF, BCNF).

BEFORE (Denormalized):                AFTER (3NF):
+---------------------------+         +--------+  +----------+
| order_id|cust_name|city   |         | Orders |  | Customers|
+---------+---------+-------+   →     +--------+  +----------+
| 1       | John    | NYC   |         | id     |  | id       |
| 2       | John    | NYC   |         | cust_id|→ | name     |
+---------------------------+         +--------+  | city     |
                                                  +----------+

Denormalization

Intentionally adding redundancy back into normalized databases to improve read performance by reducing joins, commonly used in read-heavy systems and data warehouses at the cost of data consistency complexity.

Normalized (requires JOIN):           Denormalized (single read):
SELECT o.*, c.name, c.city            +---------------------------+
FROM orders o                         | order_id|cust_name|city   |
JOIN customers c                      +---------+---------+-------+
ON o.cust_id = c.id;                  | 1       | John    | NYC   |
                                      +---------------------------+

Scalability & Performance

Horizontal Scaling Strategies

Adding more machines to distribute load rather than upgrading single machine resources (vertical scaling), requiring stateless design, data partitioning, and load balancing to work effectively.

Vertical Scaling:          Horizontal Scaling:
    ┌─────┐                ┌─────┐ ┌─────┐ ┌─────┐
    │ BIG │                │small│ │small│ │small│
    │ BOX │       vs       └─────┘ └─────┘ └─────┘
    └─────┘                      ↑
   (limited)               Load Balancer
                          (unlimited potential)

Database Replication (Master-Slave, Master-Master)

Copying data across multiple database servers for fault tolerance and read scaling; master-slave routes writes to one node and reads to replicas, while master-master allows writes on multiple nodes with conflict resolution.

Master-Slave:                    Master-Master:
   ┌────────┐                   ┌────────┐←──→┌────────┐
   │ Master │ (writes)          │Master 1│    │Master 2│
   └───┬────┘                   └────────┘    └────────┘
       │ replication               (both accept writes,
   ┌───┴───┐                        conflict resolution needed)
┌──┴──┐ ┌──┴──┐
│Slave│ │Slave│ (reads)
└─────┘ └─────┘

Database Partitioning

Dividing a database into smaller pieces within the same instance, either horizontally (splitting rows across partitions by range/list/hash) or vertically (splitting columns into separate tables).

Horizontal Partitioning:              Vertical Partitioning:
┌─────────────────────┐              ┌──────────┐ ┌───────────┐
│ users_2020 (id 1-1M)│              │ users    │ │user_photos│
├─────────────────────┤              │----------│ │-----------│
│ users_2021 (1M-2M)  │              │ id, name │ │ id, blob  │
├─────────────────────┤              └──────────┘ └───────────┘
│ users_2022 (2M-3M)  │               (frequently   (rarely
└─────────────────────┘                accessed)    accessed)

Database Sharding

Distributing data across multiple independent database instances (shards) based on a shard key, enabling horizontal scaling but adding complexity for cross-shard queries and rebalancing.

Shard Key: user_id % 3

        ┌─────────────────┐
        │  Application    │
        └────────┬────────┘
                 │
    ┌────────────┼────────────┐
    ▼            ▼            ▼
┌───────┐   ┌───────┐   ┌───────┐
│Shard 0│   │Shard 1│   │Shard 2│
│id%3=0 │   │id%3=1 │   │id%3=2 │
└───────┘   └───────┘   └───────┘

Consistent Hashing

A distributed hashing scheme that minimizes key remapping when nodes are added/removed by arranging nodes on a virtual ring, essential for distributed caches and databases to avoid massive reshuffling.

         Node A
           ●
        ╱     ╲
      ╱    ●k1  ╲         Adding Node D:
    ╱             ╲       Only keys between C and D
 Node D●           ●Node B    move to D (minimal disruption)
    ╲             ╱
      ╲   ●k2   ╱
        ╲     ╱
           ●
         Node C

Connection Pooling

Maintaining a cache of reusable database connections to avoid the overhead of establishing new connections for each request, dramatically improving performance in high-throughput applications.

# Without pooling: new connection per request (slow)
conn = db.connect()  # ~50ms overhead each time

# With pooling: reuse from pool (fast)
┌─────────────────────────────────┐
│  Connection Pool (size=10)      │
│  [conn][conn][conn]...[conn]    │
│    ↑      ↑                     │
│  Request1 Request2              │
└─────────────────────────────────┘

Stateless vs Stateful Services

Stateless services don't store client session data between requests (enabling easy horizontal scaling), while stateful services maintain state (requiring sticky sessions or shared state storage).

Stateless:                      Stateful:
┌────────┐                     ┌────────┐
│Request │→ Any Server OK      │Request │→ Must go to 
└────────┘                     └────────┘   same server
     ↓                              ↓
┌────┐┌────┐┌────┐             ┌────────────┐
│ S1 ││ S2 ││ S3 │             │ S1 [state] │
└────┘└────┘└────┘             └────────────┘

Session Management

Handling user session state across requests using server-side storage (memory, Redis, database), client-side tokens (JWT), or hybrid approaches, with stateless tokens preferred for scalability.

Server-Side Sessions:           Token-Based (JWT):
┌────────┐  session_id         ┌────────┐  JWT token
│ Client │────────────→        │ Client │──────────────→
└────────┘                     └────────┘
              ↓                              ↓
        ┌──────────┐                  (self-contained,
        │  Redis   │                   verify signature
        │ Sessions │                   no storage lookup)
        └──────────┘

Read-Through Cache

Cache sits between application and database; on cache miss, the cache itself fetches data from database, populates itself, and returns to client—simplifying application code.

┌─────┐   1.GET    ┌───────┐   2.MISS   ┌────┐
│ App │──────────→ │ Cache │──────────→ │ DB │
└─────┘            └───────┘            └────┘
   ↑                   │ 3.Store + Return
   └───────────────────┘

Write-Through Cache

Every write operation updates both cache and database synchronously before confirming success, ensuring consistency but adding write latency.

┌─────┐  1.WRITE   ┌───────┐  2.WRITE   ┌────┐
│ App │──────────→ │ Cache │──────────→ │ DB │
└─────┘            └───────┘            └────┘
   ↑                                      │
   └────────── 3.ACK (after both) ────────┘

Write-Behind Cache

Writes update cache immediately and asynchronously batch-write to database later, providing low write latency but risking data loss if cache fails before persistence.

┌─────┐  1.WRITE   ┌───────┐
│ App │──────────→ │ Cache │  (immediate ACK)
└─────┘     ↑      └───┬───┘
            │          │ 2.Async batch write (later)
         ACK│          ▼
            │      ┌────┐
            │      │ DB │
                   └────┘

Cache-Aside Pattern

Application manages cache directly: checks cache first, on miss fetches from database and populates cache; writes go directly to database with cache invalidation—most common pattern.

def get_user(user_id):
    # 1. Check cache
    user = cache.get(user_id)
    if user is None:
        # 2. Cache miss - fetch from DB
        user = db.query(user_id)
        # 3. Populate cache
        cache.set(user_id, user)
    return user

Cache Invalidation Strategies

Techniques to remove or update stale cached data: TTL (time-based expiry), event-driven invalidation (on writes), version-based invalidation, or active refresh—"the two hard things in CS."

1. TTL:           cache.set(key, value, ttl=300)  # expires in 5min
2. Event-driven:  on_user_update → cache.delete(user_key)
3. Version:       cache.set(f"user:v{version}:{id}", data)
4. Active:        background job refreshes before expiry

Cache Eviction Policies (LRU, LFU, FIFO)

Algorithms determining which items to remove when cache is full: LRU (least recently used), LFU (least frequently used), FIFO (first in first out)—LRU is most commonly used.

LRU (Least Recently Used):
Access: A B C D E A B (cache size=4)
┌───┬───┬───┬───┐
│ B │ A │ E │ D │  → C evicted (oldest access)
└───┴───┴───┴───┘
  ↑ most recent      least recent

LFU: Evicts least accessed count
FIFO: Evicts oldest inserted

Load Balancing

Round Robin

Simplest load balancing algorithm that distributes requests sequentially across servers in circular order, treating all servers equally regardless of current load or capacity.

Requests: 1, 2, 3, 4, 5, 6...

Server A: 1, 4 ←──┐
Server B: 2, 5 ←──┼── Rotating assignment
Server C: 3, 6 ←──┘

Weighted Round Robin

Extension of round robin that assigns more requests to higher-capacity servers based on configured weights, useful when servers have different resources.

Weights: A=5, B=3, C=2 (total=10)

Request distribution per 10 requests:
Server A: █████     (50%)
Server B: ███       (30%)
Server C: ██        (20%)

Least Connections

Routes new requests to the server with fewest active connections, effective when requests have varying processing times and helps prevent overloading busy servers.

Current connections:
Server A: ████████ (8)
Server B: ███      (3)  ← Next request goes here
Server C: █████    (5)

Least Response Time

Directs traffic to the server with lowest average response time and fewest connections, optimizing for actual performance rather than just connection count.

Server A: 8 conns, 150ms avg
Server B: 3 conns,  50ms avg  ← Wins (lowest latency)
Server C: 2 conns, 200ms avg

IP Hash

Uses client IP address to determine server assignment, ensuring the same client always reaches the same server—useful for basic session affinity without sticky session overhead.

hash(client_ip) % num_servers = server_index

Client 192.168.1.1 → hash → always Server A
Client 192.168.1.2 → hash → always Server C

Layer 4 vs Layer 7 Load Balancing

L4 operates at transport layer (TCP/UDP) making fast routing decisions on IP/port, while L7 operates at application layer (HTTP) enabling content-based routing, SSL termination, and header inspection.

Layer 4 (Transport):          Layer 7 (Application):
┌──────────────────┐          ┌──────────────────────────┐
│ IP + Port only   │          │ HTTP headers, URL, body  │
│ Fast, simple     │          │ /api → API servers       │
│ No content aware │          │ /static → CDN            │
└──────────────────┘          │ Slower, more flexible    │
                              └──────────────────────────┘

Health Checks

Periodic probes sent to backend servers to verify availability and remove unhealthy instances from rotation, using TCP checks, HTTP endpoints, or custom scripts.

Load Balancer
     │
     ├──→ Server A: GET /health → 200 OK ✓
     ├──→ Server B: GET /health → 200 OK ✓
     └──→ Server C: GET /health → timeout ✗ (removed)

# Health endpoint example
@app.get("/health")
def health(): return {"status": "healthy"}

Sticky Sessions

Mechanism ensuring a client's requests consistently route to the same backend server using cookies or session IDs, necessary for stateful applications but reduces load distribution effectiveness.

First request:
Client → LB → Server B → Set-Cookie: SERVERID=B

Subsequent requests:
Client (Cookie: SERVERID=B) → LB → Server B (always)

┌────────┐         ┌────┐
│ Client │════════→│ B  │  (affinity maintained)
└────────┘         └────┘

Data Storage

SQL vs NoSQL Trade-offs

SQL databases offer ACID compliance, strong consistency, and complex joins via structured schemas—ideal for transactional systems. NoSQL sacrifices some consistency (eventual consistency) for horizontal scalability, flexible schemas, and high throughput. Choose SQL when data integrity and relationships matter; choose NoSQL when you need massive scale and schema flexibility.

┌─────────────────────────────────────────────────────────────┐
│                    SQL                 │      NoSQL         │
├─────────────────────────────────────────────────────────────┤
│  ✓ ACID Transactions                  │  ✓ Horizontal Scale │
│  ✓ Complex Joins                      │  ✓ Flexible Schema  │
│  ✓ Strong Consistency                 │  ✓ High Throughput  │
│  ✗ Hard to scale horizontally         │  ✗ Eventual Consistency│
└─────────────────────────────────────────────────────────────┘

Document Stores (MongoDB)

Document stores persist data as JSON/BSON documents, allowing nested structures and variable schemas per record—perfect for content management, catalogs, and user profiles where each entity may have different attributes.

// MongoDB document example
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "name": "John Doe",
  "orders": [
    { "item": "laptop", "price": 999 },
    { "item": "mouse", "price": 25 }
  ],
  "metadata": { "vip": true, "region": "US" }
}

Key-Value Stores (Redis, DynamoDB)

Key-value stores provide O(1) lookups by mapping unique keys to values—ideal for caching, session storage, and leaderboards. Redis operates in-memory for sub-millisecond latency; DynamoDB offers managed persistence with automatic scaling.

┌──────────────┬─────────────────────────┐
│     KEY      │         VALUE           │
├──────────────┼─────────────────────────┤
│ user:1001    │ {"name":"alice","age":30}│
│ session:xyz  │ {"token":"abc123"}      │
│ cache:prod:1 │ <serialized object>     │
└──────────────┴─────────────────────────┘

# Redis commands
SET user:1001 '{"name":"alice"}'  TTL 3600
GET user:1001

Column-Family Stores (Cassandra, HBase)

Column-family stores organize data into rows with dynamic columns grouped into families—optimized for write-heavy workloads, time-series data, and wide-row access patterns. They excel at horizontal scaling across datacenters with tunable consistency.

RowKey: user123
┌─────────────────────────────────────────────────────────┐
│ Column Family: profile    │ Column Family: activity     │
├───────────┬───────────────┼─────────────┬───────────────┤
│ name:John │ email:j@x.com │ login:ts1   │ click:ts2     │
│ age:30    │               │ purchase:ts3│ view:ts4      │
└───────────┴───────────────┴─────────────┴───────────────┘

Graph Databases (Neo4j)

Graph databases store nodes and relationships as first-class citizens, enabling efficient traversal of connected data—ideal for social networks, recommendation engines, fraud detection, and knowledge graphs where relationship queries would require expensive SQL joins.

       ┌───────────┐    FOLLOWS     ┌───────────┐
       │   Alice   │───────────────▶│    Bob    │
       └───────────┘                └───────────┘
             │                            │
             │ LIKES                      │ POSTED
             ▼                            ▼
       ┌───────────┐                ┌───────────┐
       │  Post:123 │◀───COMMENTS────│  Post:456 │
       └───────────┘                └───────────┘

// Cypher Query
MATCH (a:User)-[:FOLLOWS]->(b:User)-[:POSTED]->(p:Post)
WHERE a.name = 'Alice'
RETURN p.title

Time-Series Databases (InfluxDB, TimescaleDB)

Time-series databases optimize for append-heavy, time-stamped data with automatic downsampling, retention policies, and time-range queries—essential for metrics, IoT sensor data, and financial tick data where you query by time windows.

-- InfluxDB Line Protocol
cpu,host=server01,region=us-west value=0.64 1434055562000000000

-- TimescaleDB Query
SELECT time_bucket('5 minutes', time) AS bucket,
       AVG(temperature) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 day'
GROUP BY bucket
ORDER BY bucket;

Full-Text Search (Elasticsearch, Solr)

Full-text search engines build inverted indexes to enable fast keyword searches, fuzzy matching, faceted navigation, and relevance scoring—used for product search, log analysis, and any feature requiring "Google-like" search capabilities.

Document: "The quick brown fox"

Inverted Index:
┌─────────┬────────────┐
│  Term   │  Doc IDs   │
├─────────┼────────────┤
│  quick  │  [1, 5, 9] │
│  brown  │  [1, 3]    │
│  fox    │  [1, 7]    │
└─────────┴────────────┘

GET /products/_search
{
  "query": { "match": { "title": "brown fox" }}
}

Object Storage (S3)

Object storage stores data as objects with metadata in a flat namespace, accessed via HTTP APIs—ideal for unstructured data like images, videos, backups, and data lakes. It offers virtually unlimited scale, 11 nines durability, and lifecycle policies.

┌────────────────────────────────────────────────────┐
│                    S3 Bucket                       │
├────────────────────────────────────────────────────┤
│  Key: images/2024/photo.jpg                        │
│  ├── Data: <binary blob>                           │
│  ├── Metadata: {content-type, size, custom-tags}   │
│  └── URL: https://bucket.s3.region.amazonaws.com/  │
└────────────────────────────────────────────────────┘

aws s3 cp file.jpg s3://my-bucket/images/file.jpg

Block Storage

Block storage provides raw storage volumes that appear as local disks to VMs, offering low-latency random I/O—used for databases, boot volumes, and applications requiring filesystem control. You manage the filesystem and data organization yourself.

┌─────────────────────────────────────────────┐
│              Block Storage                  │
├─────────────────────────────────────────────┤
│  [Block 0][Block 1][Block 2]...[Block N]    │
│      │                                      │
│      ▼                                      │
│  ┌────────┐                                 │
│  │   VM   │ ← Sees it as /dev/sdb           │
│  │ (ext4) │ ← You format & mount            │
│  └────────┘                                 │
└─────────────────────────────────────────────┘

File Storage vs Object Storage

File storage uses hierarchical directories with POSIX semantics (NFS/SMB), supporting concurrent access and file locking—good for shared workloads and legacy apps. Object storage uses flat key-based access via HTTP, excelling at scale and cost efficiency for write-once-read-many patterns.

File Storage (NFS/EFS)          Object Storage (S3)
├── /home                       bucket/
│   ├── /user1                    ├── user1/file1.txt
│   │   └── file1.txt             ├── user1/file2.txt
│   └── /user2                    └── user2/data.json
│       └── file2.txt           
                                 
✓ POSIX, locks, append          ✓ Unlimited scale
✓ Low latency random I/O        ✓ HTTP API, versioning
✗ Limited scale                 ✗ No append, eventual consistency

Communication Patterns

Synchronous Communication

Synchronous communication blocks the caller until the response arrives—simple to reason about but creates tight coupling, cascading failures, and latency accumulation across service chains. Use for real-time operations where immediate feedback is required.

┌────────┐         Request          ┌────────┐
│ Client │ ───────────────────────▶ │ Server │
│(blocked)│ ◀─────────────────────── │        │
└────────┘         Response         └────────┘
          
Timeline: ─────[wait]─────▶

Asynchronous Communication

Asynchronous communication decouples sender and receiver—the caller continues immediately while work happens in the background, improving resilience and throughput. Essential for long-running tasks, cross-service communication, and handling traffic spikes.

┌────────┐    1. Send     ┌─────────┐    2. Process    ┌────────┐
│ Client │ ─────────────▶ │  Queue  │ ────────────────▶│ Worker │
│(continues)              └─────────┘                  └────────┘
└────────┘                     
     │
     └──▶ 3. Poll/Callback for result (optional)

Message Queues

Message queues provide point-to-point delivery where each message is consumed by exactly one consumer—used for task distribution, work queues, and load leveling. They offer durability, retry mechanisms, and decoupling between producers and consumers.

              ┌──────────────────────────────┐
              │         Queue (FIFO)         │
Producer ───▶ │ [Msg3] [Msg2] [Msg1] ───────▶│ ───▶ Consumer
              └──────────────────────────────┘
                          │
                          └── Each message → ONE consumer

# RabbitMQ / SQS pattern
channel.basic_publish(exchange='', routing_key='tasks', body='job123')

Pub-Sub Pattern

Publish-Subscribe broadcasts messages to multiple subscribers via topics—decoupling publishers from consumers and enabling event fan-out. Ideal for event notifications, real-time updates, and building reactive systems.

                    ┌─────────────┐
                    │   Topic:    │
Publisher ─────────▶│  "orders"   │
                    └──────┬──────┘
                           │
         ┌─────────────────┼─────────────────┐
         ▼                 ▼                 ▼
   ┌───────────┐    ┌───────────┐    ┌───────────┐
   │Subscriber1│    │Subscriber2│    │Subscriber3│
   │(inventory)│    │ (billing) │    │(analytics)│
   └───────────┘    └───────────┘    └───────────┘

Request-Reply Pattern

Request-Reply correlates asynchronous responses with their originating requests using correlation IDs—enabling async communication while maintaining request semantics. Used when you need non-blocking behavior but still require a response.

┌────────┐   {id:123, data}    ┌───────┐    ┌────────┐
│Requester│ ────────────────▶  │ Queue │───▶│Responder│
└────────┘                     └───────┘    └────────┘
     ▲                                           │
     │        {correlationId:123, result}        │
     └───────────────────────────────────────────┘
                   (via reply queue)

Fire and Forget

Fire-and-forget sends a message without waiting for acknowledgment or response—maximizing throughput for non-critical operations like logging, analytics events, and metrics where occasional message loss is acceptable.

┌────────┐      Event       ┌───────────┐
│ Client │ ────────────────▶│  Handler  │
│(done!) │    (no ack)      │           │
└────────┘                  └───────────┘

// Example: Analytics event
analytics.track("page_view", {page: "/home"});  // Returns immediately

GraphQL

GraphQL is a query language letting clients request exactly the data they need in a single request—eliminating over-fetching and under-fetching problems of REST. It provides strong typing, introspection, and efficient data loading via resolvers.

# Single request replaces multiple REST calls
query {
  user(id: "123") {
    name
    email
    orders(last: 5) {
      id
      total
      items { name, price }
    }
  }
}

# Response: exactly what was requested
{
  "data": { "user": { "name": "Alice", "orders": [...] }}
}

gRPC

gRPC is a high-performance RPC framework using Protocol Buffers for serialization and HTTP/2 for transport—providing streaming, multiplexing, and strong typing. Ideal for inter-service communication where latency and bandwidth matter.

// user.proto
service UserService {
  rpc GetUser(UserRequest) returns (User);
  rpc StreamUpdates(Empty) returns (stream UserEvent);
}

message User {
  string id = 1;
  string name = 2;
}

┌────────┐  HTTP/2 + Protobuf  ┌────────┐
│ Client │ ◀══════════════════▶│ Server │
│ (stub) │  Binary, streaming   │        │
└────────┘                     └────────┘

WebSockets

WebSockets provide full-duplex, persistent connections over a single TCP socket—enabling real-time bidirectional communication for chat applications, live dashboards, gaming, and collaborative editing where low-latency push is essential.

┌────────┐                      ┌────────┐
│ Client │ ──HTTP Upgrade──────▶│ Server │
└────────┘                      └────────┘
     ║         WebSocket            ║
     ║◀═══════════════════════════▶║
     ║     Bidirectional msgs       ║
     ║         (persistent)         ║

// JavaScript
const ws = new WebSocket('wss://server/socket');
ws.onmessage = (event) => console.log(event.data);
ws.send('Hello Server');

Server-Sent Events (SSE)

SSE provides a simple, unidirectional server-to-client push over HTTP—browsers automatically reconnect and handle event parsing. Ideal for live feeds, notifications, and dashboards where you only need server-to-client streaming.

┌────────┐  GET /events        ┌────────┐
│ Client │ ──────────────────▶ │ Server │
└────────┘                     └────────┘
     ▲                              │
     │   event: update              │
     │   data: {"price": 150.25}    │
     │                              │
     └──────────────────────────────┘
          (keeps connection open)

// Response headers
Content-Type: text/event-stream

Long Polling

Long polling simulates server push by holding HTTP requests open until data is available—simpler than WebSockets but less efficient. Used as a fallback when WebSockets aren't available or for infrequent updates.

Client                           Server
  │                                │
  │──── GET /poll ────────────────▶│
  │                    (holds request)
  │                                │ ← Event occurs
  │◀─── Response ──────────────────│
  │                                │
  │──── GET /poll ────────────────▶│  (immediately reconnect)
  │                    (holds...)   │

Architecture Patterns

Microservices Architecture

Microservices decompose applications into small, independently deployable services—each owning its data and business capability. This enables team autonomy, technology diversity, and independent scaling, but introduces distributed systems complexity, network latency, and operational overhead.

┌─────────────────────────────────────────────────────────────┐
│                      API Gateway                            │
└───────────┬───────────────┬───────────────┬────────────────┘
            │               │               │
     ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
     │   Users     │ │   Orders    │ │  Payments   │
     │   Service   │ │   Service   │ │   Service   │
     └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
            │               │               │
     ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
     │  Users DB   │ │  Orders DB  │ │ Payments DB │
     └─────────────┘ └─────────────┘ └─────────────┘

Service-Oriented Architecture (SOA)

SOA organizes systems into reusable services communicating via enterprise service bus (ESB)—predating microservices but with heavier governance, shared databases, and centralized orchestration. Better suited for enterprise integration than modern cloud-native applications.

┌─────────┐  ┌─────────┐  ┌─────────┐
│Service A│  │Service B│  │Service C│
└────┬────┘  └────┬────┘  └────┬────┘
     │            │            │
═════╪════════════╪════════════╪═════  Enterprise Service Bus (ESB)
     │            │            │
┌────▼────────────▼────────────▼────┐
│         Shared Enterprise DB       │
└────────────────────────────────────┘

Event-Driven Architecture

Event-driven architecture uses events as the primary communication mechanism—services emit events when state changes, and interested services react asynchronously. This enables loose coupling, scalability, and real-time responsiveness but requires careful event schema design and eventual consistency handling.

┌──────────┐     OrderCreated      ┌─────────────┐
│  Order   │ ─────────────────────▶│ Event Broker│
│ Service  │                       │ (Kafka)     │
└──────────┘                       └──────┬──────┘
                                          │
                    ┌─────────────────────┼─────────────────────┐
                    ▼                     ▼                     ▼
             ┌───────────┐         ┌───────────┐         ┌───────────┐
             │ Inventory │         │  Shipping │         │  Billing  │
             │  Service  │         │  Service  │         │  Service  │
             └───────────┘         └───────────┘         └───────────┘

CQRS (Command Query Responsibility Segregation)

CQRS separates read and write models—commands modify state through one model while queries use an optimized read model. This enables independent scaling, optimized read stores (denormalized), and complex domains, but adds synchronization complexity.

                    ┌───────────────────────────────────┐
                    │           Application             │
                    └───────────────┬───────────────────┘
                                    │
              ┌─────────────────────┴─────────────────────┐
              ▼                                           ▼
     ┌─────────────────┐                        ┌─────────────────┐
     │   Command Side  │                        │   Query Side    │
     │  (Write Model)  │ ───── Events ─────────▶│  (Read Model)   │
     └────────┬────────┘                        └────────┬────────┘
              │                                          │
       ┌──────▼──────┐                           ┌───────▼───────┐
       │ Normalized  │                           │ Denormalized  │
       │     DB      │                           │  Read Store   │
       └─────────────┘                           └───────────────┘

Event Sourcing

Event sourcing persists all state changes as immutable events rather than current state—enabling complete audit trails, temporal queries, and state reconstruction. Combined with CQRS, it's powerful for financial systems, but adds storage costs and complexity.

Traditional:  [Current State: balance = $150]

Event Sourced:
┌──────────────────────────────────────────────┐
│                 Event Store                  │
├──────────────────────────────────────────────┤
│ 1. AccountCreated  { id: 123 }               │
│ 2. MoneyDeposited  { amount: 100 }           │
│ 3. MoneyDeposited  { amount: 100 }           │
│ 4. MoneyWithdrawn  { amount: 50 }            │
└──────────────────────────────────────────────┘
         │
         └──▶ Replay events → Current State: $150

Hexagonal Architecture

Hexagonal (Ports & Adapters) architecture isolates business logic from external concerns—the core domain defines ports (interfaces), and adapters implement them for databases, APIs, and UIs. This makes the core testable and infrastructure-agnostic.

                    ┌──────────────────────┐
      ┌─────────────│    REST Adapter      │
      │             └──────────────────────┘
      │                      │
      ▼                      ▼
┌─────────┐           ┌─────────────┐           ┌──────────┐
│   CLI   │──(Port)──▶│   Domain    │◀──(Port)──│    DB    │
│ Adapter │           │   (Core)    │           │ Adapter  │
└─────────┘           └─────────────┘           └──────────┘
                             │
                      ┌──────▼──────┐
                      │   Queue     │
                      │   Adapter   │
                      └─────────────┘

Clean Architecture

Clean Architecture enforces dependency rules where outer layers depend on inner layers—with entities at the core, use cases defining application logic, and frameworks/drivers at the edges. This creates highly testable, framework-independent systems.

┌─────────────────────────────────────────────────────┐
│              Frameworks & Drivers (DB, Web)         │
│   ┌─────────────────────────────────────────────┐   │
│   │         Interface Adapters (Controllers,    │   │
│   │              Presenters, Gateways)          │   │
│   │   ┌─────────────────────────────────────┐   │   │
│   │   │      Use Cases (Application Logic)  │   │   │
│   │   │   ┌─────────────────────────────┐   │   │   │
│   │   │   │    Entities (Domain Core)   │   │   │   │
│   │   │   └─────────────────────────────┘   │   │   │
│   │   └─────────────────────────────────────┘   │   │
│   └─────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘
      Dependencies point INWARD →

Layered Architecture

Layered architecture organizes code into horizontal layers (presentation, business, data access)—each layer only communicates with adjacent layers. Simple and widely understood, but can lead to tight coupling and anemic domain models.

┌─────────────────────────────────────┐
│       Presentation Layer            │  ← Controllers, Views
├─────────────────────────────────────┤
│       Business Logic Layer          │  ← Services, Domain Logic
├─────────────────────────────────────┤
│       Data Access Layer             │  ← Repositories, ORM
├─────────────────────────────────────┤
│       Database                      │  ← PostgreSQL, etc.
└─────────────────────────────────────┘
            │
            └── Each layer depends only on layer below

MVC/MVP/MVVM Patterns

These patterns separate presentation from logic: MVC (Controller mediates View-Model), MVP (Presenter handles all logic, View is passive), MVVM (ViewModel exposes bindable data, View binds automatically). MVVM suits reactive UIs; MVP enables easier testing; MVC remains widely used in web frameworks.

MVC:                    MVP:                    MVVM:
┌────────┐             ┌────────┐              ┌────────┐
│  View  │◀───────────▶│  View  │◀────────────▶│  View  │
└───┬────┘             └───┬────┘              └───┬────┘
    │                      │                       │ (data binding)
┌───▼────┐             ┌───▼────┐              ┌───▼────┐
│Controller│           │Presenter│             │ViewModel│
└───┬────┘             └───┬────┘              └───┬────┘
    │                      │                       │
┌───▼────┐             ┌───▼────┐              ┌───▼────┐
│ Model  │             │ Model  │              │ Model  │
└────────┘             └────────┘              └────────┘