System design interviews test your ability to architect scalable solutions. You don’t need to know every technology — focus on trade-offs and clear reasoning.

Framework for Any Design Question

  1. Clarify requirements — functional and non-functional (scale, latency, availability)
  2. Estimate scale — users, requests/sec, data size
  3. High-level design — draw boxes and arrows
  4. Deep dive — pick 2–3 components to detail
  5. Identify bottlenecks — and propose solutions
  6. Discuss trade-offs — why this approach over alternatives

Example: Design a URL Shortener

Requirements

  • Shorten long URLs to 6-character codes
  • Redirect on access
  • Track click counts
  • 100M URLs, 1000 redirects/sec

High-Level Design

  Client → Load Balancer → API Servers → Database
                              ↓
                           Cache (Redis)
  

Database Schema

  CREATE TABLE links (
    id          BIGSERIAL PRIMARY KEY,
    short_code  VARCHAR(10) UNIQUE NOT NULL,
    original_url TEXT NOT NULL,
    clicks      INTEGER DEFAULT 0,
    created_at  TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_short_code ON links(short_code);
  

Short Code Generation

  • Base62 encoding of auto-increment ID (a-z, A-Z, 0-9)
  • 6 chars = 62^6 ≈ 56 billion unique codes
  • Alternative: random + collision check

Caching Strategy

  Redirect request:
1. Check Redis for short_code → URL
2. Cache hit → redirect immediately
3. Cache miss → query DB, populate cache, redirect
4. Async: increment click counter (don't block redirect)
  

Scaling

Component Scale Strategy
API servers Horizontal — stateless, add instances
Database Read replicas, shard by short_code hash
Cache Redis cluster, TTL for less popular links
Static assets CDN

Common Components

Load Balancer

Distributes traffic across servers. Options: AWS ALB, Nginx, HAProxy.

Caching

  • Redis/Memcached — in-memory key-value store
  • Cache-aside pattern: read cache → miss → read DB → write cache
  • Set TTL to prevent stale data

Message Queue

Decouple services for async processing:

  • AWS SQS, RabbitMQ, Kafka
  • Use for: email sending, analytics, image processing
  API → SQS → Worker Lambda → Process async
  

Database Choices

Type Use Case Examples
Relational (SQL) Structured data, transactions PostgreSQL, MySQL
Document (NoSQL) Flexible schema, horizontal scale MongoDB, DynamoDB
Key-Value Caching, sessions Redis, DynamoDB
Search Full-text search Elasticsearch

CDN

Cache static content (images, JS, CSS) at edge locations close to users.

Python-Specific Architecture

                      ┌─────────────┐
                    │   Nginx     │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ Gunicorn │ │ Gunicorn │ │ Gunicorn │
        │ FastAPI  │ │ FastAPI  │ │ FastAPI  │
        └────┬─────┘ └────┬─────┘ └────┬─────┘
             │            │            │
             └────────────┼────────────┘
                          ▼
                   ┌─────────────┐
                   │ PostgreSQL  │
                   │  (primary)  │
                   └──────┬──────┘
                          │
                   ┌──────┴──────┐
                   ▼             ▼
             ┌──────────┐  ┌──────────┐
             │ Replica  │  │  Redis   │
             └──────────┘  └──────────┘
  

See Flask URL Shortener Project for a simplified implementation.

Key Trade-offs to Discuss

Decision Option A Option B
Consistency vs availability Strong consistency (SQL) Eventual consistency (NoSQL)
Sync vs async Simple, immediate Scalable, complex
Monolith vs microservices Faster to build Independent scaling
SQL vs NoSQL ACID, joins Flexible schema, scale-out
Push vs pull CDN Real-time updates Simpler architecture

CAP Theorem (Brief)

In a distributed system, you can guarantee at most two of:

  • Consistency — all nodes see same data
  • Availability — every request gets a response
  • Partition tolerance — system works despite network failures

Most modern systems choose AP (availability + partition tolerance) with eventual consistency.