System Design Cases (Middle 3 Deep)

Зачем знать на Middle 3: System design interview — must-pass для Senior+ ролей. От тебя ждут не “знать про CAP”, а уметь развернуть классические кейсы за 45-60 минут, спорить про trade-offs, обосновывать capacity. Ниже — 10 классических кейсов с deep dive (обычно интервьюер уходит вглубь одного-двух).

Содержание

URL Shortener (TinyURL)
Distributed Cache (Redis-like)
Notification System
Rate Limiter
Distributed Queue (Kafka-like)
Search (Elasticsearch-like)
Chat (WhatsApp-like)
Feed (Twitter timeline)
Payment System
GPS Tracking / Logistics
30 общих вопросов System Design
Источники

1. URL Shortener (TinyURL / bit.ly)

Requirements

Functional:

POST /shorten body {url} → {short}.
GET /{short} → 301 redirect.
Custom alias.
Optional TTL.
Analytics (clicks, geo, referer).

Non-functional:

100M URLs/day → ~1200 writes/sec.
100B total URLs за 100 дней retention.
Read:Write = 10:1 → ~12k reads/sec, peak ~25k.
p99 redirect < 50ms.
Availability 99.99%.

Capacity

Storage: 100B * (long URL avg 500B + short 7B + metadata 100B) ≈ 60 TB.
QPS reads peak: 25k.
Bandwidth in: 600 MB/s, out: 30 MB/s.

API

POST /api/v1/shorten
  body: {"long_url": "https://...", "alias": "promo", "ttl": 86400}
  → 201 {"short": "promo", "expires_at": "..."}

GET /:short
  → 301 Location: <long_url>

Data model

PG:

urls (
  short VARCHAR(10) PRIMARY KEY,
  long  TEXT NOT NULL,
  owner_id BIGINT,
  created_at TIMESTAMPTZ,
  expires_at TIMESTAMPTZ
)

ClickHouse for analytics:

clicks (short, ts, ip, country, referer, user_agent) ENGINE=MergeTree

Encoding (deep)

7 символов base62 → 62^7 ≈ 3.5 * 10^12 (хватит на годы).
Варианты ID:
1. Counter (auto-increment) + base62 encode. Risk: leaks order, requires single source of truth.
2. Snowflake: timestamp(41) + worker(10) + seq(12) = 64 bit → base62.
3. Random + check collision: redo on conflict (probability low at 62^7).
4. Pre-generated batch: worker берёт 1000 ID из ID-service, отдаёт локально.

Choice: Snowflake-like (no DB round-trip), encode в base62.

Architecture

                              +---------+
                              |   CDN   |
                              +----+----+
                                   |
                              +----v----+
                              |   LB    |
                              +----+----+
                                   |
                   +---------------+---------------+
                   |                               |
              +----v-----+                    +----v-----+
              | Read API |                    | Write API|
              +----+-----+                    +----+-----+
                   |                               |
              +----v-----+                    +----v-----+
              |  Redis   | <-- write-back --- |   PG     |
              |  cache   |                    | (sharded)|
              +----------+                    +----+-----+
                                                   |
                                              +----v-----+
                                              |   CDC    |
                                              +----+-----+
                                                   |
                                              +----v-----+
                                              | ClickHouse|
                                              +----------+

Read path (deep dive)

Request → CDN (cache hit → 301 return).
CDN miss → LB → Read API.
Read API → Redis (LRU, 10M hot keys, 90%+ hit).
Cache miss → PG (read replica) → write back to cache.
301 response with Cache-Control: max-age=3600.

Write path

Generate short via Snowflake.
INSERT into PG primary.
Write to Redis (write-through).
Async event → Kafka → ClickHouse for indexing.

Bottlenecks + scaling

Write QPS: Snowflake distributed → no bottleneck.
Read on hot keys: CDN + Redis замыкают 95%.
Storage growth: shard PG by hash(short) % N. Add read replicas per shard.
Analytics: don’t aggregate on hot path; Kafka consumer batches into ClickHouse.

Trade-offs

Random ID → no order leak, but no friendly enumeration.
301 vs 302: 301 cacheable (good for traffic), but invalidation hard. Use 302 if URL может меняться.

2. Distributed Cache (Redis-like)

Requirements

Get/Set/Delete by key.
TTL.
LRU eviction.
Cluster (auto-sharding).
HA: replica per shard.
p99 < 1ms intra-DC.

Architecture

   client
     |
   [client-side hash]    <- consistent hashing ring
     |
   shard 0  shard 1  ... shard N
    | |     | |
    M R     M R     <- master + replica per shard

Consistent hashing

Каждый node — N virtual nodes (vnodes).
Key → hash → найти ближайший vnode по часовой.
Add/remove node перемещает только N/total keys.

type Ring struct {
    nodes []uint64 // sorted hashes
    nodeMap map[uint64]string
}

func (r *Ring) GetNode(key string) string {
    h := xxhash.Sum64String(key)
    idx := sort.Search(len(r.nodes), func(i int) bool { return r.nodes[i] >= h })
    if idx == len(r.nodes) { idx = 0 }
    return r.nodeMap[r.nodes[idx]]
}

Eviction (LRU)

Doubly-linked list + hashmap O(1).
Approximated LRU (Redis) — sample N keys, evict oldest. Cheaper.

TTL

Lazy expiration: проверка на read.
Active: periodic scan random keys, evict expired.

Replication

Master accepts writes.
Async push to replicas (low-latency).
Read from replicas (eventual consistency).
Failover: replica → master via sentinel/raft.

Bottlenecks

Hot key: один shard перегружен. Mitigation: hash key + suffix (key:shard1, key:shard2).
Network bandwidth: pipelining, compression.

Trade-offs

Strong vs eventual: chose eventual за speed.
Memory cost: high (everything in RAM).

3. Notification System

Requirements

Multi-channel: email, SMS, push, in-app, webhook.
1M notifications/sec peak.
Templates с переменными.
Rate limiting per user (max 5/day).
Provider abstraction.
Retry с backoff.
Idempotency.

Architecture

+--------+   +--------+   +-----------+
| API    |->| Kafka  |->| Per-channel|
| (POST  |  |        |  | workers    |
| /send) |  +--------+  |            |
+--------+              | email      |
                        | sms        |
                        | push       |
                        +-----------+
                              |
                       provider (SendGrid, Twilio, FCM)
                              |
                        +-----v-----+
                        | Status DB |
                        | (delivery,|
                        | bounces)  |
                        +-----------+

Idempotency

Client supplies Idempotency-Key.
API stores key + result for 24h (Redis).
Duplicate request → return cached.

Rate limit

Sliding window per user.
Redis: INCR user:123:notif:hour with EXPIRE.
Or token bucket (см. файл 45).

Provider fallback

Primary: SendGrid.
Fallback: SES.
Health metrics per provider; circuit breaker switches.

Templates

Storage: Postgres / S3.
Render: Go text/template или Liquid.
Variables: {{.UserName}}, {{formatDate .ts}}.

Retry strategy

Exponential backoff: 1s, 5s, 30s, 5m, 1h.
DLQ after 5 attempts.
Permanent errors (invalid email): no retry, mark as failed.

Trade-offs

At-least-once delivery — accepted, but template idempotency mandatory.
Real-time vs batch — push/SMS real-time, digest emails batched hourly.

4. Rate Limiter

Requirements

Per IP / user / API key.
Multiple tiers (free=100rpm, paid=10krpm).
Distributed (one limit across pods).
Low latency (<5ms).

Algorithms

Token bucket:

Bucket capacity C, refill R per second.
Request consumes 1 token.
Burst allowed up to C.

Leaky bucket:

Same as token but constant rate output. Smooths burst.

Sliding window log:

Store timestamps of each request.
Count в [now-window, now]. Точно, но память.

Sliding window counter:

2 buckets (current + previous window).
Estimate = current + prev * (1 - elapsed_in_current/window).
Approximation, low memory.

Fixed window counter:

Counter per minute. Reset at boundary.
⚠️ Boundary effect: 2x burst at minute change.

Distributed token bucket via Redis + Lua

-- KEYS[1] = key, ARGV[1] = capacity, ARGV[2] = refill_rate, ARGV[3] = now, ARGV[4] = tokens_to_take
local data = redis.call("HMGET", KEYS[1], "tokens", "ts")
local tokens = tonumber(data[1]) or tonumber(ARGV[1])
local ts = tonumber(data[2]) or tonumber(ARGV[3])
local elapsed = math.max(0, tonumber(ARGV[3]) - ts)
tokens = math.min(tonumber(ARGV[1]), tokens + elapsed * tonumber(ARGV[2]))
local allowed = 0
if tokens >= tonumber(ARGV[4]) then
  tokens = tokens - tonumber(ARGV[4])
  allowed = 1
end
redis.call("HMSET", KEYS[1], "tokens", tokens, "ts", ARGV[3])
redis.call("EXPIRE", KEYS[1], 3600)
return {allowed, tokens}

Atomic via Lua → no race.

Edge: 429 response

Headers: X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After.

Bottlenecks

Hot Redis key — sharding by hash(user).
Single Redis node — clustered.

5. Distributed Queue (Kafka-like)

Requirements

High throughput (millions msg/s).
Durable.
Partitioned для parallel consume.
Ordered внутри partition.
At-least-once delivery.

Architecture

producers ----> [broker] ---> consumers
                   |
              partitions: p0, p1, p2, ...
              each replicated x3
              log on disk (append-only)

Partition

Topic = N partitions.
Producer выбирает partition: round-robin, hash(key), explicit.
Order guaranteed внутри partition.
Parallelism = N partitions.

Replication

Один leader per partition + followers.
ISR (In-Sync Replica set).
acks=all — leader ждёт ISR ack.
min.insync.replicas=2 — минимум для durability.

Storage

Append-only log на disk.
Segments по 1GB.
Retention by time (retention.ms=7d) или size.

Consumer group

Каждый consumer в группе — owner partition(s).
Coordinator distributes (sticky assignment).
Offset commit (per partition) — autocommit или manual.

Exactly-once

Transactional producer + idempotent producer + read-committed isolation.
Sequence numbers per producer per partition.

Trade-offs

vs RabbitMQ: Kafka — log (replay), Rabbit — queue (ack/del).
vs SQS: SQS managed, no order guarantee у standard, FIFO у FIFO type.

6. Search (Elasticsearch-like)

Requirements

Full-text search across millions docs.
Faceting (filter by tags, dates).
Ranking (relevance).
p95 < 200ms.

Inverted index

Mapping: term → posting list (doc IDs, positions).
Tokenization, normalization (lowercase, stem).
N-gram for fuzzy.

Sharding

By doc ID hash (each shard ~100M docs).
Each shard — independent Lucene index.

Replicas

N replicas per shard for read scaling.

Query flow

Coordinator parses query.
Fan-out to all shards (or specific).
Each shard returns top-K.
Coordinator merges, returns top-K global.

Ranking

TF-IDF: term frequency * inverse doc frequency.
BM25: модификация TF-IDF (saturation, length normalization).
Custom: boost by signals (recency, popularity).

Index updates

Real-time (Lucene segments + refresh interval 1s).
Bulk indexing для backfill.

Bottlenecks

Hot shard: skewed data. Mitigation: routing-aware sharding.
Deep pagination (page 1000): cost. Use search_after.

7. Chat (WhatsApp / Telegram-like)

Requirements

1:1 + groups.
Real-time delivery.
Online presence.
Offline message storage.
Push for offline.
E2EE optional.

Architecture

client <--ws--> gateway (sticky) <--> message bus (Kafka)
                        |                    |
                        v                    v
                  presence (Redis)     storage (Cassandra)
                        |
                        v
                  push (FCM / APNS)

WebSocket gateway

Sticky session (consistent hash by user_id).
Hold connection.
Send/receive frames.
Heartbeat (every 30s).

Message storage (Cassandra)

Partition: conversation_id.
Clustering: ts DESC.
Query: “last 50 messages in chat” — efficient.

CREATE TABLE messages (
  conversation_id UUID,
  ts TIMESTAMP,
  message_id UUID,
  sender_id UUID,
  body BLOB,
  PRIMARY KEY ((conversation_id), ts)
) WITH CLUSTERING ORDER BY (ts DESC);

Presence

Redis SET online:users with TTL.
Pub/sub for friends notifications.

Group chat

Fan-out на send (write amplification).
1000-member group: 1 send → 1000 messages stored? Or one stored + fanout на read?
WhatsApp: fan-out на send (но cap group size).

E2EE

Signal Protocol: X3DH + Double Ratchet.
Server видит ciphertext, не plaintext.
Forward secrecy: per-message key.

Push for offline

При offline (last_seen > Xs) → отправить FCM/APNS.

8. Feed (Twitter / Instagram)

Requirements

Followers timeline (chronological + ranked).
500M users, avg 200 follows.
Latency timeline read < 200ms.

Approaches

Fan-out on write (push):

При публикации: для каждого follower’а — INSERT в его timeline.
Pros: O(1) read.
Cons: O(F) write. Celebrity (50M followers) убьёт систему.

Fan-out on read (pull):

Timeline вычисляется при запросе: SELECT из feeds следуемых, sort.
Pros: O(1) write.
Cons: O(F) read.

Hybrid (Twitter):

Push для regular users.
Pull для celebrities.
Merge на read.

Storage

Redis Sorted Set per user (score=ts, member=tweet_id), capped at 1000.

Ranking

Recency baseline.
Engagement signal (likes, replies, retweets).
ML model (LightGBM/transformer).

Bottlenecks

Hot accounts (Elon Musk).
Storage: 500M users * 1000 tweet IDs * 16B ≈ 8TB. Sharded Redis.

9. Payment System

Requirements

Strong consistency (no double-debit).
Idempotency.
Audit trail.
PCI-DSS.
1k TPS peak.

Architecture

client -> API -> Payments service -> Saga orchestrator
                                          |
                          +---------------+------+---------+
                          v               v      v         v
                       AuthZ         Risk eng  Balance  Provider
                       (3DS)                            (Stripe)

Idempotency Key (Stripe-style)

Client supplies Idempotency-Key: <uuid>.
Server stores key + result for 24h.
Duplicate → cached response (no double-charge).

Transaction patterns

Two-phase commit:

Prepare phase on all participants.
Commit if all OK.
Blocking on coordinator failure.
Rarely используется в distributed system.

Saga:

Sequence of local transactions, each with compensating action.
E.g. reserve → charge → ship → ship_fail → refund.

Order placed
  -> Reserve inventory
  -> Charge payment
  -> Ship

Compensations: refund, release_inventory.

Double-entry bookkeeping

Каждая транзакция = pair: debit on one account, credit on another.
Invariant: SUM(all entries) = 0.
Audit clear.

Audit log

Append-only, immutable.
Кripted/signed.
Retain 7+ years (regulatory).

Reconciliation

Daily job: compare internal balance vs provider report.
Discrepancy → manual review.

PCI-DSS

Don’t store PAN. Use tokenization (Stripe handles).
Network isolation (PCI scope).
Annual audit.

10. GPS Tracking / Logistics

Requirements

1M vehicles, 1Hz updates → 1M points/sec.
Real-time dashboard.
Historical query (“vehicle X в pos за вчера”).
Geofencing alerts.

Architecture

device -> MQTT broker -> Kafka -> ingestion svc -> ClickHouse / TimescaleDB
                                       |
                                       v
                                  Redis (latest pos)
                                       |
                                       v
                                  WebSocket gateway -> dashboards

Storage choice

TimescaleDB: PostgreSQL extension, SQL-friendly.
ClickHouse: columnar, fastest analytics.
InfluxDB: purpose-built TSDB.

Compare:

ClickHouse: лучший для batch analytics.
TimescaleDB: лучшее SQL + transactional.
InfluxDB: лучшее для metric-style.

Choose: ClickHouse для history, Redis для latest.

Schema (ClickHouse)

CREATE TABLE positions (
  device_id UInt64,
  ts DateTime64(3),
  lat Float64,
  lon Float64,
  speed Float32,
  heading UInt16
) ENGINE=MergeTree
PARTITION BY toYYYYMMDD(ts)
ORDER BY (device_id, ts);

Geofencing

PostGIS: polygons + ST_Contains.
Or RTree in app memory for hot zones.
Event when device crosses boundary → alert.

Map matching

Raw GPS noisy → snap to nearest road.
Algorithms: Hidden Markov Model on graph.

Route optimization

Vehicle Routing Problem (NP-hard).
Heuristics: Clarke-Wright savings, OR-Tools.

Bottlenecks

Ingestion: Kafka partition by device_id, parallel consumers.
Live updates: WebSocket fan-out to dashboards. Use Redis pub/sub.

11. 30 общих вопросов System Design

CAP теорема — когда P happens, что выбираешь и почему?
ACID vs BASE — пример где где?
Eventual consistency — как разрешать конфликты?
Strong vs Weak consistency — пример из payment.
Что такое quorum read/write (N=3, W=2, R=2)?
Чем replication отличается от sharding?
Master-master vs master-slave?
Conflict-free Replicated Data Type (CRDT) — пример?
Vector clock vs Lamport clock — разница?
Что такое read repair / anti-entropy?
Heartbeat и failure detection — phi accrual?
Leader election — Raft высоко-уровнево.
Consistent hashing — что решает?
Bloom filter — где применять?
CDN: edge vs origin, cache invalidation strategy.
Reverse proxy: L4 vs L7?
WebSocket vs SSE vs long-polling?
gRPC vs REST vs GraphQL?
Backpressure — паттерны?
Circuit breaker состояния (closed/open/half-open).
Bulkhead pattern.
Saga vs 2PC.
Outbox pattern — зачем?
CDC — Debezium принцип.
Event sourcing vs CRUD.
CQRS — когда оправдано?
Multi-region active-active vs active-passive.
Latency budget: 200ms p95 — куда уходит время?
Что такое observability triangle (logs, metrics, traces)?
Capacity planning — формула для DB storage growth.

12. Источники

“Designing Data-Intensive Applications” — Martin Kleppmann
“System Design Interview” Vol 1, 2 — Alex Xu
“Designing Distributed Systems” — Brendan Burns
“Database Internals” — Alex Petrov
ByteByteGo blog
Highscalability.com
Kafka: The Definitive Guide
Cassandra: The Definitive Guide
Stripe Engineering Blog
Discord Engineering Blog
Uber Engineering Blog
Twitter Engineering Blog
AWS Architecture Center
Google SRE Workbook
“Cloud Native Patterns” — Cornelia Davis
“Microservice Patterns” — Chris Richardson
“Building Microservices” — Sam Newman
PostgreSQL docs
Redis docs
Elastic / OpenSearch docs