System Design Loop • Reference Guide

SYS DESIGN // BILAL

Interview Loop Reference

TUESDAY • MARK KOCKERBECK

PROGRESS

0 / 0

01

Problem Scoping

~5 min · SET THE STAGE

✓
Who are the users? What's the core user problem?
✓
What's the business problem / motivation?
✓
What does success look like?DAU, latency P99, uptime SLA, revenue?
✓
Clarify scope: what's IN and OUT?
✓
Existing systems to integrate with?
✓
Scale signalsHow many users, req/sec, data volume?

02

Requirements

~5-7 min · FUNC + NON-FUNC

Functional — What it does

✓
Core user flows (3-5 key operations)
✓
CRUD ops, read/write ratio
✓
Real-time vs batch? Sync vs async?

Non-Functional — How it does it

✓

Availability99.9% = 8.7h/yr · 99.99% = 52min

✓

LatencyP99 target, read vs write

✓

ConsistencyCAP theorem tradeoff

✓

ScalabilityPeak load, growth rate

✓

DurabilityData loss tolerance

✓

Security & AuthWho accesses what

✓

PrivacyPII, GDPR, HIPAA?

✓

ObservabilityLogging, metrics, alerts

03

Capacity Estimation

~3-5 min · ONLY IF NEEDED

✓
DAU → requests/sec1M DAU × 10 req/day ≈ 100 req/sec
✓
Storage: size/record × records × retention
✓
Bandwidth: read/write throughput
✓
Thumb rules1KB/record · 1B records ≈ 1TB · 1Gbps ≈ 100MB/s

04

High-Level Design

~10-15 min · DRAW THE BOXES

✓
Clients → LB → API → Services → Storage
✓
Key components: CDN, cache, queue, DB, blob
✓
Data flow for main happy path
✓
DB typeSQL (ACID, relations) vs NoSQL (scale, schema-flex)
✓
Storage typeBlob · relational · doc · graph · time-series
✓
CommsSync REST/gRPC vs async queue/events

05

Deep Dive

~10-15 min · PICK 1-2 HARD PARTS

✓
Bottlenecks: where does it fail at scale?
✓
Caching strategyWhere, what, TTL, invalidation
✓
DB designSchema, indexes, partitioning/sharding
✓
API designKey endpoints, request/response shapes
✓
ConsistencyStrong vs eventual, distributed txns
✓
Failure modes: what breaks? how to handle?
✓
ScalingHorizontal vs vertical, stateless services

06

Cross-Cutting

WEAVE THROUGHOUT

✓
Fault toleranceRetries, circuit breakers, bulkheads
✓
Idempotency: safe to retry?
✓
Rate limiting & throttling
✓
Auth: AuthN (who?) vs AuthZ (what can they do?)
✓
Data modelNormalization vs denormalization tradeoffs
✓
Hot spots / uneven traffic distribution

⚠

Red Flags to Avoid

COMMON FAILURE MODES

◆Jumping to design before clarifying requirements
◆Single points of failure — no redundancy
◆No caching strategy mentioned
◆Synchronous calls everywhere — no async/queues
◆No monitoring / observability / alerting
◆Security and auth completely ignored
◆Over-engineering (YAGNI — you ain't gonna need it)
◆Not narrating tradeoffs — just stating decisions

B

Bilal Reminders

MINDSET & COMMUNICATION

TUESDAY

01 Think out loud — narrate your reasoning continuously
02 Ask before assuming — confirm before proceeding
03 Drive the conversation — don't wait for prompts
04 Name tradeoffs explicitly — every decision has a cost
05 "I'd validate this with benchmarks" is a valid answer
06 Prioritize depth over breadth — go deep on hard parts