System Design Loop • Reference Guide

SYS DESIGN // BILAL

Interview Loop Reference
TUESDAY • MARK KOCKERBECK
PROGRESS
0 / 0
01
Problem Scoping
~5 min · SET THE STAGE
  • Who are the users? What's the core user problem?
  • What's the business problem / motivation?
  • What does success look like?DAU, latency P99, uptime SLA, revenue?
  • Clarify scope: what's IN and OUT?
  • Existing systems to integrate with?
  • Scale signalsHow many users, req/sec, data volume?
02
Requirements
~5-7 min · FUNC + NON-FUNC
    Functional — What it does
  • Core user flows (3-5 key operations)
  • CRUD ops, read/write ratio
  • Real-time vs batch? Sync vs async?
  • Non-Functional — How it does it
  • Availability99.9% = 8.7h/yr · 99.99% = 52min
  • LatencyP99 target, read vs write
  • ConsistencyCAP theorem tradeoff
  • ScalabilityPeak load, growth rate
  • DurabilityData loss tolerance
  • Security & AuthWho accesses what
  • PrivacyPII, GDPR, HIPAA?
  • ObservabilityLogging, metrics, alerts
03
Capacity Estimation
~3-5 min · ONLY IF NEEDED
  • DAU → requests/sec1M DAU × 10 req/day ≈ 100 req/sec
  • Storage: size/record × records × retention
  • Bandwidth: read/write throughput
  • Thumb rules1KB/record · 1B records ≈ 1TB · 1Gbps ≈ 100MB/s
04
High-Level Design
~10-15 min · DRAW THE BOXES
  • Clients → LB → API → Services → Storage
  • Key components: CDN, cache, queue, DB, blob
  • Data flow for main happy path
  • DB typeSQL (ACID, relations) vs NoSQL (scale, schema-flex)
  • Storage typeBlob · relational · doc · graph · time-series
  • CommsSync REST/gRPC vs async queue/events
05
Deep Dive
~10-15 min · PICK 1-2 HARD PARTS
  • Bottlenecks: where does it fail at scale?
  • Caching strategyWhere, what, TTL, invalidation
  • DB designSchema, indexes, partitioning/sharding
  • API designKey endpoints, request/response shapes
  • ConsistencyStrong vs eventual, distributed txns
  • Failure modes: what breaks? how to handle?
  • ScalingHorizontal vs vertical, stateless services
06
Cross-Cutting
WEAVE THROUGHOUT
  • Fault toleranceRetries, circuit breakers, bulkheads
  • Idempotency: safe to retry?
  • Rate limiting & throttling
  • Auth: AuthN (who?) vs AuthZ (what can they do?)
  • Data modelNormalization vs denormalization tradeoffs
  • Hot spots / uneven traffic distribution
Red Flags to Avoid
COMMON FAILURE MODES
  • Jumping to design before clarifying requirements
  • Single points of failure — no redundancy
  • No caching strategy mentioned
  • Synchronous calls everywhere — no async/queues
  • No monitoring / observability / alerting
  • Security and auth completely ignored
  • Over-engineering (YAGNI — you ain't gonna need it)
  • Not narrating tradeoffs — just stating decisions
B
Bilal Reminders
MINDSET & COMMUNICATION
TUESDAY
  • 01 Think out loud — narrate your reasoning continuously
  • 02 Ask before assuming — confirm before proceeding
  • 03 Drive the conversation — don't wait for prompts
  • 04 Name tradeoffs explicitly — every decision has a cost
  • 05 "I'd validate this with benchmarks" is a valid answer
  • 06 Prioritize depth over breadth — go deep on hard parts