✓
Who are the users? What's the core user problem?
✓
What's the business problem / motivation?
✓
What does success look like?DAU, latency P99, uptime SLA, revenue?
✓
Clarify scope: what's IN and OUT?
✓
Existing systems to integrate with?
✓
Scale signalsHow many users, req/sec, data volume?
Functional — What it does
✓
Core user flows (3-5 key operations)
✓
CRUD ops, read/write ratio
✓
Real-time vs batch? Sync vs async?
Non-Functional — How it does it
✓
Availability99.9% = 8.7h/yr · 99.99% = 52min
✓
LatencyP99 target, read vs write
✓
ConsistencyCAP theorem tradeoff
✓
ScalabilityPeak load, growth rate
✓
DurabilityData loss tolerance
✓
Security & AuthWho accesses what
✓
PrivacyPII, GDPR, HIPAA?
✓
ObservabilityLogging, metrics, alerts
✓
DAU → requests/sec1M DAU × 10 req/day ≈ 100 req/sec
✓
Storage: size/record × records × retention
✓
Bandwidth: read/write throughput
✓
Thumb rules1KB/record · 1B records ≈ 1TB · 1Gbps ≈ 100MB/s
✓
Clients → LB → API → Services → Storage
✓
Key components: CDN, cache, queue, DB, blob
✓
Data flow for main happy path
✓
DB typeSQL (ACID, relations) vs NoSQL (scale, schema-flex)
✓
Storage typeBlob · relational · doc · graph · time-series
✓
CommsSync REST/gRPC vs async queue/events
✓
Bottlenecks: where does it fail at scale?
✓
Caching strategyWhere, what, TTL, invalidation
✓
DB designSchema, indexes, partitioning/sharding
✓
API designKey endpoints, request/response shapes
✓
ConsistencyStrong vs eventual, distributed txns
✓
Failure modes: what breaks? how to handle?
✓
ScalingHorizontal vs vertical, stateless services
✓
Fault toleranceRetries, circuit breakers, bulkheads
✓
Idempotency: safe to retry?
✓
Rate limiting & throttling
✓
Auth: AuthN (who?) vs AuthZ (what can they do?)
✓
Data modelNormalization vs denormalization tradeoffs
✓
Hot spots / uneven traffic distribution
- ◆Jumping to design before clarifying requirements
- ◆Single points of failure — no redundancy
- ◆No caching strategy mentioned
- ◆Synchronous calls everywhere — no async/queues
- ◆No monitoring / observability / alerting
- ◆Security and auth completely ignored
- ◆Over-engineering (YAGNI — you ain't gonna need it)
- ◆Not narrating tradeoffs — just stating decisions
-
01
Think out loud — narrate your reasoning continuously
-
02
Ask before assuming — confirm before proceeding
-
03
Drive the conversation — don't wait for prompts
-
04
Name tradeoffs explicitly — every decision has a cost
-
05
"I'd validate this with benchmarks" is a valid answer
-
06
Prioritize depth over breadth — go deep on hard parts