The 45-Minute System Design Framework
A repeatable time-boxed framework for structuring any system design interview. Covers scoping, estimation, API design, and deep dives, with exact minute allocations for each phase.
The Real Reason Engineers Fail System Design
Most engineers fail system design rounds not because they can't design systems. They fail because they misallocate time. They spend 20 minutes on requirements, realise they have 10 minutes left, and rush through the design without depth. Or they jump straight to components and never establish the scale constraints that should drive every subsequent decision.
The fix is a time-boxed framework you execute the same way every interview, regardless of the problem. Here it is.
The Framework
Five phases. Forty-five minutes. Non-negotiable time allocations.
Phase 1: Requirements (5 minutes)
Establish what you're building and what matters. Two categories:
Functional requirements: what the system does. Ask until you have 3–5 concrete features. "Design a URL shortener" → users can shorten URLs, users can access shortened URLs, users can see click analytics. Stop there. Don't gold-plate.
Non-functional requirements: scale, latency, consistency. These drive architecture. Get specific: "100M DAU, sub-100ms P99 for read, eventual consistency acceptable." If the interviewer says "it depends," give them a number and ask if it's reasonable. An interviewer who corrects your estimate is giving you information; one who shrugs isn't blocking you.
Common mistake: Spending 15 minutes here. Requirements are inputs, not outputs. Five minutes maximum. If you're still discussing requirements at minute 8, you have a time management problem, not a requirements problem.
Phase 2: Capacity Estimation (5 minutes)
Back-of-envelope. The point is not precision. It's establishing the order of magnitude that determines your architecture choices.
Estimate:
- QPS (read and write): DAU × actions/day ÷ 86,400
- Storage: write QPS × object size × retention period
- Bandwidth: read QPS × object size
For a URL shortener at 100M DAU, 1 redirect/day: ~1,200 redirects/second. That's comfortably handled by a single server. Compare that to Twitter's news feed at 500M DAU with 10 reads/day: ~58,000 read QPS. Different architecture.
The number you compute tells you whether you need sharding, caching, CDN, or horizontal scaling at the API tier. Skip this step and every subsequent decision is ungrounded.
Phase 3: High-Level Design (10 minutes)
Draw the skeleton. Client → load balancer → API servers → database. Add the obvious components for the specific problem (cache for read-heavy, message queue for async, CDN for media). Don't go deep yet. Breadth first.
At the end of this phase, your interviewer should understand the full data flow. If they have to ask "but where does the data go after X?", you haven't finished this phase.
Phase 4: Deep Dives (20 minutes)
This is where interviews are won or lost. You have 20 minutes; the interviewer will guide you toward the components they care about. Respond to their signals but also drive to the 2–3 hardest problems in the design proactively.
For each component you dive into:
- State the problem it solves
- Give your approach
- Name the trade-off explicitly ("this improves read latency by caching hot data but means we tolerate up to 5 minutes of stale data, acceptable given the requirements")
- Name the failure mode ("cache stampede on cold start, mitigated by request coalescing")
Trade-offs and failure modes are the evidence that you've built this in production, not just read about it. They're what separates L5 answers from L6 answers.
Phase 5: Wrap-Up (5 minutes)
Bottlenecks and future work. What would you change if scale 10x'd? Where are the single points of failure? What monitoring would you add?
This phase is often skipped under time pressure. Don't skip it. It signals operational maturity: you think about systems beyond the happy path.
Common Mistakes
Jumping to components before requirements. "So we'll need a load balancer, a database, probably Redis..." without establishing what you're building or at what scale. This is the most common failure mode at L4. Every component choice must be justified by a requirement or scale number.
Treating the database as a detail. Database choice is a core architectural decision, not an implementation detail. SQL vs NoSQL, consistency model, sharding strategy: these belong in Phase 3, not buried in a footnote.
Not naming trade-offs. Every architectural decision has a cost. If you're only describing benefits, the interviewer will ask about trade-offs. Get there first.
Ignoring the interviewer. System design is a conversation. If your interviewer keeps returning to a component, they want depth there. Redirect your time accordingly.
Worked Example: Design a URL Shortener
Requirements (5 min): Shorten URLs, redirect on access, track click counts. 100M DAU, 1 write/10 reads per user daily.
Estimation (5 min): 100M × 1 write ÷ 86,400 = ~1,200 writes/sec. 100M × 10 reads ÷ 86,400 = ~12,000 reads/sec. Storage: 1,200/sec × 500 bytes × 86,400 × 365 × 5 years ≈ 95GB. Manageable.
High-Level (10 min): Client → LB → API servers (stateless, horizontal) → Postgres (slug → URL mapping) + Redis cache (hot slugs). CDN for the redirect endpoint. Async click tracking via Kafka → ClickHouse.
Deep Dives (20 min): Slug generation (random 6-char base62 → ~57B combinations, collision probability negligible at 100M slugs; use DB uniqueness constraint + retry on conflict). Cache strategy (read-through, TTL 24h, LRU eviction; slug mappings are immutable so stale data is not a concern). Click analytics (write to Kafka at redirect time, aggregate in ClickHouse hourly for dashboards, real-time counts not in requirements).
Wrap-Up (5 min): Single point of failure is the Postgres primary. Add read replicas and consider a standby for failover. At 10x scale, slug generation collisions increase; switch to distributed ID generation (Snowflake).