Patterns

Scaling Patterns Reference

📈 Horizontal vs Vertical Scaling

Vertical (Scale Up)

Add more power to existing machine

  • ✓ Simple, no code changes
  • ✓ No distributed complexity
  • ✗ Hardware limits (ceiling)
  • ✗ Single point of failure
  • ✗ Expensive at scale

Best for: Databases, quick wins

Horizontal (Scale Out)

Add more machines

  • ✓ Near-unlimited scaling
  • ✓ Fault tolerance (redundancy)
  • ✓ Cost-effective at scale
  • ✗ Distributed complexity
  • ✗ Data consistency challenges

Best for: Stateless services, high traffic

⚖️ Load Balancing Strategies

AlgorithmHow It WorksBest For
Round RobinRotate through servers sequentiallySimilar capacity servers
Weighted Round RobinMore traffic to higher-weight serversMixed server capacities
Least ConnectionsRoute to server with fewest active connectionsLong-lived connections
IP HashSame IP always routes to same serverSession affinity needed
RandomRandom server selectionSimple, statistically even

🔀 Database Sharding

Sharding Strategies

  • Hash-based: hash(user_id) % N shards
    → Even distribution, hard to add shards
  • Range-based: A-M on shard1, N-Z on shard2
    → Easy range queries, can get unbalanced
  • Directory-based: Lookup table maps key → shard
    → Flexible, lookup table is bottleneck
  • Geo-based: Shard by user location
    → Lower latency, cross-region queries hard

⚠️ Sharding Challenges

  • Cross-shard joins - Very expensive
  • Rebalancing - Moving data is painful
  • Hotspots - Popular keys overwhelm one shard
  • Transactions - ACID across shards is hard
  • Unique constraints - Global uniqueness is complex

📋 Database Replication

TypeHow It WorksTrade-offs
Primary-Replica
(Master-Slave)
Writes → Primary
Reads → Replicas
Read scalability
Replica lag, write bottleneck
Multi-Primary
(Multi-Master)
Any node accepts writes
Conflict resolution needed
Write availability
Conflict complexity
SynchronousWait for replicas before ack
Strong consistency
No data loss
Higher latency
AsynchronousAck immediately, replicate later
Eventual consistency
Low latency
Possible data loss

🧩 Common Scaling Patterns

CDN

Cache static content at edge locations

→ Lower latency, reduce origin load

Message Queue

Decouple producers and consumers

→ Handle traffic spikes, async processing

Microservices

Split into independent services

→ Scale bottlenecks independently

CQRS

Separate read and write models

→ Optimize each path independently

Read Replicas

Replicate DB for read queries

→ Scale reads without sharding

Connection Pooling

Reuse DB connections

→ Reduce connection overhead

✅ Scaling Checklist

Before Scaling

  • ☐ Profile and identify bottleneck
  • ☐ Optimize queries and indexes
  • ☐ Add caching layer
  • ☐ Consider vertical scaling first

Horizontal Scaling

  • ☐ Make services stateless
  • ☐ Add load balancer
  • ☐ Externalize session state
  • ☐ Plan for failure (redundancy)

🎯 Interview Template

"For scaling this system, I'd first ensure the application layer is stateless so we can horizontally scale with a load balancer. For the database, I'd start with read replicas for [read-heavy workload]. If we need to scale writes, I'd consider sharding by [user_id/geo] using [hash-based] partitioning."