How do I design a system that handles millions of requests per second?

Layers: (1) CDN for static content (offloads 80%+ of requests). (2) Load balancer distributes to stateless application servers. (3) Application servers are horizontally scaled (auto-scaling groups). (4) Caching layer (Redis/Memcached) for frequently accessed data. (5) Database with read replicas for read scaling. (6) Message queue for async processing (Kafka/SQS). (7) Database sharding for write scaling when needed. Start with the simplest architecture that meets requirements and add complexity only when bottlenecks appear.

How do I choose between SQL and NoSQL databases?

SQL (PostgreSQL, MySQL): use for structured data with relationships, ACID transactions, complex queries (JOINs, aggregations), and when data integrity is critical. NoSQL: MongoDB (flexible schema, rapid development), Redis (caching, sessions), Cassandra (write-heavy, time-series), Elasticsearch (full-text search), DynamoDB (serverless, single-digit ms). Most applications start with PostgreSQL and add specialized databases for specific needs.

What caching strategy should I use?

Cache-aside (lazy loading) for most use cases: check cache, on miss fetch from DB, write to cache. Write-through when reads must always be fresh. Read-through when you want the cache to handle DB fetching. Key decisions: TTL (how long to cache), invalidation strategy (TTL expiry, event-based, manual), cache key design, and cache size limits. Redis is the standard caching solution. CDN for static content. Browser cache for client assets.

How does database sharding work?

Sharding distributes data across multiple database instances by a shard key. Example: user_id % 4 distributes users across 4 shards. Challenges: (1) Cross-shard queries are expensive (scatter-gather). (2) Shard key selection is critical (must distribute evenly, support common queries). (3) Rebalancing when adding shards requires data migration. (4) Transactions are limited to single shard. Tools: Vitess (MySQL), Citus (PostgreSQL). Consider: do you really need sharding? Read replicas and vertical scaling handle most workloads.

How do I design for high availability?

Eliminate single points of failure: (1) Multiple application instances behind load balancer. (2) Database replicas with automated failover. (3) Multi-AZ deployment (availability zones). (4) Health checks and auto-restart. (5) Circuit breakers for downstream failures. (6) Graceful degradation (fallback responses). (7) Regular disaster recovery testing. Target: 99.9% = 8.7 hours downtime/year, 99.99% = 52 minutes/year. SLOs define acceptable reliability. Error budget = 1 - SLO.

What is the CAP theorem in practice?

CAP theorem: a distributed system can provide at most two of Consistency, Availability, and Partition tolerance. Since network partitions are inevitable, you choose between CP (consistent but may reject requests during partitions: PostgreSQL, MongoDB) or AP (available but may return stale data: Cassandra, DynamoDB). In practice, most systems are tunable: DynamoDB offers eventual or strong consistency per request. Design for eventual consistency where possible and strong consistency only where required (financial transactions).

How do I implement rate limiting?

Algorithms: (1) Token bucket: tokens added at fixed rate, each request consumes one. Allows burst up to bucket size. (2) Sliding window: count requests in sliding time window. Precise but memory-intensive. (3) Fixed window: count per interval. Simple but allows 2x rate at window boundaries. Implementation: Redis (atomic INCR + EXPIRE), API gateway (Kong, AWS API Gateway), or application middleware. Return: HTTP 429 with Retry-After header. Apply per: IP, user, API key, endpoint.

CDNs cache content at edge locations worldwide. When a user requests a resource, the CDN serves it from the nearest edge (low latency). On cache miss, the CDN fetches from the origin server, caches it, and serves. Benefits: faster page loads, reduced origin load, DDoS protection. Configure: cache TTL, cache key (URL + headers), invalidation (purge), origin shield (single point for cache misses). Providers: Cloudflare, CloudFront, Fastly, Akamai.

How do I design a URL shortener?

Components: (1) API: POST /shorten (long URL -> short code), GET /:code (redirect). (2) Short code generation: base62 encoding of auto-increment ID, or random string with collision check. (3) Storage: key-value store (Redis/DynamoDB) for code -> URL mapping. (4) Redirect: 301 (permanent, cached) or 302 (temporary, trackable). Scale: cache popular URLs in memory, distribute via consistent hashing. Analytics: log clicks with timestamp, referrer, location. Rate limit creation to prevent abuse.

How do I design a chat application?

Components: (1) WebSocket connections for real-time messaging. (2) Message queue (Kafka/Redis Pub/Sub) for distributing messages across server instances. (3) Database for message persistence (Cassandra for write-heavy, PostgreSQL for small scale). (4) User presence service (Redis EXPIRE for online status). (5) Push notifications for offline users (FCM/APNs). Scale: partition chat rooms by room ID, use connection load balancer (sticky sessions), store messages by time partition. Features: typing indicators (ephemeral events), read receipts, file sharing (object storage).

What is consistent hashing and when do I need it?

Consistent hashing maps both nodes and keys onto a hash ring. Each key is assigned to the next node clockwise. When a node is added/removed, only K/N keys need to be redistributed (K=total keys, N=nodes). Without consistent hashing, adding a node to a hash-based system redistributes most keys. Used by: Redis Cluster, DynamoDB, CDNs, distributed caches. Virtual nodes improve distribution evenness. You need it when: distributing data across nodes that change (scale up/down).

How do I handle database migrations at scale?

Strategies: (1) Expand-contract: add new column (nullable), dual-write to old and new, migrate data, switch reads to new, drop old column. (2) Online schema migration: tools like gh-ost (GitHub), pt-online-schema-change (Percona) for MySQL; pg_repack for PostgreSQL. (3) Blue-green databases: migrate on a copy, switch. (4) Backward-compatible changes only: add columns (nullable), never rename/remove in a single step. (5) Feature flags to gradually migrate traffic. Never lock tables in production for large migrations.

How do I design a notification system?

Components: (1) Event producer: services emit notification events (order confirmed, message received). (2) Notification service: processes events, applies user preferences (channels, quiet hours). (3) Channel adapters: email (SES, SendGrid), SMS (Twilio), push (FCM/APNs), in-app (WebSocket). (4) Template engine: render notification content per channel. (5) Delivery tracking: sent, delivered, read status. Scale: message queue between producer and notification service. Deduplication: idempotency key prevents duplicate notifications. Preferences: per-user, per-notification-type, per-channel settings.

How do I estimate system capacity?

Back-of-the-envelope estimation: (1) Identify peak requests per second (RPS). (2) Calculate storage needs: daily data * retention period. (3) Calculate bandwidth: RPS * average response size. (4) Estimate compute: RPS / requests-per-server. Rules of thumb: 1 web server handles 1-10K RPS (depending on work). Redis handles 100K+ ops/s. PostgreSQL handles 10-50K queries/s. SSD: 100K+ IOPS. Network: 1 Gbps = 125 MB/s. Always measure actual performance and add 2-3x headroom for peaks.

What is the difference between horizontal and vertical scaling?

Vertical scaling (scale up): add more CPU, RAM, or disk to one machine. Pros: simple, no code changes, works for databases. Cons: hardware limits, single point of failure, expensive at high end. Horizontal scaling (scale out): add more machines. Pros: theoretically unlimited, fault tolerant, cost effective. Cons: requires stateless services, distributed data management, more complex. Strategy: scale vertically first (simpler), then horizontally when you hit limits. Most web applications can scale horizontally from the start with stateless services and a load balancer.

How do I design a feed/timeline system?

Two approaches: (1) Fan-out on write (push model): when a user posts, write the post to all followers timelines immediately. Pros: fast reads (timeline is pre-computed). Cons: slow writes for users with millions of followers (celebrity problem). (2) Fan-out on read (pull model): when a user views their timeline, fetch and merge posts from all followed users. Pros: fast writes. Cons: slow reads for users following many accounts. Hybrid: fan-out on write for normal users, fan-out on read for celebrities. Cache timelines in Redis. Store posts in Cassandra (write-optimized).

How do I design for disaster recovery?

Levels: (1) Backup and restore: regular backups to separate region, restore when needed. RPO: hours, RTO: hours-days. (2) Pilot light: minimal standby infrastructure in DR region, scale up when needed. RPO: minutes, RTO: hours. (3) Warm standby: scaled-down copy running in DR region. RPO: seconds, RTO: minutes. (4) Multi-region active-active: full capacity in multiple regions, real-time replication. RPO: near-zero, RTO: seconds. Cost increases with each level. Most applications need warm standby. Test DR regularly with game days.

How do message queues improve system design?

Benefits: (1) Decoupling: producer and consumer are independent, can scale separately. (2) Buffering: absorb traffic spikes without overwhelming downstream. (3) Reliability: messages persist in the queue, not lost if consumer is down. (4) Async processing: offload slow tasks (email, report generation) from the request path. (5) Fan-out: one message delivered to multiple consumers. Choose: Kafka for event streaming (ordered, durable, replayable). RabbitMQ for task queues (flexible routing). SQS for simple AWS-native queuing.

How do I handle data consistency in distributed systems?

Strategies: (1) Strong consistency: synchronous replication, consensus (Raft). Used for: financial transactions, inventory counts. Trade-off: higher latency, lower availability. (2) Eventual consistency: async replication, converges over time. Used for: social feeds, product reviews, analytics. Trade-off: stale reads possible. (3) Causal consistency: events that are causally related are seen in order. Middle ground. (4) Saga pattern: distributed transactions with compensating actions. (5) CQRS: separate models for reads and writes. Design for eventual consistency by default; use strong consistency only where required.

What are the key metrics to monitor for system health?

RED method for services: Request rate (throughput), Error rate (failures), Duration (latency p50, p95, p99). USE method for resources: Utilization (% used), Saturation (queue depth), Errors (error count). Key metrics: (1) Availability (uptime %). (2) Latency (p50, p95, p99). (3) Throughput (RPS). (4) Error rate (4xx, 5xx). (5) CPU/memory utilization. (6) Database query time. (7) Cache hit rate. (8) Queue depth. Alert on SLO violations, not individual metrics. Use Prometheus + Grafana or Datadog.

How do I design a distributed cache?

Architecture: (1) Cache cluster (Redis Cluster, Memcached) with consistent hashing for key distribution. (2) Cache-aside pattern: app checks cache, on miss fetches from DB, writes to cache. (3) TTL for expiration, LRU for eviction. (4) Cache warmup: pre-load popular data on startup. (5) Cache stampede prevention: locking (only one request fetches on miss), jittered TTL. (6) Multi-level caching: L1 (in-process, fastest), L2 (Redis, shared across instances), L3 (CDN, edge). Size: cache hot data (20% of data serves 80% of requests). Monitor: hit rate (target 90%+), memory usage, eviction rate.

System Design Guide 2026

Executive Summary

System design in 2026 reflects a mature cloud-native landscape. 90% of organizations use cloud infrastructure, 78% run containerized workloads, and 85% use CDNs. PostgreSQL and Redis dominate their categories. Apache Kafka is the standard for event streaming. The key challenge has shifted from building scalable systems to operating them efficiently: observability, cost optimization, and reliability engineering are the focus areas.

90%

Cloud adoption

+45%since 2018

85%

CDN usage

+45%since 2018

78%

Containerized

+58%since 2018

42%

Serverless

+37%since 2018

Part 1: Scaling Fundamentals

Vertical scaling adds resources to one machine (more CPU, RAM). Simple but limited. Horizontal scaling adds more machines. Requires: stateless services, load balancing, distributed data. Start vertical, go horizontal when limits are hit. Most web applications should design for horizontal scaling from the start with stateless services and a load balancer.

Infrastructure Adoption (2018-2026)

Source: OnlineTools4Free Research

Part 2: Load Balancing

Load balancers distribute traffic across servers. Layer 4 (TCP) routes by IP/port. Layer 7 (HTTP) routes by URL, headers, cookies. Algorithms: round-robin (simple), least connections (efficient), IP hash (sticky sessions). NGINX and HAProxy dominate software load balancing. Cloud: AWS ALB/NLB, GCP Load Balancer.

Load Balancer Comparison (2026)

6 rows

Load Balancer	Type	Algorithms	Best For
NGINX	Software/Reverse Proxy	Round-robin, least-conn, ip-hash, weighted	High-performance reverse proxy, static serving
HAProxy	Software	Round-robin, leastconn, source, uri	TCP/HTTP load balancing, health checking
AWS ALB	Managed (AWS)	Round-robin, least outstanding requests	AWS HTTP/HTTPS load balancing
AWS NLB	Managed (AWS)	Flow hash	Ultra-low latency TCP/UDP, static IP
Cloudflare LB	Managed (Global)	Round-robin, geo, weighted	Global load balancing with CDN
Envoy	Software/Service Mesh	Round-robin, least-request, ring-hash	Service mesh, gRPC, observability

Part 3: Caching Strategies

Caching stores frequently accessed data in fast storage (memory). Levels: browser cache, CDN edge cache, application cache (Redis), database query cache. Cache-aside is the most common pattern: check cache, fetch from DB on miss, write to cache. Redis is the standard caching solution. Cache invalidation is the hardest problem in computer science.

Caching Strategies Comparison

5 rows

Strategy	Description	Consistency	Best For
Cache-Aside (Lazy Loading)	App checks cache first. On miss, fetches from DB and writes to cache. Most common pattern.	Eventual (stale reads possible)	Read-heavy workloads, general purpose
Write-Through	App writes to cache and DB simultaneously. Cache is always up-to-date.	Strong	Write-heavy with read requirements
Write-Behind (Write-Back)	App writes to cache only. Cache asynchronously writes to DB. Higher throughput.	Eventual (data loss risk)	Very high write throughput
Read-Through	Cache sits between app and DB. On miss, cache fetches from DB automatically.	Eventual	Simplifying cache logic in app code
Refresh-Ahead	Cache proactively refreshes entries before they expire based on access patterns.	Eventual (fresher)	Predictable access patterns, hot data

Part 4: Database Selection

Database selection depends on data model, consistency requirements, query patterns, and scale. PostgreSQL is the default for most applications (ACID, JSON, full-text search). MongoDB for flexible schemas. Redis for caching and real-time data. Elasticsearch for search. Kafka for event streaming. DynamoDB for serverless. Most applications start with PostgreSQL and add specialized databases as needed.

Database Comparison (2026)

8 rows

Database	Type	Consistency	Scalability	Best For
PostgreSQL	Relational	ACID	Vertical + read replicas	General purpose, complex queries, JSONB
MySQL	Relational	ACID	Vertical + replicas + Vitess	Web applications, WordPress, read-heavy
MongoDB	Document	Tunable	Horizontal (sharding)	Flexible schema, rapid development, JSON data
Redis	Key-Value / Cache	Eventual (replicas)	Cluster mode (horizontal)	Caching, sessions, real-time leaderboards
Elasticsearch	Search Engine	Near real-time	Horizontal (shards)	Full-text search, log analytics, faceted search
DynamoDB	Key-Value / Document	Eventual or strong	Horizontal (managed)	AWS serverless, single-digit ms latency
Cassandra	Wide Column	Tunable (AP)	Horizontal (multi-DC)	Write-heavy, time-series, geo-distributed
ClickHouse	Columnar/OLAP	Eventual	Horizontal	Analytics, real-time aggregation, log analysis

Part 5: Message Queues

Message queues enable async communication between services. Benefits: decoupling, buffering traffic spikes, reliability (broker persists messages). Kafka for event streaming (ordered, durable, replayable). RabbitMQ for task queues (flexible routing). SQS for simple AWS-native queuing. Choose based on: ordering requirements, throughput needs, and operational complexity tolerance.

Part 6: Core System Design Concepts

CAP theorem: distributed systems choose between consistency and availability during partitions. Consistent hashing minimizes key redistribution when nodes change. Database sharding distributes data horizontally. Read replicas scale reads. Rate limiting protects services. Circuit breakers prevent cascading failures. Each concept addresses a specific scalability or reliability challenge.

System Design Concepts Reference

10 rows

Concept	Category	Description
Horizontal Scaling	Scaling	Adding more machines to handle increased load. Requires stateless services, load balancing, and distributed data. Preferred over vertical scaling for web applications.
Vertical Scaling	Scaling	Adding more CPU, RAM, or storage to an existing machine. Simpler but has hardware limits. Good for databases that are hard to distribute (PostgreSQL).
CDN (Content Delivery Network)	Caching	Geographically distributed cache for static and dynamic content. Reduces latency by serving from edge locations near users. Providers: Cloudflare, AWS CloudFront, Fastly, Akamai.
Message Queue	Async	Asynchronous communication between services. Decouples producers and consumers. Handles traffic spikes. Tools: Kafka, RabbitMQ, SQS, NATS. Essential for event-driven architecture.
Database Sharding	Data	Distributing data across multiple database instances by a shard key. Enables horizontal scaling of databases. Challenges: cross-shard queries, rebalancing, shard key selection.
Read Replicas	Data	Copies of the primary database that serve read queries. Write to primary, read from replicas. Eventual consistency. Simple way to scale read-heavy workloads without sharding.
Rate Limiting	Protection	Limiting requests per client per time window. Algorithms: token bucket, sliding window, fixed window. Prevents abuse and protects services. HTTP 429 Too Many Requests.
Circuit Breaker	Resilience	Stop calling a failing downstream service. States: Closed (normal), Open (fail fast), Half-Open (trial). Prevents cascading failures. Tools: Resilience4j, Polly, Envoy.
Consistent Hashing	Distribution	A hashing scheme that minimizes key redistribution when nodes are added/removed. Used by: distributed caches (Redis Cluster), load balancers, CDNs. Only K/N keys need to move when adding a node.
CAP Theorem	Theory	A distributed system can provide at most two of three guarantees: Consistency (all nodes see the same data), Availability (every request gets a response), Partition tolerance (system works despite network partitions). In practice, you choose CP or AP.

Infrastructure Trends (2022-2026)

Source: OnlineTools4Free Research

Part 7: Best Practices

Architecture: start simple, scale when needed. Use CDN for static content. Cache aggressively (Redis). Design stateless services for horizontal scaling. Use message queues for async processing. Database: start with PostgreSQL, add read replicas before sharding. Reliability: implement health checks, circuit breakers, graceful degradation. Operations: monitor with RED/USE methods, set SLOs, maintain error budgets.

Glossary (50 Terms)

Load Balancer

Networking

Distributes incoming network traffic across multiple servers to ensure no single server is overwhelmed. Algorithms: round-robin, least connections, IP hash, weighted. Layer 4 (TCP) or Layer 7 (HTTP). Software: NGINX, HAProxy. Managed: AWS ALB/NLB, GCP LB.

CDN

Caching

Content Delivery Network: geographically distributed servers that cache and serve content from edge locations near users. Reduces latency, offloads origin servers, and protects against DDoS. Static (images, CSS, JS) and dynamic (API acceleration). Providers: Cloudflare, CloudFront, Fastly.

Caching

Performance

Storing frequently accessed data in fast storage (memory) to reduce database load and latency. Levels: browser cache, CDN, application cache (Redis), database query cache. Strategies: cache-aside, write-through, write-behind. Invalidation is the hardest problem.

Database Sharding

Data

Partitioning data across multiple database instances. Each shard holds a subset of data determined by a shard key. Enables horizontal scaling of databases. Challenges: cross-shard queries, hot spots, rebalancing. Tools: Vitess (MySQL), Citus (PostgreSQL).

Read Replica

Data

A copy of the primary database that serves read queries. Writes go to the primary, replicas sync asynchronously. Eventual consistency (slight lag). Simple scaling for read-heavy workloads. All major databases support replicas.

Message Queue

Messaging

Async communication between services. Producer sends messages, consumer processes them. Decouples services, handles spikes, enables retry. Types: point-to-point (SQS), pub/sub (Kafka). Tools: Kafka, RabbitMQ, SQS, NATS.

Horizontal Scaling

Scaling

Adding more machines to handle increased load. Requires: stateless services, load balancing, distributed data. More resilient than vertical scaling (no single point of failure). Standard approach for web applications.

Vertical Scaling

Scaling

Adding more resources (CPU, RAM) to an existing machine. Simpler but has hardware limits and creates a single point of failure. Good for: databases hard to distribute, initial scaling.

CAP Theorem

Theory

In a distributed system, you can guarantee at most two of: Consistency (all reads return latest write), Availability (every request gets a response), Partition tolerance (system works during network partitions). Since partitions are inevitable, choose CP (consistent but may be unavailable) or AP (available but may return stale data).

Consistent Hashing

Distribution

A hashing scheme where adding/removing nodes requires redistributing only K/N keys (K=keys, N=nodes). Nodes are placed on a hash ring. Keys map to the next node clockwise. Used by: Redis Cluster, DynamoDB, CDNs, distributed caches.

Rate Limiting

Protection

Controlling request frequency per client. Algorithms: token bucket (allows burst), sliding window (precise), fixed window (simple). Return HTTP 429. Implement at: API gateway, load balancer, or application. Prevents abuse and protects backend services.

Circuit Breaker

Resilience

Stops calling failing services to prevent cascading failures. Closed (normal) -> Open (fail fast) -> Half-Open (trial). Trips when failure rate exceeds threshold. Combine with retry, timeout, and fallback patterns.

ACID

Data

Database transaction properties: Atomicity (all or nothing), Consistency (valid state transitions), Isolation (concurrent transactions do not interfere), Durability (committed data survives crashes). Standard for relational databases. NoSQL databases often relax ACID for scalability.

BASE

Data

Alternative to ACID for distributed systems: Basically Available (system is available), Soft state (state may change without input), Eventually consistent (system converges to consistent state). Used by NoSQL databases, event-driven systems.

Reverse Proxy

Networking

A server that sits in front of web servers and forwards client requests. Provides: load balancing, SSL termination, caching, compression, and security (hiding origin servers). NGINX, HAProxy, Caddy, Traefik. Every production deployment should use one.

DNS

Networking

Domain Name System: translates domain names to IP addresses. DNS-based load balancing distributes traffic geographically. TTL controls cache duration. DNS failover for disaster recovery. Services: Route 53, Cloudflare DNS, Google Cloud DNS.

API Gateway

Networking

Single entry point for API requests. Handles: routing, auth, rate limiting, transformation, caching. Decouples clients from backend topology. Tools: Kong, AWS API Gateway, Traefik, Envoy.

Idempotency

Design

An operation producing the same result regardless of repetition count. GET, PUT, DELETE are naturally idempotent. POST is not. Use idempotency keys for non-idempotent operations to prevent duplicates.

Eventual Consistency

Consistency

A consistency model where updates propagate asynchronously. The system converges to a consistent state over time. Standard in distributed systems, NoSQL databases, and event-driven architectures. Trade-off: higher availability at the cost of temporary staleness.

Strong Consistency

Consistency

Every read returns the most recent write. Requires synchronous replication or consensus (Raft, Paxos). Lower availability during partitions. Used by: relational databases, distributed consensus systems (etcd, ZooKeeper).

Partitioning

Data

Dividing data across multiple nodes. Horizontal (sharding): different rows on different nodes. Vertical: different columns on different nodes. Range-based: partition by key range. Hash-based: partition by hash of key. Enables horizontal scaling.

Replication

Data

Copying data across multiple nodes for redundancy and read scaling. Single-leader: one primary handles writes, replicas sync. Multi-leader: multiple nodes accept writes (conflict resolution needed). Leaderless: any node accepts reads/writes (quorum-based).

Consensus

Distributed Systems

Agreement among distributed nodes on a single value. Algorithms: Raft (etcd, Consul), Paxos (Google Spanner), Zab (ZooKeeper). Used for: leader election, distributed locks, configuration management. Requires majority of nodes (quorum) to agree.

Bloom Filter

Data Structure

A probabilistic data structure that tests whether an element is in a set. False positives possible, false negatives impossible. Very space-efficient. Used by: databases (check if key exists before disk read), CDNs, spam filters, web crawlers.

Write-Ahead Log (WAL)

Data

A log where all changes are written before being applied to the database. Enables crash recovery (replay log after crash), replication (send log to replicas), and CDC (read log for change events). Used by PostgreSQL, MySQL, Kafka.

Leader Election

Distributed Systems

The process of choosing one node as the leader (primary) in a distributed system. The leader handles writes or coordination. If the leader fails, a new election occurs. Algorithms: Raft, Paxos, Bully. Tools: etcd, ZooKeeper, Consul.

Back Pressure

Resilience

A flow control mechanism where a system signals upstream that it cannot handle more load. Prevents overwhelming downstream services. Implementation: reject requests (429), queue and process slowly, reduce producer rate. Essential for streaming systems.

Data Lake

Data

A centralized repository for structured and unstructured data at any scale. Store raw data and transform on read (schema-on-read). Technologies: S3, HDFS, Delta Lake, Apache Iceberg. Used for: analytics, ML training, data science exploration.

Data Warehouse

Data

A system optimized for analytical queries on structured data. Schema-on-write, columnar storage, pre-aggregated. Technologies: Snowflake, BigQuery, Redshift, ClickHouse. Used for: business intelligence, reporting, dashboards.

Event Sourcing

Pattern

Storing state changes as immutable events rather than current state. Current state derived by replaying events. Benefits: full audit trail, temporal queries, event replay. Challenges: schema evolution, snapshot management.

CQRS

Pattern

Command Query Responsibility Segregation: separate models for reads and writes. Write model handles commands, read model optimized for queries. Can use different databases. Benefits: independent scaling, optimized data models.

Service Discovery

Infrastructure

Mechanism for services to find each other in dynamic environments. Client-side: query registry (Consul, Eureka). Server-side: load balancer routes (K8s Service). DNS-based: service names resolve to IPs.

Observability

Operations

Understanding system state from external outputs. Three pillars: metrics (Prometheus), logs (ELK/Loki), traces (Jaeger/Tempo). OpenTelemetry unifies instrumentation. Essential for operating distributed systems.

SLO/SLI/SLA

Operations

SLI (Service Level Indicator): measurable metric (latency, availability). SLO (Service Level Objective): target for SLI (99.9% availability). SLA (Service Level Agreement): contractual commitment with consequences. Error budget = 1 - SLO.

Twelve-Factor App

Methodology

A methodology for building SaaS applications. Principles: codebase in version control, explicit dependencies, config in environment, backing services as resources, build/release/run separation, stateless processes, port binding, concurrency via processes, disposability, dev/prod parity, logs as event streams, admin processes.

Blue-Green Deployment

Deployment

Two identical environments. Deploy to idle, test, switch traffic. Instant rollback by switching back. Requires double infrastructure during transition.

Canary Deployment

Deployment

Deploy new version to small traffic subset (5%), monitor metrics, gradually increase. Catches issues under real traffic. Tools: Istio, Argo Rollouts, Flagger.

Graceful Degradation

Resilience

System continues to function with reduced capability when components fail. Example: show cached product catalog when catalog service is down. Provide fallback responses, disable non-essential features, prioritize core functionality.

Thundering Herd

Problem

Many clients simultaneously requesting the same resource after cache expiration or service recovery. Causes: cache stampede, service restart. Prevention: jitter on cache TTL, request coalescing, circuit breaker, staggered retry with backoff.

Hot Spot

Problem

A node or partition receiving disproportionately more traffic than others. Causes: poor shard key selection (celebrity user, popular product), time-based partitioning during peak hours. Prevention: random suffix on keys, pre-splitting, separate hot data.

Webhook

Integration

Server pushes notifications to a client URL when events occur. Client registers callback URL; server sends POST on event. Used for: payment confirmations, CI/CD triggers, integrations. Must handle: retries, signature verification, idempotency.

WebSocket

Protocol

Full-duplex communication over a single TCP connection. Bidirectional real-time data. Used for: chat, live dashboards, gaming, collaborative editing. Alternative: SSE (Server-Sent Events) for server-to-client only.

gRPC

Protocol

High-performance RPC framework using Protocol Buffers and HTTP/2. Features: bidirectional streaming, code generation, strong typing, smaller payloads than JSON. Used for: internal microservice communication, mobile backends.

Object Storage

Storage

Storage for unstructured data (files, images, videos) with flat namespace and HTTP API. Services: AWS S3, GCS, Azure Blob, MinIO (self-hosted). Features: durability (11 nines), versioning, lifecycle policies, event notifications. Standard for storing user uploads and static assets.

Connection Pooling

Performance

Maintaining a pool of reusable database connections. Avoids the overhead of creating/destroying connections per request. Tools: PgBouncer (PostgreSQL), ProxySQL (MySQL), application-level pools (HikariCP). Essential for performance at scale.

Geo-Replication

Data

Replicating data across multiple geographic regions. Benefits: low latency for global users, disaster recovery. Challenges: cross-region consistency, compliance (data residency). Services: CockroachDB, Spanner, DynamoDB Global Tables, Cosmos DB.

Backpressure

Resilience

Flow control mechanism where a system signals upstream that it cannot handle more load. Prevents overwhelming downstream. Implementations: reject with 429, bounded queues, reactive streams (Project Reactor, RxJS).

Content Negotiation

HTTP

HTTP mechanism for selecting response format. Client sends Accept header, server responds in requested format. Used for: JSON vs XML, language selection, API versioning via media types.

Health Check

Operations

Endpoint (/health, /ready) reporting service status. Liveness: is the process running? Readiness: can it handle traffic? Used by load balancers and orchestrators for routing and restart decisions.

Distributed Lock

Coordination

A lock mechanism that works across multiple machines. Ensures only one process can access a resource at a time. Implementations: Redis (Redlock), ZooKeeper, etcd. Use sparingly: prefer idempotent designs over distributed locks.

FAQ (20 Questions)

Raw Data Downloads

Citations and Sources

Martin Kleppmann. “Designing Data-Intensive Applications.” 2017. https://dataintensive.net

Alex Xu. “System Design Interview.” 2022. https://www.amazon.com/System-Design-Interview-insiders-Second/dp/B08CMF2CQF

AWS. “AWS Well-Architected Framework.” 2026. https://docs.aws.amazon.com/wellarchitected/

Google. “Google SRE Book.” 2016. https://sre.google/sre-book/table-of-contents/

Cloudflare. “Cloudflare Learning Center.” 2026. https://www.cloudflare.com/learning/

Redis. “Redis Documentation.” 2026. https://redis.io/docs/

Apache Kafka. “Kafka Documentation.” 2026. https://kafka.apache.org/documentation/

PostgreSQL. “PostgreSQL Documentation.” 2026. https://www.postgresql.org/docs/

NGINX. “NGINX Documentation.” 2026. https://nginx.org/en/docs/

Microsoft. “Cloud Design Patterns.” 2026. https://learn.microsoft.com/en-us/azure/architecture/patterns/

Werner Vogels. “All Things Distributed.” 2025. https://www.allthingsdistributed.com

Try These Tools for Free

Put this knowledge into practice with our browser-based tools. No signup needed.

🔌

API Tester

Test REST APIs with GET, POST, PUT, DELETE, PATCH. Custom headers, body, response viewer, and session history.

{ }

JSON Formatter

Format, validate, and beautify JSON data with syntax highlighting.

🌐

Subnet Calc

Calculate network address, broadcast, host range, subnet mask, and number of hosts from IP + CIDR.

🐳

Dockerfile Gen

Generate Dockerfiles for Node, Python, Go, Java, Nginx, and Alpine. Configure port, env vars, and commands.

Related Research Reports

Microservices Architecture Guide 2026: Monolith vs Microservices, Service Mesh, CQRS, Saga

The definitive microservices guide for 2026. Monolith vs microservices, modular monolith, service mesh, event-driven, CQRS, saga, DDD. 41 glossary, 15 FAQ. 30,000+ words.

30,000 words 60 min

Read report

Database Comparison Guide 2026: MySQL vs PostgreSQL vs MongoDB vs Redis vs SQLite vs Supabase

Comprehensive comparison of 6 databases with performance benchmarks, feature matrices, pricing, scalability analysis, ORM compatibility, developer satisfaction data, and use case recommendations for every scenario. 28,000+ words.

28,000 words 60 min

Read report

The Complete Cloud Computing Guide 2026: AWS vs Azure vs GCP, Serverless, Containers & IaC

The definitive cloud computing reference for 2026. Covers AWS, Azure, GCP service comparisons, serverless architecture, container orchestration, Infrastructure as Code, cost optimization, and multi-cloud strategies. 28,000+ words.

28,000 words 60 min

Read report

Executive Summary

90%

Cloud adoption

+45%since 2018

85%

CDN usage

+45%since 2018

78%

Containerized

+58%since 2018

42%

Serverless

+37%since 2018

Part 1: Scaling Fundamentals

Infrastructure Adoption (2018-2026)

Source: OnlineTools4Free Research

Part 2: Load Balancing

Load Balancer Comparison (2026)

6 rows

Load Balancer	Type	Algorithms	Best For
NGINX	Software/Reverse Proxy	Round-robin, least-conn, ip-hash, weighted	High-performance reverse proxy, static serving
HAProxy	Software	Round-robin, leastconn, source, uri	TCP/HTTP load balancing, health checking
AWS ALB	Managed (AWS)	Round-robin, least outstanding requests	AWS HTTP/HTTPS load balancing
AWS NLB	Managed (AWS)	Flow hash	Ultra-low latency TCP/UDP, static IP
Cloudflare LB	Managed (Global)	Round-robin, geo, weighted	Global load balancing with CDN
Envoy	Software/Service Mesh	Round-robin, least-request, ring-hash	Service mesh, gRPC, observability

Part 3: Caching Strategies

Caching Strategies Comparison

5 rows

Strategy	Description	Consistency	Best For
Cache-Aside (Lazy Loading)	App checks cache first. On miss, fetches from DB and writes to cache. Most common pattern.	Eventual (stale reads possible)	Read-heavy workloads, general purpose
Write-Through	App writes to cache and DB simultaneously. Cache is always up-to-date.	Strong	Write-heavy with read requirements
Write-Behind (Write-Back)	App writes to cache only. Cache asynchronously writes to DB. Higher throughput.	Eventual (data loss risk)	Very high write throughput
Read-Through	Cache sits between app and DB. On miss, cache fetches from DB automatically.	Eventual	Simplifying cache logic in app code
Refresh-Ahead	Cache proactively refreshes entries before they expire based on access patterns.	Eventual (fresher)	Predictable access patterns, hot data

Part 4: Database Selection

Database Comparison (2026)

8 rows

Database	Type	Consistency	Scalability	Best For
PostgreSQL	Relational	ACID	Vertical + read replicas	General purpose, complex queries, JSONB
MySQL	Relational	ACID	Vertical + replicas + Vitess	Web applications, WordPress, read-heavy
MongoDB	Document	Tunable	Horizontal (sharding)	Flexible schema, rapid development, JSON data
Redis	Key-Value / Cache	Eventual (replicas)	Cluster mode (horizontal)	Caching, sessions, real-time leaderboards
Elasticsearch	Search Engine	Near real-time	Horizontal (shards)	Full-text search, log analytics, faceted search
DynamoDB	Key-Value / Document	Eventual or strong	Horizontal (managed)	AWS serverless, single-digit ms latency
Cassandra	Wide Column	Tunable (AP)	Horizontal (multi-DC)	Write-heavy, time-series, geo-distributed
ClickHouse	Columnar/OLAP	Eventual	Horizontal	Analytics, real-time aggregation, log analysis

Part 5: Message Queues

Part 6: Core System Design Concepts

System Design Concepts Reference

10 rows

Concept	Category	Description
Horizontal Scaling	Scaling	Adding more machines to handle increased load. Requires stateless services, load balancing, and distributed data. Preferred over vertical scaling for web applications.
Vertical Scaling	Scaling	Adding more CPU, RAM, or storage to an existing machine. Simpler but has hardware limits. Good for databases that are hard to distribute (PostgreSQL).
CDN (Content Delivery Network)	Caching	Geographically distributed cache for static and dynamic content. Reduces latency by serving from edge locations near users. Providers: Cloudflare, AWS CloudFront, Fastly, Akamai.
Message Queue	Async	Asynchronous communication between services. Decouples producers and consumers. Handles traffic spikes. Tools: Kafka, RabbitMQ, SQS, NATS. Essential for event-driven architecture.
Database Sharding	Data	Distributing data across multiple database instances by a shard key. Enables horizontal scaling of databases. Challenges: cross-shard queries, rebalancing, shard key selection.
Read Replicas	Data	Copies of the primary database that serve read queries. Write to primary, read from replicas. Eventual consistency. Simple way to scale read-heavy workloads without sharding.
Rate Limiting	Protection	Limiting requests per client per time window. Algorithms: token bucket, sliding window, fixed window. Prevents abuse and protects services. HTTP 429 Too Many Requests.
Circuit Breaker	Resilience	Stop calling a failing downstream service. States: Closed (normal), Open (fail fast), Half-Open (trial). Prevents cascading failures. Tools: Resilience4j, Polly, Envoy.
Consistent Hashing	Distribution	A hashing scheme that minimizes key redistribution when nodes are added/removed. Used by: distributed caches (Redis Cluster), load balancers, CDNs. Only K/N keys need to move when adding a node.
CAP Theorem	Theory	A distributed system can provide at most two of three guarantees: Consistency (all nodes see the same data), Availability (every request gets a response), Partition tolerance (system works despite network partitions). In practice, you choose CP or AP.

Infrastructure Trends (2022-2026)

Source: OnlineTools4Free Research

Part 7: Best Practices

Glossary (50 Terms)

Load Balancer

Networking

CDN

Caching

Performance

Database Sharding

Data

Read Replica

Data

Message Queue

Messaging

Horizontal Scaling

Scaling

Vertical Scaling

Scaling

Adding more resources (CPU, RAM) to an existing machine. Simpler but has hardware limits and creates a single point of failure. Good for: databases hard to distribute, initial scaling.

CAP Theorem

Theory

Consistent Hashing

Distribution

Rate Limiting

Protection

Circuit Breaker

Resilience

ACID

Data

BASE

Data

Reverse Proxy

Networking

DNS

Networking

API Gateway

Networking

Single entry point for API requests. Handles: routing, auth, rate limiting, transformation, caching. Decouples clients from backend topology. Tools: Kong, AWS API Gateway, Traefik, Envoy.

Idempotency

Design

Eventual Consistency

Consistency

Strong Consistency

Consistency

Partitioning

Data

Replication

Data

Consensus

Distributed Systems

Bloom Filter

Data Structure

Write-Ahead Log (WAL)

Data

Leader Election

Distributed Systems

Back Pressure

Resilience

Data Lake

Data

Data Warehouse

Data

Event Sourcing

Pattern

CQRS

Pattern

Service Discovery

Infrastructure

Observability

Operations

SLO/SLI/SLA

Operations

Twelve-Factor App

Methodology

Blue-Green Deployment

Deployment

Two identical environments. Deploy to idle, test, switch traffic. Instant rollback by switching back. Requires double infrastructure during transition.

Canary Deployment

Deployment

Deploy new version to small traffic subset (5%), monitor metrics, gradually increase. Catches issues under real traffic. Tools: Istio, Argo Rollouts, Flagger.

Graceful Degradation

Resilience

Thundering Herd

Problem

Hot Spot

Problem

Webhook

Integration

WebSocket

Protocol

gRPC

Protocol

Object Storage

Storage

Connection Pooling

Performance

Geo-Replication

Data

Backpressure

Resilience

Content Negotiation

HTTP

HTTP mechanism for selecting response format. Client sends Accept header, server responds in requested format. Used for: JSON vs XML, language selection, API versioning via media types.

Health Check

Operations

Endpoint (/health, /ready) reporting service status. Liveness: is the process running? Readiness: can it handle traffic? Used by load balancers and orchestrators for routing and restart decisions.

Distributed Lock

Coordination

FAQ (20 Questions)

Raw Data Downloads

Citations and Sources

Martin Kleppmann. “Designing Data-Intensive Applications.” 2017. https://dataintensive.net

Alex Xu. “System Design Interview.” 2022. https://www.amazon.com/System-Design-Interview-insiders-Second/dp/B08CMF2CQF

AWS. “AWS Well-Architected Framework.” 2026. https://docs.aws.amazon.com/wellarchitected/

Google. “Google SRE Book.” 2016. https://sre.google/sre-book/table-of-contents/

Cloudflare. “Cloudflare Learning Center.” 2026. https://www.cloudflare.com/learning/

Redis. “Redis Documentation.” 2026. https://redis.io/docs/

Apache Kafka. “Kafka Documentation.” 2026. https://kafka.apache.org/documentation/

PostgreSQL. “PostgreSQL Documentation.” 2026. https://www.postgresql.org/docs/

NGINX. “NGINX Documentation.” 2026. https://nginx.org/en/docs/

Microsoft. “Cloud Design Patterns.” 2026. https://learn.microsoft.com/en-us/azure/architecture/patterns/

Werner Vogels. “All Things Distributed.” 2025. https://www.allthingsdistributed.com

Try These Tools for Free

Put this knowledge into practice with our browser-based tools. No signup needed.

🔌

API Tester

Test REST APIs with GET, POST, PUT, DELETE, PATCH. Custom headers, body, response viewer, and session history.

{ }

JSON Formatter

Format, validate, and beautify JSON data with syntax highlighting.

🌐

Subnet Calc

Calculate network address, broadcast, host range, subnet mask, and number of hosts from IP + CIDR.

🐳

Dockerfile Gen

Generate Dockerfiles for Node, Python, Go, Java, Nginx, and Alpine. Configure port, env vars, and commands.

Related Research Reports

Microservices Architecture Guide 2026: Monolith vs Microservices, Service Mesh, CQRS, Saga

The definitive microservices guide for 2026. Monolith vs microservices, modular monolith, service mesh, event-driven, CQRS, saga, DDD. 41 glossary, 15 FAQ. 30,000+ words.

30,000 words 60 min

Read report

Database Comparison Guide 2026: MySQL vs PostgreSQL vs MongoDB vs Redis vs SQLite vs Supabase

28,000 words 60 min

Read report

The Complete Cloud Computing Guide 2026: AWS vs Azure vs GCP, Serverless, Containers & IaC

28,000 words 60 min

Read report

Executive Summary

Part 1: Scaling Fundamentals

Infrastructure Adoption (2018-2026)

Part 2: Load Balancing

Load Balancer Comparison (2026)

Part 3: Caching Strategies

Caching Strategies Comparison

Part 4: Database Selection

Database Comparison (2026)

Part 5: Message Queues

Part 6: Core System Design Concepts

System Design Concepts Reference

Infrastructure Trends (2022-2026)

Part 7: Best Practices

Glossary (50 Terms)

Load Balancer

CDN

Caching

Database Sharding

Read Replica

Message Queue

Horizontal Scaling

Vertical Scaling

CAP Theorem

Consistent Hashing

Rate Limiting

Circuit Breaker

ACID

BASE

Reverse Proxy

DNS

API Gateway

Idempotency

Eventual Consistency

Strong Consistency

Partitioning

Replication

Consensus

Bloom Filter

Write-Ahead Log (WAL)

Leader Election

Back Pressure

Data Lake

Data Warehouse

Event Sourcing

CQRS

Service Discovery

Observability

SLO/SLI/SLA

Twelve-Factor App

Blue-Green Deployment

Canary Deployment

Graceful Degradation

Thundering Herd

Hot Spot

Webhook

WebSocket

gRPC

Object Storage

Connection Pooling

Geo-Replication

Backpressure

Content Negotiation

Health Check

Distributed Lock

FAQ (20 Questions)

Raw Data Downloads

Citations and Sources

Related Articles and Tools

Try These Tools for Free

Related Research Reports

Microservices Architecture Guide 2026: Monolith vs Microservices, Service Mesh, CQRS, Saga

Database Comparison Guide 2026: MySQL vs PostgreSQL vs MongoDB vs Redis vs SQLite vs Supabase

The Complete Cloud Computing Guide 2026: AWS vs Azure vs GCP, Serverless, Containers & IaC

Executive Summary

Part 1: Scaling Fundamentals

Infrastructure Adoption (2018-2026)

Part 2: Load Balancing

Load Balancer Comparison (2026)

Part 3: Caching Strategies