Load Balancer Sizing Guide

Calculate how many server instances you need for your traffic load

A load balancer sizing guide helps infrastructure engineers calculate the minimum number of server instances needed to handle a traffic load — and how many to provision for redundancy. Based on Little's Law and standard capacity planning formulas.

Traffic Parameters

Node.js event loop: 100-500, Java/Go: 200-1000, PHP-FPM: equal to workers

Instance Requirements

Load Balancer Algorithm

How to Size Your Load Balancer Setup

Load balancer sizing uses Little's Law: the number of concurrent requests in a system equals throughput multiplied by average response time. This determines your minimum instance count.

Step 1: Measure Peak Traffic

Use your current metrics or estimate from business requirements. Peak RPS is typically 3-5x your average RPS. For a new service, estimate from user count × actions per session ÷ session duration.

Step 2: Benchmark Response Time and Concurrency

Run load tests with k6, Locust, or Gatling to measure: average response time at your target RPS, and how many concurrent connections each instance can handle before response time degrades. Response time typically increases nonlinearly beyond 70-75% CPU.

Step 3: Apply Little's Law

Concurrent requests = RPS × response_time_seconds. At 500 RPS with 200ms response time: 500 × 0.2 = 100 concurrent requests. If each instance handles 60 concurrent at 65% CPU utilization: ceiling(100/60) = 2 instances minimum. Apply 2x redundancy = 4 instances total.

Step 4: Choose Load Balancer Algorithm

Round-robin for uniform requests. Least-connections for mixed workloads (some requests much slower than others). IP-hash for WebSocket connections that need sticky sessions. For cloud deployments, use a managed load balancer (AWS ALB, GCP Cloud Load Balancing) rather than self-managed nginx.

Frequently Asked Questions

Is this load balancer sizing guide free?

Yes, completely free. All calculations run in your browser.

How do I calculate how many servers I need?

Use Little's Law: instances = (requests/second × avg response time in seconds) / target CPU utilization per instance. For example, 500 RPS × 0.2s response time = 100 concurrent requests. At 70% target CPU, handling 50 concurrent per instance = 2 instances. Always add redundancy: 2-3x the minimum for failover.

What is Little's Law and why does it apply to load balancing?

Little's Law states that average concurrent requests (L) = throughput (λ) × average response time (W). L = λW. In practice: 1,000 RPS at 200ms avg = 200 concurrent requests in-flight at any moment. If each instance handles 50 concurrent at 70% CPU, you need 4 instances plus headroom for spikes.

What is the target CPU utilization for a web server?

Target 60-70% CPU utilization under normal load, leaving headroom for traffic spikes. Running at 90%+ CPU causes degraded response times and leaves no room for sudden traffic increases. At 70% utilization, a 2x traffic spike won't crash the server (peaks to ~100% but doesn't sustain it).

What load balancer algorithm should I use?

Round-robin works for stateless services with similar request processing times. Least-connections is better when request processing times vary significantly — it routes to the server with fewest active connections. IP Hash is needed for sticky sessions (WebSockets, file uploads). Weighted round-robin is used when instances have different capacities.

How many instances should I have for high availability?

Minimum 2 instances for redundancy — if one fails, the other handles all traffic. For production, 3+ instances (N+1 redundancy) is recommended so one failure doesn't put remaining instances near capacity. For critical services, N+2 redundancy (2 can fail simultaneously) is common.