A load balancer sizing guide helps infrastructure engineers calculate the minimum number of server instances needed to handle a traffic load — and how many to provision for redundancy. Based on Little's Law and standard capacity planning formulas.
Traffic Parameters
Node.js event loop: 100-500, Java/Go: 200-1000, PHP-FPM: equal to workers
Instance Requirements
Load Balancer Algorithm
How to Size Your Load Balancer Setup
Load balancer sizing uses Little's Law: the number of concurrent requests in a system equals throughput multiplied by average response time. This determines your minimum instance count.
Step 1: Measure Peak Traffic
Use your current metrics or estimate from business requirements. Peak RPS is typically 3-5x your average RPS. For a new service, estimate from user count × actions per session ÷ session duration.
Step 2: Benchmark Response Time and Concurrency
Run load tests with k6, Locust, or Gatling to measure: average response time at your target RPS, and how many concurrent connections each instance can handle before response time degrades. Response time typically increases nonlinearly beyond 70-75% CPU.
Step 3: Apply Little's Law
Concurrent requests = RPS × response_time_seconds. At 500 RPS with 200ms response time: 500 × 0.2 = 100 concurrent requests. If each instance handles 60 concurrent at 65% CPU utilization: ceiling(100/60) = 2 instances minimum. Apply 2x redundancy = 4 instances total.
Step 4: Choose Load Balancer Algorithm
Round-robin for uniform requests. Least-connections for mixed workloads (some requests much slower than others). IP-hash for WebSocket connections that need sticky sessions. For cloud deployments, use a managed load balancer (AWS ALB, GCP Cloud Load Balancing) rather than self-managed nginx.
Frequently Asked Questions
Is this load balancer sizing guide free?
Yes, completely free. All calculations run in your browser.
How do I calculate how many servers I need?
Use Little's Law: instances = (requests/second × avg response time in seconds) / target CPU utilization per instance. For example, 500 RPS × 0.2s response time = 100 concurrent requests. At 70% target CPU, handling 50 concurrent per instance = 2 instances. Always add redundancy: 2-3x the minimum for failover.
What is Little's Law and why does it apply to load balancing?
Little's Law states that average concurrent requests (L) = throughput (λ) × average response time (W). L = λW. In practice: 1,000 RPS at 200ms avg = 200 concurrent requests in-flight at any moment. If each instance handles 50 concurrent at 70% CPU, you need 4 instances plus headroom for spikes.
What is the target CPU utilization for a web server?
Target 60-70% CPU utilization under normal load, leaving headroom for traffic spikes. Running at 90%+ CPU causes degraded response times and leaves no room for sudden traffic increases. At 70% utilization, a 2x traffic spike won't crash the server (peaks to ~100% but doesn't sustain it).
What load balancer algorithm should I use?
Round-robin works for stateless services with similar request processing times. Least-connections is better when request processing times vary significantly — it routes to the server with fewest active connections. IP Hash is needed for sticky sessions (WebSockets, file uploads). Weighted round-robin is used when instances have different capacities.
How many instances should I have for high availability?
Minimum 2 instances for redundancy — if one fails, the other handles all traffic. For production, 3+ instances (N+1 redundancy) is recommended so one failure doesn't put remaining instances near capacity. For critical services, N+2 redundancy (2 can fail simultaneously) is common.