Why You Need a Load Balancer
When you decide to horizontally scale, you add more servers. But clients can't talk to "the server" anymore—there are multiple. You need something in front that receives requests and sends them to the right place. That's a load balancer.
Clients talk to the load balancer. The load balancer routes traffic to one of your web servers. Clients no longer talk to the servers directly.
For even more security, the servers inside your cluster can be reachable only through private IPs. The load balancer sits at the edge (or in a DMZ) and is the only public entry point. That's the idea behind a VPC (Virtual Private Cloud)—your servers live in a private network, and traffic comes in through the load balancer.
What Does a Load Balancer Do?
Spread the Load
The load balancer distributes requests across your servers so no single server gets overwhelmed. Each server does less work, stays responsive, and you handle more traffic overall.
Improve Availability
When a server goes down (crash, reboot, deployment), the load balancer stops sending traffic to it and routes only to healthy servers. Users don't see errors—traffic just goes somewhere else. When the server comes back, the load balancer can start sending traffic to it again.
Load Balancing Algorithms
How does the load balancer decide which server gets the next request? Common algorithms:
| Algorithm | How it works |
|---|
| Round Robin | Send each request to the next server in order. Simple, fair for equal-capacity servers. |
| Least Connections | Send to the server with the fewest active connections. Good when requests have different durations. |
| Least Response Time | Send to the server with the fastest recent response time. Routes away from slow or busy servers. |
| Least Bandwidth | Send to the server currently using the least bandwidth. |
| Weighted | Some servers are stronger than others. Assign weights and distribute proportionally. |
Pick based on your workload. Round Robin is the simplest; Least Connections or Least Response Time often work better when requests vary in cost or duration.
Health Checks
How does the load balancer know which servers are still alive?
Health checks. The load balancer periodically hits an endpoint on each server (e.g., /health) or checks if a port is open. If the server doesn't respond in time or returns an error, the load balancer marks it unhealthy and stops sending traffic until it starts responding again.
Without health checks, the load balancer would keep sending requests to dead or overloaded servers. Health checks are what make failover and high availability possible.