System Design

Load Balancer

February 14, 20263 min read

Why You Need a Load Balancer

When you decide to horizontally scale, you add more servers. But clients can't talk to "the server" anymore—there are multiple. You need something in front that receives requests and sends them to the right place. That's a load balancer.

Clients talk to the load balancer. The load balancer routes traffic to one of your web servers. Clients no longer talk to the servers directly.

No diagram content

For even more security, the servers inside your cluster can be reachable only through private IPs. The load balancer sits at the edge (or in a DMZ) and is the only public entry point. That's the idea behind a VPC (Virtual Private Cloud)—your servers live in a private network, and traffic comes in through the load balancer.


What Does a Load Balancer Do?

Spread the Load

The load balancer distributes requests across your servers so no single server gets overwhelmed. Each server does less work, stays responsive, and you handle more traffic overall.

Improve Availability

When a server goes down (crash, reboot, deployment), the load balancer stops sending traffic to it and routes only to healthy servers. Users don't see errors—traffic just goes somewhere else. When the server comes back, the load balancer can start sending traffic to it again.


Load Balancing Algorithms

How does the load balancer decide which server gets the next request? Common algorithms:

AlgorithmHow it works
Round RobinSend each request to the next server in order. Simple, fair for equal-capacity servers.
Least ConnectionsSend to the server with the fewest active connections. Good when requests have different durations.
Least Response TimeSend to the server with the fastest recent response time. Routes away from slow or busy servers.
Least BandwidthSend to the server currently using the least bandwidth.
WeightedSome servers are stronger than others. Assign weights and distribute proportionally.

Pick based on your workload. Round Robin is the simplest; Least Connections or Least Response Time often work better when requests vary in cost or duration.


Health Checks

How does the load balancer know which servers are still alive?

Health checks. The load balancer periodically hits an endpoint on each server (e.g., /health) or checks if a port is open. If the server doesn't respond in time or returns an error, the load balancer marks it unhealthy and stops sending traffic until it starts responding again.

Without health checks, the load balancer would keep sending requests to dead or overloaded servers. Health checks are what make failover and high availability possible.