Cloud Load Balancer Selection and Configuration: ALB, NLB, or CLB? Choose Right, Not Expensive

Last Black Friday, an e‑commerce client called me at midnight, panicked. “We added thirty more instances, but the system is still melting down. Users are retrying like crazy.”

I logged into their console. CPU and memory were fine on the surviving instances. But their load balancer’s health checks had marked half the backend targets as unhealthy. Traffic was slammed onto the remaining healthy ones. I looked at the settings: interval 5 seconds, timeout 2 seconds, two failures and the target was removed. Their backend occasionally saw a 300ms latency spike during peak load—just over 2 seconds. The load balancer kicked them out.

A single health check parameter nearly killed their biggest sales day.

This is the overlooked reality of load balancing: choosing the right type is only half the battle. Misconfigure it, and you’re still in trouble.

Today, let’s talk about cloud load balancers. Not the “load balancing is important” fluff, but a practical guide: ALB vs NLB vs CLB—how to choose? Health checks, sticky sessions, slow start—how to configure them? And what traps will ruin your day?

01 Layer 7 or Layer 4? The First Big Decision

Cloud providers offer three main types of load balancers: Layer 7 (HTTP/HTTPS), Layer 4 (TCP/UDP), and sometimes a hybrid. Your choice depends on your traffic and requirements.

Layer 7 (ALB / CLB Layer‑7)

These understand HTTP. They can route based on URL paths, headers, cookies. They support SSL termination, WebSockets, redirects, and rewrites.

Good for: Web applications, API gateways, microservice entry points. Anywhere you need path‑based or host‑based routing.

Not good for: Pure TCP traffic (like database connections, gaming long‑polling), extreme performance requirements (Layer‑7 processing adds overhead).

Layer 4 (NLB / CLB Layer‑4)

These operate at the transport layer. They look at IP and port, not application data. Performance is extremely high, latency is low, and they support millions of concurrent connections.

Good for: TCP/UDP traffic, database proxies, gaming services, IoT ingestion.

Not good for: Scenarios that need URL routing, cookie‑based stickiness, or SSL termination.

Counter‑intuitive truth: Not every workload belongs on Layer 7. Many people default to Layer 7 because “more features must be better.” But Layer‑7 processing can be several times heavier than Layer‑4. For pure TCP traffic, using Layer 7 is just wasting capacity.

Real example: A gaming company used a Layer‑7 load balancer for TCP long‑polling connections. Each instance topped out at 50,000 connections. Switching to Layer‑4 NLB, the same instance handled over a million connections.

02 Core Configurations: Details That Make or Break You

Choose the right type. Then configure it correctly.

Health Checks

This is the most fragile configuration—and the one that affects stability the most.

Interval: Default is 5‑10 seconds. Shorter isn’t always better. Too short adds load and may treat momentary jitter as failure. For most applications, 10 seconds is fine. For latency‑sensitive services, 5 seconds may be appropriate.
Timeout: Usually 1/2 to 2/3 of the interval. If your backend sometimes processes slowly (e.g., during peak load), set timeout generously. A timeout that’s too tight will mark healthy backends as unhealthy.
Healthy threshold: How many consecutive successes before marking a target healthy again. Default is 2‑3. Setting it too low might send traffic to an instance that hasn’t fully warmed up.
Unhealthy threshold: How many consecutive failures before marking a target unhealthy. Default is 2‑3. Too sensitive, and you’ll eject instances for minor hiccups. Too lenient, and you’ll keep failing instances in rotation.

Golden rule: It’s better to keep a questionable target than to eject a healthy one. A questionable target might cause a few slow responses. Ejecting a healthy one can overload your remaining capacity and trigger a cascade.

Sticky Sessions (Session Affinity)

When enabled, all requests from a client go to the same backend. Useful for legacy applications that store session state locally.

But the cost is high: Stickiness can cause load imbalance. One “heavy” user can overload a single backend while others sit idle. Avoid it if you can. If you must keep state, move it to a shared store like Redis. Stateless backends are always better.

Slow Start

Newly registered targets receive a gradually increasing share of traffic, not the full load immediately. This prevents a just‑started instance from being crushed while its caches are cold and its JVM is still warming up.

For flash sales or any traffic surge, slow start is essential. Without it, your freshly scaled instances will get the full traffic spike immediately and likely time out.

03 Common Traps (And How to Avoid Them)

Trap 1: Cross‑zone data transfer charges

Many people don’t realize that cross‑zone traffic can incur charges. If your targets are unevenly distributed across Availability Zones, the load balancer may forward a request from a client in AZ‑A to a target in AZ‑B—and you pay for that cross‑zone transfer. Solution: Place at least one target in each AZ. Then the load balancer can keep traffic local.

Trap 2: Misconfigured connection timeouts

The idle timeout between the load balancer and the backend must be set appropriately. Too short, and long‑running requests get cut off (client sees 504). Too long, and you may accumulate zombie connections. For typical web apps, 60‑120 seconds works. For file uploads or streaming, increase it.

Trap 3: Inconsistent keep‑alive settings

Load balancers often reuse connections to backends. If your backend’s keep‑alive timeout is shorter than the load balancer’s, the backend may close the connection while the load balancer still thinks it’s open. Result: errors. Solution: Match the keep‑alive settings on both sides.

04 Selection Decision Table

Workload	Recommended Type	Why
Standard web / API	ALB / L7 CLB	Path‑based routing, SSL termination, rewrites
Microservice gateway	ALB / L7 CLB	Header‑based routing
TCP long‑polling	NLB / L4 CLB	High performance, massive connections
UDP traffic	NLB / L4 CLB	L7 doesn’t support UDP
Database proxy	NLB / L4 CLB	Pure TCP, no L7 features needed
Global multi‑region	GA / Global Accelerator	Intelligent routing across regions
WebSockets	ALB / L7 CLB	Native support
gRPC	ALB / L7 CLB	Native gRPC routing; L4 just passes raw bytes

05 A Real Story: Fixing a Black Friday Near‑Miss

Back to the e‑commerce client. We debugged through the night.

The root cause: health check timeout was 2 seconds, interval 5 seconds, unhealthy threshold 2. During the traffic surge, backend response times occasionally exceeded 2 seconds. The load balancer marked those targets unhealthy. They were ejected, then re‑added after passing a few checks, then ejected again. Traffic oscillated. A cascading failure was imminent.

We adjusted the settings: timeout increased to 5 seconds, interval left at 5 seconds, unhealthy threshold changed from 2 to 3. The health checks stabilized. Traffic balanced again.

The next day, peak traffic was even higher than the night before, but the system held.

Their ops lead later said: “I used to think health checks were just a checkbox. Now I know—set them wrong, and they’re worse than having no checks at all.”

The Bottom Line

A load balancer seems simple—just distribute traffic. But the details matter. Choose the wrong type, and you leave performance on the table. Configure health checks badly, and you destabilize your whole system.

Remember the rules: Prefer Layer 4 unless you truly need Layer 7 features. Make health checks tolerant, not aggressive. Avoid sticky sessions unless you have no other choice. Enable slow start for any scaled event. Watch your cross‑zone traffic.

That ops lead added one more thing: “A load balancer is like the front door of a building. If the door isn’t wide enough, it doesn’t matter how big the rooms are inside.”

How wide is your front door?