Blog
Engineering
Kubernetes
AWS
6
minutes

Our migration from Kubernetes built-in NLB to ALB controller

A standard trap is trusting the in-tree Kubernetes service load balancer on AWS. When you delete a Service of type LoadBalancer, the in-tree controller frequently fails to delete the underlying AWS resources. You end up with dozens of orphaned Network Load Balancers silently racking up massive cloud bills. Transitioning to the out-of-tree AWS Load Balancer Controller is mandatory to stop the bleeding.
April 17, 2026
Pierre Mavro
CTO & Co-founder
Summary
Twitter icon
linkedin icon

Key Points:

  • The maintenance dead-end: The built-in Kubernetes NLB integration is legacy code. AWS does not actively maintain it, leading to unresolved bugs and orphaned infrastructure.
  • Feature limitations: The in-tree controller cannot handle modern networking requirements like PROXY protocol IP preservation or fine-grained target group attributes.
  • Migration hazards: Switching controllers provisions an entirely new load balancer with a new DNS name. Managing this DNS crossover without dropping traffic requires strict routing governance.

Working with Kubernetes Services is convenient, especially when you can deploy Load Balancers via cloud providers simply by declaring type: LoadBalancer.

At Qovery, our orchestration engine initially relied on the Kubernetes built-in Network Load Balancer (NLB). It seemed like the rational choice for maintaining cloud-agnostic deployments without adding extra dependencies.

The reality of Day-2 operations proved otherwise. We were forced to migrate to the AWS Load Balancer Controller (ALB Controller) to simplify management, stop billing leaks, and gain access to necessary routing features. If you are operating Amazon EKS clusters in production, moving to the out-of-tree controller from day one is non-negotiable.

The 1,000-cluster reality: why in-tree controllers fail at scale

Relying on the default Kubernetes load balancer works perfectly in a local development cluster. At an enterprise scale of thousands of clusters, relying on legacy in-tree cloud providers creates a massive financial and operational liability. An orphaned load balancer on a single cluster is an annoyance.

Across a fleet of hundreds of Amazon EKS clusters, orphaned load balancers generate thousands of dollars in cloud waste every month. Resolving this requires migrating to the AWS Load Balancer Controller and utilizing an Agentic Kubernetes Management Platform to enforce strict, standardized ingress configurations globally.

Managing 100+ K8s Clusters

From cluster sprawl to fleet harmony. Master the intent-based orchestration and predictive sizing required to build high-performing, AI-ready Kubernetes fleets.

Best practices to manage 100+ Kubernetes clusters

Why did we start with the in-tree NLB controller

For our customers and many platform engineers, the built-in NLB is the default choice because it ships natively with Kubernetes.

  • Kubernetes native: It uses native objects, reducing the need for deep AWS-specific knowledge.
  • Cloud-agnostic intent: It theoretically makes it easier to migrate to other cloud providers without rewriting complex ingress manifests. As a platform managing multi-cloud deployments, we must maintain transparency for our customers.
  • Low initial overhead: It requires zero additional Helm charts or IAM roles to install.

The operational cost of legacy code

Migration to the ALB Controller came four years after we initially adopted the built-in NLB. We survived without it for a long time, but the technical debt eventually compounded into critical failures.

We began facing severe infrastructure leaks. When a developer deleted an environment, the Kubernetes Service was removed, but the underlying AWS Network Load Balancer was not cleaned up correctly. AWS support confirmed they were no longer prioritizing fixes for the in-tree load balancer code, directing everyone to use their out-of-tree AWS Load Balancer Controller instead.

When you use the Kubernetes built-in NLB, you are entirely on your own. We had to manually instrument our Rust-based Qovery Engine to hunt down and delete orphaned AWS resources via the AWS API to enforce Kubernetes cost optimization.

// fix for NLB not properly removed by the legacy in-tree controller
pub fn clean_up_deleted_k8s_nlb(
    event_details: EventDetails,
    target: &DeploymentTarget,
) -> Result<(), Box<EngineError>> {
    // custom logic to force-delete orphaned AWS Load Balancers
    // to prevent massive cloud billing leaks
}

Feature gaps forced the migration

Beyond the bugs, we needed to leverage advanced AWS networking features that the built-in controller simply ignores. Moving to the AWS Load Balancer Controller provided access to critical annotations:

  • PROXY protocol support: service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*". This annotation preserves the client source IP address, which is mandatory for strict security auditing and rate limiting.
  • Direct pod routing: service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip". This bypasses kube-proxy and routes traffic directly to the pod IP addresses, reducing network hops and lowering latency.
  • Target group attributes: service.beta.kubernetes.io/aws-load-balancer-target-group-attributes. This allows fine-tuned control over the AWS target groups, such as enabling deregistration delay or sticky sessions directly from the Kubernetes manifest.
apiVersion: v1
kind: Service
metadata:
  name: api-gateway
  namespace: production
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector:
    app: api-gateway
  ports:
    - port: 443
      targetPort: 8443

The deployment hazard you must anticipate

When you migrate an existing Service from the in-tree controller to the AWS Load Balancer Controller, things will break if you are not careful.

The biggest failure point is DNS routing. The new controller provisions an entirely new load balancer with a completely new AWS DNS name. If you simply update your Service annotations on a live production deployment, Kubernetes will detach the old load balancer and spin up the new one. Because your external DNS (like Route53 or Cloudflare) still points to the old load balancer name, you will drop 100% of your incoming traffic while you wait for the new DNS records to propagate.

You must provision the new Service alongside the old one, update your DNS CNAME records, wait out the TTL expiration, and only then decommission the legacy Service.

🚀 Real-world proof

Hyperline wanted to accelerate their time to market and avoid the overhead of building custom DevOps pipelines for developer testing.

The result: Eliminated the need for a dedicated DevOps engineer, saving significant costs and improving deployment confidence through automated ephemeral environments. Read the Hyperline case study.

Intent-based ingress with Qovery

Installing the AWS Load Balancer Controller requires configuring strict AWS IAM roles for Service Accounts (IRSA), deploying the Helm chart, and managing webhook certificates. Doing this manually across thousands of clusters introduces massive configuration drift.

Qovery abstracts this complexity. As an Agentic Kubernetes Management Platform, Qovery natively handles the AWS Load Balancer Controller lifecycle across your Amazon EKS fleet.

# .qovery.yml
application:
  api-gateway:
    build_mode: docker
    ports:
      - internal_port: 8443
        publicly_accessible: true
        routing_type: custom_domain

Instead of fighting raw Kubernetes annotations and Terraform state files, platform teams declare their routing intent.

Qovery provisions the correct load balancers, attaches the target groups, and configures the networking automatically. This eliminates cost leaks from orphaned resources and ensures your ingress layer is permanently maintained.

FAQs

Why did AWS stop maintaining the in-tree Kubernetes load balancer?

The Kubernetes community mandated moving all cloud-specific provider code out of the core Kubernetes repository to reduce bloat and separate release cycles. AWS shifted all development focus to the out-of-tree AWS Load Balancer Controller, leaving the built-in controller as legacy code that receives no new features or non-critical bug fixes.

What happens when you delete an in-tree LoadBalancer Service on Amazon EKS?

Due to unpatched bugs in the legacy in-tree controller, deleting the Kubernetes Service frequently fails to trigger the deletion of the corresponding AWS Network Load Balancer. This leaves orphaned load balancers running in your AWS account, quietly consuming your cloud budget until you manually audit and delete them via the AWS console.

How do I migrate to the AWS Load Balancer Controller without downtime?

Migrating a Service to the new controller provisions a completely new AWS load balancer with a different DNS name. To avoid downtime, you must deploy the new Service alongside the old one, update your DNS CNAME records to point to the new load balancer, wait for the DNS TTL to expire globally, and then delete the legacy Service.

Share on :
Twitter icon
linkedin icon
Tired of fighting your Kubernetes platform?
Qovery provides a unified Kubernetes control plane for cluster provisioning, security, and deployments - giving you an enterprise-grade platform without the DIY overhead.
See it in action

Suggested articles

Kubernetes
 minutes
Stopping Kubernetes cloud waste: agentic automation for enterprise fleets

Agentic Kubernetes resource reclamation is the practice of using an autonomous control plane to continuously identify, suspend, and delete idle infrastructure across a multi-cloud Kubernetes fleet. It replaces manual cleanup and reactive autoscaling with intent-based policies that act on business state, eliminating the configuration drift and cloud waste typical of unmanaged fleets.

Mélanie Dallé
Senior Marketing Manager
Platform Engineering
Kubernetes
DevOps
10
 minutes
What is Kubernetes? The reality of Day-2 enterprise fleet orchestration

Kubernetes focuses on container orchestration, but the reality on the ground is far less forgiving. Provisioning a single cluster is a trivial Day-1 exercise. The true operational nightmare begins on Day 2. Teams that treat multi-cloud fleets like isolated pets inevitably face crushing YAML configuration drift, runaway AWS bills, and severe scaling bottlenecks.

Morgan Perry
Co-founder
AI
Compliance
Healthtech
 minutes
Agentic AI infrastructure: moving beyond Copilots to autonomous operations

The shift from AI copilots to autonomous agents is redefining infrastructure requirements. Discover how to build secure, stateful, and compliant Agentic AI systems using Kubernetes, sandboxing, and observability while meeting EU AI Act standards

Mélanie Dallé
Senior Marketing Manager
Kubernetes
8
 minutes
The 2026 guide to Kubernetes management: master day-2 ops with agentic control

Effective Kubernetes management in 2026 demands a shift from manual cluster building to intent-based fleet orchestration. By implementing agentic automation on standard EKS, GKE, or AKS clusters, enterprises eliminate operational weight, prevent configuration drift, and proactively control cloud spend without vendor lock-in, enabling effective scaling across massive fleets.

Mélanie Dallé
Senior Marketing Manager
Kubernetes
 minutes
Building a single pane of glass for enterprise Kubernetes fleets

A Kubernetes single pane of glass is a centralized management layer that unifies visibility, access control, cost allocation, and policy enforcement across § cluster in an enterprise fleet for all cloud providers. It replaces the fragmented practice of switching between AWS, GCP, and Azure consoles to govern infrastructure, giving platform teams a single source of truth for multi-cloud Kubernetes operations.

Mélanie Dallé
Senior Marketing Manager
Kubernetes
 minutes
How to deploy a Docker container on Kubernetes (and why manual YAML fails at scale)

Deploying a Docker container on Kubernetes requires building an image, authenticating with a registry, writing YAML deployment manifests, configuring services, and executing kubectl commands. While necessary to understand, executing this manual workflow across thousands of clusters causes severe configuration drift. Enterprise platform teams use agentic platforms to automate the entire deployment lifecycle.

Mélanie Dallé
Senior Marketing Manager
Qovery
Cloud
AWS
Kubernetes
8
 minutes
10 best practices for optimizing Kubernetes on AWS

Optimizing Kubernetes on AWS is less about raw compute and more about surviving Day-2 operations. A standard failure mode occurs when teams scale the control plane while ignoring Amazon VPC IP exhaustion. When the cluster autoscaler triggers, nodes provision but pods fail to schedule due to IP depletion. Effective scaling requires network foresight before compute allocation.

Morgan Perry
Co-founder
Kubernetes
Terraform
 minutes
Managing Kubernetes deployment YAML across multi-cloud enterprise fleets

At enterprise scale, managing provider-specific Kubernetes YAML across multiple clouds creates crippling configuration drift and operational toil. By adopting an agentic Kubernetes management platform, infrastructure teams abstract cloud-specific configurations (like ingress controllers and storage classes) into a single, declarative intent that automatically reconciles across 1,000+ clusters.

Mélanie Dallé
Senior Marketing Manager

It’s time to change
the way you manage K8s

Turn Kubernetes into your strategic advantage with Qovery, automating the heavy lifting while you stay in control.