Our migration from Kubernetes built-in NLB to ALB controller



Key Points:
- The maintenance dead-end: The built-in Kubernetes NLB integration is legacy code. AWS does not actively maintain it, leading to unresolved bugs and orphaned infrastructure.
- Feature limitations: The in-tree controller cannot handle modern networking requirements like PROXY protocol IP preservation or fine-grained target group attributes.
- Migration hazards: Switching controllers provisions an entirely new load balancer with a new DNS name. Managing this DNS crossover without dropping traffic requires strict routing governance.
Working with Kubernetes Services is convenient, especially when you can deploy Load Balancers via cloud providers simply by declaring type: LoadBalancer.
At Qovery, our orchestration engine initially relied on the Kubernetes built-in Network Load Balancer (NLB). It seemed like the rational choice for maintaining cloud-agnostic deployments without adding extra dependencies.
The reality of Day-2 operations proved otherwise. We were forced to migrate to the AWS Load Balancer Controller (ALB Controller) to simplify management, stop billing leaks, and gain access to necessary routing features. If you are operating Amazon EKS clusters in production, moving to the out-of-tree controller from day one is non-negotiable.
The 1,000-cluster reality: why in-tree controllers fail at scale
Relying on the default Kubernetes load balancer works perfectly in a local development cluster. At an enterprise scale of thousands of clusters, relying on legacy in-tree cloud providers creates a massive financial and operational liability. An orphaned load balancer on a single cluster is an annoyance.
Across a fleet of hundreds of Amazon EKS clusters, orphaned load balancers generate thousands of dollars in cloud waste every month. Resolving this requires migrating to the AWS Load Balancer Controller and utilizing an Agentic Kubernetes Management Platform to enforce strict, standardized ingress configurations globally.
Why did we start with the in-tree NLB controller
For our customers and many platform engineers, the built-in NLB is the default choice because it ships natively with Kubernetes.
- Kubernetes native: It uses native objects, reducing the need for deep AWS-specific knowledge.
- Cloud-agnostic intent: It theoretically makes it easier to migrate to other cloud providers without rewriting complex ingress manifests. As a platform managing multi-cloud deployments, we must maintain transparency for our customers.
- Low initial overhead: It requires zero additional Helm charts or IAM roles to install.
The operational cost of legacy code
Migration to the ALB Controller came four years after we initially adopted the built-in NLB. We survived without it for a long time, but the technical debt eventually compounded into critical failures.
We began facing severe infrastructure leaks. When a developer deleted an environment, the Kubernetes Service was removed, but the underlying AWS Network Load Balancer was not cleaned up correctly. AWS support confirmed they were no longer prioritizing fixes for the in-tree load balancer code, directing everyone to use their out-of-tree AWS Load Balancer Controller instead.
When you use the Kubernetes built-in NLB, you are entirely on your own. We had to manually instrument our Rust-based Qovery Engine to hunt down and delete orphaned AWS resources via the AWS API to enforce Kubernetes cost optimization.
// fix for NLB not properly removed by the legacy in-tree controller
pub fn clean_up_deleted_k8s_nlb(
event_details: EventDetails,
target: &DeploymentTarget,
) -> Result<(), Box<EngineError>> {
// custom logic to force-delete orphaned AWS Load Balancers
// to prevent massive cloud billing leaks
}Feature gaps forced the migration
Beyond the bugs, we needed to leverage advanced AWS networking features that the built-in controller simply ignores. Moving to the AWS Load Balancer Controller provided access to critical annotations:
- PROXY protocol support:
service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*". This annotation preserves the client source IP address, which is mandatory for strict security auditing and rate limiting. - Direct pod routing:
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip". This bypasses kube-proxy and routes traffic directly to the pod IP addresses, reducing network hops and lowering latency. - Target group attributes:
service.beta.kubernetes.io/aws-load-balancer-target-group-attributes. This allows fine-tuned control over the AWS target groups, such as enabling deregistration delay or sticky sessions directly from the Kubernetes manifest.
apiVersion: v1
kind: Service
metadata:
name: api-gateway
namespace: production
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
type: LoadBalancer
selector:
app: api-gateway
ports:
- port: 443
targetPort: 8443The deployment hazard you must anticipate
When you migrate an existing Service from the in-tree controller to the AWS Load Balancer Controller, things will break if you are not careful.
The biggest failure point is DNS routing. The new controller provisions an entirely new load balancer with a completely new AWS DNS name. If you simply update your Service annotations on a live production deployment, Kubernetes will detach the old load balancer and spin up the new one. Because your external DNS (like Route53 or Cloudflare) still points to the old load balancer name, you will drop 100% of your incoming traffic while you wait for the new DNS records to propagate.
You must provision the new Service alongside the old one, update your DNS CNAME records, wait out the TTL expiration, and only then decommission the legacy Service.
🚀 Real-world proof
Hyperline wanted to accelerate their time to market and avoid the overhead of building custom DevOps pipelines for developer testing.
⭐ The result: Eliminated the need for a dedicated DevOps engineer, saving significant costs and improving deployment confidence through automated ephemeral environments. Read the Hyperline case study.
Intent-based ingress with Qovery
Installing the AWS Load Balancer Controller requires configuring strict AWS IAM roles for Service Accounts (IRSA), deploying the Helm chart, and managing webhook certificates. Doing this manually across thousands of clusters introduces massive configuration drift.
Qovery abstracts this complexity. As an Agentic Kubernetes Management Platform, Qovery natively handles the AWS Load Balancer Controller lifecycle across your Amazon EKS fleet.
# .qovery.yml
application:
api-gateway:
build_mode: docker
ports:
- internal_port: 8443
publicly_accessible: true
routing_type: custom_domain
Instead of fighting raw Kubernetes annotations and Terraform state files, platform teams declare their routing intent.
Qovery provisions the correct load balancers, attaches the target groups, and configures the networking automatically. This eliminates cost leaks from orphaned resources and ensures your ingress layer is permanently maintained.
FAQs
Why did AWS stop maintaining the in-tree Kubernetes load balancer?
The Kubernetes community mandated moving all cloud-specific provider code out of the core Kubernetes repository to reduce bloat and separate release cycles. AWS shifted all development focus to the out-of-tree AWS Load Balancer Controller, leaving the built-in controller as legacy code that receives no new features or non-critical bug fixes.
What happens when you delete an in-tree LoadBalancer Service on Amazon EKS?
Due to unpatched bugs in the legacy in-tree controller, deleting the Kubernetes Service frequently fails to trigger the deletion of the corresponding AWS Network Load Balancer. This leaves orphaned load balancers running in your AWS account, quietly consuming your cloud budget until you manually audit and delete them via the AWS console.
How do I migrate to the AWS Load Balancer Controller without downtime?
Migrating a Service to the new controller provisions a completely new AWS load balancer with a different DNS name. To avoid downtime, you must deploy the new Service alongside the old one, update your DNS CNAME records to point to the new load balancer, wait for the DNS TTL to expire globally, and then delete the legacy Service.

Suggested articles
.webp)












