We Build The Future Of The Cloud ⚡️



This video has been recorded during our retreat in the south of France 🇫🇷 in Feb 2022.
Join us! We are hiring...

Suggested articles

The part teams consistently underestimate is that OPA Gatekeeper, the tool most people reach for first, only enforces policy at the cluster level. It blocks non-compliant resources from being created within a single cluster. Propagating consistent Gatekeeper policies across 300 clusters, and detecting when those policies drift, is a fleet orchestration problem that Gatekeeper was not designed to solve.


Training a model in a notebook is easy. What breaks teams is the step after, serving it reliably without haemorrhaging cloud budget or burying your SREs in YAML. The common trap: picking a platform that handles the model but not the surrounding stack. An AI deployment platform should orchestrate the full application graph (inference endpoints, vector databases, caching layers, and frontends) inside a single VPC, with GPU autoscaling that doesn't require a dedicated platform engineer to babysit.


The mistake teams make early is assuming Kubernetes namespaces provide sufficient isolation between workloads or teams. They do not. Namespaces share the control plane, the node pool, and the underlying network fabric. A misconfigured workload in one namespace can exhaust node capacity or crash the API server for every other namespace simultaneously. That is when the multi-cluster conversation starts.


Migrating from nginx Ingress to Envoy Gateway? Discover how Alan migrated 100+ services in one month, the technical hurdles they faced (like Content-Length normalization), and why staging isn't always enough.


Kubernetes assigns an entire physical GPU to a single pod by default. NVIDIA MIG solves the hardware partitioning side: one A100 becomes up to seven isolated slices. The part teams underestimate is the orchestration layer: device plugin configuration, node labeling, taints, and pod affinity rules all need to be correct before Kubernetes can actually schedule onto those slices.


The cluster coming up is the easy part. What catches teams off guard is what happens six months later: certificates expire without a single alert, node pools run at 40% over-provisioned because nobody revisited the initial resource requests, and a manual kubectl patch applied during a 2am incident is now permanent state. Agentic control planes enforce declared state continuously. Monitoring tools just report the problem.


The instinct when setting up Kubernetes observability is to instrument everything and send it all to your APM vendor. That works fine at ten nodes. At a hundred, the bill becomes a board-level conversation. The less obvious problem is the fix most teams reach for: aggressive sampling. That is how intermittent failures affecting 1% of requests disappear from your monitoring entirely.


Scaling your deployments to zero is only half the battle. If your cluster autoscaler does not aggressively bin-pack and terminate the underlying worker nodes, you are still paying for idle metal. True environment sleeping requires tight integration between your ingress layer and your node provisioner to actually realize FinOps savings.

It’s time to change the way you manage K8s
Turn Kubernetes into your strategic advantage with Qovery, automating the heavy lifting while you stay in control.


.webp)