Jean-Gabriel Gill-Couture·LinkedIn
TIL: publishNotReadyAddresses — the last hurdle to get our NATS SuperCluster online across 3 OKD sites
A classic chicken-and-egg clustering problem: Jetstream requires quorum across 3 sites before it becomes healthy, but Kubernetes won't route traffic to unhealthy pods. The fix was a single Kubernetes Service flag that opens traffic to pods before they're ready.
Read on LinkedIn →Jean-Gabriel Gill-Couture·LinkedIn
Why hyperscalers are thinking inside the box — and we achieve a PUE of 0.1
Andrew Ng is right about hyperscalers achieving PUE of 1.1–1.2. But we deploy compute within existing buildings as heating appliances, achieving PUE of 0.1. The energy used to heat a home equals the compute needed to host all household workloads.
Read on LinkedIn →Jean-Gabriel Gill-Couture·LinkedIn
Nobody supported RTX 5090 on OKD 4.20? We did it — and contributed back to NVIDIA
When a client needed GPU support on OKD 4.20 (CentOS Stream CoreOS 10), the official NVIDIA operator had open tickets and no support for our versions. We rolled up our sleeves, made it work, and contributed two open-source fixes — including one directly to NVIDIA.
Read on LinkedIn →Sylvain Tremblay·LinkedIn
How a 'Healthy' Router pod silently broke 33% of production traffic
All Router pods reported Healthy, but 33% of requests returned 502. A NotReady node meant one router stopped receiving config updates — yet external HAProxy kept sending it traffic. The fix: a custom Rust DaemonSet watchdog that validates the Control Plane ↔ Data Plane link.
Read on LinkedIn →Sylvain Tremblay·LinkedIn · 144 reactions · 97 comments · 20k+ impressions
The x86-64-v3 wall: virtualization won't save your old hardware
Tried deploying OKD 4.21 on HP Gen8 servers with Ivy Bridge CPUs. SCOS10 (RHEL 10 baseline) mandates AVX2 and FMA — which Ivy Bridge simply doesn't have. KVM is a hardware accelerator, not an emulator. If your CPU predates Haswell (2013), you've hit the v3 wall.
Read on LinkedIn →Sylvain Tremblay·LinkedIn
When every test costs money, you test less — a cloud billing story
A client had idle physical machines and a recurring feeling his cloud bill didn't match the value received. We built him dual-firewall, fully HA private infrastructure on second-hand hardware using OPNSense, OKD, and Ceph. Result: variable billing dropped to zero, and his team now deploys entire stacks in three clicks.
Read on LinkedIn →Sylvain Tremblay·LinkedIn Article
Aventure matinale dans le bare metal — remplacer un serveur dans un MCD sans interruption de service
Des serveurs chauffent ma maison. Voici comment j'ai remplacé un serveur défectueux d'un cluster Kubernetes (OKD) hébergeant un cluster CEPH de 248 TiB — en ~2h, sans aucune interruption de service. Un récit détaillé des étapes physiques et logicielles.
Read on LinkedIn →