Get started with Chkk for free today! No credit card required
Learn more
Learn more
Back to the blog
Case Study
March 26, 2025

How a Fortune 1000 Enterprise Standardized Multi-Cloud (EKS & GKE) Upgrades for 30+ Add-Ons, Avoided 6x Costs, and Achieved an 80% Reduction in Prep Time

Written by
Chkk Team
X logoLinkedin logo
Start for free
Estimated Reading time
4 min

Challenge

The Platform Engineering team at Fortune 1000 Enterprise had built a Platform on EKS and GKE, supporting hundreds of software developers running thousands of applications. These applications were mission-critical, as online transactions constituted a sizeable fraction of their revenue. The platform ran 30+ complex, open-source add-ons, including Istio/Envoy for Service Mesh, Cilium for Networking & Security, HashiCorp Vault, Gloo, Redis, and Postgres.

GKE and EKS require at least three Kubernetes upgrades every year and each add-on requires at least 1-2 upgrades per year. This continuous upgrade treadmill created several challenges for the Platform team:

  • 500% increase in cluster costs – GKE and EKS had launched Extended Support, and all clusters not upgraded in time incurred a 6x surcharge.
  • Risk of forced upgrades – Falling behind on upgrade cycles could have resulted in forced upgrades—an extreme and disruptive event that posed a business continuity risk.
  • Complex add-ons had to be upgraded separately and without application disruption – To contain the blast radius during cluster upgrades, the Platform team needed to upgrade Istio, Cilium, Vault and other add-ons separately from the clusters. The team also had to ensure zero downtime during each upgrade cycle.
  • Standardization of a multi-cloud upgrade strategy – GKE and EKS upgrades had diverged into snowflakes, with different team members acting as Subject Matter Experts (SMEs) for each cloud. The Platform team sought to standardize the upgrade process and eliminate dependence on SMEs.

Solution

The Platform Team implemented Chkk’s Operational Safety Platform to simplify upgrade management and ensure that upgrades were disruption-free.

  • Accelerating Upgrade Process – Chkk automated key upgrade tasks such as dependency analysis, release note processing, and impact assessment across hundreds of add-ons, cutting down research and planning time by up to 8x.
  • Upgrade Copilot & Preverified Plans – Chkk’s Upgrade Copilot automated tedious pre-work and delivered Preverified Upgrade Plans for clusters and add-ons, tested on a digital twin of Dexcom’s infrastructure, ensuring safe, well-orchestrated upgrades. Separate Upgrade Plans for add-ons enabled the team to execute fleetwide upgrades of complex add-ons and soak them for months before performing cluster upgrades.
  • Repeatable Upgrades with Curated Workflows – Chkk standardized workflows and enabled task delegation, reducing reliance on expert knowledge and making complex upgrades repeatable and efficient.
  • A Standardized, Single Pane-of-Glass for All Upgrades – Chkk Upgrade Copilot became a single pane-of-glass for EKS and GKE upgrades. Customizations in Upgrade Plans ensured that expert knowledge was codified into the Upgrade Plans and carried over for future upgrades. All past upgrades were also available as a system of record, ensuring posterity as upgrade decision-making and collaboration were accessible to all team members.
  • Conformance to Operational Safety Guardrails – Platform team used Chkk’s Guardrails to update hundreds of Helm charts owned by application teams, ensuring conformance to safety primitives at the source of their software development lifecycle.

"Chkk has transformed our Kubernetes upgrade strategy, eliminating costly delays and manual overhead while ensuring a standard multi-cloud upgrade process that works across EKS and GKE. With Chkk, we’ve not only reduced upgrade costs but also improved business continuity by proactively preventing forced upgrades and disruptions."Director of Infrastructure

Outcomes

By implementing Chkk, this Fortune 500 Platform achieved significant operational and financial benefits:

  • Avoided 6x cost increase in Extended Support costs. 
  • 200% increase in upgrade productivity, ensuring business, regulatory, and compliance goals were met.
  • 80% reduction in upgrade preparation time, eliminating weeks of manual research and validation.
  • 2 FTEs Repurposed for High-Value Work – With Chkk handling upgrade complexity, skilled engineers were freed from routine, manual tasks and could focus on strategic initiatives.
  • Improved operational efficiency – Platform team could focus on strategic initiatives rather than break-fix efforts.
  • Eliminated Upgrade Bottlenecks and Knowledge Silos – Chkk enabled multiple team members to take ownership of complex add-on upgrades, breaking reliance on a handful of experts and allowing work to be parallelized.
  • Standardization of Operational Safety Guardrails – Chkk’s Guardrails were used to define a “conformance standard” that all application teams adopted, making safety a key primitive throughout the software development lifecycle.

"Before Chkk, I was the bottleneck for every complex add-on upgrade, stuck in a cycle of manual work and firefighting. With Chkk’s workflows, I could finally delegate upgrades to other team members who were eager to dive into the challenges. We parallelized upgrades across the team, making the process faster, more efficient, and no longer dependent on a few experts."Principal Platform Engineer.

Takeaways

  1. Frequent Kubernetes Releases Demand Proactive Upgrade Management – With EKS accelerating its release cycles, teams must adopt automation to stay ahead of deprecations, avoid forced upgrades, and prevent costly disruptions.
  2. Add-On Complexity Requires an Out-of-Band, Fleetwide Upgrade Strategy – Upgrading Kubernetes is not just about the control plane; mission-critical add-ons like Istio, Cilium, Keycloak, and Kafka require separate upgrade workflows to minimize risk and maintain stability.
  3. Breaking Bottlenecks Enables Scale and Efficiency – Chkk empowered more team members to take on complex upgrades, eliminating reliance on a few experts and allowing work to be parallelized for faster execution.
  4. Automation Turns Upgrades from a Burden into a Competitive Advantage – By streamlining upgrades, reducing manual effort, and ensuring smooth transitions, Chkk helped the team focus on innovation rather than firefighting infrastructure changes.

Challenge

The Platform Engineering team at Fortune 1000 Enterprise had built a Platform on EKS and GKE, supporting hundreds of software developers running thousands of applications. These applications were mission-critical, as online transactions constituted a sizeable fraction of their revenue. The platform ran 30+ complex, open-source add-ons, including Istio/Envoy for Service Mesh, Cilium for Networking & Security, HashiCorp Vault, Gloo, Redis, and Postgres.

GKE and EKS require at least three Kubernetes upgrades every year and each add-on requires at least 1-2 upgrades per year. This continuous upgrade treadmill created several challenges for the Platform team:

  • 500% increase in cluster costs – GKE and EKS had launched Extended Support, and all clusters not upgraded in time incurred a 6x surcharge.
  • Risk of forced upgrades – Falling behind on upgrade cycles could have resulted in forced upgrades—an extreme and disruptive event that posed a business continuity risk.
  • Complex add-ons had to be upgraded separately and without application disruption – To contain the blast radius during cluster upgrades, the Platform team needed to upgrade Istio, Cilium, Vault and other add-ons separately from the clusters. The team also had to ensure zero downtime during each upgrade cycle.
  • Standardization of a multi-cloud upgrade strategy – GKE and EKS upgrades had diverged into snowflakes, with different team members acting as Subject Matter Experts (SMEs) for each cloud. The Platform team sought to standardize the upgrade process and eliminate dependence on SMEs.

Solution

The Platform Team implemented Chkk’s Operational Safety Platform to simplify upgrade management and ensure that upgrades were disruption-free.

  • Accelerating Upgrade Process – Chkk automated key upgrade tasks such as dependency analysis, release note processing, and impact assessment across hundreds of add-ons, cutting down research and planning time by up to 8x.
  • Upgrade Copilot & Preverified Plans – Chkk’s Upgrade Copilot automated tedious pre-work and delivered Preverified Upgrade Plans for clusters and add-ons, tested on a digital twin of Dexcom’s infrastructure, ensuring safe, well-orchestrated upgrades. Separate Upgrade Plans for add-ons enabled the team to execute fleetwide upgrades of complex add-ons and soak them for months before performing cluster upgrades.
  • Repeatable Upgrades with Curated Workflows – Chkk standardized workflows and enabled task delegation, reducing reliance on expert knowledge and making complex upgrades repeatable and efficient.
  • A Standardized, Single Pane-of-Glass for All Upgrades – Chkk Upgrade Copilot became a single pane-of-glass for EKS and GKE upgrades. Customizations in Upgrade Plans ensured that expert knowledge was codified into the Upgrade Plans and carried over for future upgrades. All past upgrades were also available as a system of record, ensuring posterity as upgrade decision-making and collaboration were accessible to all team members.
  • Conformance to Operational Safety Guardrails – Platform team used Chkk’s Guardrails to update hundreds of Helm charts owned by application teams, ensuring conformance to safety primitives at the source of their software development lifecycle.

"Chkk has transformed our Kubernetes upgrade strategy, eliminating costly delays and manual overhead while ensuring a standard multi-cloud upgrade process that works across EKS and GKE. With Chkk, we’ve not only reduced upgrade costs but also improved business continuity by proactively preventing forced upgrades and disruptions."Director of Infrastructure

Outcomes

By implementing Chkk, this Fortune 500 Platform achieved significant operational and financial benefits:

  • Avoided 6x cost increase in Extended Support costs. 
  • 200% increase in upgrade productivity, ensuring business, regulatory, and compliance goals were met.
  • 80% reduction in upgrade preparation time, eliminating weeks of manual research and validation.
  • 2 FTEs Repurposed for High-Value Work – With Chkk handling upgrade complexity, skilled engineers were freed from routine, manual tasks and could focus on strategic initiatives.
  • Improved operational efficiency – Platform team could focus on strategic initiatives rather than break-fix efforts.
  • Eliminated Upgrade Bottlenecks and Knowledge Silos – Chkk enabled multiple team members to take ownership of complex add-on upgrades, breaking reliance on a handful of experts and allowing work to be parallelized.
  • Standardization of Operational Safety Guardrails – Chkk’s Guardrails were used to define a “conformance standard” that all application teams adopted, making safety a key primitive throughout the software development lifecycle.

"Before Chkk, I was the bottleneck for every complex add-on upgrade, stuck in a cycle of manual work and firefighting. With Chkk’s workflows, I could finally delegate upgrades to other team members who were eager to dive into the challenges. We parallelized upgrades across the team, making the process faster, more efficient, and no longer dependent on a few experts."Principal Platform Engineer.

Takeaways

  1. Frequent Kubernetes Releases Demand Proactive Upgrade Management – With EKS accelerating its release cycles, teams must adopt automation to stay ahead of deprecations, avoid forced upgrades, and prevent costly disruptions.
  2. Add-On Complexity Requires an Out-of-Band, Fleetwide Upgrade Strategy – Upgrading Kubernetes is not just about the control plane; mission-critical add-ons like Istio, Cilium, Keycloak, and Kafka require separate upgrade workflows to minimize risk and maintain stability.
  3. Breaking Bottlenecks Enables Scale and Efficiency – Chkk empowered more team members to take on complex upgrades, eliminating reliance on a few experts and allowing work to be parallelized for faster execution.
  4. Automation Turns Upgrades from a Burden into a Competitive Advantage – By streamlining upgrades, reducing manual effort, and ensuring smooth transitions, Chkk helped the team focus on innovation rather than firefighting infrastructure changes.
Tags
Case Study

Continue reading

Spotlight

Spotlight: Simplifying Contour Upgrades with Chkk

by
Chkk Team
Read more
Hidden Toil

5 Reasons Why Delaying Open Source Software Upgrades Is a Bad Idea

by
Awais Nemat
Read more
Spotlight

Spotlight: Seamless cert-manager Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Argo Rollouts Upgrades with Chkk

by
Chkk Team
Read more
Upgrade Advisory

Upgrade Advisory: Pods Stuck in Pending During Kubelet v1.30 → v1.31 Upgrade

by
Chkk Team
Read more
Spotlight

Spotlight: Simplifying Self-Managed Apache Kafka Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Seamless Calico Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: NGINX Ingress Controller Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: KEDA Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Streamlining Prometheus Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: RabbitMQ Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Seamless Kyverno Upgrades with Chkk

by
Chkk Team
Read more
News

Google Container Registry Deprecation 2025: How to Migrate to Artifact Registry

by
Chkk Team
Read more
Spotlight

Spotlight: HashiCorp Vault Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Streamlining Crossplane Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Seamless External DNS Upgrades with Chkk

by
Chkk Team
Read more
Case Study

How Dexcom Derisked GKE Upgrades and Sped Them Up by 5x using Chkk

by
Chkk Team
Read more
Case Study

Assuring Compliance and Availability for Yoti’s On-Prem Platform with Chkk

by
Chkk Team
Read more
Case Study

How a Fortune 500 Enterprise Avoided $500K in EKS Extended Support Fees, Achieved 80% Reduction in Prep Time, and Boosted Upgrade Productivity by 200%

by
Chkk Team
Read more
Case Study

How a Fortune 1000 Enterprise Standardized Multi-Cloud (EKS & GKE) Upgrades for 30+ Add-Ons, Avoided 6x Costs, and Achieved an 80% Reduction in Prep Time

by
Chkk Team
Read more
Spotlight

Spotlight: Upgrading Self-Managed Redis

by
Chkk Team
Read more
Spotlight

Spotlight: Simplifying Self-Managed Elasticsearch Upgrades with Chkk

by
Chkk Team
Read more
News

GKE & EKS Extended Support: Are 6x Fees for Supporting Older Kubernetes Versions Justified?

by
Ali Khayam
Read more
Spotlight

Spotlight: Seamless Karpenter Upgrades with Chkk

by
Chkk Team
Read more
Operational Safety

Forced EKS & GKE Upgrades: How to Manage Business Continuity Risks

by
Fawad Khaliq
Read more
Spotlight

Spotlight: How Chkk Streamlines & Safeguards Cilium Upgrades

by
Chkk Team
Read more
Technology

Kubernetes Admission Controllers and Webhooks Deep Dive

by
Chkk Team
Read more
Spotlight

Chkk Spotlight: Istio

by
Chkk Team
Read more
Technology

Pod Disruption Budgets: Pitfalls, Evictions & Kubernetes Upgrades

by
Chkk Team
Read more
Technology

cgroup v1 to v2 Migration in Kubernetes

by
Chkk Team
Read more
Operational Safety

OpenAI’s Outage: The Complexity and Fragility of Modern AI Infrastructure on Kubernetes

by
Fawad Khaliq
Read more
News

EKS launches Auto Mode… How can you adopt it?

by
Ali Khayam
Read more
Change Safety

CrowdStrike outage was the symptom; missing Operational Safety was the cause

by
Fawad Khaliq
Read more
News

GKE Follows EKS & AKS, Launches Extended Support with a 500% Surcharge for Delayed Upgrade

by
Ali Khayam
Read more
News

AKS Long Term Support and EKS Extended Support: Similarities & Differences

by
Ali Khayam
Read more
News

Amazon launches EKS extended support… How does it impact you?

by
Ali Khayam
Read more
Platform Engineering

Platform teams need a delightfully different approach, not one that sucks less

by
Fawad Khaliq
Read more
Technology

Kubernetes Enters Its Second Decade: Insights from KubeCon Chicago

by
Fawad Khaliq
Read more
Company

Launching Chkk Operational Safety Platform

by
Awais Nemat
Read more
Technology

What Makes Kubernetes Upgrades So Challenging?

by
Fawad Khaliq
Read more
Company

4 Lessons from our SOC2 Journey

by
Fawad Khaliq
Read more
Technology

Collective Learning: The Power of Not Repeating Others’ Mistakes

by
Ali Khayam
Read more
Technology

From Fighting Fires to Availability Assurance

by
Fawad Khaliq
Read more
Company

Welcome to Chkk

by
Awais Nemat
Read more