Get started with Chkk for free today! No credit card required
Learn more
Learn more
Back to the blog
Case Study
March 26, 2025

How a Fortune 500 Enterprise Avoided $500K in EKS Extended Support Fees, Achieved 80% Reduction in Prep Time, and Boosted Upgrade Productivity by 200%

Written by
Chkk Team
X logoLinkedin logo
Start for free
Estimated Reading time
4 min

Challenge

The Platform Engineering team at the Fortune 500 enterprise had a Kubernetes Platform on EKS that supported hundreds of software developers running thousands of applications. These applications were mission-critical because online transactions were a sizeable fraction of their revenue—conservative estimates modeled the cost of downtime at $3.2K per minute ($200K per hour). 

EKS requires at least three Kubernetes upgrades  every year and each add-on requires at least 1-2 upgrades per year. This posed the following challenges for the Platform team:

  • 500% increase in cluster costs – EKS had launched Extended Support, and all clusters in Extended Support incurred a 6x surcharge, which amounted to approximately $500K in additional costs per year.
  • Risk of forced upgrades – Falling behind on upgrade cycles could have resulted in forced upgrades—an extreme and disruptive event that posed a business continuity risk.
  • Complex add-ons had to be upgraded separately and without disruption – To contain the blast radius during cluster upgrades, the Platform team needed to upgrade Contour, Envoy, Prometheus, Nginx, Redis, Kong, and other stateful add-ons separately from the clusters.
  • Extensive coordination required with application teams – The Platform team needed to regularly communicate upgrade readiness with application teams, ensuring that they migrated off removed APIs with each upgrade cycle. This was a multi-step coordination process owned by a dedicated Technical Project Manager (TPM) and involving engineers, Directors, and VPs from both the Platform and application teams.

Solution

The Platform team implemented Chkk’s Operational Safety Platform to simplify upgrade management and ensure that the upgrades were disruption-free.

  • Accelerating Upgrade Process – Chkk automated key upgrade tasks such as dependency analysis, release note processing, and impact assessment across hundreds of add-ons, cutting down research and planning time by up to 8x.
  • Upgrade Copilot & Preverified Plans – Chkk’s Upgrade Copilot automated tedious pre-work and delivered Preverified Upgrade Plans for clusters and add-ons, tested on a digital twin of Dexcom’s infrastructure, ensuring safe, well-orchestrated upgrades. Separate Upgrade Plans for add-ons enabled the team to execute fleetwide upgrades of complex add-ons and soak them for months before performing cluster upgrades.
  • Repeatable Upgrades with Curated Workflows – Chkk standardized workflows and enabled task delegation, reducing reliance on expert knowledge and making complex upgrades repeatable and efficient.
  • Streamlining Communications with Application Teams – Chkk highlighted upgrade risks and considerations that could have caused application disruptions or stalled the upgrade. Each risk came with a Knowledge Base Article (KBA) detailing the risk, its severity and impact, which resources were impacted, and how to mitigate it. Each risk included its own workflows for notification and risk lifecycle management, improving coordination and reducing friction between the Platform and application teams.

"Managing Kubernetes upgrades at this scale was becoming unsustainable, with rising costs, complex add-on dependencies, and constant coordination challenges. Chkk gave us a structured, automated approach that not only kept us ahead of EKS releases but also eliminated nearly $500K in unnecessary costs while ensuring business continuity."VP of Infrastructure

Outcomes

By implementing Chkk, this Fortune 500 enterprise achieved significant operational and financial benefits:

  • Avoided $403K per yr in Extended Support costs, up to $500K per year
  • 200% increase in upgrade productivity, ensuring business, regulatory, and compliance goals were met.
  • 80% reduction in upgrade preparation time, eliminating weeks of manual research and validation.
  • 4 FTEs Repurposed for High-Value Work – With Chkk handling upgrade complexity, skilled engineers were freed from routine, manual tasks and could focus on strategic initiatives.
  • Improved operational efficiency – Platform team could focus on strategic initiatives rather than break-fix efforts.
  • Seamless upgrade communications between platform and application teams – Chkk enabled clear, automated risk communication, reducing friction between the Platform and application teams and ensuring smooth, disruption-free upgrades.

"Before Chkk, upgrades were a painstaking process, requiring constant coordination with application teams and careful sequencing of add-on upgrades. With Chkk, we streamlined workflows, automated risk assessments, and improved communication—allowing us to execute upgrades with confidence and reclaim valuable engineering time."Director, Platform Engineering.

Takeaways

  1. Frequent Kubernetes Releases Demand Proactive Upgrade Management – With EKS accelerating its release cycles, teams must adopt automation to stay ahead of deprecations, avoid forced upgrades, and prevent costly disruptions.
  2. Add-On Complexity Requires an Out-of-Band, Fleetwide Upgrade Strategy – Upgrading Kubernetes is not just about the control plane; mission-critical add-ons like Istio, Cilium, Keycloak, and Kafka require separate upgrade workflows to minimize risk and maintain stability.
  3. Clear Communication Reduces Upgrade Friction – Proactively identifying risks and providing structured, automated communication workflows between the Platform and application teams streamlines upgrades and minimizes disruptions.
  4. Automation Increases Efficiency and Reduces Costs – By automating upgrade planning, dependency analysis, and validation, Chkk helped the team reduce manual effort, repurpose skilled engineers for strategic work, and avoid nearly $500K in Extended Support costs.

Challenge

The Platform Engineering team at the Fortune 500 enterprise had a Kubernetes Platform on EKS that supported hundreds of software developers running thousands of applications. These applications were mission-critical because online transactions were a sizeable fraction of their revenue—conservative estimates modeled the cost of downtime at $3.2K per minute ($200K per hour). 

EKS requires at least three Kubernetes upgrades  every year and each add-on requires at least 1-2 upgrades per year. This posed the following challenges for the Platform team:

  • 500% increase in cluster costs – EKS had launched Extended Support, and all clusters in Extended Support incurred a 6x surcharge, which amounted to approximately $500K in additional costs per year.
  • Risk of forced upgrades – Falling behind on upgrade cycles could have resulted in forced upgrades—an extreme and disruptive event that posed a business continuity risk.
  • Complex add-ons had to be upgraded separately and without disruption – To contain the blast radius during cluster upgrades, the Platform team needed to upgrade Contour, Envoy, Prometheus, Nginx, Redis, Kong, and other stateful add-ons separately from the clusters.
  • Extensive coordination required with application teams – The Platform team needed to regularly communicate upgrade readiness with application teams, ensuring that they migrated off removed APIs with each upgrade cycle. This was a multi-step coordination process owned by a dedicated Technical Project Manager (TPM) and involving engineers, Directors, and VPs from both the Platform and application teams.

Solution

The Platform team implemented Chkk’s Operational Safety Platform to simplify upgrade management and ensure that the upgrades were disruption-free.

  • Accelerating Upgrade Process – Chkk automated key upgrade tasks such as dependency analysis, release note processing, and impact assessment across hundreds of add-ons, cutting down research and planning time by up to 8x.
  • Upgrade Copilot & Preverified Plans – Chkk’s Upgrade Copilot automated tedious pre-work and delivered Preverified Upgrade Plans for clusters and add-ons, tested on a digital twin of Dexcom’s infrastructure, ensuring safe, well-orchestrated upgrades. Separate Upgrade Plans for add-ons enabled the team to execute fleetwide upgrades of complex add-ons and soak them for months before performing cluster upgrades.
  • Repeatable Upgrades with Curated Workflows – Chkk standardized workflows and enabled task delegation, reducing reliance on expert knowledge and making complex upgrades repeatable and efficient.
  • Streamlining Communications with Application Teams – Chkk highlighted upgrade risks and considerations that could have caused application disruptions or stalled the upgrade. Each risk came with a Knowledge Base Article (KBA) detailing the risk, its severity and impact, which resources were impacted, and how to mitigate it. Each risk included its own workflows for notification and risk lifecycle management, improving coordination and reducing friction between the Platform and application teams.

"Managing Kubernetes upgrades at this scale was becoming unsustainable, with rising costs, complex add-on dependencies, and constant coordination challenges. Chkk gave us a structured, automated approach that not only kept us ahead of EKS releases but also eliminated nearly $500K in unnecessary costs while ensuring business continuity."VP of Infrastructure

Outcomes

By implementing Chkk, this Fortune 500 enterprise achieved significant operational and financial benefits:

  • Avoided $403K per yr in Extended Support costs, up to $500K per year
  • 200% increase in upgrade productivity, ensuring business, regulatory, and compliance goals were met.
  • 80% reduction in upgrade preparation time, eliminating weeks of manual research and validation.
  • 4 FTEs Repurposed for High-Value Work – With Chkk handling upgrade complexity, skilled engineers were freed from routine, manual tasks and could focus on strategic initiatives.
  • Improved operational efficiency – Platform team could focus on strategic initiatives rather than break-fix efforts.
  • Seamless upgrade communications between platform and application teams – Chkk enabled clear, automated risk communication, reducing friction between the Platform and application teams and ensuring smooth, disruption-free upgrades.

"Before Chkk, upgrades were a painstaking process, requiring constant coordination with application teams and careful sequencing of add-on upgrades. With Chkk, we streamlined workflows, automated risk assessments, and improved communication—allowing us to execute upgrades with confidence and reclaim valuable engineering time."Director, Platform Engineering.

Takeaways

  1. Frequent Kubernetes Releases Demand Proactive Upgrade Management – With EKS accelerating its release cycles, teams must adopt automation to stay ahead of deprecations, avoid forced upgrades, and prevent costly disruptions.
  2. Add-On Complexity Requires an Out-of-Band, Fleetwide Upgrade Strategy – Upgrading Kubernetes is not just about the control plane; mission-critical add-ons like Istio, Cilium, Keycloak, and Kafka require separate upgrade workflows to minimize risk and maintain stability.
  3. Clear Communication Reduces Upgrade Friction – Proactively identifying risks and providing structured, automated communication workflows between the Platform and application teams streamlines upgrades and minimizes disruptions.
  4. Automation Increases Efficiency and Reduces Costs – By automating upgrade planning, dependency analysis, and validation, Chkk helped the team reduce manual effort, repurpose skilled engineers for strategic work, and avoid nearly $500K in Extended Support costs.
Tags
Case Study

Continue reading

Spotlight

Spotlight: Simplifying Contour Upgrades with Chkk

by
Chkk Team
Read more
Hidden Toil

5 Reasons Why Delaying Open Source Software Upgrades Is a Bad Idea

by
Awais Nemat
Read more
Spotlight

Spotlight: Seamless cert-manager Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Argo Rollouts Upgrades with Chkk

by
Chkk Team
Read more
Upgrade Advisory

Upgrade Advisory: Pods Stuck in Pending During Kubelet v1.30 → v1.31 Upgrade

by
Chkk Team
Read more
Spotlight

Spotlight: Simplifying Self-Managed Apache Kafka Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Seamless Calico Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: NGINX Ingress Controller Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: KEDA Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Streamlining Prometheus Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: RabbitMQ Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Seamless Kyverno Upgrades with Chkk

by
Chkk Team
Read more
News

Google Container Registry Deprecation 2025: How to Migrate to Artifact Registry

by
Chkk Team
Read more
Spotlight

Spotlight: HashiCorp Vault Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Streamlining Crossplane Upgrades with Chkk

by
Chkk Team
Read more
Spotlight

Spotlight: Seamless External DNS Upgrades with Chkk

by
Chkk Team
Read more
Case Study

How Dexcom Derisked GKE Upgrades and Sped Them Up by 5x using Chkk

by
Chkk Team
Read more
Case Study

Assuring Compliance and Availability for Yoti’s On-Prem Platform with Chkk

by
Chkk Team
Read more
Case Study

How a Fortune 500 Enterprise Avoided $500K in EKS Extended Support Fees, Achieved 80% Reduction in Prep Time, and Boosted Upgrade Productivity by 200%

by
Chkk Team
Read more
Case Study

How a Fortune 1000 Enterprise Standardized Multi-Cloud (EKS & GKE) Upgrades for 30+ Add-Ons, Avoided 6x Costs, and Achieved an 80% Reduction in Prep Time

by
Chkk Team
Read more
Spotlight

Spotlight: Upgrading Self-Managed Redis

by
Chkk Team
Read more
Spotlight

Spotlight: Simplifying Self-Managed Elasticsearch Upgrades with Chkk

by
Chkk Team
Read more
News

GKE & EKS Extended Support: Are 6x Fees for Supporting Older Kubernetes Versions Justified?

by
Ali Khayam
Read more
Spotlight

Spotlight: Seamless Karpenter Upgrades with Chkk

by
Chkk Team
Read more
Operational Safety

Forced EKS & GKE Upgrades: How to Manage Business Continuity Risks

by
Fawad Khaliq
Read more
Spotlight

Spotlight: How Chkk Streamlines & Safeguards Cilium Upgrades

by
Chkk Team
Read more
Technology

Kubernetes Admission Controllers and Webhooks Deep Dive

by
Chkk Team
Read more
Spotlight

Chkk Spotlight: Istio

by
Chkk Team
Read more
Technology

Pod Disruption Budgets: Pitfalls, Evictions & Kubernetes Upgrades

by
Chkk Team
Read more
Technology

cgroup v1 to v2 Migration in Kubernetes

by
Chkk Team
Read more
Operational Safety

OpenAI’s Outage: The Complexity and Fragility of Modern AI Infrastructure on Kubernetes

by
Fawad Khaliq
Read more
News

EKS launches Auto Mode… How can you adopt it?

by
Ali Khayam
Read more
Change Safety

CrowdStrike outage was the symptom; missing Operational Safety was the cause

by
Fawad Khaliq
Read more
News

GKE Follows EKS & AKS, Launches Extended Support with a 500% Surcharge for Delayed Upgrade

by
Ali Khayam
Read more
News

AKS Long Term Support and EKS Extended Support: Similarities & Differences

by
Ali Khayam
Read more
News

Amazon launches EKS extended support… How does it impact you?

by
Ali Khayam
Read more
Platform Engineering

Platform teams need a delightfully different approach, not one that sucks less

by
Fawad Khaliq
Read more
Technology

Kubernetes Enters Its Second Decade: Insights from KubeCon Chicago

by
Fawad Khaliq
Read more
Company

Launching Chkk Operational Safety Platform

by
Awais Nemat
Read more
Technology

What Makes Kubernetes Upgrades So Challenging?

by
Fawad Khaliq
Read more
Company

4 Lessons from our SOC2 Journey

by
Fawad Khaliq
Read more
Technology

Collective Learning: The Power of Not Repeating Others’ Mistakes

by
Ali Khayam
Read more
Technology

From Fighting Fires to Availability Assurance

by
Fawad Khaliq
Read more
Company

Welcome to Chkk

by
Awais Nemat
Read more