Using Self-Managed Amazon Elastic Kubernetes Service Like It’s AWS Fargate

Containers Cake

Apologize” a Baby Lullaby version of the original One Republic song by Twinkle Twinkle Little Rockstar🎵

I once met someone who was surprised that I told them to go use AWS Fargate, and I would cheer them on.

I don’t like every egg in the same basket.

But I don’t like a lot of eggs in a ton of baskets either.

I’ve been re-looking into container solutions outside of Kubernetes – it’s easy to get silo-ed into one particular type of architecture and think “This is the best, ever!” It is good to question your assumptions every once in a while. Stay architecturally humble.

It is, perhaps, easy when one drafts a job to be seen as a person who will always defend that particular architecture – My job is currently “Director Of Engineering for Kubernetes” and thus, despite having lived and breathed in other tools I can completely understand why it may seem like I have drunk the CNCF koolaid.

However, I truly believe teams should build on (1) what works for their workload (2) within the context of the larger businesses they sit under (3) and determined by what resources are readily available to them.

That can go multiple directions depending on where cost savings and efficiency really are to a person – It is with dark acceptance that architecture often only plays a small fraction of that choice.

AWS Fargate is an abstracted layer on top of Amazon ECS (Elastic Container Service) or Amazon EKS (Elastic Kubernetes Service) – it is ideal if teams want to spend money per vCPU, memory, and storage and do not have a dedicated resourcing group who can manage updating Kubernetes for them. It is especially great for small teams. If not being on the “latest” instance type, price-perf, and full customizability & visibility into resources doesn’t bother them, then it can get a lot done. However it won’t fit every size workload and is not the same as building on an abstracted layer on top of EC2 where teams can still see their servers.

It is good if teams have a handful of pods or tasks they need to run in a limited duration time window because they won’t be charged for utilization that they don’t need – similar to AWS Lambda, which is why container support for Lambda makes this conversation exceptionally confusing when looking at tasks that last less than 15 minutes (The function timeout of AWS Lambda). The time factor for a Fargate task is more of a cost and orchestration problem as we’ll get to later.

Moving on…the summary of what you get with Fargate if you like Kubernetes but want something easier: Hyper-abstracted Kubernetes-like experience with not a lot of compute customization or host level visibility because the architect is trading the management, configuration, and scaling of the underlying infrastructure for simplicity to pay a bit more in cost.

And therein is the issue.

The moment you want to do anything else. It isn’t a great fit. It can be really expensive at scale – even more so than hiring a team to manage Kubernetes.

Why & When Companies Use Kubernetes

Kubernetes is a solution for enterprise companies. I’ve seen startups talk about using it, and if I decided to do that a 2nd time, I would absolutely not use Kubernetes if I had a <$2M seed round.

Kubernetes is best when used at massive scale – when centralizing more than 20+ mircoservices across over 50+ nodes minimum (ideally, hundreds of hosts). It is absolutely HORRIBLE for “baby clusters” with only a few containers or pods because the cost proposition really is not there.

What is a Baby Cluster? A baby cluster is clusters that are smaller than 10 hosts where auxiliary workloads take up disproportionately appropriate resources than internal customer application workloads. Just like real babies they need a lot of attention, they alert a lot just like any grown up cluster, and still require you to care about them. When companies inspect more closely the operational costs of maintaining clusters, auxiliary services, and internal customer applications by factors of people as “time is money, friend” – K8s is best when you’ve got a bunch of containers, pods, and applications in only a handful of clusters instead of a bunch of containers, pods, and applications across 50+ clusters that are all very small and snow-flakey in their underlying host compute choices and configuration of add-ons.

Is Kubernetes okay for spiky workloads? It is “okay” for teams that have spikey workloads – but only if teams have built tooling to scale down and spin back up nodes and are comfortable with waiting for scale up. This isn’t a Kubernetes problem – developers have the same issues in EC2 with any auto-scaling workload if they scaled down to 0 worker nodes – and they’d still need some system nodes in Kubernetes.

Will I see more disruption in Kubernetes? Not handling graceful termination of a host at the application level is an issue no matter where an application sits. Neither Kubernetes nor Fargate are panaceas for not following best practices around application designs. In Kubernetes there are specific concepts for handling termination and avoiding disruption (For example, pod disruption budgets), but even still there are workloads, that regardless of EC2, Kubernetes, or Fargate need a plan for stateful applications or long-lived sessions in the event a host MUST die. Hosts do not live forever for security reasons.

Monitoring & Observability

I see people drawn to Kubernetes for more control over the visibility into their servers and applications while still having orchestration and containers. They want it all – they want to use their 3rd party security tooling, 3rd party performance monitoring tools, 3rd party cost tools, and tooling to inspect network traffic across services, pods, and nodes with granular accuracy.

They don’t just want to monitor the service – they want to monitor specific aspects of Linux too.

Cost

They also want to benefit from Instance Saving Plans over being forced into Compute Savings Plans because they want to save more money (72% vs 66%). They care a lot about price-performance of instance types against their workloads, but not to a point where they upgrade more than once a year their compute type.

This is why if a team runs a Kubernetes cluster with a bunch of people, that team has to help make those decisions for them and understand when exactly they should be made. The cost problem is further exacerbated if companies are managing too many clusters with too many workload types and too many opinions on which instance type each specific application should be on – you lose in the customization debate due to time.

Where Kubernetes Goes South: Using it Like Fargate Without Giving Up the Infrastructure Part

Larger enterprises can see the cost benefits of using Kubernetes over Fargate but may at first adopt Kubernetes like it was Fargate which brings me absolute maniacal joy because it is a culture challenge, not an architecture challenge. 🙂

  • Companies may operate in silo-ed teams and do not centralize or share their applications. They struggle to multi-tenant because of billing & reporting requirements, performance / noisy neighbor fears, or see security as binary instead of a spectrum of choices. This results in increased cost and less efficiency.
  • Companies may want to upgrade their instance types at a speed that prevents them from using Instance Savings Plans (faster than 1 year needing Compute Savings Plans) or at a speed that prevents them from seeing price-perf (slower than 3 years). They may do this across a sea of users who cannot update all at the same time, across too many clusters, accounts, and node pools to reasonably consult on in this topic per workload.
  • Companies may not have centralized choices around observability and monitoring, resulting in duplication of use cases and increased costs of their monitoring portfolio. While some eggs in different baskets is good for evaluation, too many eggs, or duplication of use cases can also be expensive.
  • Application teams may run only 2-3 services for their primary business case in an entire cluster while an infrastructure team is running 5-10 auxiliary services to monitor everything about those 3 services and the hosts they run on for cost, security, and performance…ironically resulting in increased costs that multi-tenancy with other application teams would solve.

Kubernetes has this reputation of being complex – and sure, it is on its own, but so is Fargate…and the other 912384792837489 container services AWS, GCP, and Azure all have that we could learn if time wasn’t also money.

Amazon EKS charges per cluster hour in addition to what AWS charges for what teams are paying for the hosts underneath it, and, if teams don’t update their cluster, EKS charges an additional price for passing end of life – a fee that is worse the more clusters teams own. It isn’t nearly as bad as using Fargate which charges per vCPU hour and per GB per hour with 293749827349732 applications in cost at enterprise scale.

But is it okay to use Fargate? Where it makes sense? Yes.

Just make sure you’re wanting to use Fargate as Fargate and not wanting to use Fargate to avoid the hard problems of trying to save money at scale.

It’s clear to me that what is hardest, and most complex, isn’t the tools at all. They do their jobs well, independently.

But it’s exceptionally hard to do big business well independently so that we can all have the cake and eat it too.

Header Image by Henley Design Studio from Unsplash.