“Planet Earth II Suite” by Hans Zimmer, Jacob Shea, Jasha Klebe 🎵
This week I saw a convention that reminded me of the “dress” (if you recall the 2015 viral meme). You may have seen the dress as blue/black in coloring while others saw the picture of the dress as white/gold. Someone has to decide if it’s appropriate for a wedding. One of us has to lean in and wear the convention so we can even still go. We don’t have time to pick out a new one. It’s best to get perspective from opinions of those we trust.
My husband sees the dress as blue/black. I see it as white/gold. All I wanted to do was be able to put myself in his shoes and see it from his perspective, but it’s challenging to do so. I don’t have his eyes or his brain.
A brilliant engineer on my team, Daniel, found a condition that made me smile along those lines. It’s in public vendor documentation so I feel comfortable discussing – you can share in our combined joy. In the creation of security group resources, the Terraform AWS EKS module used to handle tagging one way and the EKS service itself now handles tagging another way. I was interested in how the conventions occurred.
In simplest terms, for this tag, if a security group is created by the EKS service today, the tag starts with eks-cluster-sg- and then appends with the cluster name you provide per the standards of AWS, but if that same resource is instead created by the module on an older version, it starts with cluster name and then appends with -eks_cluster_sg. First look at the AWS EKS documentation:
“Amazon EKS adds the following tags to the security group. If you remove the tags, Amazon EKS adds them back to the security group whenever your cluster is updated.” This means, congrats! Conform, fam! These tags may have been important to you to stay the same, but oh well.
At massive companies teams create 1000s, 10s of 1000s, to 100s of 1000s of resources using different modules, tools, and patterns for microservices and the naming conventions matter because they are tied to dependent systems. In a perfect world, teams would use centralized conventions on top of those shared by communities and vendors (and don’t change). The issue is: Chaos makes attaining that to a factor of 100% impossible. You can get close. Ish. But some changes imposed by external factors just hurt.
I was interested in how AWS may have evaluated this convention and the Terraform AWS EKS community this particular standard as they changed it over time, especially as security groups along with other resources are seen as a bi-directional dependency for 100 other services. I know I understood the change to just be a difference in the convention of a tag…but the devil is in the details…tags don’t just change in repos for fun.
My favorite answer was wisely from a colleague named Al, simply: perspective.
Getting Perspectives
When we design systems or write code, engineers start from the perspective of how they would use the system. As they work through the design, they evaluate the perspective of others and that changes the design.
Scenario One: A team tells an engineer what they have will not work because they’ve already lived through the pain. No one has a great recommendation on what will work.
In this case, the engineer may build the “risky” design anyway to find out why it won’t work. The engineer aims to better understand what they are trying to solve and knows even the information of their peers is dated. You have to agree to trust to break things. I see this in problem spaces where a good solution hasn’t been found, and we only know what doesn’t work.
The act of educating where we’ve been through the trial of building, while it has an upfront cost in revisiting of the wheel, ends up being an accelerant towards solving the larger problem as each person builds their own understanding of what didn’t work on top of their unique subject matter expertise.
Scenario Two: Those an architect reaches out to for perspective intend to use the system differently than what the architect had assumed when requirements were written.
There is an assumption that if we get enough requirements and if they are detailed, scenario two won’t happen. I’d rather assume scenario two will – get feedback early.
When I do not believe we’ve evaluated enough why we may want to do something, I ask if we know use cases for which that decision applies – I question it, intentionally, even if I feel like I want to do it. One may find that a pattern applies only in specific contexts, not for every workload. For example, if someone is learning infrastructure as code then they are repeatedly replacing and tearing down infrastructure. If someone is in sales they most certainly are creating and tearing down infrastructure for demos. And if they are load testing or practicing DR they are too. But that doesn’t mean the act of doing so applies to every architecture stack – or even every part of the architecture (You have to decide which option is riskier when testing bi-directional dependencies – starting from scratch or just dealing with it).
Scenario Three: A system’s semantics will change because your shoes look a whole lot different from mine and we both need to use the system.
This one is perhaps the most common scenario. Perhaps my approach is the complete opposite from another’s – that includes being a customer of third parties, being applied a business requirement from an enforcement team, or we simply use the same tools and are responsible for different goals. I care in systems about the clusters first because that’s how the users of EKS narrow down quickly, but a third party vendor may care in theirs about the resource types because their changes impact a blast radius at a different isolation level and we are organized around our problem spaces in a way that is fundamentally different. I care about how changes impact games, they care about how their changes impact everyone.
Bi-Directional Dependencies as a Matter of Perspective
If you hate your life and want to understand how not easy the problem of “getting perspectives” is when evaluating the design of an architecture, the impact of migration, or a simple update to a portion of a stack is – I recommend reading up on Terraform bi-directional dependencies with the recreation of security groups and understanding for even just this one resource how many co-dependencies there are on the side of AWS with regards to dependency graphing.
From Terraform, “A simple security group name
change ‘forces new’ the security group–Terraform destroys the security group and creates a new one. (Likewise, description
, name_prefix
, or vpc_id
cannot be changed.) Attempting to recreate the security group leads to a variety of complications depending on how it is used. Security groups are generally associated with other resources–more than 100 AWS Provider resources reference security groups…the dependency relationship actually goes both directions causing the Security Group Deletion Problem. AWS does not allow you to delete the security group associated with another resource (e.g., the aws_instance
)….Terraform does not model bi-directional dependencies like this, but, even if it did, simply knowing the dependency situation would not be enough to solve it. For example, some resources must always have an associated security group while others don’t need to.”
The simplest form to describe what’s going on on this page is a resource may need to exist before another one does. One cannot simply “replace” some of those choices easily. In the case of bi-directional dependencies, when you are changing anything one needs to understand both (1) the dependency of one part of a system on another (2) whether it cares about order of creation and operations. An engineer has to put themselves in that service’s shoes and understand just how significant is their change in the grand scheme of our shared dependencies – sometimes one hopes for insignificant or minor, and sadly, is blessed with the universe’s irony where simplicity is an illusion.
My guess is the reason AWS cares about the cluster name after defining a security group from the perspective of taxonomy is perhaps it needs it to exist first as a matter of perspective, and thus many semantics down to the tagging convention are written from the perspective of order of operations as AWS sees them, not, as the users of the AWS EKS service do.
But I won’t really know without seeking perspective by creating this first.
Image Credit: NASA/JPL-Caltech/MSSS “NASA’s Curiosity Mars Rover Gets a Major Software Upgrade.” April 13, 2023 – An excerpt from NASA as a matter of perspective…
“[Curiosity] can now do more of what the team calls “thinking while driving” – something NASA’s newest Mars rover, Perseverance, can perform in a more advanced way to navigate around rocks and sand traps. When Perseverance drives, it constantly snaps pictures of the terrain ahead, processing them with a dedicated computer so it can autonomously navigate during one continuous drive.
Curiosity doesn’t have a dedicated computer for this purpose. Instead, it drives in segments, halting to process imagery of the terrain after each segment. That means it needs to start and stop repeatedly over the course of a long drive. The new software will help the venerable rover process images faster, allowing it to spend more time on the move.“