To Be in it for the Gold, You Have to be in it for the Story

Uncharted – Drake’s Fortune: Nate’s Theme by Greg Edmonson, Andrew Skeet as played by the London Philharmonic Orchestra 🎵

My husband and I have watched 3 treasure hunting shows. We saw Pirate Gold of Adak Island on Netflix first – It’s a metaphor for real life Minesweeper and Macgyver tactics of civil engineering. You’re lured in by the idea a pirate in 1892 buried $300M on Adak island, but the reality is, that even if a pirate did, within moments you’re concerned by the safety of the team. Surprisingly, it’s not because of potential pirate traps. No. It’s because of our choice as humans to discard missiles from war on the same island with reckless abandon. A treasure happens to be buried around our dumb choices and propensity to be the source of our own irony. Go watch it.

I promise this relates to development. But let’s keep going so you have more context. I’m not as good at easter eggs and masterminding as some pop stars.

We then finished 2 seasons of Lost Gold of World War II on the History Channel, which leaves viewers with a cliffhanger concerned that the entire team may now be in the witness protection program. I watched it to relax because looking at epic drone footage of the Philippines lets me index into my Uncharted persona. I walked down “the aisle” to Nate’s theme song from the game.

Now, my husband and I are watching the full 10 seasons of The Curse of Oak Island. It’s a roller coaster of digging and disappointment. You’ll uncover a side of President Franklin D. Roosevelt you never knew. I’m on Season 5, and they still haven’t found “the” treasure despite finding weird artifacts and poor decisions of other people. They’ve definitely spent millions of dollars – and before they did, 6 others died in the process of trying. Supposedly they may find the Ark of the Covenant, Shakespeare’s lost manuscripts (a hilarious thought given everything is covered in water), or Marie Antoinette’s jewels. Every five seconds something is tied to the Knights Templar, the history of Spain, British colonialism, or French history. Every episode is the same and yet I still can’t stop watching it.

It’s struck me that as an engineering manager with a business pastâ„¢ that there is a reason I like the dark humor in these shows. One would think they were developers with the level of frustration on top of millions of dollars. At some point they ask each other why they continue to keep going.

It’s always the story over the treasure.

A Tale of Racing Game Splines

You may have used a tool so bad it crippled the entire team. If you had to use a third party spline tool before 2014 I’d be shocked if that did not resonate deep in your dark soul.

You see, splines, which let you draw geometry or move cameras over a procedural path, have come a long way. It’s no surprise to me that both Unity and Unreal have looked at this area and that the title of this ’22 blog post includes the words “maintaining creative flow with Splines.” That used to be so hard in some contexts.

Without explicitly mentioning the name of the racing game a former team and great friends were working on, the spline tool we had controlled a road in the racing game. The road’s design was controlled by “fun” which was controlled by playtests. Every time we had to change the road the one 3D level designer on the project, had to rebuild the terrain around the changes of the road. And the roads changed a lot. If you do the mental math on the manual labor involved in that you immediately see why people are now using AI to solve for spline problems and relate those two systems – Terrain should adapt to changes in level design so that a level designer doesn’t have to go in there and hand manipulate other geometry around the spline when it changes so cars can’t collide with it. He’d have to drive on it, change it, move the world around, and do a myriad of other tactics to look for collisions and make it so after “fun changed” generally the landscape looked believable. You know, like entire trees weren’t now floating. The systems should be smart enough, related, to adjust not just the terrain, but decorative materials and objects around those changes. But those options were not available or robust 10 years ago.

There wasn’t a tools team and this was a contract project which mean everyone got to live in “tools pain” of third party tools from the Unity asset store. Arguably this is what is very challenging for development agency and third parties – they don’t have the resources to create centralized tools and keep investing based on problem spaces that slow them down. I’m reminded of this story because at the beginning of Season 5 of Oak Island a winter storm has washed away one of the roads they need to bring equipment to continue looking for treasure. It’s going to cost them tons of money. For us? We had to update the road every time we wanted to retest “fun” with playtesters. The faster we could update that road, the faster we could deliver. But it perpetually slowed down as a cyclical dependency upon itself for which we were not contractually able to build a solution – as fixing that problem would have been a cost to the company, not the client. It would have been like the Oak Island road washing away over and over and over and never being able to fix it except just re-spread gravel and try to drive again.

I look back on this, and you would think that I’d wish I had done more to advocate for this specific tool, but I don’t. One may think I’d blame game development timelines, or shipping, but I don’t. In the end, we shipped the game, it went into a ton of malls and kiosks, and the customer sold a ton of perfume – it achieved its goal. I’m really proud of what we accomplished – but I do remember the pain and frustration of that problem while we built the game and if we had made more racing games or generally used a lot more splines I would have wanted to invest in splines. That tool impacted how those on that team navigated speed and decisions later in their careers. It created a “tools passion.” It showed us the value of investing in systems not only content.

I think if you want to invest in tools as a business, you invest in platforms and it fundamentally changes the future of your company not on one project. Companies have to think about the impact the tools they make in one team could help another team and find ways to cross-share and centralize up front, not later. They have to build an internal business justification around its repeated use or multi-game and multi-customer use.

The ask for investing in a platform or a specific line of tools to solve for a complex pain is a cross company strategic initiative, not for just one game, and for many years. You have to know that even if you haven’t seen the results or the “treasure” that it’s there because you anticipate the pain is large enough and impacts the core delivery window, from lived history that you’ve dug back up in your collective pasts.

A Tale of 7 Environments to 2

In another line of “things that really slowed my former teams down but changed how I think” – there were a few games where the number of release environments was far too many. I’ve seen up to 7 release environments on one game (that includes backend, client-side builds, HockeyApp, TestFlight, etc in that pattern). These days I love the idea of “two” (stage and production) and then each eng gets their own dev environment (local or in cloud depending on where you sit). In Kubernetes land teams can pool and siphon off prod hosts in the same environment to isolate testing – but the mental model is different – it comes from wanting to live in production instead of trying to prevent workloads from getting to it out of caution. I think a lot about paying that down safely by measuring where we’ve all been (change in blast radius & severity).

If teams design releases around “People can’t ever see a single failure or bug” they inevitably end up spending thousands to millions of dollars in wasted OPEX and compute. I educate a LOT on the question of “Do you need this infrastructure?” and “How fast do you want to get to production and why not as fast as possible?” to really make sure there is a good answer – the more experience teams have, they begin to realize that getting to production faster is still cheaper (and safer) long term than doing everything to never get to it and often even 6 reviewers can still miss the 1 thing that would have failed a change. Teams cannot know unless their design, their work, is in that system – let integration tests catch what they can and stop a release if it does – if a release fails, update integration tests from the lessons. Iterate systems from results not lack there of. That said, It takes a while to make that transition and make sure it works for everyone – not every part of a game should have the same DORA metrics – apps, infrastructure, clients are all very different worlds.

I remember after coming off the non-high of 7 environments I said “I will not be doing this for PrinceNapped.” On PrinceNapped my team only had 2 cloud environments and it felt much better. Our environments were decisively simple. Our processes even more so.

Our goal was: What if we don’t “Release” at all?

In PrinceNapped, we tried to design around “avoid releases entirely” to test changes (client, server side). For example, this is a google sheet version that mapped to the level database in RDS (not automatically but to track what we were going to do and discuss – I would then change the database value). Sure you can tie this to a release for tracking, and this was tied to tests, but the concept was “How little code as possible” do we need to change to get to production.

The string IDs mapped to scenes. We learned that (1) in the future it would have been better to have 1 scene with all objects and then populate their location upon load using a configuration file (this would have cut down the binary on the client). What we had gave us (2) the ability to swap levels for users. This made us able to test without pushing a client side binary. Had we done (1) and used cloud hosted configuration files for puzzle piece mapping, it would have been even faster and still required no client update and only database changes.

We were able to swap move targets, score targets, among other variables. We would run ads, change the values, run ads again. Ultimately what we were trying to do was pay down this and truly understand what were the pain points for players – we were invested in that. The “red” needed to get lower every time on Facebook in the FTUE. 7 years ago, I wanted to make sure 50% of those who played Level 1 made it past level 10 before the game launched on mobile (red being the delta between each level for which a user didn’t show up again over an identified test window).

The data above was completely based on trying to make player experience better – make sure that the puzzles were not so hard they were not engaged. Everything else had to conform to that goal. In Treasure Hunting shows, teams all have the same underlying goal – keep poking at the story, keep trying to make changes. I think the biggest thing that speaks to me about these shows is that sometimes it’s big overarching choices – everything from not investing in tools to having 7 environments out of fear – that slow down developers the most and prevent teams from helping players in business or even getting what they really need.

I used to say “Make good choices” now? I say “Make epic changes” – and by that I mean ones that transform businesses and how they operate – be extreme in what you do and remove constraints and you really will see the results you can measure.

We cannot guarantee good choices, we can only get as close as possible with luck and by trying truly big bets and to ask for charters and process change that makes a difference. If one’s goal is to get better, find answers, then we must be in it for the story as learned from trial not – only the treasure.

Image Credit: Image from Unsplash by Nadjib BR.