👋 Hey, this is Chibs! Welcome to Regular Dev.
I’m trying to strip software engineering of all the wrong things social media has made it out to be, based on my own career, experiences, and journey. It’s sort of like telling the truth about software engineering; everything I write here is objectively aimed at that.
Get updates delivered to your inbox, and never miss out on the important stuff! 👇
Have you broken an application in production before? (by mistake of course). It’s a rollercoaster of emotions, whether it’s as little as leaving IAM credentials on clientside code, bursting everyone’s favourite react component, or deleting the entire prod database and triggering nuclear fission in the process, it’s all good, we’re here for therapy, gather around.
Disclaimer: you’re responsible for following my advice 🫱🏻🫲🏼
What we are going to cover in this episode:
🔒 A rite of passage? - It’s like initiation
👁️ The myth - An alternative process
🏥 Handling these situations - Don’t do a witchhunt
🔥 Firefighting - The act of fighting fires
This is me apologizing for taking a three-month break without saying anything, I’ve been working on a wide range of multiple projects, exploring other interests, learning, fighting aliens (literarily), not finding Time when I needed it the most🤧, and most importantly, travelling upward out of this Earth (this is not a Space-X reference).
I promise to do better. Arigatou!
If you find typos in it, I’m sorry about it.
🔒 A Rite of Passage?
I have a couple of stories to tell. They’re true stories of my friends but I won’t use their real names, it’s really just to show even the best of us make mistakes, some trivial, some mind-boggling, like this 👇🏾, very deserving of awards and a lot of “damn”
Let’s assume your name is Isaac, and you’re a backend developer who’s just been given access to your company’s codebase, and you’re afraid to break something in it…. (a very non-specific example)🌚 How do you hope to master it if you don’t get in there and break things? (not intentionally of course). I reckon everyone who’s confident enough to build stuff and lead other engineers today has at one point or another in the past made errors that broke the codebase, the sound barrier, and the ozone layer. It’s not weird, I tend even to argue that it’s a rite of passage, great engineers do not only know what to do, but they know even better what not to do. It’s not the event or the consequences that should be looked at, it’s the process, and learning that comes with the event that stays with you, it makes you careful, detailed, and rightly paranoid, I could say this is a very important reason to justify outage post-mortems.
So listen up Isaac, my friend, Emmanuel once forgot to reconvert from kobo to naira and pushed to prod, it’s little yeah, but the ripple effect is nuclear scale, because imagine a customer tries to send 1000NGN, you process it in coinage, i.e 1000 X 100 but forget to reconvert. The other user gets an alert of 100,000NGN, that’s a bomb! I also remember pushing a feature and forgetting to mention there were new environment variables to be added to the server, luckily, it was an internal application. While these are undesirable and make you feel less than an engineer, you may even think about switching to selling wigs on Instagram, they make up your processes and experience. The first time is always the worst, but after that, you’re more confident in facing these problems head-on. It’s a given that you may keep doing this all your career (unintentionally of course), junior, senior, CTO, anyone.
👁️ The Myth
No matter how brilliant or careful an engineer is, we cannot rule out the human factor and that people are prone to mistakes and errors. This should be the default mindset of engineers and engineering teams. Everyone can make a mistake, sometimes, it’s because of negligence or a lack of attention to detail, or distraction, other times, it’s burnout, fatigue, personal problems, health, etc. No one should ever expect a perfect designer or engineer. you shouldn’t even expect that from yourself.
Don’t make the kind of mistakes that can for instance cause the AI invasion and destruction of mankind though. I said small mistakes, not world-dominating ones 🫠
🏥 Handling these situations
For Engineering Leaders - An engineering leader who accepts and practices this would be able to build a culture where blame is not readily thrown around. Even post-mortems are most beneficial to the team when it’s not a witch-hunt, but rather work to discover the problem and the reason, find out why it happened, and if it’s because of a mistake someone on the team made, place priority on helping that person learn from it. If it comes from a usually high-flying engineer then something else may be happening that affects their focus. In all, get to the root without throwing anyone under the bus, and you’ll have a grateful team who will be willing to give their best without being afraid to try and fail a few times.
For Engineers - Do not be overly confident in what you’ve done, it blocks you from tracking back on your processes and checking for grey areas, sometimes, it may not necessarily be an error but the way you’ve done something, it would be an okay implementation or design but may not suit the context or scale of your company, for instance, generating a PDF invoice and sending it via a REST call to the user is an okay implementation but when the application involved has scaled to thousands of users, doing that in real-time may consume resources and slow down other users, you may then have to consider queuing and doing it in the background. So, it’s always a good thing to not just push a code or a screen, but consider context and the business a lot.
For Engineering Teams - A proper review system should always be in place, it could be anything from peer reviews to prototyping, to QA, to first deploying to a staging environment that is close enough to the production environment, it makes it easier to catch before it goes to production.
Situations like breaking production can be turned into memories like Flickr did, build a culture around it, and you’ll be encouraging confident engineers. If engineers receive harsh punishments for breaking stuff, your workplace becomes tame, and boring, and may even slip into toxic. Everyone who comes into a team wants to make an impact, and you can’t always do that without breaking something else (not intentionally of course) 😂
🔥 Firefighting
Noun
fire·fight·er | \ ˈfī(-ə)r-ˌfī-tər \Definition: a person who fights fires in production
One great way to nip this in the bud is to build an engineering culture that looks similar to carrying out systematic fire drills. Netflix had a system in place that sometimes killed a few servers to test how resilient their systems were. Teams could try breaking things systematically to learn troubleshooting, test how quickly they can get it back up, and ultimately discover application weaknesses, breakpoints and ways to improve their systems. If your system can be broken because a wrong type got to production then maybe the mistake is not first from the developer but from the system in place, your automated tests, and CI/CD pipelines may need to be revisited and optimised. This is a nicer way to do things trust me, you want to firefight in the peak of application traffic, launch day, or in the middle of an attack, it should always be a culture to break things and fix them up better than before. The principle of chaos engineering is to run experiments on production because truth be told, no environment can really be like your production no matter how simulated you make it out to be, real users and servers will always be unpredictable, and nothing prepares you for that, I’ve seen it firsthand.
My final rather chaotic advice, kill a few production servers today and watch the world burn, then proceed to fight all the fires, that way you save the world sacrificing only a few servers. We win!
Final Thoughts:
I think this is a beautiful beautiful thread you need to go back to every time you think you’ve done something very silly as an engineer. I should probably do a compilation of some of the most profound ones.
As a tech leader, engineer, manager, or designer, what do you think?