Developer tooling and boiled frogs

Posted on Thu, Aug 17, 2023 in software, articles with tags tooling.

This is an article about software tooling and the developer experience.

Background

Right now I am spending most of my work time on an embedded project. It’s C++ running on bare-metal target hardware. I’ve got all of 768K of RAM to play with, but that’s still a tad more than my first home micro had.

Our tooling and infrastructure grew mostly organically. I was the one who initialised the git repo.

CI chain 🔗

We have a CI chain, of course. It has grown over time and does a few tricks:

build for the target hardware (various targets; RAM and flash builds)
build on Linux native port and run unit tests
perform various analyses (clang-tidy, sonarcloud, code coverage, valgrind)
code style enforcement (clang-format) 👮
prepare a signed upgrade image with the development signing key
assemble a whole-of-flash image for the factory 🏭
build and run some unit tests on a development board

The dev board is on-premises but the rest happens in the cloud, taking a matter of minutes on every push. It’s reasonably slick.

You can run all the build and analysis steps locally. We have a script that spins up a container emulating our CI provider so you don’t even need to have the dev tools installed, you only need to pull our in-house container image.

How much effort to spend on tooling? 💸

Many a savvy manager or product owner will tell you to not spend too much time on tooling. They have a point; time spent working on tooling and infra is not delivering features, after all. Many managers have been bitten by infrastructure blow-outs and would rather we focus on building the stuff they want us to build.

But here’s the thing. When they say that, they don’t mean “forget about tooling and never speak of it again”. You are allowed to bring the subject up again in future, and you are allowed to try to educate them as to the value.

The frog boils 📉

We now have over 300 source files that get compiled into the image, which includes some vendor and autogenerated code.

There’s the rub. We didn’t start out with anywhere near that many. The CI chain used to be really fast, under one minute. Local builds were fast, and even running the clang-tidy script wasn’t that awful (though I did build a cache for it, which helped).

The size and complexity grew over time, and here we are today.

It’s like the old tale about boiling frogs. (Disclaimer: I don’t even know if this is true, and I definitely don’t suggest you try it, but it’s an interesting thought experiment.)

What happens if you drop a frog into a pot of boiling water?

The frog doesn’t like it and jumps out.

But what happens if you drop a frog into a pot of cold water, then light the stove? The frog doesn’t realise anything is wrong. It boils to death before it thinks to jump out.

Builds with a cold cache were taking a couple of minutes locally, and about 8 minutes in the cloud. (This happens depressingly often when you change a low-level header that is included all over the place… there’s another observation, that maybe our architecture needs work, but that’s for another day.)

A full static analysis with cold cache was taking over 5 minutes locally, and 22 in the cloud. (The difference is because our company-provided laptops have 16 cores, while our CI provider runs our container with 4 vCPU cores and tightly limited RAM.)

Goodness me, 300 compile units now? This water sure is getting warm! 😰

It’s a poor developer experience. Even running the code style checker script takes five whole seconds. That’s 5 seconds in which you can get distracted, pop over to Slack, and whoops! ten minutes later, you’re trying to remember what it was you were doing.

I’ve been in this business long enough to know that keeping myself and my team focussed is vitally important to getting things done and keeping the stakeholders happy. An efficient DX is part of that puzzle.

Like the mythical frog, I don’t remember noticing these tasks growing longer.

A concrete walk-through 🔧

OK, so you’ve decided you need to jump out of the pan and spend some time sharpening your tools. Great.

The most important advice I can offer is to understand where the pain points are. 🔍

If this is stuff you work with every day, you maybe already know where the pain points are but are blind to them because you haven’t noticed how poor the DX has become. Open your eyes, and you may find it is in plain sight. Don’t optimise for its own sake; optimise where it will help.

Here’s what I noted about our tooling:

The compilation time itself is already pretty well optimised and parallelised thanks to meson, ninja and ccache.
The code style checker is something we run very frequently as part of our workflow (it’s in our pre-commit script). At 5 seconds, that’s like sand in your gears.
The static analyser is a different type of pain point. We run it interactively but we don’t like to because it’s a slow ol' beast. On the cloud, it is the slowest part of the build; it is probably the reason why we’re regularly incurring overage on our CI build minutes. (Now, that’s not a big cost compared to an engineer’s salary, but it’s not zero and it has been noticed, so maybe worth spending a little time on.)
The unit test run on a dev board is a bottleneck, practically by definition, as it uses a self-hosted CI runner plugged into our pipeline.

Code style checks: Parallelise!

We use clang-format with an in-house style file.

The heart of our checker script is a shell function.

The tl;dr version is:

We were invoking clang-format in a tight loop, once per input file. 🤦
Now, we count the files, divide by (slightly more than) the number of CPU cores available, and have xargs parcel out the work into that many chunks. 🚅

Static analysis: Reconsider the load, cache, and parallelise where it will help!

The learnings from this one are:

Do the heavy lifting in parallel (we were already)
Cacheing positive analysis results helps (we were doing that too)
Don’t analyse unit tests or test harness/mocks/fakes unless you have a good reason
Don’t analyse target-specific files for obsolete hardware targets unless you have a good reason
Where you have a string-and-glue shell phase to prepare the analysis run, it can be worth parallelising that as well.

Results 🚀

(Timed on my work laptop, which has 16 cores. It’s running Ubuntu 22.04 on WSL on Windows 11.)

Action/Phase	Time before	Time after	Speed-up
Code style check	4.7 seconds	210 ms	22.3x
Static analysis setup	14 seconds	3.5s	4x

My next steps

The on-target testing phase is a bottleneck, so we’re going to add more hardware.
There might be some mileage in using using pre-compiled headers to speed up cold-cache compilation.
- This isn’t something the places I have worked have traditionally done, but I know it’s popular when you’re dealing with the Win32 C++ api.
- Our codebase has a “platform” layer; this smells like a good starting point for precompilation.
Coverage analysis. We already do this, but there’s another pain point while genhtml crunches the data and generates the HTML report. This one could be trickier; it’s a single-threaded perl script, invoked by meson.
Stack usage analysis. We’ve got some code paths that can be provoked to overflow on debug builds. This is a different sort of pain point; while we’ve configured the silicon to detect and hard-fault on overflow, I’d like to set up some analysis to automatically find these situations ahead of time.

Closing thoughts

How long to spend on infrastructure? 💸

Circling back to a question I asked earlier, how long is a piece of string?

It depends where you are and what your current DX is like.

10-20% of your developer time seems a reasonable benchmark; it’s enough that you can get a handful of DX tickets or story points through every iteration without becoming too much of a drag on progress. After all, if you’re working agile (and who isn’t, these days?) constant improvements are the key.

How’s your developer experience looking?