Faster C++ code style checks: detail
xargs
, nproc
and shell arrays, oh my!
This is a companion article to Developer tooling and boiled frogs.
Before: 4.7 seconds 💩
In this codebase, style checks were performed by clang-format with an in-house style file. They ran in a pre-commit hook (as well as in CI). However, the pre-commit hook is critical for a good DX; 5 seconds really sucks.
The entry point to the check was a shell script which invoked clang-format, amongst other things. Like all such scripts, it was not truly original, being assembled from snippets and ideas found around the internet. Here’s the gist of it:
check_files() {
# arguments are filenames. if any failed, returns 1; if all passed, returns 0
err=
for file in $@; do
ext=${file##*.}
case "$ext" in
c|cpp|h|hpp)
if ! clang-format -Werror -style=file -n "$file" ; then
err=1
echo "code style check failed for $file"
fi
;;
*)
# unsupported file type; ignore
;;
esac
done
return $err
}
# ...
check_files "$(git ls-files)"
Do you see the problems?
- It’s single-threaded! Almost all desktop and laptop CPUs have had multiple cores for 15 or more years, so this is truly Dark Ages style processing.
- We’re invoking
clang-format
far more times than we need to. Every invocation has to parse the config file, which is wasteful.
After: 0.21 seconds 🚀
check_files() {
# arguments are filenames. if any failed, returns 1; if all passed, returns 0
FILES=()
for file in $@; do
ext=${file##*.}
case "$ext" in
c|cpp|h|hpp)
FILES+=("$file")
;;
*)
# unsupported file type; ignore
;;
esac
done
FILES_COUNT="${#FILES[@]}"
CORES=$(nproc)
# CORES+3 gave best performance for me. YMMV.
JOB_SIZE=$((FILES_COUNT / (CORES+3)))
if [ ${JOB_SIZE} == 0 ]; then JOB_SIZE=1; fi
echo "${FILES[@]}" | xargs -P 0 -n ${JOB_SIZE} clang-format -Werror -style=file -n || return 1
return 0
}
There’s a bit to unpack here:
- We’ve moved from
sh
tobash
to take advantage of array variables. - Rather than immediately running
clang-format
on every file that passes the filter, add them to theFILES
array so we can use xargs. (Yes, I know this could be better structured. This was but a single step of evolution; first, make it work.) - We use
xargs
in parallel mode.
We’ve naively assumed that checking each file takes roughly the same amount of time. This isn’t quite true (I found about a factor of 3 variation amongst them) but the performance gains were good enough that it doesn’t seem worth worrying about at the present time.
The key here is combining -n $JOB_SIZE
with -P 0
(unlimited parallelisation) and doing some shell arithmetic to figure out what that job size should be.
xargs
with neither option runs a singleclang-format
with all the input files in a single invocation.- This is good (about a 6x speedup) but we can do better! It is still single-threaded, iterating through the files in order.
xargs -P $(nproc)
sounds interesting but is no better on its own. The man page says it all:Use the -n option or the -L option with -P; otherwise chances are that only one exec will be done.
- I’ve counted the files, divided by (slightly more than) the number of processor cores available, and told xargs to parcel the files out in jobs of that size. Of course, if you’re doing this yourself, you’ll want to repeat the experiment to figure out the sweet spot for your hardware and your codebase.
🚅 Final speedup for this activity: 22.3x 🚀 (timings as measured on my laptop for that codebase on that day; your mileage may vary, may contain nuts, etc.)