Tue, 15 Apr 2014
We had some requests to get GHC (the Glasgow Haskell Compiler) up and running on two new Ubuntu architectures: arm64, added in 13.10, and ppc64el, added in 14.04. This has been something of a saga, and has involved rather more late-night hacking than is probably good for me.
Book the First: Recalled to a life of strange build systems
You might not know it from the sheer bulk of uploads I do sometimes, but I actually don't speak a word of Haskell and it's not very high up my list of things to learn. But I am a pretty experienced build engineer, and I enjoy porting things to new architectures: I'm firmly of the belief that breadth of architecture support is a good way to shake out certain categories of issues in code, that it's worth doing aggressively across an entire distribution, and that, even if you don't think you need something now, new requirements have a habit of coming along when you least expect them and you might as well be prepared in advance. Furthermore, it annoys me when we have excessive noise in our build failure and proposed-migration output and I often put bits and pieces of spare time into gardening miscellaneous problems there, and at one point there was a lot of Haskell stuff on the list and it got a bit annoying to have to keep sending patches rather than just fixing things myself, and ... well, I ended up as probably the only non-Haskell-programmer on the Debian Haskell team and found myself fixing problems there in my free time. Life is a bit weird sometimes.
Bootstrapping packages on a new architecture is a bit of a black art that only a fairly small number of relatively bitter and twisted people know very much about. Doing it in Ubuntu is specifically painful because we've always forbidden direct binary uploads: all binaries have to come from a build daemon. Compilers in particular often tend to be written in the language they compile, and it's not uncommon for them to build-depend on themselves: that is, you need a previous version of the compiler to build the compiler, stretching back to the dawn of time where somebody put things together with a big magnet or something. So how do you get started on a new architecture? Well, what we do in this case is we construct a binary somehow (usually involving cross-compilation) and insert it as a build-dependency for a proper build in Launchpad. The ability to do this is restricted to a small group of Canonical employees, partly because it's very easy to make mistakes and partly because things like the classic "Reflections on Trusting Trust" are in the backs of our minds somewhere. We have an iron rule for our own sanity that the injected build-dependencies must themselves have been built from the unmodified source package in Ubuntu, although there can be source modifications further back in the chain. Fortunately, we don't need to do this very often, but it does mean that as somebody who can do it I feel an obligation to try and unblock other people where I can.
As far as constructing those build-dependencies goes, sometimes we look for binaries built by other distributions (particularly Debian), and that's pretty straightforward. In this case, though, these two architectures are pretty new and the Debian ports are only just getting going, and as far as I can tell none of the other distributions with active arm64 or ppc64el ports (or trivial name variants) has got as far as porting GHC yet. Well, OK. This was somewhere around the Christmas holidays and I had some time. Muggins here cracks his knuckles and decides to have a go at bootstrapping it from scratch. It can't be that hard, right? Not to mention that it was a blocker for over 600 entries on that build failure list I mentioned, which is definitely enough to make me sit up and take notice; we'd even had the odd customer request for it.
Several attempts later and I was starting to doubt my sanity, not least for trying in the first place. We ship GHC 7.6, and upgrading to 7.8 is not a project I'd like to tackle until the much more experienced Haskell folks in Debian have switched to it in unstable. The porting documentation for 7.6 has bitrotted more or less beyond usability, and the corresponding documentation for 7.8 really isn't backportable to 7.6. I tried building 7.8 for ppc64el anyway, picking that on the basis that we had quicker hardware for it and didn't seem likely to be particularly more arduous than arm64 (ho ho), and I even got to the point of having a cross-built stage2 compiler (stage1, in the cross-building case, is a GHC binary that runs on your starting architecture and generates code for your target architecture) that I could copy over to a ppc64el box and try to use as the base for a fully-native build, but it segfaulted incomprehensibly just after spawning any child process. Compilers tend to do rather a lot, especially when they're built to use GCC to generate object code, so this was a pretty serious problem, and it resisted analysis. I poked at it for a while but didn't get anywhere, and I had other things to do so declared it a write-off and gave up.
Book the Second: The golden thread of progress
In March, another mailing list conversation prodded me into finding a blog entry by Karel Gardas on building GHC for arm64. This was enough to be worth another look, and indeed it turned out that (with some help from Karel in private mail) I was able to cross-build a compiler that actually worked and could be used to run a fully-native build that also worked. Of course this was 7.8, since as I mentioned cross-building 7.6 is unrealistically difficult unless you're considerably more of an expert on GHC's labyrinthine build system than I am. OK, no problem, right? Getting a GHC at all is the hard bit, and 7.8 must be at least as capable as 7.6, so it should be able to build 7.6 easily enough ...
Not so much. What I'd missed here was that compiler engineers generally only care very much about building the compiler with older versions of itself, and if the language in question has any kind of deprecation cycle then the compiler itself is likely to be behind on various things compared to more typical code since it has to be buildable with older versions. This means that the removal of some deprecated interfaces from 7.8 posed a problem, as did some changes in certain primops that had gained an associated compatibility layer in 7.8 but nobody had gone back to put the corresponding compatibility layer into 7.6. GHC supports running Haskell code through the C preprocessor, and there's a
This was all interesting, and finally all that work was actually paying off in terms of getting to watch a slew of several hundred build failures vanish from arm64 (the final count was something like 640, I think). The fly in the ointment was that ppc64el was still blocked, as the problem there wasn't building 7.6, it was getting a working 7.8. But now I really did have other much more urgent things to do, so I figured I just wouldn't get to this by release time and stuck it on the figurative shelf.
Book the Third: The track of a bug
Then, last Friday, I cleared out my urgent pile and thought I'd have another quick look. (I get a bit obsessive about things like this that smell of "interesting intellectual puzzle".) slyfox on the #ghc IRC channel gave me some general debugging advice and, particularly usefully, a reduced example program that I could use to debug just the process-spawning problem without having to wade through noise from running the rest of the compiler. I reproduced the same problem there, and then found that the program crashed earlier (in
You see, the vast majority of porting bugs come down to what I might call gross properties of the architecture. You have things like whether it's 32-bit or 64-bit, big-endian or little-endian, whether
From some painstaking gdb work, one thing I eventually noticed was that
There's still more to do. I gather there may be a Google Summer of Code project in Linaro to write proper native code generation for GHC on arm64: this would make things a good deal faster, but also enable GHCi (the interpreter) and Template Haskell, and thus clear quite a few more build failures. Since there's already native code generation for ppc64 in GHC, getting it going for ppc64el would probably only be a couple of days' work at this point. But these are niceties by comparison, and I'm more than happy with what I got working for 14.04.
The upshot of all of this is that I may be the first non-Haskell-programmer to ever port GHC to two entirely new architectures. I'm not sure if I gain much from that personally aside from a lot of lost sleep and being considered extremely strange. It has, however, been by far the most challenging set of packages I've ported, and a fascinating trip through some odd corners of build systems and undefined behaviour that I don't normally need to touch.