Colin Watson
   


About
Colin Watson's blog
cjwatson@debian.org

Subscribe
Subscribe to a syndicated feed of my blog.

Flavours


Powered by Blosxom

       
Sun, 26 Oct 2014

Moving on, but not too far

The Ubuntu Code of Conduct says:

Step down considerately: When somebody leaves or disengages from the project, we ask that they do so in a way that minimises disruption to the project. They should tell people they are leaving and take the proper steps to ensure that others can pick up where they left off.

I've been working on Ubuntu for over ten years now, almost right from the very start; I'm Canonical's employee #17 due to working out a notice period in my previous job, but I was one of the founding group of developers. I occasionally tell the story that Mark originally hired me mainly to work on what later became Launchpad Bugs due to my experience maintaining the Debian bug tracking system, but then not long afterwards Jeff Waugh got in touch and said "hey Colin, would you mind just sorting out some installable CD images for us?". This is where you imagine one of those movie time-lapse clocks ... At some point it became fairly clear that I was working on Ubuntu, and the bug system work fell to other people. Then, when Matt Zimmerman could no longer manage the entire Ubuntu team in Canonical by himself, Scott James Remnant and I stepped up to help him out. I did that for a couple of years, starting the Foundations team in the process. As the team grew I found that my interests really lay in hands-on development rather than in management, so I switched over to being the technical lead for Foundations, and have made my home there ever since. Over the years this has given me the opportunity to do all sorts of things, particularly working on our installers and on the GRUB boot loader, leading the development work on many of our archive maintenance tools, instituting the +1 maintenance effort and proposed-migration, and developing the Click package manager, and I've had the great pleasure of working with many exceptionally talented people.

However. In recent months I've been feeling a general sense of malaise and what I've come to recognise with hindsight as the symptoms of approaching burnout. I've been working long hours for a long time, and while I can draw on a lot of experience by now, it's been getting harder to summon the enthusiasm and creativity to go with that. I have a wonderful wife, amazing children, and lovely friends, and I want to be able to spend a bit more time with them. After ten years doing the same kinds of things, I've accreted history with and responsibility for a lot of projects. One of the things I always loved about Foundations was that it's a broad church, covering a wide range of software and with a correspondingly wide range of opportunities; but, over time, this has made it difficult for me to focus on things that are important because there are so many areas where I might be called upon to help. I thought about simply stepping down from the technical lead position and remaining in the same team, but I decided that that wouldn't make enough of a difference to what matters to me. I need a clean break and an opportunity to reset my habits before I burn out for real.

One of the things that has consistently held my interest through all of this has been making sure that the infrastructure for Ubuntu keeps running reliably and that other developers can work efficiently. As part of this, I've been able to do a lot of work over the years on Launchpad where it was a good fit with my remit: this has included significant performance improvements to archive publishing, moving most archive administration operations from excessively-privileged command-line operations to the webservice, making build cancellation reliable across the board, and moving live filesystem building from an unscalable ad-hoc collection of machines into the Launchpad build farm. The Launchpad development team has generally welcomed help with open arms, and in fact I joined the ~launchpad team last year.

So, the logical next step for me is to make this informal involvement permanent. As such, at the end of this year I will be moving from Ubuntu Foundations to the Launchpad engineering team.

This doesn't mean me leaving Ubuntu. Within Canonical, Launchpad development is currently organised under the Continuous Integration team, which is part of Ubuntu Engineering. I'll still be around in more or less the usual places and available for people to ask me questions. But I will in general be trying to reduce my involvement in Ubuntu proper to things that are closely related to the operation of Launchpad, and a small number of low-effort things that I'm interested enough in to find free time for them. I still need to sort out a lot of details, but it'll very likely involve me handing over project leadership of Click, drastically reducing my involvement in the installer, and looking for at least some help with boot loader work, among others. I don't expect my Debian involvement to change, and I may well find myself more motivated there now that it won't be so closely linked with my day job, although it's possible that I will pare some things back that I was mostly doing on Ubuntu's behalf. If you ask me for help with something over the next few months, expect me to be more likely to direct you to other people or suggest ways you can help yourself out, so that I can start disentangling myself from my current web of projects.

Please contact me sooner or later if you're interested in helping out with any of the things I'm visible in right now, and we can see what makes sense. I'm looking forward to this!

[/ubuntu] permanent link

Tue, 15 Apr 2014

Porting GHC: A Tale of Two Architectures

We had some requests to get GHC (the Glasgow Haskell Compiler) up and running on two new Ubuntu architectures: arm64, added in 13.10, and ppc64el, added in 14.04. This has been something of a saga, and has involved rather more late-night hacking than is probably good for me.

Book the First: Recalled to a life of strange build systems

You might not know it from the sheer bulk of uploads I do sometimes, but I actually don't speak a word of Haskell and it's not very high up my list of things to learn. But I am a pretty experienced build engineer, and I enjoy porting things to new architectures: I'm firmly of the belief that breadth of architecture support is a good way to shake out certain categories of issues in code, that it's worth doing aggressively across an entire distribution, and that, even if you don't think you need something now, new requirements have a habit of coming along when you least expect them and you might as well be prepared in advance. Furthermore, it annoys me when we have excessive noise in our build failure and proposed-migration output and I often put bits and pieces of spare time into gardening miscellaneous problems there, and at one point there was a lot of Haskell stuff on the list and it got a bit annoying to have to keep sending patches rather than just fixing things myself, and ... well, I ended up as probably the only non-Haskell-programmer on the Debian Haskell team and found myself fixing problems there in my free time. Life is a bit weird sometimes.

Bootstrapping packages on a new architecture is a bit of a black art that only a fairly small number of relatively bitter and twisted people know very much about. Doing it in Ubuntu is specifically painful because we've always forbidden direct binary uploads: all binaries have to come from a build daemon. Compilers in particular often tend to be written in the language they compile, and it's not uncommon for them to build-depend on themselves: that is, you need a previous version of the compiler to build the compiler, stretching back to the dawn of time where somebody put things together with a big magnet or something. So how do you get started on a new architecture? Well, what we do in this case is we construct a binary somehow (usually involving cross-compilation) and insert it as a build-dependency for a proper build in Launchpad. The ability to do this is restricted to a small group of Canonical employees, partly because it's very easy to make mistakes and partly because things like the classic "Reflections on Trusting Trust" are in the backs of our minds somewhere. We have an iron rule for our own sanity that the injected build-dependencies must themselves have been built from the unmodified source package in Ubuntu, although there can be source modifications further back in the chain. Fortunately, we don't need to do this very often, but it does mean that as somebody who can do it I feel an obligation to try and unblock other people where I can.

As far as constructing those build-dependencies goes, sometimes we look for binaries built by other distributions (particularly Debian), and that's pretty straightforward. In this case, though, these two architectures are pretty new and the Debian ports are only just getting going, and as far as I can tell none of the other distributions with active arm64 or ppc64el ports (or trivial name variants) has got as far as porting GHC yet. Well, OK. This was somewhere around the Christmas holidays and I had some time. Muggins here cracks his knuckles and decides to have a go at bootstrapping it from scratch. It can't be that hard, right? Not to mention that it was a blocker for over 600 entries on that build failure list I mentioned, which is definitely enough to make me sit up and take notice; we'd even had the odd customer request for it.

Several attempts later and I was starting to doubt my sanity, not least for trying in the first place. We ship GHC 7.6, and upgrading to 7.8 is not a project I'd like to tackle until the much more experienced Haskell folks in Debian have switched to it in unstable. The porting documentation for 7.6 has bitrotted more or less beyond usability, and the corresponding documentation for 7.8 really isn't backportable to 7.6. I tried building 7.8 for ppc64el anyway, picking that on the basis that we had quicker hardware for it and didn't seem likely to be particularly more arduous than arm64 (ho ho), and I even got to the point of having a cross-built stage2 compiler (stage1, in the cross-building case, is a GHC binary that runs on your starting architecture and generates code for your target architecture) that I could copy over to a ppc64el box and try to use as the base for a fully-native build, but it segfaulted incomprehensibly just after spawning any child process. Compilers tend to do rather a lot, especially when they're built to use GCC to generate object code, so this was a pretty serious problem, and it resisted analysis. I poked at it for a while but didn't get anywhere, and I had other things to do so declared it a write-off and gave up.

Book the Second: The golden thread of progress

In March, another mailing list conversation prodded me into finding a blog entry by Karel Gardas on building GHC for arm64. This was enough to be worth another look, and indeed it turned out that (with some help from Karel in private mail) I was able to cross-build a compiler that actually worked and could be used to run a fully-native build that also worked. Of course this was 7.8, since as I mentioned cross-building 7.6 is unrealistically difficult unless you're considerably more of an expert on GHC's labyrinthine build system than I am. OK, no problem, right? Getting a GHC at all is the hard bit, and 7.8 must be at least as capable as 7.6, so it should be able to build 7.6 easily enough ...

Not so much. What I'd missed here was that compiler engineers generally only care very much about building the compiler with older versions of itself, and if the language in question has any kind of deprecation cycle then the compiler itself is likely to be behind on various things compared to more typical code since it has to be buildable with older versions. This means that the removal of some deprecated interfaces from 7.8 posed a problem, as did some changes in certain primops that had gained an associated compatibility layer in 7.8 but nobody had gone back to put the corresponding compatibility layer into 7.6. GHC supports running Haskell code through the C preprocessor, and there's a __GLASGOW_HASKELL__ definition with the compiler's version number, so this was just a slog tracking down changes in git and adding #ifdef-guarded code that coped with the newer compiler (remembering that stage1 will be built with 7.8 and stage2 with stage1, i.e. 7.6, from the same source tree). More inscrutably, GHC has its own packaging system called Cabal which is also used by the compiler build process to determine which subpackages to build and how to link them against each other, and some crucial subpackages weren't being built: it looked like it was stuck on picking versions from "stage0" (i.e. the initial compiler used as an input to the whole process) when it should have been building its own. Eventually I figured out that this was because GHC's use of its packaging system hadn't anticipated this case, and was selecting the higher version of the ghc package itself from stage0 rather than the version it was about to build for itself, and thus never actually tried to build most of the compiler. Editing ghc_stage1_DEPS in ghc/stage1/package-data.mk after its initial generation sorted this out. One late night building round and round in circles for a while until I had something stable, and a Debian source upload to add basic support for the architecture name (and other changes which were a bit over the top in retrospect: I didn't need to touch the embedded copy of libffi, as we build with the system one), and I was able to feed this all into Launchpad and watch the builders munch away very satisfyingly at the Haskell library stack for a while.

This was all interesting, and finally all that work was actually paying off in terms of getting to watch a slew of several hundred build failures vanish from arm64 (the final count was something like 640, I think). The fly in the ointment was that ppc64el was still blocked, as the problem there wasn't building 7.6, it was getting a working 7.8. But now I really did have other much more urgent things to do, so I figured I just wouldn't get to this by release time and stuck it on the figurative shelf.

Book the Third: The track of a bug

Then, last Friday, I cleared out my urgent pile and thought I'd have another quick look. (I get a bit obsessive about things like this that smell of "interesting intellectual puzzle".) slyfox on the #ghc IRC channel gave me some general debugging advice and, particularly usefully, a reduced example program that I could use to debug just the process-spawning problem without having to wade through noise from running the rest of the compiler. I reproduced the same problem there, and then found that the program crashed earlier (in stg_ap_0_fast, part of the run-time system) if I compiled it with +RTS -Da -RTS. I nailed it down to a small enough region of assembly that I could see all of the assembly, the source code, and an intermediate representation or two from the compiler, and then started meditating on what makes ppc64el special.

You see, the vast majority of porting bugs come down to what I might call gross properties of the architecture. You have things like whether it's 32-bit or 64-bit, big-endian or little-endian, whether char is signed or unsigned, that sort of thing. There's a big table on the Debian wiki that handily summarises most of the important ones. Sometimes you have to deal with distribution-specific things like whether GL or GLES is used; often, especially for new variants of existing architectures, you have to cope with foolish configure scripts that think they can guess certain things from the architecture name and get it wrong (assuming that powerpc* means big-endian, for instance). We often have to update config.guess and config.sub, and on ppc64el we have the additional hassle of updating libtool macros too. But I've done a lot of this stuff and I'd accounted for everything I could think of. ppc64el is actually a lot like amd64 in terms of many of these porting-relevant properties, and not even that far off arm64 which I'd just successfully ported GHC to, so I couldn't be dealing with anything particularly obvious. There was some hand-written assembly which certainly could have been problematic, but I'd carefully checked that this wasn't being used by the "unregisterised" (no specialised machine dependencies, so relatively easy to port but not well-optimised) build I was using. A problem around spawning processes suggested a problem with SIGCHLD handling, but I ruled that out by slowing down the first child process that it spawned and using strace to confirm that SIGSEGV was the first signal received. What on earth was the problem?

From some painstaking gdb work, one thing I eventually noticed was that stg_ap_0_fast's local stack appeared to be being corrupted by a function call, specifically a call to the colourfully-named debugBelch. Now, when IBM's toolchain engineers were putting together ppc64el based on ppc64, they took the opportunity to fix a number of problems with their ABI: there's an OpenJDK bug with a handy list of references. One of the things I noticed there was that there were some stack allocation optimisations in the new ABI, which affected functions that don't call any vararg functions and don't call any functions that take enough parameters that some of them have to be passed on the stack rather than in registers. debugBelch takes varargs: hmm. Now, the calling code isn't quite in C as such, but in a related dialect called "Cmm", a variant of C-- (yes, minus), that GHC uses to help bridge the gap between the functional world and its code generation, and which is compiled down to C by GHC. When importing C functions into Cmm, GHC generates prototypes for them, but it doesn't do enough parsing to work out the true prototype; instead, they all just get something like extern StgFunPtr f(void);. In most architectures you can get away with this, because the arguments get passed in the usual calling convention anyway and it all works out, but on ppc64el this means that the caller doesn't generate enough stack space and then the callee tries to save its varargs onto the stack in an area that in fact belongs to the caller, and suddenly everything goes south. Things were starting to make sense.

Now, debugBelch is only used in optional debugging code; but runInteractiveProcess (the function associated with the initial round of failures) takes no fewer than twelve arguments, plenty to force some of them onto the stack. I poked around the GCC patch for this ABI change a bit and determined that it only optimised away the stack allocation if it had a full prototype for all the callees, so I guessed that changing those prototypes to extern StgFunPtr f(); might work: it's still technically wrong, not least because omitting the parameter list is an obsolescent feature in C11, but it's at least just omitting information about the parameter list rather than actively lying about it. I tweaked that and ran the cross-build from scratch again. Lo and behold, suddenly I had a working compiler, and I could go through the same build-7.6-using-7.8 procedure as with arm64, much more quickly this time now that I knew what I was doing. One upstream bug, one Debian upload, and several bootstrapping builds later, and GHC was up and running on another architecture in Launchpad. Success!

Epilogue

There's still more to do. I gather there may be a Google Summer of Code project in Linaro to write proper native code generation for GHC on arm64: this would make things a good deal faster, but also enable GHCi (the interpreter) and Template Haskell, and thus clear quite a few more build failures. Since there's already native code generation for ppc64 in GHC, getting it going for ppc64el would probably only be a couple of days' work at this point. But these are niceties by comparison, and I'm more than happy with what I got working for 14.04.

The upshot of all of this is that I may be the first non-Haskell-programmer to ever port GHC to two entirely new architectures. I'm not sure if I gain much from that personally aside from a lot of lost sleep and being considered extremely strange. It has, however, been by far the most challenging set of packages I've ported, and a fascinating trip through some odd corners of build systems and undefined behaviour that I don't normally need to touch.

[] permanent link

Sat, 18 Jan 2014

Testing wanted: GRUB 2.02~beta2 Debian/Ubuntu packages

This is mostly a repost of my ubuntu-devel mail for a wider audience, but see below for some additions.

I'd like to upgrade to GRUB 2.02 for Ubuntu 14.04; it's currently in beta. This represents a year and a half of upstream development, and contains many new features, which you can see in the NEWS file.

Obviously I want to be very careful with substantial upgrades to the default boot loader. So, I've put this in trusty-proposed, and filed a blocking bug to ensure that it doesn't reach trusty proper until it's had a reasonable amount of manual testing. If you are already using trusty and have some time to try this out, it would be very helpful to me. I suggest that you only attempt this if you're comfortable driving apt-get directly and recovering from errors at that level, and if you're willing to spend time working with me on narrowing down any problems that arise.

Please ensure that you have rescue media to hand before starting testing. The simplest way to upgrade is to enable trusty-proposed, upgrade ONLY packages whose names start with "grub" (e.g. use apt-get dist-upgrade to show the full list, say no to the upgrade, and then pass all the relevant package names to apt-get install), and then (very important!) disable trusty-proposed again. Provided that there were no errors in this process, you should be safe to reboot. If there were errors, you should be able to downgrade back to 2.00-22 (or 1.27+2.00-22 in the case of grub-efi-amd64-signed).

Please report your experiences (positive and negative) with this upgrade in the tracking bug. I'm particularly interested in systems that are complex in any way: UEFI Secure Boot, non-trivial disk setups, manual configuration, that kind of thing. If any of the problems you see are also ones you saw with earlier versions of GRUB, please identify those clearly, as I want to prioritise handling regressions over anything else. I've assigned myself to that bug to ensure that messages to it are filtered directly into my inbox.

I'll add a couple of things that weren't in my ubuntu-devel mail. Firstly, this is all in Debian experimental as well (I do all the work in Debian and sync it across, so the grub2 source package in Ubuntu is a verbatim copy of the one in Debian these days). There are some configuration differences applied at build time, but a large fraction of test cases will apply equally well to both. I don't have a definite schedule for pushing this into jessie yet - I only just finished getting 2.00 in place there, and the release schedule gives me a bit more time - but I certainly want to ship jessie with 2.02 or newer, and any test feedback would be welcome. It's probably best to just e-mail feedback to me directly for now, or to the pkg-grub-devel list.

Secondly, a couple of news sites have picked this up and run it as "Canonical intends to ship Ubuntu 14.04 LTS with a beta version of GRUB". This isn't in fact my intent at all. I'm doing this now because I think GRUB 2.02 will be ready in non-beta form in time for Ubuntu 14.04, and indeed that putting it in our development release will help to stabilise it; I'm an upstream GRUB developer too and I find the exposure of widely-used packages very helpful in that context. It will certainly be much easier to upgrade to a beta now and a final release later than it would be to try to jump from 2.00 to 2.02 in a month or two's time.

Even if there's some unforeseen delay and 2.02 isn't released in time, though, I think nearly three months of stabilisation is still plenty to yield a boot loader that I'm comfortable with shipping in an LTS. I've been backporting a lot of changes to 2.00 and even 1.99, and, as ever for an actively-developed codebase, it gets harder and harder over time (in particular, I've spent longer than I'd like hunting down and backporting fixes for non-512-byte sector disks). While I can still manage it, I don't want to be supporting 2.00 for five more years after upstream has moved on; I don't think that would be in anyone's best interests. And I definitely want some of the new features which aren't sensibly backportable, such as several of the new platforms (ARM, ARM64, Xen) and various networking improvements; I can imagine a number of our users being interested in things like optional signature verification of files GRUB reads from disk, improved Mac support, and the TrueCrypt ISO loader, just to name a few. This should be a much stronger base for five-year support.

[] permanent link

Fri, 26 Oct 2012

Automatic installability checking

I've just finished deploying automatic installability checking for Ubuntu's development release, which is more or less equivalent to the way that uploads are promoted from Debian unstable to testing. See my ubuntu-devel post and my ubuntu-devel-announce post for details. This now means that we'll be opening the archive for general development once glibc 2.16 packages are ready.

I'm very excited about this because it's something I've wanted to do for a long, long time. In fact, back in 2004 when I had my very first telephone conversation with a certain spaceman about this crazy Debian-based project he wanted me to work on, I remember talking about Debian's testing migration system and some ways I thought it could be improved. I don't remember the details of that conversation any more and what I just deployed may well bear very little resemblance to it, but it should transform the extent to which our development release is continuously usable.

The next step is to hook in autopkgtest results. This will allow us to do a degree of automatic testing of reverse-dependencies when we upgrade low-level libraries.

[/ubuntu] permanent link

Sun, 27 May 2012

OpenSSH 6.0p1

OpenSSH 6.0p1 was released a little while back; this weekend I belatedly got round to uploading packages of it to Debian unstable and Ubuntu quantal.

I was a bit delayed by needing to put together an improvement to privsep sandbox selection that particularly matters in the context of distributions. One of the experts on seccomp_filter has commented favourably on it, but I haven't yet had a comment from upstream themselves, so I may need to refine this depending on what they say.

(This is a good example of how it matters that software is often not built on the system that it's going to run on, and in particular that the kernel version is rather likely to be different. Where possible it's always best to detect kernel capabilities at run-time rather than at build-time.)

I didn't make it very clear in the changelog, but using the new seccomp_filter sandbox currently requires UsePrivilegeSeparation sandbox in sshd_config as well as a capable kernel. I won't change the default here in advance of upstream, who still consider privsep sandboxing experimental.

[] permanent link

Fri, 02 Mar 2012

libpipeline 1.2.1 released

I've released libpipeline 1.2.1, and uploaded it to Debian unstable. This is a bug-fix release:

  • Retry reads and writes on EINTR.
  • Fix opening of output files requested by pipeline_want_outfile; these are now created if they do not already exist, and truncated if they do.
  • <pipeline.h> is now wrapped in extern "C" when used in a C++ compilation unit.

[] permanent link

Mon, 30 Jan 2012

APT resolver bugs

I've managed to go for eleven years working on Debian and nearly eight on Ubuntu without ever needing to teach myself how APT's resolver works. I get the impression that there's a certain mystique about it in general (alternatively, I'm just the last person to figure this out). Recently, though, I had a couple of Ubuntu upgrade bugs to fix that turned out to be bugs in the resolver, and I thought it might be interesting to walk through the process of fixing them based on the Debug::pkgProblemResolver=true log files.

Breakage with Breaks

The first was Ubuntu bug #922485 (apt.log). To understand the log, you first need to know that APT makes up to ten passes of the resolver to attempt to fix broken dependencies by upgrading, removing, or holding back packages; if there are still broken packages after this point, it's generally because it's got itself stuck in some kind of loop, and it bails out rather than carrying on forever. The current pass number is shown in each "Investigating" log entry, so they start with "Investigating (0)" and carry on up to at most "Investigating (9)". Any packages that you see still being investigated on the tenth pass are probably something to do with whatever's going wrong.

In this case, most packages have been resolved by the end of the fourth pass, but xserver-xorg-core is causing some trouble. (Not a particular surprise, as it's an important package with lots of relationships.) We can see that each breakage is:

Broken xserver-xorg-core:i386 Breaks on xserver-xorg-video-6 [ i386 ] < none > ( none )

This is a Breaks (a relatively new package relationship type introduced a few years ago as a sort of weaker form of Conflicts) on a virtual package, which means that in order to unpack xserver-xorg-core each package that provides xserver-xorg-video-6 must be deconfigured. Much like Conflicts, APT responds to this by upgrading providing packages to versions that don't provide the offending virtual package if it can, and otherwise removing them. We can see it doing just that in the log (some lines omitted):

Investigating (0) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-tseng:i386
Investigating (1) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-i740:i386
Investigating (2) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-nv:i386

OK, so that makes sense - presumably upgrading those packages didn't help at the time. But look at the pass numbers. Rather than just fixing all the packages that provide xserver-xorg-video-6 in a single pass, which it would be perfectly able to do, it only fixes one per pass. This means that if a package Breaks a virtual package which is provided by more than ten installed packages, the resolver will fail to handle that situation. On inspection of the code, this was being handled correctly for Conflicts by carrying on through the list of possible targets for the dependency relation in that case, but apparently when Breaks support was implemented in APT this case was overlooked. The fix is to carry on through the list of possible targets for any "negative" dependency relation, not just Conflicts, and I've filed a patch as Debian bug #657695.

My cup overfloweth

The second bug I looked at was Ubuntu bug #917173 (apt.log). Just as in the previous case, we can see the resolver "running out of time" by reaching the end of the tenth pass with some dependencies still broken. This one is a lot less obvious, though. The last few entries clearly indicate that the resolver is stuck in a loop:

Investigating (8) dpkg [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( admin )
Broken dpkg:i386 Breaks on dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils ) (< 1.15.8)
  Considering dpkg-dev:i386 29 as a solution to dpkg:i386 7205
  Upgrading dpkg-dev:i386 due to Breaks field in dpkg:i386
Investigating (8) dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils )
Broken dpkg-dev:i386 Depends on libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl ) (= 1.16.1.2ubuntu5)
  Considering libdpkg-perl:i386 12 as a solution to dpkg-dev:i386 29
  Holding Back dpkg-dev:i386 rather than change libdpkg-perl:i386
Investigating (9) dpkg [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( admin )
Broken dpkg:i386 Breaks on dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils ) (< 1.15.8)
  Considering dpkg-dev:i386 29 as a solution to dpkg:i386 7205
  Upgrading dpkg-dev:i386 due to Breaks field in dpkg:i386
Investigating (9) dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils )
Broken dpkg-dev:i386 Depends on libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl ) (= 1.16.1.2ubuntu5)
  Considering libdpkg-perl:i386 12 as a solution to dpkg-dev:i386 29
  Holding Back dpkg-dev:i386 rather than change libdpkg-perl:i386

The new version of dpkg requires upgrading dpkg-dev, but it can't because of something wrong with libdpkg-perl. Following the breadcrumb trail back through the log, we find:

Investigating (1) libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl )
Broken libdpkg-perl:i386 Depends on perl [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
  Considering perl:i386 1472 as a solution to libdpkg-perl:i386 12
  Holding Back libdpkg-perl:i386 rather than change perl:i386
Investigating (1) perl [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Broken perl:i386 Depends on perl-base [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl ) (= 5.14.2-6ubuntu1)
  Considering perl-base:i386 5806 as a solution to perl:i386 1472
  Removing perl:i386 rather than change perl-base:i386
Investigating (1) perl-base [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Broken perl-base:i386 PreDepends on libc6 [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs ) (>= 2.11)
  Considering libc6:i386 -17473 as a solution to perl-base:i386 5806
  Added libc6:i386 to the remove list
Investigating (0) libc6 [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs )
Broken libc6:i386 Depends on libc-bin [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs ) (= 2.11.1-0ubuntu7.8)
  Considering libc-bin:i386 10358 as a solution to libc6:i386 -17473
  Removing libc6:i386 rather than change libc-bin:i386

So ultimately the problem is something to do with libc6; but what? As Steve Langasek said in the bug, libc6's dependencies have been very carefully structured, and surely we would have seen some hint of it elsewhere if they were wrong. At this point ideally I wanted to break out GDB or at the very least experiment a bit with apt-get, but due to some tedious local problems I hadn't been able to restore the apt-clone state file for this bug onto my system so that I could attack it directly. So I fell back on the last refuge of the frustrated debugger and sat and thought about it for a bit.

Eventually I noticed something. The numbers after the package names in the third line of each of these log entries are "scores": roughly, the more important a package is, the higher its score should be. The function that calculates these is pkgProblemResolver::MakeScores() in apt-pkg/algorithms.cc. Reading this, I noticed that the various values added up to make each score are almost all provably positive, for example:

         Scores[I->ID] += abs(OldScores[D.ParentPkg()->ID]);

The only exceptions are an initial -1 or -2 points for Priority: optional or Priority: extra packages respectively, or some values that could theoretically be configured to be negative but weren't in this case. OK. So how come libc6 has such a huge negative score of -17473, when one would normally expect it to be an extremely powerful package with a large positive score?

Oh. This is computer programming, not mathematics ... and each score is stored in a signed short, so in a sufficiently large upgrade all those bonus points add up to something larger than 32767 and everything goes haywire. Bingo. Make it an int instead - the number of installed packages is going to be on the order of tens of thousands at most, so it's not as though it'll make a substantial difference to the amount of memory used - and chances are everything will be fine. I've filed a patch as Debian bug #657732.

I'd expected this to be a pretty challenging pair of bugs. While I certainly haven't lost any respect for the APT maintainers for dealing with this stuff regularly, it wasn't as bad as I thought. I'd expected to have to figure out how to retune some slightly out-of-balance heuristics and not really know whether I'd broken anything else in the process; but in the end both patches were very straightforward.

[/ubuntu] permanent link

Mon, 24 Oct 2011

Quality in Ubuntu 12.04 LTS

As is natural for an LTS cycle, lots of people are thinking and talking about work focused on quality rather than features. With Canonical extending LTS support to five years on the desktop for 12.04, much of this is quite rightly focused on the desktop. I'm really not a desktop hacker in any way, shape, or form, though. I spent my first few years in Ubuntu working mainly on the installer - I still do, although I do some other things now too - and I used to say only half-jokingly that my job was done once X started. Of course there are plenty of bugs I can fix, but I wanted to see if I could do something with a bit more structure, so I got to thinking about projects we could work on at the foundations level that would make a big difference.

Image build pipeline

One difficulty we have is that quite a few of our bugs - especially installer bugs, although this goes for some other things too - are only really caught when people are doing coordinated image testing just before a milestone release. Now, it takes a while to do all the builds and then it takes a while to test them. The excellent work of the QA team has meant that testing is much quicker now than it used to be, and a certain amount of smoke-testing is automated (particularly for server images). On the other hand, the build phase has only got longer as we've added more flavours and architectures, particularly as some parts of the process are still serialised per architecture or subarchitecture so ARM builds in particular take a very long time indeed. Exact timings are a bit difficult to get for various reasons, but I think the minimum time between a developer uploading a fix and us having a full set of candidate images on all architectures including that fix is currently somewhere north of eight hours, and that's with people cutting corners and pulling strings which is a suboptimal thing to have to do around release time. This obviously makes us reluctant to respin for anything short of showstopper bugs. If we could get things down to something closer to two hours, respins would be a much less horrible proposition and so we might be able to fix a few bugs that are serious but not showstoppers, not to mention that the release team would feel less burned out.

We discussed this problem at the release sprint, and came up with a laundry list of improvements; I've scheduled this for discussion at UDS in case we can think of any more. Please come along if you're interested!

One thing in particular that I'm working on is refactoring Germinate, a tool which dates right back to our first meeting before Ubuntu was even called Ubuntu and whose job is to expand dependencies starting from our lists of "seed" packages; we use this, among other things, to generate Task fields in the archive and to decide which packages to copy into our images. This was acceptably quick in 2004, but now that we run it forty times (eight flavours multiplied by five architectures) at the end of every publisher run it's actually become rather a serious performance problem: cron.germinate takes about ten minutes, which is over a third of the typical publisher runtime. It parses Packages files eight times as often as it needs to, Sources files forty times as often as it needs to, and recalculates the dependency tree of the base system five times as often as it needs to. I am confident that we can significantly reduce the runtime here, and I think there's some hope that we might be able to move the publisher back to a 30-minute cycle, which would increase the velocity of Ubuntu development in general.

Maintaining the development release

Our release cycle always starts with syncing and merging packages from Debian unstable (or testing in the case of LTS cycles). The vast majority of packages in Ubuntu arrive this way, and generally speaking if we didn't do this we would fall behind in ways that would be difficult to recover from later. However, this does mean that we get a "big bang" of changes at the start of the cycle, and it takes a while for the archive to be usable again. Furthermore, even once we've taken care of this, we have a long-established rhythm where the first part of the cycle is mainly about feature development and the second part of the cycle is mainly about stabilisation. As a result, we've got used to the archive being fairly broken for the first few months, and we even tell people that they shouldn't expect things to work reliably until somewhere approaching beta.

This makes some kind of sense from the inside. But how are you supposed to do feature development that relies on other things in the development release?

In the first few years of Ubuntu, this question didn't matter very much. Nearly all the people doing serious feature development were themselves serious Ubuntu developers; they were capable of fixing problems in the development release as they went along, and while it got in their way a little bit it wasn't all that big a deal. Now, though, we have people focusing on things like Unity development, and we shouldn't assume that just because somebody is (say) an OpenGL expert or a window management expert that they should be able to recover from arbitrary failures in development release upgrades. One of the best things we could do to help the 12.04 desktop be more stable is to have the entire system be less unstable as we go along, so that developers further up the stack don't have to be distracted by things wobbling underneath them. Plus, it's just good software engineering to keep the basics working as you go along: it should always build, it should always install, it should always upgrade. Ubuntu is too big to do something like having everyone stop any time the build breaks, the way you might do in a smaller project, but we shouldn't let things slide for months either.

I've been talking to Rick Spencer and the other Ubuntu engineering leads at Canonical about this. Canonical has a system of "rotations", where you can go off to another team for a while if you're in need of a change or want to branch out a bit; so I proposed that we allow our engineers to spend a month or two at a time on what I'm calling the +1 Maintenance Team, whose job is simply to keep the development release buildable, installable, and upgradeable at all times. Rick has been very receptive to this, and we're going to be running this as a trial throughout the 12.04 cycle, with probably about three people at a time. As well as being professional archive gardeners, these people will also work on developing infrastructure to help us keep better track of what we need to do. For instance, we could deploy better tools from Debian QA to help us track uninstallable packages, or we could enhance some of our many existing reports to have bug links and/or comment facilities, or we could spruce up the weather report; there are lots of things we could do to make our own lives easier.

By 12.04, I would like, in no particular order:

  • Precise to have been more or less continuously usable from Alpha 1 onward for people with reasonable general technical ability
  • Canonical engineering teams outside Ubuntu (DX, Ubuntu One, Launchpad, etc.) to be comfortable with running the development release on at least one system from Alpha 2 onward
  • Installability problems in daily image builds to be dealt with within one working day, or preferably before they even make it to daily builds
  • The archive to be close to consistent as we start milestone preparation, rather than the release team having to scramble to make it so
  • A very significant reduction in our long-term backlog of automatically-detected problems

Of course, this overlaps to a certain extent with the kinds of things that the MOTU team have been doing for years, not to mention with what all developers should be doing to keep their own houses in reasonable order, and I'd like us to work together on this; we're trying to provide some extra hands here to make Ubuntu better for everyone, not take over! I would love this to be an opportunity to re-energise MOTU and bring some new people on board.

I've registered a couple of blueprints (priorities, infrastructure) for discussion at UDS. These are deliberately open-ended skeleton sessions, and I'll try to make sure they're scheduled fairly early in the week, so that we have time for break-out sessions later on. If you're interested, please come along and give your feedback!

[/ubuntu] permanent link

Thu, 06 Oct 2011

Top ideas on Ubuntu Brainstorm (August 2011)

The Ubuntu Technical Board conducts a regular review of the most popular Ubuntu Brainstorm ideas (previous reviews conducted by Matt Zimmerman and Martin Pitt). This time it was my turn. Apologies for the late arrival of this review.

Contact lens in the Unity Dash (#27584)

Unity supports Lenses, which provide a consistent way for users to quickly search for information via the Dash. Current lenses include Applications, Files, and Music, but a number of people have asked for contacts to be accessible using the same interface.

While Canonical's DX team isn't currently working on this for Ubuntu 11.10 or 12.04, we'd love somebody who's interested in this to get involved. Allison Randal explains how to get started, including some skeleton example code and several useful links.

Displaying Ubuntu version information (#27460)

Several people have asked for it to be more obvious what Ubuntu version they're running, as well as other general information about their system.

John Lea, user experience architect on the Unity team, responds that in Ubuntu 11.10 the new LightDM greeter shows the Ubuntu version number, making that basic information very easily visible. For more detail, System Settings -> System Info provides a simple summary.

Volume adjustments for headphone use (#27275)

People often find that they need to adjust their sound volume when plugging in or removing headphones. It seems as though the computer ought to be able to remember this kind of thing and do it automatically; after all, a major goal of Ubuntu is to make the desktop Just Work.

David Henningson, a member of Canonical's OEM Services group and an Ubuntu audio developer, responds on his blog with a summary of how PulseAudio jack detection has improved matters in Ubuntu 11.10, and what's left to do:

The good news: in the upcoming Ubuntu Oneiric (11.10), this is actually working. The bad news: it isn't working for everyone.

Making it easier to find software to handle a file (#28148)

Ubuntu is not always as helpful as it could be when you don't have the right software installed to handle a particular file.

Michael Vogt, one of the developers of the Ubuntu Software Center, responded to this. It seems that most of the pieces to make this work nicely are in place, but there are a few more bits of glue required:

Thanks a lot for this suggestion. I like the idea and it's something that software-center itself supports now. In the coming version 5.0 we will offer to "sort by top-rated" (based on the ratings&reviews data). It's also possible to search for an application based on its mime data. To search for a mime-type, you can enter "mime:text/html" or "mime:audio/ogg" into the search field. What is needed however is better integration into the file manager nautilus. I will make sure this gets attention at the next developer meeting and filed bug #860536 about it.

In nautilus, there is now a button called "Find applications online" available as an option when opening an unknown file or when the user selects "open with...other application" in the context menu. But that will not use the data from software-center.

Show pop-up alert on low battery (#28037)

Some users have reported on Brainstorm that they are not alerted frequently enough when their laptop's battery is low, as they clearly ought to be.

This is an odd one, because there are already several power alert levels and this has been working well for us for some time. Nevertheless, enough people have voted for this idea that there must be something behind it, perhaps a bug that only affects certain systems. Martin Pitt, technical lead of the Ubuntu desktop team, has responded directly to the Brainstorm idea with a description of the current system and how to file a bug when it does not work as intended.

[/ubuntu] permanent link

Sat, 09 Apr 2011

man-db 2.6.0

I've released man-db 2.6.0 (announcement, NEWS, ChangeLog), and uploaded it to Debian unstable. Ubuntu is rapidly approaching beta freeze so I'm not going to try to cram this into 11.04; it'll be in 11.10.

[] permanent link

Tue, 15 Mar 2011

Wubi bug 693671

I spent most of last week working on Ubuntu bug 693671 ("wubi install will not boot - phase 2 stops with: Try (hd0,0): NTFS5"), which was quite a challenge to debug since it involved digging into parts of the Wubi boot process I'd never really touched before. Since I don't think much of this is very well-documented, I'd like to spend a bit of time explaining what was involved, in the hope that it will help other developers in the future.

Wubi is a system for installing Ubuntu into a file in a Windows filesystem, so that it doesn't require separate partitions and can be uninstalled like any other Windows application. The purpose of this is to make it easy for Windows users to try out Ubuntu without the need to worry about repartitioning, before they commit to a full installation. Wubi started out as an external project, and initially patched the installer on the fly to do all the rather unconventional things it needed to do; we integrated it into Ubuntu 8.04 LTS, which involved turning these patches into proper installer facilities that could be accessed using preseeding, so that Wubi only needs to handle the Windows user interface and other Windows-specific tasks.

Anyone familiar with a GNU/Linux system's boot process will immediately see that this isn't as simple as it sounds. Of course, ntfs-3g is a pretty solid piece of software so we can handle the Windows filesystem without too much trouble, and loopback mounts are well-understood so we can just have the initramfs loop-mount the root filesystem. Where are you going to get the kernel and initramfs from, though? Well, we used to copy them out to the NTFS filesystem so that GRUB could read them, but this was overly complicated and error-prone. When we switched to GRUB 2, we could instead use its built-in loopback facilities, and we were able to simplify this. So all was more or less well, except for the elephant in the room. How are you going to load GRUB?

In a Wubi installation, NTLDR (or BOOTMGR in Windows Vista and newer) still owns the boot process. Ubuntu is added as a boot menu option using BCDEdit. You might then think that you can just have the Windows boot loader chain-load GRUB. Unfortunately, NTLDR only loads 16 sectors - 8192 bytes - from disk. GRUB won't fit in that: the smallest core.img you can generate at the moment is over 18 kilobytes. Thus, you need something that is small enough to be loaded by NTLDR, but that is intelligent enough to understand NTFS to the point where it can find a particular file in the root directory of a filesystem, load boot loader code from it, and jump to that. The answer for this was GRUB4DOS. Most of GRUB4DOS is based on GRUB Legacy, which is not of much interest to us any more, but it includes an assembly-language program called GRLDR that supports doing this very thing for FAT, NTFS, and ext2. In Wubi, we build GRLDR as wubildr.mbr, and build a specially-configured GRUB core image as wubildr.

Now, the messages shown in the bug report suggested a failure either within GRLDR or very early in GRUB. The first thing I did was to remember that GRLDR has been integrated into the grub-extras ntldr-img module suitable for use with GRUB 2, so I tried building wubildr.mbr from that; no change, but this gave me a modern baseline to work on. OK; now to try QEMU (you can use tricks like qemu -hda /dev/sda if you're very careful not to do anything that might involve writing to the host filesystem from within the guest, such as recursively booting your host OS ... [update: Tollef Fog Heen and Zygmunt Krynicki both point out that you can use the -snapshot option to make this safer]). No go; it hung somewhere in the middle of NTLDR. Still, I could at least insert debug statements, copy the built wubildr.mbr over to my test machine, and reboot for each test, although it would be slow and tedious. Couldn't I?

Well, yes, I mostly could, but that 8192-byte limit came back to bite me, along with an internal 2048-byte limit that GRLDR allocates for its NTFS bootstrap code. There were only a few spare bytes. Something like this would more or less fit, to print a single mark character at various points so that I could see how far it was getting:

	pushal
	xorw	%bx, %bx	/* video page 0 */
	movw	$0x0e4d, %ax	/* print 'M' */
	int	$0x10
	popal

In a few places, if I removed some code I didn't need on my test machine (say, CHS compatibility), I could even fit in cheap and nasty code to print a single register in hex (as long as you didn't mind 'A' to 'F' actually being ':' to '?' in ASCII; and note that this is real-mode code, so the loop counter is %cx not %ecx):

	/* print %edx in dumbed-down hex */
	pushal
	xorw	%bx, %bx
	movb	$0xe, %ah
	movw	$8, %cx
1:
	roll	$4, %edx
	movb	%dl, %al
	andb	$0xf, %al
	int	$0x10
	loop	1b
	popal

After a considerable amount of work tracking down problems by bisection like this, I also observed that GRLDR's NTFS code bears quite a bit of resemblance in its logical flow to GRUB 2's NTFS module, and indeed the same person wrote much of both. Since I knew that the latter worked, I could use it to relieve my brain of trying to understand assembly code logic directly, and could compare the two to look for discrepancies. I did find a few of these, and corrected a simple one. Testing at this point suggested that the boot process was getting as far as GRUB but still wasn't printing anything. I removed some Ubuntu patches which quieten down GRUB's startup: still nothing - so I switched my attentions to grub-core/kern/i386/pc/startup.S, which contains the first code executed from GRUB's core image. Code before the first call to real_to_prot (which switches the processor into protected mode) succeeded, while code after that point failed. Even more mysteriously, code added to real_to_prot before the actual switch to protected mode failed too. Now I was clearly getting somewhere interesting, but what was going on? What I really wanted was to be able to single-step, or at least see what was at the memory location it was supposed to be jumping to.

Around this point I was venting on IRC, and somebody asked if it was reproducible in QEMU. Although I'd tried that already, I went back and tried again. Ubuntu's qemu is actually built from qemu-kvm, and if I used qemu -no-kvm then it worked much better. Excellent! Now I could use GDB:

(gdb) target remote | qemu -gdb stdio -no-kvm -hda /dev/sda

This let me run until the point when NTLDR was about to hand over control, then interrupt and set a breakpoint at 0x8200 (the entry point of startup.S). This revealed that the address that should have been real_to_prot was in fact garbage. I set a breakpoint at 0x7c00 (GRLDR's entry point) and stepped all the way through to ensure it was doing the right thing. In the process it was helpful to know that GDB and QEMU don't handle real mode very well between them. Useful tricks here were:

  • Use set architecture i8086 before disassembling real-mode code (and set architecture i386 to switch back).
  • GDB prints addresses relative to the current segment base, but if you want to enter an address then you need to calculate a linear address yourself. For example, breakpoints must be set at (CS << 4) + IP, rather than just at IP.

Single-stepping showed that GRLDR was loading the entirety of wubildr correctly and jumping to it. The first instruction it jumped to wasn't in startup.S, though, and then I remembered that we prefix the core image with grub-core/boot/i386/pc/lnxboot.S. Stepping through this required a clear head since it copies itself around and changes segment registers a few times. The interesting part was at real_code_2, where it copies a sector of the kernel to the target load address, and then checks a known offset to find out whether the "kernel" is in fact GRUB rather than a Linux kernel. I checked that offset by hand, and there was the smoking gun. GRUB recently acquired Reed-Solomon error correction on its core image, to allow it to recover from other software writing over sectors in the boot track. This moved the magic number lnxboot.S was checking somewhat further into the core image, after the first sector. lnxboot.S couldn't find it because it hadn't copied it yet! A bit of adjustment and all was well again.

The lesson for me from all of this has been to try hard to get an interactive debugger working. Really hard. It's worth quite a bit of up-front effort if it saves you from killing neurons stepping through pages of code by hand. I think the real-mode debugging tricks I picked up should be useful for working on GRUB in the future.

[/ubuntu] permanent link

Sat, 11 Dec 2010

libpipeline 1.1.0 released

I've released libpipeline 1.1.0, and uploaded it to Debian unstable. The changes are mostly just to add a few occasionally useful interfaces:

  • Add pipecmd_exec to execute a single command, replacing the current process; this is analogous to execvp.
  • Add pipecmd_clearenv to clear a command's environment; this is analogous to clearenv.
  • Add pipecmd_get_nargs to get the number of arguments to a command.

The shared library actually ends up being a few kilobytes smaller on Debian than 1.0.0, probably because I tweaked the set of Gnulib modules I'm using.

[] permanent link

Mon, 06 Dec 2010

NTP synchronisation problems

The Ubuntu Technical Board is currently conducting a review of the top ten Brainstorm issues users have raised about Ubuntu, and Matt asked me to investigate and respond to Idea #25301: Keeping the time accurate over the Internet by default.

My first reaction was "hey, that's odd - I thought we already did that?". We install the ntpdate package by default (although it's deprecated upstream in favour of other tools, but that shouldn't be important here). ntpdate is run from /etc/network/if-up.d/ntpdate, in other words every time you connect to a network, which should be acceptably frequent for most people, so it really ought to Just Work by default. But this is one of the top ten problems where users have gone to the trouble of proposing solutions on Brainstorm, so it couldn't be that simple. What was going on?

I brought up a clean virtual machine with a development version of Natty (the current Ubuntu development version, which will eventually become 11.04), and had a look in its logs: it was indeed synchronising its time from ntp.ubuntu.com, and I didn't think anything in that area had changed recently. On the other hand, I had occasionally noticed that my own laptop wasn't always synchronising its time quite right, but I'd put it down to local weirdness as my network isn't always very stable. Maybe this wasn't so local after all?

So, I started tracing through the scripts to figure out what was going on. It turned out that I had an empty /etc/ntp.conf file on my laptop. The /usr/sbin/ntpdate-debian script assumed that that meant I had a full NTP server installed (I don't), and fetched the list of servers from it; since the file was empty, it ended up synchronising time from no servers, that is, not synchronising at all. I removed the file and all was well.

That left the question of where that file came from. It didn't seem to be owned by any package; I was pretty sure I hadn't created it by hand either. I had a look through some bug reports, and soon found ntpdate 1:4.2.2.p4+dfsg-1ubuntu2 has a flawed configuration file. It turns out that time-admin (System -> Administration -> Time and Date) creates an empty /etc/ntp.conf file if you press the reload button (tooltip: "Synchronise now"), as part of an attempt to update NTP configuration. Aha!

Once I knew where the problems were, it was easy to fix them. I've uploaded the following changes, which will be in the 11.04 release:

  • Disregard empty ntp.conf files in ntpdate-debian.
  • Remove an empty /etc/ntp.conf file on fresh installation of the ntp package, so that it doesn't interfere with creating the normal configuration file.
  • Don't create the NTP configuration file in the time-admin backend if it doesn't exist already.

I've also sent these changes to Debian and GNOME as appropriate.

There are still a few problems. The "Synchronise now" button doesn't work quite right in general (bug #90524), and if your network doesn't allow time synchronisation from ntp.ubuntu.com then you'll have to change the value of NTPSERVERS in /etc/default/ntpdate. Furthermore, the time-admin interface is confusing and makes it seem as though the default is not to synchronise the time automatically; this interface is being redesigned at the moment, which should be a good opportunity to make it less confusing, and I will contact the designers to mention this problem. On the whole, though, I think that many fewer people should have this kind of problem in Ubuntu 11.04.

It's always possible that I missed some other problem that breaks automatic time synchronisation for people. Please do file a bug report if it still doesn't work for you in 11.04, or contact me directly (cjwatson at ubuntu.com).

[/ubuntu] permanent link

Thu, 02 Dec 2010

man-db on Fedora

I just found out by chance that Fedora 14 switched from their old man package to man-db. This is great news: it should now be the beginning of the end of the divergence of man implementations that happened way back in the mid-1990s, when two different people took John W. Eaton's man package and developed it in different directions without being aware of each other's existence. For a while it looked as though man-db was stuck on just the Debian family and openSUSE, but a number of distributions have switched over in the last few years. As of now, the only remaining major distribution not using man-db is Gentoo, and they have a bug for switching which I think should be unblocked fairly soon.

In some ways man-db's package name didn't help it; people thought that the main difference was that man-db had a database backend stuck around apropos. These days, the database is one of the least important parts of man-db as far as I'm concerned. Other ways in which it's very significantly superior to anything man could do without years of equivalent effort include correct encoding support, robust child process handling, and use of more modern development facilities (dear catgets: you belong to a previous millennium, so please go away). I'm glad that Fedora has recognised this.

[] permanent link

Fri, 29 Oct 2010

libpipeline 1.0.0 released

In my previous post, I described the pipeline library from man-db and asked whether people were interested in a standalone release of it. Several people expressed interest, and so I've now released libpipeline version 1.0.0. It's in the Debian NEW queue, and my PPA contains packages of it for Ubuntu lucid and maverick.

I gave a lightning talk on this at UDS in Orlando, and my slides are available. I hope there'll be a video at some point which I can link to.

Thanks to Scott James Remnant for code review (some time back), Ian Jackson for an extensive design review, and Kees Cook and Matthias Klose for helpful conversations.

[] permanent link

Sun, 03 Oct 2010

Pipeline library

When I took over man-db in 2001, one of the major problems that became evident after maintaining it for a while was the way it handled subprocesses. The nature of man and friends means that it spends a lot of time calling sequences of programs such as zsoelim < input-file | tbl | nroff -mandoc -Tutf8. Back then, it was using C library facilities such as system and popen for all this, and I had to deal with several bugs where those functions were being called with untrusted input as arguments without properly escaping metacharacters. Of course it was possible to chase around every such call inserting appropriate escaping functions, but this was always bound to be error-prone and one of the tasks that rapidly became important to me was arranging to start subprocesses in a way that was fundamentally immune to this kind of bug.

In higher-level languages, there are usually standard constructs which are safer than just passing a command line to the shell. For example, in Perl you can use system([$command, $arg1, $arg2, ...]) to invoke a program with arguments without the interference of the shell, and perlipc(1) describes various facilities for connecting them together. In Python, the subprocess module allows you to create pipelines easily and safely (as long as you remember the SIGPIPE gotcha). C has the fork and execve primitives, but assembling these to construct full-blown pipelines correctly is difficult and error-prone, so many programmers don't bother and use the simple but unsafe library facilities instead.

I wrote a couple of thousand lines of library code in man-db to address this problem, loosely and now quite distantly based on code in groff. In the following examples, function names starting with command_, pipeline_, or decompress_ are real functions in the library, while any other function names are pseudocode.

Constructing the simplified example pipeline from my first paragraph using this library looks like this:

pipeline *p;
int status;

p = pipeline_new ();
p->want_infile = "input-file";
pipeline_command_args (p, "zsoelim", NULL);
pipeline_command_args (p, "tbl", NULL);
pipeline_command_args (p, "nroff", "-mandoc", "-Tutf8", NULL);
pipeline_start (p);
status = pipeline_wait (p);
pipeline_free (p);

You might want to construct a command more dynamically:

command *manconv = command_new_args ("manconv", "-f", from_code,
                                     "-t", "UTF-8", NULL);
if (quiet)
	command_arg (manconv, "-q");
pipeline_command (p, manconv);

Perhaps you want an environment variable set only while running a certain command:

command *less = command_new ("less");
command_setenv (less, "LESSCHARSET", lesscharset);

You might find yourself needing to pass the output of one pipeline to several other pipelines, in a "tee" arrangement:

pipeline *source, *sink1, *sink2;

source = make_source ();
sink1 = make_sink1 ();
sink2 = make_sink2 ();
pipeline_connect (source, sink1, sink2, NULL);
/* Pump data among these pipelines until there's nothing left. */
pipeline_pump (source, sink1, sink2, NULL);
pipeline_free (sink2);
pipeline_free (sink1);
pipeline_free (source);

Maybe one of your commands is actually an in-process function, rather than an external program:

command *inproc = command_new_function ("in-process", &func, NULL, NULL);
pipeline_command (p, inproc);

Sometimes your program needs to consume the output of a pipeline, rather than sending it all to some other subprocess:

pipeline *p = make_pipeline ();
const char *line;

line = pipeline_peekline (p);
if (!strstr (line, "coding: UTF-8"))
	printf ("Unicode text follows:\n");
while (line = pipeline_readline (p))
	printf ("  %s", line);
pipeline_free (p);

man-db deals with compressed files a lot, so I wrote an add-on library for opening compressed files (which is somewhat man-db-specific, but the implementation wasn't difficult given the underlying library):

pipeline *decomp_file = decompress_open (compressed_filename);
pipeline *decomp_stdin = decompress_fdopen (fileno (stdin));

This library has been in production in man-db for over five years now. The very careful signal handling code has been reviewed independently and the whole thing has been run through multiple static analysis tools, although I would always welcome more review; in particular I have no idea what it would take to make it safe for use in threaded programs since I generally avoid threading wherever possible. There have been a handful of bugs, which I've fixed promptly, and I've added various new features to support particular requirements of man-db (though in as general a way as possible). Every so often I see somebody asking about subprocess handling in C, and I wonder if I should split this library out into a standalone package so that it can be used elsewhere. Web searches for things like "pipeline library" and "libpipeline" don't reveal anything that's a particularly close match for what I have. The licensing would be GPLv2 or later; this isn't likely to be negotiable since some of the original code wasn't mine and in any case I don't feel particularly bad about giving an advantage to GPLed programs. For more details on the interface, the header file is well-commented.

Is there enough interest in this to make the effort of producing a separate library package worthwhile? As well as the general effort of creating a new package, I'd need to do some work to disentangle it from a few bits and pieces specific to man-db. If you maintain a specific package that could use this and you're interested, please contact me with details, mentioning any extensions you think you'd need. I intentionally haven't enabled comments on my blog for various reasons, but you can e-mail me at cjwatson at debian.org or man-db-devel at nongnu.org.

[] permanent link

Sat, 28 Aug 2010

Windows applications making GRUB 2 unbootable

If you find that running Windows makes a GRUB 2-based system unbootable (Debian bug, Ubuntu bug), then I'd like to hear from you. This is a bug in which some proprietary Windows-based software overwrites particular sectors in the gap between the master boot record and the first partition, sometimes called the "embedding area". GRUB Legacy and GRUB 2 both normally use this part of the disk to store one of their key components: GRUB Legacy calls this component Stage 1.5, while GRUB 2 calls it the core image (comparison). However, Stage 1.5 is less useful than the core image (for example, the latter provides a rescue shell which can be used to recover from some problems), and is therefore rather smaller: somewhere around 10KB vs. 24KB for the common case of ext[234] on plain block devices. It seems that the Windows-based software writes to a sector which is after the end of Stage 1.5, but before the end of the core image. This is why the problem appears to be new with GRUB 2.

At least some occurrences of this are with software which writes a signature to the embedding area which hangs around even after uninstallation (even with one of those tools that tracks everything the installation process did and reverses it, I gather), so that you cannot uninstall and reinstall the application to defeat a trial period. This seems like a fine example of an antifeature, especially given its destructive consequences for free software, and is in general a poor piece of engineering; what happens if multiple such programs want to use the same sector, I wonder? They clearly aren't doing much checking that the sector is unused, not that that's really possible anyway. While I do not normally think that GRUB should go to any great lengths to accommodate proprietary software, this is a case where we need to defend ourselves against the predatory practices of some companies making us look bad: a relatively small number of people do enough detective work to realise that it's the fault of a particular Windows application, but many more simply blame our operating system because it won't start any more.

I believe that it may be possible to assemble a collection of signatures of such software, and arrange to avoid the disk sectors they have stolen. Indeed, I have a first draft of the necessary code. This is not a particularly pleasant solution, but it seems to be the most practical way around the problem; I'm hoping that several of the programs at fault are using common "licence manager" code or something like that, so that we can address most of the problems with a relatively small number of signatures. In order to do this, I need to hear from as many people as possible who are affected by this problem.

If you suffer from this problem, then please do the following:

  • Save the output of fdisk -lu to a file. In this output, take note of the start sector of the first partition (usually 63, but might also be 2048 on recent installations, or occasionally something else). If this is something other than 63, then replace 63 in the following items with your number.
  • Save the contents of the embedding area to a file (replace /dev/sda with your disk device if it's something else): dd if=/dev/sda of=sda.1 count=63
  • Do whatever you do to make GRUB unbootable (presumably starting Windows), then boot into a recovery environment. Before you reinstall GRUB, save the new contents of the embedding area to a different file: dd if=/dev/sda of=sda.2 count=63
  • Follow up to either the Debian or the Ubuntu bug with these three files (the output of fdisk -lu, and the embedding area before and after making GRUB unbootable.

I hope that this will help me to assemble enough information to fix this bug at least for most people, and of course if you provide this information then I can make sure to fix your particular version of this problem. Thanks in advance!

[/debian] permanent link

Sat, 10 Jul 2010

debhelper statistics, redux

Apropos of my previous post, I see that dh has now overtaken CDBS as the most popular rules helper system of its kind in Debian unstable, and shows no particular sign of slowing its rate of uptake any time soon. The resolution of the graph is such that you can't see it yet, but dh drew dead level with CDBS on Thursday, and today 3836 packages are using dh as opposed to 3823 using CDBS.

[/debian] permanent link

Fri, 02 Jul 2010

GRUB 2: With luck ...

... this version, or something not too far away from it, might actually stand a chance of getting into testing.

I've just uploaded grub2 1.98+20100702-1. The most significant set of changes in this release is that it switches /boot/grub/device.map and the grub-pc/install_devices debconf question over to stable device names under /dev/disk/by-id (on Linux kernels). The code implementing this is reasonably careful, and it should make it quite difficult for people to accidentally fail to upgrade their installed GRUB core image; I explained the problems that tends to cause in the previous post in this series. There will probably be a few small glitches I need to clear up, but I've given this much more extensive testing than usual so I hope I won't break too many people's computers (again).

I did this work first in Ubuntu as one of my major goals for 10.04 LTS, which exposed a few problems that I wanted to fix before inflicting it on Debian as well (fixes for those are now under testing for 10.04.1). Most significantly, I felt it was necessary to start offering partitions in the select list for grub-pc/install_devices, but I went a bit overboard and offered all partitions in a giant list. This seemed like a good idea at the time, but it tended to confuse people into just selecting everything in the list, which in particular tended to make Windows unbootable! So I dialled that back a bit, and in the version I just merged it will only offer the partitions mounted on /, /boot, and /boot/grub (de-duplicating if necessary). This seems like a reasonable compromise between confusing people too much and forcing them to install only to MBRs.

My next priority will be making whatever fixes are necessary to get this version into testing, since the problems with /dev/mapper symlinks in testing aren't getting any less urgent, and this is finally a version that shouldn't break for most people due to the kernel's switch to libata. I expect that I'll try to get mdadm 1.x metadata sorted out immediately after that.

Other improvements since my last entry have included:

  • Further documentation work. Thanks to Vladimir Serbinenko (and to Jordan Uggla for hosting it temporarily), there's now an HTML version of the GRUB manual from trunk online, which includes new sections on embedded configuration files, the various GRUB image files, device.map, and (shortly) a summary of changes from GRUB Legacy.
  • Video improvements: among other things, UEFI systems whose firmware uses the Graphics Output Protocol should now work rather better, and GRUB now includes specific support for some cards often used with minimal firmware support under emulation.
  • A fix to handle large memory maps exposed by some UEFI firmware.
  • Automatic configuration support for Fedora 13. You may need os-prober 1.39 from unstable as well.
  • Automatic configuration support for Linux on Xen.
  • Skip LVM snapshots rather than failing when they're present.

[/debian] permanent link

Mon, 21 Jun 2010

GRUB 2 boot problems

(This is partly a repost of material I've posted to bug reports and to debian-release, put together with some more detail for a wider audience.)

You could be forgiven for looking at the RC bug activity on grub2 over the last couple of days and thinking that it's all gone to hell in a handbasket with recent uploads. In fact, aside from an interesting case which turned out to be due to botched handling of the GRUB Legacy to GRUB 2 chainloading setup (which prompted me to fix three other RC bugs along the way), all the recent problems people have been having have been duplicates of one of these bugs which have existed essentially forever:

When GRUB boots, its boot sector first loads its "core image", which is usually embedded in the gap between the boot sector and the first partition on the same disk as the boot sector. This core image then figures out where to find /boot/grub, and loads grub.cfg from it as well as more GRUB modules.

The thing that tends to go wrong here is that the core image must be from the same version of GRUB as any modules it loads. /boot/grub/*.mod are updated only by grub-install, so this normally works OK. However, for various reasons (deliberate or accidental) some people install GRUB to multiple disks. In this case, grub-install might update /boot/grub/*.mod along with the core image on one disk, but your BIOS might actually be booting from a different disk. The effect of this will be that you'll have an old core image and new modules, which will probably blow up in any number of possible ways. Quite often, this problem lies dormant for a while because GRUB happens not to change in a way that causes incompatibility between the core image and modules, but then we get massive spikes of bug reports any time the interface does change. Since these bugs sometimes bite people upgrading from testing to unstable, they get interpreted as regressions from the version in testing even though that isn't strictly true (but it tends not to be very productive to argue this line; after all, people's computers suddenly don't boot!). Any problem that causes the core image to be installed to a disk other than the one actually being booted from, or not to be installed at all, will show up this way sooner or later.

On 2010-06-10, there was a substantial upstream change to the handling of list iterators (to reduce core image size and make code clearer and faster) which introduced an incompatibility between old core images and newer modules. This caused a bunch of dormant problems to flare up again, and so there was a flood of reports of booting problems with 1.98+20100614-1 and newer, often described as "the unaligned pointer bug" due to how it happened to manifest this time round. In previous cases, GRUB reported undefined symbols on boot, but it's all essentially the same problem even though there are different symptoms.

The confusing bit when handling bug reports is that not only are there different symptoms with the same cause, but there are also multiple causes for the same symptom! This takes a certain amount of untangling, especially when lots of people have thought "ooh, that bug looks a bit like mine" and jumped in with their own comments. Working through this was a worthwhile exercise, as it came up with an entirely new cause for a problem I thought was fairly well-understood (thanks to debugging assistance from Sedat Dilek). If you had set up GRUB 2 to be automatically chainloaded from GRUB Legacy (which happens automatically on upgrade from the latter to the former), never got round to running upgrade-from-grub-legacy once you confirmed it worked, and then later ran grub-install by hand for one reason or another, then the core image you installed by hand would never be updated and would eventually fall over the next time the core/modules interface changed. Fixing future cases of this was easy enough, but fixing existing cases involved figuring out how to detect whether an installed GRUB boot sector came from GRUB Legacy or GRUB 2, which isn't as easy as you might think. Fortunately, it turns out that there are a limited number of jump offsets that have ever been used in the second byte of the boot sector, and none of the GRUB 2 values clash with the only value ever used in GRUB Legacy; so, if you still have /boot/grub/stage2 et al on upgrade, we scan all disks for a GRUB 2 boot sector, and if we find one then we offer to complete the upgrade to GRUB 2.

Unless anything new shows up, that just leaves the problems that were already understood. Today, I posted a patch to generate stable device names in device.map by default. If this is accepted, then we can do something or other to fix up device.map on upgrade, switch over to /dev/disk/by-id names in grub-pc/install_devices at the same time, and that should take care of the vast majority of this kind of upgrade bug. I think at that point it should be feasible to get a new version into testing, and we should be down from 18 RC bugs towards the end of last month to around 6. We can then start attacking things like the lack of support for mdadm 1.x metadata.

Since my last blog entry on GRUB 2, improvements have included:

  • Substantial work on info grub, with, among other things, new sections on /etc/default/grub and on configuring authentication.
  • A workaround for GRUB's inability to probe dm-crypt devices, thanks to Marc Haber.
  • Several build fixes for architectures I wasn't testing, and a fix for broken nested partition handling on Debian GNU/kFreeBSD. I'm now testing GNU/kFreeBSD locally.
  • Rather less cruft in fs.lst, partmap.lst, and video.lst, which should speed up booting a bit by e.g. avoiding unnecessary filesystem probing.
  • upgrade-from-grub-legacy actually now installs GRUB 2 to the boot sector (!).
  • Ask for confirmation if grub-pc/install_devices is left empty.

The next upstream snapshot will bring several improvements to EFI video support, mainly thanks to Vladimir Serbinenko. I've been working on making grub-install actually work on UEFI systems as one of my goals for the next Ubuntu release, and I hope to get this landed in the not-too-distant future.

[/debian] permanent link

Fri, 04 Jun 2010

Hacking on grub2

Various people observed in a long thread on debian-devel that the grub2 package was in a bit of a mess in terms of its release-critical bug count, and Jordi and Stefano both got in touch with me directly to gently point out that I probably ought to be doing something about it as one of the co-maintainers.

Actually, I don't think grub2 was in quite as bad a state as its 18 RC bugs suggested. Of course every boot loader failure is critical to the person affected by it, not to mention that GRUB 2 offers more complex functionality than any other boot loader (e.g. LVM and RAID), and so it tends to accumulate RC bugs at rather a high rate. That said, we'd been neglecting its bug list for some time; Robert and Felix have both been taking some time off, Jordi mostly only cared about PowerPC and can't do that any more due to hardware failure, and I hadn't been able to pick up the slack.

Most of my projects at work for the next while involve GRUB in one way or another, so I decided it was a perfectly reasonable use of work time to do something about this; I was going to need fully up-to-date snapshots anyway, and practically all the Debian grub2 bugs affect Ubuntu too. Thus, with the exception of some other little things like releasing the first Maverick alpha, I've spent pretty much the last week and a half solidly trying to get the grub2 package back into shape, with four uploads so far.

The RC issues that remain are:

  • upgrade-from-grub-legacy problems (#547944, #550477):
    I think this has just been traditionally undertested. I'm setting up a KVM image now with GRUB Legacy which I can snapshot just before and after running upgrade-from-grub-legacy, and I should be able to unpick the bugs this way.
  • LVM snapshots break GRUB's LVM module (#574863):
    Sean has been working on this and seems to be nearly there. Yay.
  • RAID metadata version 1.x not supported (#492897):
    This became rather more of an issue recently since mdadm switched its default from the old 0.90 format which GRUB understood. Felix put together a branch implementing the hard parts of this a while back, and I've been trying to finish it off. The hard bit is dealing with device naming, especially as the new-format and rather more useful names under /dev/md/ don't show up during d-i after creating RAID volumes; I think this is because we always create them as /dev/md0 etc. It's looking tractable, though.
  • Another odd problem probing RAID (#548648):
    Not sure about this one, and I'll need to work with Josip on it as soon as I get a chance.
  • Stable device naming #554790) and consequential problems due to grub-install not being properly run (#557425 and many other sub-RC bugs):
    Ubuntu's been carrying a patch to rearrange device presentation in the postinst, which Robert OKed in principle ages ago and so I've been intending to merge it for a while, but there are a few known problems with it that I need to fix first. One known unfixable problem is that it will have to ask some people which devices they want GRUB to be installed on, even if they'd answered that question before: this will be one-time, and it's because it recorded the answer using unstable device names and so has in some sense forgotten. Simple cases (e.g. single-disk) can be handled without needing to ask again, though.
  • Alignment errors on SPARC (#560823):
    I have no idea what's going on here, I'm afraid. I'll try to trace it, but may have to downgrade it at some point since after all we don't install GRUB by default on SPARC yet.
  • Fonts not shown in gfxmenu (#564844):
    Apparently fixed upstream, but I couldn't find the responsible commit so I want to make sure I can get gfxmenu working before closing this.
  • Sensitivity to out-of-date device.map files (#575076 and other sub-RC bugs):
    We're trying to get rid of device.map in general. It was fine in the 1990s but it's hopeless now. Unfortunately there are still a small number of problems with running entirely without one, and one of my patches to help is controversial upstream, so we probably won't get to that for squeeze. In the meantime we'll probably just need some extra sanity-checking and robustness in the event that there's an incorrect or out-of-date device.map lying around, which we may just be able to do in the maintainer scripts or something if necessary.
  • Seriously weird failures to load initramfs (#582342):
    If anyone can produce a reproduction recipe for this, that would really help me out. There are too many reports to discount as user error, but I haven't seen this myself yet.
  • Build failure on sparc (unfiled):
    We've been discussing this upstream, but for the time being I'm just going to stop building grub-emu on sparc as a workaround.

If we can fix that lot, or even just the ones that are reasonably well-understood, I think we'll be in reasonable shape. I'd also like to make grub-mkconfig a bit more robust in the event that the root filesystem isn't one that GRUB understands (#561855, #562672), and I'd quite like to write some more documentation.

On the upside, progress has been good. We have multiple terminal support thanks to a new upstream snapshot (#506707), update-grub runs much faster (#508834, #574088), we have DM-RAID support with a following wind (#579919), the new scheme with symlinks under /dev/mapper/ works (#550704), we have basic support for btrfs / as long as you have something GRUB understands properly on /boot (#540786), we have full info documentation covering all the user-adjustable settings in /etc/default/grub, and a host of other smaller fixes. I'm hoping we can keep this up.

If you'd like to help, contact me, especially if there's something particular that isn't being handled that you think you could work on. GRUB 2 is actually quite a pleasant codebase to work on once you get used to its layout; it's certainly much easier to fix bugs in than GRUB Legacy ever was, as far as I'm concerned. Thanks to tools like grub-probe and grub-fstest, it's very often possible to fix problems without needing to reboot for anything other than a final sanity check (although KVM certainly helps), and you can often debug very substantial bits of the boot loader - the bits that actually go wrong - using standard tools such as strace and gdb. Upstream is helpful and I've been able to get many of the problems above fixed directly there. If you have a sound knowledge of C and a decent level of understanding of the environment a boot loader needs to operate in - or for that matter specialist knowledge of interesting device types - then you should be able to find something to do.

[/debian] permanent link

Mon, 10 May 2010

OpenSSH 5.5p1 for Lucid

For various reasons, I chose to leave Ubuntu 10.04 LTS using OpenSSH 5.3p1. The new features in 5.4p1 such as certificate authentication, the new smartcard handling, netcat mode, and tab-completion in sftp are great, but unfortunately it was available just a little bit too late for me to be able to land it for 10.04 LTS. I realise that many Lucid users want to make use of these features for one reason or another, though, so as a compromise here's a PPA containing OpenSSH 5.5p1 for Lucid.

I intend to keep this up to date for as long as I reasonably can, and I'm happy to accept bug reports on it in the usual place.

[/ubuntu] permanent link

Fri, 26 Mar 2010

Thoughts on 3.0 (quilt) format

Note: I wrote most of this before Neil Williams' recent comments on the 3.0 family of formats, so despite the timing this isn't really a reaction to that although I do have a couple of responses. On the whole I think I agree that the Lintian message is a bit heavy-handed and I'm not sure I'm thrilled about the idea of the default source format being changed (though I can see why the dpkg maintainers are interested in that). That said, as far as I personally am concerned, there is a vast cognitive benefit to me in having as much as possible be common to all my packages. Once I have more than a couple of packages that require patching and benefit from the 3.0 (quilt) format as a result, I find it in my interest to use it for all my non-native packages even if they're patchless right now, so that for instance if they need patches in the future I can handle them the same way. It's not unheard of for me to apply temporary patches even to packages I actively maintain upstream, so I don't discount those either. I haven't decided what to do with my native packages yet; unless they're big enough for bzip2 compression to be worthwhile, there doesn't seem to be much immediate advantage to 3.0 (native).

Anyway, on to the main body of this post:

I've been one of the holdouts resisting use of patch systems for a long time, on the basis that I felt strongly that dpkg-source -x ought to give you the source that's actually built, rather than having to mess around with debian/rules targets in order to see it. Now that the 3.0 (quilt) format is available to fix this bug, I felt that I ought to revisit my resistance and start trying to use it. Migrating to it from monolithic diffs is of course a bit more work than migrating to it from other patch systems, so it's taken me a little while to get round to it. I'd been thinking about holding off until there was better integration with revision control (e.g. bzr looms), as I feel that patch files really ought to be an export format, but I eventually decided that I shouldn't let the perfect be the enemy of the good. I have enough experience with co-maintaining packages that use build-time patch systems to be able to compare my reactions.

After experimenting with a couple of small packages, I moved over to the deep end and converted openssh a few weekends ago, since quite a few people have requested over the years that the Debian changes to openssh be easier to audit. This was a substantial job - over 6000 lines of upstream patches - but not actually as much work as I expected. I took a fairly simplistic approach: first, I unapplied all the upstream patches from my tree; then I ran bzr di | interdiff -q /dev/stdin /dev/null >x, reduced it to a single logically-discrete patch, applied it to a new quilt patch using quilt fold, and repeated until x was empty. This was maybe an hour or two of work, and then I went through and tagged all the patches according to DEP-3, which took another few hours. After the first pass, I ended up with 38 patches and a much clearer idea of what has been forwarded upstream and what hasn't; I currently have 5 patches to forward or eliminate, down from 18.

Good things:

  • I don't lose any of my history. Since all the patches remain applied to the tree in revision control (this is what dpkg-source -x gives you, so it's the natural representation in revision control too), bzr blame works just as you'd expect and displays both upstream and Debian changes at once. I rely on tools like blame a lot, and I really hate the way build-time patch systems make it hard to use revision control when the tree is in a built state, so this was a hard requirement for me.
  • I've used patch tagging before, so I was expecting some benefits, but viscerally I feel much more in control. It's so much less laborious now to see what I need to do by way of forwarding. I don't regret waiting for 3.0 (quilt) to become available, but I hadn't realised quite how much I was being held back beforehand.
  • Adding new patches is pretty natural, much more so than with build-time patch systems. You can create and apply the patch, test-build, and commit when it works. I much prefer this over having to clean the tree before committing (or commit just part of the tree, which is error-prone). The more that committing to a Debian package feels like committing to an upstream project, the better.
  • There's definitely something to be said for patch-tracker being more useful. It deals with DEP-3 to the extent of linkifying URLs, although it might be nice if patch descriptions were displayed on the overview page for each version.

Bad things:

  • It's a bit awkward to set things up when checking out from revision control; I didn't really want to check in the .pc directory, and the tree checks out in the patched state (as it should), so I needed some way for developers to get quilt working easily after a checkout. This is sort of the reverse of the previous problem, where users had to do something special after dpkg-source -x, and I consider it less serious so I'm willing to put up with it. I ended up with a rune in debian/rules that ought to live somewhere more common.
  • Everything ends up represented twice in revision control: the patch files, plus the changes to the patched files themselves. I'm OK with this although it is a little inelegant.
  • Although I haven't had to do it yet, I expect that merging new upstream releases will be a bit harder. bzr will deal with resolving conflicts in the patched files themselves, and that's why I use a revision control system after all, but then I'll have to go and refresh all the patches and will probably end up doing some of the same conflict resolution a second time. I think the best answer right now is to quilt pop -a, force a merge despite the modified working tree, and then quilt push && quilt refresh -pab until I get back to the top of the stack, modulo slight fiddliness when a patch disappears entirely; thus effectively using quilt's conflict resolution rather than bzr's. I suppose this will serve as additional incentive to reduce my patch count. I know that people have been working on making this work nicely with topgit, although I'm certainly not going to put up with the rest of git due to that; I'm happy to wait for looms to become usable and integrated. :-)
  • It would be nice if there were some standard DEP-3 way to note that a patch has been accepted or rejected upstream, beyond just putting it in the description. In particular, it seems to me that listing patches accepted upstream could be used to speed up the process of merging new upstream releases.

On the whole I'm satisfied with this, and the benefits definitely outweigh the costs. Thanks to the dpkg team for all their work on this!

[/debian] permanent link

Mon, 22 Mar 2010

parted 2.2 transition

I've started the transition of parted 2.2 to unstable. This is a major update needed for sensible support of newer hard disks with alignment requirements different from the archaic cylinder alignment tradition. I posted to debian-boot with a summary of the partman changes involved.

[/debian] permanent link

Wed, 03 Mar 2010

debhelper statistics

I don't know if anyone else has been tracking this recently, but a while back I got curious about the relative proportions of dh(1) and CDBS in the archive, and started running some daily analysis on the Lintian lab. Apologies for my poor graphing abilities, but the graph is here (occasionally updated):

Although dh is still a bit behind CDBS, the steady upward trend is quite striking - it looks set to break 20% soon, up from under 13% in September - compared with CDBS which has been sitting within half a percentage point of 25% the whole time.

Incidentally, was that an ftpmaster trying to sign his name in the graph over Christmas or something? :-)

[/debian] permanent link

Sun, 21 Feb 2010

Catching up

I did a bit of catching up on my Debian backlog over the last week or so. Among the things I got round to:

  • I released man-db 2.5.7. This was mostly an "I've been meaning to do this for ages" kind of thing to reduce the bug list a bit, closing ten Debian bugs, but there were a few interesting things in there as well, such as always saving cat pages in UTF-8 and recoding to the user's locale at display time (long overdue), adjusting the search order for localised manual pages by request of quite a few non-native English speakers to prefer a page in the right section over a page in the right language, and a cute gimmick to make things like man /usr/bin/time display the appropriate manual page rather than the text of the executable. See the NEWS file for more details.
  • binfmt-support now installs cleanly on non-Linux systems, even if it doesn't do anything useful yet.
  • I fixed a couple of shell bugs in groff.
  • halibut now complies with the Debian Vim policy, even though I can't say I entirely agree with it in this case.
  • I fixed a really odd build failure in troffcvt. Yay imake, or something.
  • All Debian patches to putty are now upstream, or will be once I upload a new snapshot. Thanks to Simon Tatham and Jacob Nevins.
  • I did a few bits and pieces of packaging cleanup with an eye on my DDPO list, and added some watch files where they were missing.
  • Responded to an offer to take over icoutils maintenance.

So nothing really earth-shaking, and as ever openssh could use some attention, but I feel a bit better about my backlog now. I do still have a critical bug in makepasswd to fix, and a sponsored upload of parrot; those are the next two things on my to-do list.

[/debian] permanent link

Fri, 13 Nov 2009

Tissue of lies

In case it isn't obvious, in "Ubuntu 9.10 SP1 coming in spring 2010", "Ubuman" is blatantly lying in attributing a number of statements to me. None of the text there was written by me, and if you thought any of it was true then you should probably make sure your troll radar is working properly. Nice joke, but try harder next time - it doesn't even look like my writing style.

(I wouldn't normally bother to respond, since I'm probably just giving it more publicity, but apparently one or two people may already have been taken in by it. One person was sensible enough to write to me and check the facts.)

[/ubuntu] permanent link

Fri, 31 Jul 2009

Keysigning bits

If you're generating one of these shiny new RSA keys, do please remember to generate an encryption subkey too if you expect people to sign it - at least your more obscure UIDs. I'm not going to mail unencrypted signatures around unless I have some out-of-band knowledge that the e-mail address actually belongs to the person I met.

I generated a new 4096-bit RSA key myself at DebConf (baa!), and have just published a key transition document. Please consider signing my new key if you signed my old one.

[] permanent link

Tue, 14 Jul 2009

man-db: 'man -K'

I recently implemented man -K (full-text search over all manual pages) in man-db. This was inspired by a similar feature in Federico Lucifredi's man package (formerly maintained by Andries Brouwer). I think I did a much better job of it, though. The man package just forks grep for every manual page; man-db takes advantage of the pipeline library I wrote for it a while back and does it entirely in-process (decompression requires a fork but no exec, while the man package has to exec gunzip as well).

The upshot is that, with a hot cache, man-db takes around 40 seconds to search all manual pages on my laptop; the man package (also with a hot cache) takes around five minutes, and interactive performance goes down the drain while it's doing it since it's spawning subprocesses like crazy. If I limit to a single section, the disparity is closer to 3x than 10x, but it's still very noticeable. It's interesting how much good libraries can do to help guide efficient approaches to problems.

Of course, a proper full-text search engine would be much better still, but that's a project for some other time ...

[] permanent link

Thu, 02 Jul 2009

Python SIGPIPE handling

Enrico writes about creating pipelines with Python's subprocess module, and notes that you need to take care to close stdout in non-final subprocesses so that subprocesses get SIGPIPE correctly. This is correct as far as it goes (and true in any language, although there's a Python bug report requesting that subprocess be able to do this itself), but there's an additional gotcha with Python that you missed.

Python ignores SIGPIPE on startup, because it prefers to check every write and raise an IOError exception rather than taking the signal. This is all well and good for Python itself, but most Unix subprocesses don't expect to work this way. Thus, when you are creating subprocesses from Python, it is very important to set SIGPIPE back to the default action. Before I realised this was necessary, I wrote code that caused serious data loss due to a child process carrying on out of control after its parent process died!

import signal
import subprocess

def subprocess_setup():
    # Python installs a SIGPIPE handler by default. This is usually not what
    # non-Python subprocesses expect.
    signal.signal(signal.SIGPIPE, signal.SIG_DFL)

subprocess.Popen(command, preexec_fn=subprocess_setup)

I filed a patch a while back to add a restore_sigpipe option to subprocess.Popen, which would take care of this. As I say in that bug report, in a future release I think this ought to be made the default, as it's very easy to get things dangerously wrong right now.

[] permanent link

Thu, 28 May 2009

code_swarm video of Ubuntu uploads

Joey Hess posted a draft of a code_swarm video for d-i a couple of weeks ago, which reminded me that I've been meaning to do something similar for Ubuntu for a while now as it's just about our archive's fifth birthday. I have a more or less complete archive of all our -changes mailing lists locally (I think I'm missing some of the very early ones, before the end of July 2004; let me know if you were one of the very early Canonical employees and have a record of these), and with the aid of launchpadlib it's fairly easy to map all the e-mail addresses into Launchpad user names, massage out some of the more obvious duplicates, and then treat the stream of uploads as if it were a stream of commits.

If you haven't seen code_swarm before, each dot represents an upload, and the dots "swarm" around their corresponding committers' names; more active committers have larger swarms of dots and brighter names. I assigned a colour to each of our archive components (uploads aren't really at the C code vs. Python code vs. translations vs. whatever kind of granularity that you see in other code_swarm videos), which mostly means that people who predominantly upload to main are in roughly an Ubuntu tan colour, people who predominantly upload to universe are coloured bluish, and people with a good mixture tend to come out coloured green. If I get a bit more time I may try to figure out enough about video editing software to add some captions.

Here's the video (194 MB).

[/ubuntu] permanent link

Thu, 05 Mar 2009

Bug triage, redux

I've been a bit surprised by the strong positive response to my previous post. People generally seemed to think it was quite non-ranty; maybe I should clean the rust off my flamethrower. :-) My hope was that I'd be able to persuade people to change some practices, so I guess that's a good thing.

Of course, there are many very smart people doing bug triage very well, and I don't want to impugn their fine work. Like its medical namesake, bug triage is a skilled discipline. While it's often repetitive, and there are lots of people showing up with similar symptoms, a triage nurse can really make a difference by spotting urgent cases, cleaning up some of the initial blood, and referring the patient quickly to a doctor for attention. Or, if a pattern of cases suddenly appears, a triage nurse might be able to warn of an incipient epidemic. [Note: I have no medical experience, so please excuse me if I'm talking crap here. :-)] The bug triagers who do this well are an absolute godsend; especially when they respond to repetitive tasks with tremendously useful pieces of automation like bughelper. The cases I have trouble with are more like somebody showing up untrained, going through everyone in the waiting room, and telling each of them that they just need to go home, get some rest, and stop complaining so much. Sometimes of course they'll be right, but without taking the time to understand the problem they're probably going to do more harm than good.

Ian Jackson reminded me that it's worth mentioning the purpose of bug reports on free software: namely, to improve the software. The GNU Project has some advice to maintainers on this. I think sometimes we stray into regarding bug reports more like support tickets. In that case it would be appropriate to focus on resolving each case as quickly as possible, if necessary by means of a workaround rather than by a software change, and only bother the developers when necessary. This is the wrong way to look at bug reports, though. The reason that we needed to set up a bug triage community in Ubuntu was that we had a relatively low developer-to-package ratio and a very high user-to-developer ratio, and we were getting a lot of bug reports that weren't fleshed out enough for a developer to investigate them without spending a lot of time in back-and-forth with the reporter, so a number of people volunteered to take care of the initial back-and-forth so that good clear bug reports could be handed over to developers. This is all well and good, and indeed I encouraged it because I was personally finding myself unable to keep up with incoming bugs and actually fix anything at the same time. Somewhere along the way, though, some people got the impression that what we wanted was a first-line support firewall to try to defend developers from users, which of course naturally leads to ideas such as closing wishlist bugs containing ideas because obviously those important developers wouldn't want to be bothered by them, and closing old bugs because clearly they must just be getting in developers' way. Let me be clear about this now: I absolutely appreciate help getting bug reports into a state where I can deal with them efficiently, but I do not want to be defended from my users! I don't have a basis from which to state that all developers feel the same way, but my guess is that most do.

Antti-Juhani Kaijanaho said he'd experienced most of these problems in Debian. I hadn't actually intended my post to go to Planet Debian - I'd forgotten that the "ubuntu" category on my blog goes there too, which generally I see as a feature, but if I'd remembered that I would have been a little clearer that I was talking about Ubuntu bug triage. If I had been talking about Debian bug triage I'd probably have emphasised different things. Nevertheless, it's interesting that at least one Debian (and non-Ubuntu) developer had experienced similar problems.

Justin Dugger mentions a practice of marking duplicate bugs invalid that he has problems with. I agree that this is suboptimal and try not to do it myself. That said, this is not something I object to to the same extent. Given that the purpose of bugs is to improve the software, the real goal is to be able to spend more time fixing bugs, not to get bugs into the ideal state when the underlying problem has already been solved. If it's a choice between somebody having to spend time tracking down the exact duplicate bug number versus fixing another bug, I know which I'd take. Obviously, when doing this, it's worth apologising that you weren't able to find the original bug number, and explaining what the user can do if they believe that you're mistaken (particularly if it's a bug that's believed to be fixed); the stock text people often use for this doesn't seem informative enough to me.

Sebastien Bacher commented that preferred bug triage practices differ among teams: for instance, the Ubuntu desktop team deals with packages that are very much to the forefront of users' attention and so get a lot of duplicate bugs. Indeed - and bug triagers who are working closely with the desktop team on this are almost certainly doing things the way the developers on the desktop team prefer, so I have no problem with that. The best advice I can give bug triagers is that their ultimate aim is to help developers, and so they should figure out which developers they need to work with and go and talk to them! That way, rather than duplicating work or being counterproductive, they can tailor their work to be most effective. Everybody wins.

[/ubuntu] permanent link

Mon, 02 Mar 2009

Bug triage rants

I hate to say this, but often when somebody does lots of bug triage on a package I work on, I find it to be a net loss for me. I end up having to go through all the things that were changed, correct a bunch of them, occasionally pacify angry bug submitters, and all the rest of it, and often the benefits are minimal at best.

I would very much like this not to be the case. Bug triage is supposed to help developers be more efficient, and I think most people who do bug triage are generally well-intentioned and eager to help. Accordingly, here is a series of mini-rants intended to have educational value.

  • Bugs are not like fruit.

    Fruit goes bad if you leave it too long. By and large, bugs don't, especially if they're on software that doesn't change very much. There is no reason why a bug filed against a package in Ubuntu 4.10 where the relevant code hasn't changed much since shouldn't still be perfectly valid. Even if it isn't, it deserves proper consideration.

    My biggest single annoyance with bug triage is people coming around and asking if bugs are still valid when they haven't put any effort into reproducing them themselves. This annoys bug submitters too; every so often somebody replies and says "didn't you even bother to check?". This gives a very bad impression of us as a project - wouldn't it be better if we looked as if we knew what we were talking about? There is a good reason to do this kind of check, of course: random undiagnosed crash reports and the like may well go away due to related changes, and it is occasionally worth checking. But if the bug is already well-understood and/or well-described, you should just go and check whether it's still there rather than asking.

    As I understand it, the intended workflow is that people file bugs, then if they aren't clear enough bug triagers work with the submitter to gather information until they are, then they're passed to developers for further work. We seem to have added an extra step wherein submitters must periodically give their bug a health-check, and if they don't then it gets closed as being out of date. In a small minority of cases this is useful; in most cases, frankly, it makes us look a bit clueless. Can we please stop doing this? The more we waste people's time doing this, the less likely it is that they'll bother to respond to us, and this might help our statistics but doesn't help the project as a whole.

    I know that there's a problem with bug count. I think every project of non-trivial size has that problem. But, honestly, the right answer is to fix more bugs - and, personally, I would be able to spend more time doing that if I weren't often running around trying to make sure that bugs I care about aren't getting overenthusiastically closed just because somebody thinks they've been lying around too long.

    There is a good way to expire bugs like this, of course. It goes something like this: "I've read through your bug and tried to reproduce it with a current release, but I'm afraid I can't do so. Are you still experiencing it? If not, then I think it might have been fixed by [this change I found in the package's history that seems to be related]." You can't do this en masse, but you'll get a much better response from submitters, you'll learn more doing it, and in the process of doing the necessary investigation of each bug you'll find that there are many cases you don't have to ask about at all.

  • Wishlist bugs are not intrinsically bad.

    There are certainly cases where something is far too broad or vague for a bug report; but there are also plenty of cases, probably far more, where the wish in question is a relatively small change to the program, or doesn't need any more sophisticated tracking, and a wishlist bug is just right. If you don't know the program very well, it may be difficult to tell whether a wishlist bug is appropriate or not; in that case, just leave the bug alone.

    Please, for the love of all that's holy, don't close wishlist bugs saying that people should use Brainstorm or write a specification instead! If you don't want to see wishlist bugs in your statistics, just filter them out; it's quite easy to do. Even worse, don't tell people that something probably isn't a good idea when you aren't familiar with the software; people who have gone to the effort of writing up their idea for us deserve a response from somebody who knows the software well. I've encountered cases where friends of mine submitted a bug report (sometimes even at my request) and then a triager told them it was a bad idea and closed their bug. This sort of thing puts people off Ubuntu.

    Specifications are software design documents. As such, they are best written by software designers. People who tell other people to go and write a specification may not realise that as a result of doing this for three years it's now essentially impossible to find anything in the specification system! The intent was never that every user of Ubuntu would need to write a specification to get anything changed; specifications are used by developers to document the results of discussions and write up plans. They are not a straightforward alternative to wishlist bugs, nor do they turn out to work very well as what many formal processes call "requirements documents"; the process of refining the latter in the context of Ubuntu might involve wishlist bugs, mailing list threads, wiki pages, private discussions with developers, or things of that nature, and probably shouldn't involve creating a specification until the requirements-gathering process is well underway.

  • Closing a bug is taking an item off somebody's to-do list.

    You wouldn't go up to a colleague's whiteboard and take an eraser to it unless you were sure that was OK, would you? Yet people seem to do that all the time with bugs. It's OK when the bug is really just like a support request - "help, it crashed, what do I do?" - and either you're pretty sure it's user error or there's just no way to get enough information to fix it. But once the initial triage process is done, now it's on somebody's to-do list.

    This is closely related to ...

  • If a developer has accepted it, leave it alone.

    Every so often I find that there's a bug that I have accepted by way of a bug comment or setting to Triaged or whatever, or even a bug that I filed on a package I work on as a reminder to myself, and somebody comes along and asks for more information or asks if we can still reproduce it or something. The hit rate on this kind of thing is extraordinarily low. There's a good chance that the developer went and verified the bug against the code, and in that case it certainly doesn't need more information (or they would have asked for it) and it probably isn't going to go away without anyone noticing.

    In most other free software projects, developers file bug reports themselves as a reminder about things that need to be done, and people leave them alone unless they're intending to help with the fix. In Ubuntu, developers also have to spend time making sure that those to-do items don't get expired. Nobody is helped by this.

    launchpad-gm-scripts includes a Greasemonkey script called lp_karma_suffix, which can help you to identify developers without having to spend lots of time clicking around.

  • Check whether the package is being actively worked on.

    Some packages are actively worked on in Ubuntu; some aren't (e.g. we just sync packages from Debian, or they're basically orphaned, or whatever). It's worth checking which is which before doing any kind of extensive triage work. If it's being actively worked on, why not go and talk to the developer(s) in question first? It's only polite, and it will probably help you to do a better job.

[/ubuntu] permanent link

Mon, 27 Oct 2008

Totem BBC plugin

A while back, the BBC approached Canonical about providing seamless access to unencumbered BBC content for all Ubuntu users (in the UK and elsewhere). We agreed to approach this by way of a plugin for our primary media player, Totem, and asked Collabora Multimedia to do the plugin development work.

The results are in what will shortly be released as Ubuntu 8.10, and are looking really rather good. At the moment the content available from the BBC at present is mostly audio, but support for video is in place and the feed is expected to be fleshed out here over time. We have a genre classification scheme in place, and will see how that scales as the amount of available content grows. The code has been submitted upstream, although there are still a few issues to work out there.

This is not the same thing as iPlayer; all the content available here is DRM-free. Some of it is geographically restricted to the UK, and these restrictions are handled on the server side to make sure that the client is free of encumbrances.

Christian Schaller from Collabora posted about this a little while ago. Since then, the UI has been improved somewhat and some I/O issues have been fixed to the point where we felt comfortable enabling the BBC plugin (as well as the YouTube plugin) by default in Ubuntu 8.10. Here's a screenshot of the current interface.

This is exciting stuff with a lot of potential. To try it out, run Applications -> Sound & Video -> Movie Player and select the "BBC" entry from the drop-down box labelled "Playlist". If you find bugs, please report them!

[/ubuntu] permanent link

Mon, 23 Jun 2008

Re: Perl is strange

Christoph: That's because =~ binds more tightly than +. This does what you meant:

$ perl -le 'print "yoo" if (1 + 1) =~ /3/'

perlop(1) has a useful table of precedence.

[/debian] permanent link

Don't use sshkeygen.com to generate keys!

To my horror, I recently saw this online SSH key generator.

I hope nobody reading this needs to be told why this is a bad idea. However, in case you do, here are a few reasons:

  • Every SSH implementation I know of - certainly all the major ones - that support public key authentication also provide a key generation utility. Even aside from all the good reasons not to, there is simply no reason why you should need to use a web-based tool in the first place.
  • How can you trust the person running this site? Without implying that I know he or she is untrustworthy (I don't), and with the best will in the world, it's a big Internet with a lot of nasty people on it. Do you really want somebody you don't know in a position to keep a copy of all your private keys?
  • Even if the person is trustworthy, the server running sshkeygen.com is now a giant blinking target. If lots of people use it, there is every incentive in the world for the bad guys to try to take control of it so that they can keep a copy of all your private keys. (Or, as we know from recent bitter experience, they can just give out keys from a limited set and it will probably take a couple of years before anyone notices ...)
  • The front page of sshkeygen.com says that the keys are escrowed. The plain English meaning of this would be that the operator of that site keeps a copy of the private key, to be held in trust in case (presumably) you lose it and need to retrieve it. Normally this sort of thing depends on a legal trust relationship, perhaps linked to a contract. What does it mean here? Is it just a buzzword? If it isn't, then this just makes sshkeygen.com even more of a target.
  • sshkeygen.com delivers keys to you over unencrypted HTTP. Yes, this is on its to-do list. That isn't really an excuse.
  • Even if keys were delivered over HTTPS, that still relies on people diligently checking the authenticity of the certificate. A self-signature (as suggested as an alternative in the to-do list) would be impossible to check with any reliability; and will people who have trouble with non-web-based key generation software really be able or inclined to confirm the signature chain? Browsers typically don't enforce this very strictly, or if they do they provide fairly simple ways to bypass the enforcement, simply because so many sites have broken or poorly-signed SSL certificates, and keeping up with all the CAs is pretty hard work too.
  • Furthermore, delivering private keys over HTTPS makes that SSL certificate a single giant blinking target. Might it be compromised? How would you tell? What servers would need to be compromised in order to get a copy of the private SSL key?
  • Sure, Debian is in an awkward position here given the recent OpenSSL random number generation vulnerability. However, how do you know that sshkeygen.com is running on a system that doesn't suffer from this? (As it happens, I have checked, and it doesn't appear to suffer from this vulnerability - but most people won't check and won't know how to check.)

I think this is probably being done in innocent seriousness (although I kind of hope it's a joke in poor taste), and have e-mailed the contact address offering to explain why it's a bad idea.

[] permanent link

Sat, 12 Apr 2008

Desktop automounting pain

Ubuntu's live CD installer, Ubiquity, needs to suppress desktop automounting while it's doing partitioning and generally messing about with mount points, otherwise its temporary mount points end up busy on unmount due to some smart-arse desktop component that decides to open a window for it.

To date, it employs the following methods, each of which was sufficient at the time:

  • Set the /desktop/gnome/volume_manager/automount_drives and /desktop/gnome/volume_manager/automount_media gconf keys to false.
  • Tell kded to unload its medianotifier module, and load it again just before the installer exits.
  • Set the /apps/nautilus/desktop/volumes_visible gconf key to false.
  • Set the AutomountDrives and AutomountMedia keys in $HOME/.config/Thunar/volmanrc to FALSE.
  • Set the /apps/nautilus/preferences/media_automount and /apps/nautilus/preferences/media_automount_open gconf keys to false.
  • The entire installer is run under hal-lock --interface org.freedesktop.Hal.Device.Storage --exclusive.
  • Set the /apps/nautilus/preferences/media_autorun_never gconf key to true (experimental, but apparently now required since nautilus uses the gio volume monitor).

This is getting ridiculous. Dear desktop implementors: please pick a configuration mechanism and stick to it, and provide backward compatibility if you can't. This is not a rocket-science concept.

I rather liked the hal-lock mechanism; it was simple and involved minimal fuss. I had hoped that it might end up as a standard, but I guess that would be too easy.

[/ubuntu] permanent link

Thu, 31 Jan 2008

Vim omni completion for Launchpad bugs

I hacked together a little timesaver for developers this morning: omni completion for Launchpad bugs in Vim's debchangelog mode. To use it, install vim 7.1-138+1ubuntu3 once it hits the mirrors, open up a debian/changelog file, type "LP: #", and hit Ctrl-X Ctrl-O. It'll think for a while and then give you a list of all the bugs open in Launchpad against the package in question, from which you can select to insert the bug number into your changelog.

Here's a screenshot to make it clearer:

Thanks to Stefano Zacchiroli for doing the same for Debian bugs back in July.

[/ubuntu] permanent link

Tue, 29 Jan 2008

UTF-8 manual pages

See Encodings in man-db for context.

Yesterday, I uploaded man-db 2.5.1-1 to unstable. With this version, not only is it possible to install manual pages in UTF-8 (as with 2.5.0, although with fewer bugs), but it's also possible to ask man to produce a version of an arbitrary page in the encoding of your choice, and have it guess the source encoding for you fairly reliably. This finally provides enough support to have debhelper automatically recode manual pages to UTF-8.

It'll probably take a little while to shake out the corner-case bugs, but I'm generally pretty happy with this. Once the new man-db and debhelper land in testing, I'll send a note to debian-devel-announce and push harder on my policy amendment.

Considering the historical state of man-db when it comes to localisation, and all of the dependencies and general yak-shaving that had to be tackled to get here, this represents the end of probably several hundred hours of work, so I'm pretty happy that this is out the door. The only remaining step is to add UTF-8 input support to groff, which fortunately Brian M. Carlson is working on. After that, we can reasonably claim to have dragged manual pages kicking and screaming into the 21st century.

[] permanent link

Thu, 29 Nov 2007

aptitude safe-upgrade

Erich: I do sometimes wonder why we don't relax the definition of "safe" upgrades to include installing new packages but still not removing old ones. I know that many of my uses of dist-upgrade are just for when something grows a new dependency that I didn't previously have installed.

(Of course this wouldn't always help as it wouldn't account for a new dependency that conflicted with an old dependency, but never mind. It would certainly do wonders for the metapackage case.)

[/debian] permanent link