Colin Watson
   


About
Colin Watson's blog
cjwatson@debian.org

Subscribe
Subscribe to a syndicated feed of my blog.

Flavours


Powered by Blosxom

       
Fri, 26 Oct 2012

Automatic installability checking

I've just finished deploying automatic installability checking for Ubuntu's development release, which is more or less equivalent to the way that uploads are promoted from Debian unstable to testing. See my ubuntu-devel post and my ubuntu-devel-announce post for details. This now means that we'll be opening the archive for general development once glibc 2.16 packages are ready.

I'm very excited about this because it's something I've wanted to do for a long, long time. In fact, back in 2004 when I had my very first telephone conversation with a certain spaceman about this crazy Debian-based project he wanted me to work on, I remember talking about Debian's testing migration system and some ways I thought it could be improved. I don't remember the details of that conversation any more and what I just deployed may well bear very little resemblance to it, but it should transform the extent to which our development release is continuously usable.

The next step is to hook in autopkgtest results. This will allow us to do a degree of automatic testing of reverse-dependencies when we upgrade low-level libraries.

[/ubuntu] permanent link

Mon, 30 Jan 2012

APT resolver bugs

I've managed to go for eleven years working on Debian and nearly eight on Ubuntu without ever needing to teach myself how APT's resolver works. I get the impression that there's a certain mystique about it in general (alternatively, I'm just the last person to figure this out). Recently, though, I had a couple of Ubuntu upgrade bugs to fix that turned out to be bugs in the resolver, and I thought it might be interesting to walk through the process of fixing them based on the Debug::pkgProblemResolver=true log files.

Breakage with Breaks

The first was Ubuntu bug #922485 (apt.log). To understand the log, you first need to know that APT makes up to ten passes of the resolver to attempt to fix broken dependencies by upgrading, removing, or holding back packages; if there are still broken packages after this point, it's generally because it's got itself stuck in some kind of loop, and it bails out rather than carrying on forever. The current pass number is shown in each "Investigating" log entry, so they start with "Investigating (0)" and carry on up to at most "Investigating (9)". Any packages that you see still being investigated on the tenth pass are probably something to do with whatever's going wrong.

In this case, most packages have been resolved by the end of the fourth pass, but xserver-xorg-core is causing some trouble. (Not a particular surprise, as it's an important package with lots of relationships.) We can see that each breakage is:

Broken xserver-xorg-core:i386 Breaks on xserver-xorg-video-6 [ i386 ] < none > ( none )

This is a Breaks (a relatively new package relationship type introduced a few years ago as a sort of weaker form of Conflicts) on a virtual package, which means that in order to unpack xserver-xorg-core each package that provides xserver-xorg-video-6 must be deconfigured. Much like Conflicts, APT responds to this by upgrading providing packages to versions that don't provide the offending virtual package if it can, and otherwise removing them. We can see it doing just that in the log (some lines omitted):

Investigating (0) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-tseng:i386
Investigating (1) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-i740:i386
Investigating (2) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-nv:i386

OK, so that makes sense - presumably upgrading those packages didn't help at the time. But look at the pass numbers. Rather than just fixing all the packages that provide xserver-xorg-video-6 in a single pass, which it would be perfectly able to do, it only fixes one per pass. This means that if a package Breaks a virtual package which is provided by more than ten installed packages, the resolver will fail to handle that situation. On inspection of the code, this was being handled correctly for Conflicts by carrying on through the list of possible targets for the dependency relation in that case, but apparently when Breaks support was implemented in APT this case was overlooked. The fix is to carry on through the list of possible targets for any "negative" dependency relation, not just Conflicts, and I've filed a patch as Debian bug #657695.

My cup overfloweth

The second bug I looked at was Ubuntu bug #917173 (apt.log). Just as in the previous case, we can see the resolver "running out of time" by reaching the end of the tenth pass with some dependencies still broken. This one is a lot less obvious, though. The last few entries clearly indicate that the resolver is stuck in a loop:

Investigating (8) dpkg [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( admin )
Broken dpkg:i386 Breaks on dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils ) (< 1.15.8)
  Considering dpkg-dev:i386 29 as a solution to dpkg:i386 7205
  Upgrading dpkg-dev:i386 due to Breaks field in dpkg:i386
Investigating (8) dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils )
Broken dpkg-dev:i386 Depends on libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl ) (= 1.16.1.2ubuntu5)
  Considering libdpkg-perl:i386 12 as a solution to dpkg-dev:i386 29
  Holding Back dpkg-dev:i386 rather than change libdpkg-perl:i386
Investigating (9) dpkg [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( admin )
Broken dpkg:i386 Breaks on dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils ) (< 1.15.8)
  Considering dpkg-dev:i386 29 as a solution to dpkg:i386 7205
  Upgrading dpkg-dev:i386 due to Breaks field in dpkg:i386
Investigating (9) dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils )
Broken dpkg-dev:i386 Depends on libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl ) (= 1.16.1.2ubuntu5)
  Considering libdpkg-perl:i386 12 as a solution to dpkg-dev:i386 29
  Holding Back dpkg-dev:i386 rather than change libdpkg-perl:i386

The new version of dpkg requires upgrading dpkg-dev, but it can't because of something wrong with libdpkg-perl. Following the breadcrumb trail back through the log, we find:

Investigating (1) libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl )
Broken libdpkg-perl:i386 Depends on perl [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
  Considering perl:i386 1472 as a solution to libdpkg-perl:i386 12
  Holding Back libdpkg-perl:i386 rather than change perl:i386
Investigating (1) perl [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Broken perl:i386 Depends on perl-base [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl ) (= 5.14.2-6ubuntu1)
  Considering perl-base:i386 5806 as a solution to perl:i386 1472
  Removing perl:i386 rather than change perl-base:i386
Investigating (1) perl-base [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Broken perl-base:i386 PreDepends on libc6 [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs ) (>= 2.11)
  Considering libc6:i386 -17473 as a solution to perl-base:i386 5806
  Added libc6:i386 to the remove list
Investigating (0) libc6 [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs )
Broken libc6:i386 Depends on libc-bin [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs ) (= 2.11.1-0ubuntu7.8)
  Considering libc-bin:i386 10358 as a solution to libc6:i386 -17473
  Removing libc6:i386 rather than change libc-bin:i386

So ultimately the problem is something to do with libc6; but what? As Steve Langasek said in the bug, libc6's dependencies have been very carefully structured, and surely we would have seen some hint of it elsewhere if they were wrong. At this point ideally I wanted to break out GDB or at the very least experiment a bit with apt-get, but due to some tedious local problems I hadn't been able to restore the apt-clone state file for this bug onto my system so that I could attack it directly. So I fell back on the last refuge of the frustrated debugger and sat and thought about it for a bit.

Eventually I noticed something. The numbers after the package names in the third line of each of these log entries are "scores": roughly, the more important a package is, the higher its score should be. The function that calculates these is pkgProblemResolver::MakeScores() in apt-pkg/algorithms.cc. Reading this, I noticed that the various values added up to make each score are almost all provably positive, for example:

         Scores[I->ID] += abs(OldScores[D.ParentPkg()->ID]);

The only exceptions are an initial -1 or -2 points for Priority: optional or Priority: extra packages respectively, or some values that could theoretically be configured to be negative but weren't in this case. OK. So how come libc6 has such a huge negative score of -17473, when one would normally expect it to be an extremely powerful package with a large positive score?

Oh. This is computer programming, not mathematics ... and each score is stored in a signed short, so in a sufficiently large upgrade all those bonus points add up to something larger than 32767 and everything goes haywire. Bingo. Make it an int instead - the number of installed packages is going to be on the order of tens of thousands at most, so it's not as though it'll make a substantial difference to the amount of memory used - and chances are everything will be fine. I've filed a patch as Debian bug #657732.

I'd expected this to be a pretty challenging pair of bugs. While I certainly haven't lost any respect for the APT maintainers for dealing with this stuff regularly, it wasn't as bad as I thought. I'd expected to have to figure out how to retune some slightly out-of-balance heuristics and not really know whether I'd broken anything else in the process; but in the end both patches were very straightforward.

[/ubuntu] permanent link

Mon, 24 Oct 2011

Quality in Ubuntu 12.04 LTS

As is natural for an LTS cycle, lots of people are thinking and talking about work focused on quality rather than features. With Canonical extending LTS support to five years on the desktop for 12.04, much of this is quite rightly focused on the desktop. I'm really not a desktop hacker in any way, shape, or form, though. I spent my first few years in Ubuntu working mainly on the installer - I still do, although I do some other things now too - and I used to say only half-jokingly that my job was done once X started. Of course there are plenty of bugs I can fix, but I wanted to see if I could do something with a bit more structure, so I got to thinking about projects we could work on at the foundations level that would make a big difference.

Image build pipeline

One difficulty we have is that quite a few of our bugs - especially installer bugs, although this goes for some other things too - are only really caught when people are doing coordinated image testing just before a milestone release. Now, it takes a while to do all the builds and then it takes a while to test them. The excellent work of the QA team has meant that testing is much quicker now than it used to be, and a certain amount of smoke-testing is automated (particularly for server images). On the other hand, the build phase has only got longer as we've added more flavours and architectures, particularly as some parts of the process are still serialised per architecture or subarchitecture so ARM builds in particular take a very long time indeed. Exact timings are a bit difficult to get for various reasons, but I think the minimum time between a developer uploading a fix and us having a full set of candidate images on all architectures including that fix is currently somewhere north of eight hours, and that's with people cutting corners and pulling strings which is a suboptimal thing to have to do around release time. This obviously makes us reluctant to respin for anything short of showstopper bugs. If we could get things down to something closer to two hours, respins would be a much less horrible proposition and so we might be able to fix a few bugs that are serious but not showstoppers, not to mention that the release team would feel less burned out.

We discussed this problem at the release sprint, and came up with a laundry list of improvements; I've scheduled this for discussion at UDS in case we can think of any more. Please come along if you're interested!

One thing in particular that I'm working on is refactoring Germinate, a tool which dates right back to our first meeting before Ubuntu was even called Ubuntu and whose job is to expand dependencies starting from our lists of "seed" packages; we use this, among other things, to generate Task fields in the archive and to decide which packages to copy into our images. This was acceptably quick in 2004, but now that we run it forty times (eight flavours multiplied by five architectures) at the end of every publisher run it's actually become rather a serious performance problem: cron.germinate takes about ten minutes, which is over a third of the typical publisher runtime. It parses Packages files eight times as often as it needs to, Sources files forty times as often as it needs to, and recalculates the dependency tree of the base system five times as often as it needs to. I am confident that we can significantly reduce the runtime here, and I think there's some hope that we might be able to move the publisher back to a 30-minute cycle, which would increase the velocity of Ubuntu development in general.

Maintaining the development release

Our release cycle always starts with syncing and merging packages from Debian unstable (or testing in the case of LTS cycles). The vast majority of packages in Ubuntu arrive this way, and generally speaking if we didn't do this we would fall behind in ways that would be difficult to recover from later. However, this does mean that we get a "big bang" of changes at the start of the cycle, and it takes a while for the archive to be usable again. Furthermore, even once we've taken care of this, we have a long-established rhythm where the first part of the cycle is mainly about feature development and the second part of the cycle is mainly about stabilisation. As a result, we've got used to the archive being fairly broken for the first few months, and we even tell people that they shouldn't expect things to work reliably until somewhere approaching beta.

This makes some kind of sense from the inside. But how are you supposed to do feature development that relies on other things in the development release?

In the first few years of Ubuntu, this question didn't matter very much. Nearly all the people doing serious feature development were themselves serious Ubuntu developers; they were capable of fixing problems in the development release as they went along, and while it got in their way a little bit it wasn't all that big a deal. Now, though, we have people focusing on things like Unity development, and we shouldn't assume that just because somebody is (say) an OpenGL expert or a window management expert that they should be able to recover from arbitrary failures in development release upgrades. One of the best things we could do to help the 12.04 desktop be more stable is to have the entire system be less unstable as we go along, so that developers further up the stack don't have to be distracted by things wobbling underneath them. Plus, it's just good software engineering to keep the basics working as you go along: it should always build, it should always install, it should always upgrade. Ubuntu is too big to do something like having everyone stop any time the build breaks, the way you might do in a smaller project, but we shouldn't let things slide for months either.

I've been talking to Rick Spencer and the other Ubuntu engineering leads at Canonical about this. Canonical has a system of "rotations", where you can go off to another team for a while if you're in need of a change or want to branch out a bit; so I proposed that we allow our engineers to spend a month or two at a time on what I'm calling the +1 Maintenance Team, whose job is simply to keep the development release buildable, installable, and upgradeable at all times. Rick has been very receptive to this, and we're going to be running this as a trial throughout the 12.04 cycle, with probably about three people at a time. As well as being professional archive gardeners, these people will also work on developing infrastructure to help us keep better track of what we need to do. For instance, we could deploy better tools from Debian QA to help us track uninstallable packages, or we could enhance some of our many existing reports to have bug links and/or comment facilities, or we could spruce up the weather report; there are lots of things we could do to make our own lives easier.

By 12.04, I would like, in no particular order:

  • Precise to have been more or less continuously usable from Alpha 1 onward for people with reasonable general technical ability
  • Canonical engineering teams outside Ubuntu (DX, Ubuntu One, Launchpad, etc.) to be comfortable with running the development release on at least one system from Alpha 2 onward
  • Installability problems in daily image builds to be dealt with within one working day, or preferably before they even make it to daily builds
  • The archive to be close to consistent as we start milestone preparation, rather than the release team having to scramble to make it so
  • A very significant reduction in our long-term backlog of automatically-detected problems

Of course, this overlaps to a certain extent with the kinds of things that the MOTU team have been doing for years, not to mention with what all developers should be doing to keep their own houses in reasonable order, and I'd like us to work together on this; we're trying to provide some extra hands here to make Ubuntu better for everyone, not take over! I would love this to be an opportunity to re-energise MOTU and bring some new people on board.

I've registered a couple of blueprints (priorities, infrastructure) for discussion at UDS. These are deliberately open-ended skeleton sessions, and I'll try to make sure they're scheduled fairly early in the week, so that we have time for break-out sessions later on. If you're interested, please come along and give your feedback!

[/ubuntu] permanent link

Thu, 06 Oct 2011

Top ideas on Ubuntu Brainstorm (August 2011)

The Ubuntu Technical Board conducts a regular review of the most popular Ubuntu Brainstorm ideas (previous reviews conducted by Matt Zimmerman and Martin Pitt). This time it was my turn. Apologies for the late arrival of this review.

Contact lens in the Unity Dash (#27584)

Unity supports Lenses, which provide a consistent way for users to quickly search for information via the Dash. Current lenses include Applications, Files, and Music, but a number of people have asked for contacts to be accessible using the same interface.

While Canonical's DX team isn't currently working on this for Ubuntu 11.10 or 12.04, we'd love somebody who's interested in this to get involved. Allison Randal explains how to get started, including some skeleton example code and several useful links.

Displaying Ubuntu version information (#27460)

Several people have asked for it to be more obvious what Ubuntu version they're running, as well as other general information about their system.

John Lea, user experience architect on the Unity team, responds that in Ubuntu 11.10 the new LightDM greeter shows the Ubuntu version number, making that basic information very easily visible. For more detail, System Settings -> System Info provides a simple summary.

Volume adjustments for headphone use (#27275)

People often find that they need to adjust their sound volume when plugging in or removing headphones. It seems as though the computer ought to be able to remember this kind of thing and do it automatically; after all, a major goal of Ubuntu is to make the desktop Just Work.

David Henningson, a member of Canonical's OEM Services group and an Ubuntu audio developer, responds on his blog with a summary of how PulseAudio jack detection has improved matters in Ubuntu 11.10, and what's left to do:

The good news: in the upcoming Ubuntu Oneiric (11.10), this is actually working. The bad news: it isn't working for everyone.

Making it easier to find software to handle a file (#28148)

Ubuntu is not always as helpful as it could be when you don't have the right software installed to handle a particular file.

Michael Vogt, one of the developers of the Ubuntu Software Center, responded to this. It seems that most of the pieces to make this work nicely are in place, but there are a few more bits of glue required:

Thanks a lot for this suggestion. I like the idea and it's something that software-center itself supports now. In the coming version 5.0 we will offer to "sort by top-rated" (based on the ratings&reviews data). It's also possible to search for an application based on its mime data. To search for a mime-type, you can enter "mime:text/html" or "mime:audio/ogg" into the search field. What is needed however is better integration into the file manager nautilus. I will make sure this gets attention at the next developer meeting and filed bug #860536 about it.

In nautilus, there is now a button called "Find applications online" available as an option when opening an unknown file or when the user selects "open with...other application" in the context menu. But that will not use the data from software-center.

Show pop-up alert on low battery (#28037)

Some users have reported on Brainstorm that they are not alerted frequently enough when their laptop's battery is low, as they clearly ought to be.

This is an odd one, because there are already several power alert levels and this has been working well for us for some time. Nevertheless, enough people have voted for this idea that there must be something behind it, perhaps a bug that only affects certain systems. Martin Pitt, technical lead of the Ubuntu desktop team, has responded directly to the Brainstorm idea with a description of the current system and how to file a bug when it does not work as intended.

[/ubuntu] permanent link

Tue, 15 Mar 2011

Wubi bug 693671

I spent most of last week working on Ubuntu bug 693671 ("wubi install will not boot - phase 2 stops with: Try (hd0,0): NTFS5"), which was quite a challenge to debug since it involved digging into parts of the Wubi boot process I'd never really touched before. Since I don't think much of this is very well-documented, I'd like to spend a bit of time explaining what was involved, in the hope that it will help other developers in the future.

Wubi is a system for installing Ubuntu into a file in a Windows filesystem, so that it doesn't require separate partitions and can be uninstalled like any other Windows application. The purpose of this is to make it easy for Windows users to try out Ubuntu without the need to worry about repartitioning, before they commit to a full installation. Wubi started out as an external project, and initially patched the installer on the fly to do all the rather unconventional things it needed to do; we integrated it into Ubuntu 8.04 LTS, which involved turning these patches into proper installer facilities that could be accessed using preseeding, so that Wubi only needs to handle the Windows user interface and other Windows-specific tasks.

Anyone familiar with a GNU/Linux system's boot process will immediately see that this isn't as simple as it sounds. Of course, ntfs-3g is a pretty solid piece of software so we can handle the Windows filesystem without too much trouble, and loopback mounts are well-understood so we can just have the initramfs loop-mount the root filesystem. Where are you going to get the kernel and initramfs from, though? Well, we used to copy them out to the NTFS filesystem so that GRUB could read them, but this was overly complicated and error-prone. When we switched to GRUB 2, we could instead use its built-in loopback facilities, and we were able to simplify this. So all was more or less well, except for the elephant in the room. How are you going to load GRUB?

In a Wubi installation, NTLDR (or BOOTMGR in Windows Vista and newer) still owns the boot process. Ubuntu is added as a boot menu option using BCDEdit. You might then think that you can just have the Windows boot loader chain-load GRUB. Unfortunately, NTLDR only loads 16 sectors - 8192 bytes - from disk. GRUB won't fit in that: the smallest core.img you can generate at the moment is over 18 kilobytes. Thus, you need something that is small enough to be loaded by NTLDR, but that is intelligent enough to understand NTFS to the point where it can find a particular file in the root directory of a filesystem, load boot loader code from it, and jump to that. The answer for this was GRUB4DOS. Most of GRUB4DOS is based on GRUB Legacy, which is not of much interest to us any more, but it includes an assembly-language program called GRLDR that supports doing this very thing for FAT, NTFS, and ext2. In Wubi, we build GRLDR as wubildr.mbr, and build a specially-configured GRUB core image as wubildr.

Now, the messages shown in the bug report suggested a failure either within GRLDR or very early in GRUB. The first thing I did was to remember that GRLDR has been integrated into the grub-extras ntldr-img module suitable for use with GRUB 2, so I tried building wubildr.mbr from that; no change, but this gave me a modern baseline to work on. OK; now to try QEMU (you can use tricks like qemu -hda /dev/sda if you're very careful not to do anything that might involve writing to the host filesystem from within the guest, such as recursively booting your host OS ... [update: Tollef Fog Heen and Zygmunt Krynicki both point out that you can use the -snapshot option to make this safer]). No go; it hung somewhere in the middle of NTLDR. Still, I could at least insert debug statements, copy the built wubildr.mbr over to my test machine, and reboot for each test, although it would be slow and tedious. Couldn't I?

Well, yes, I mostly could, but that 8192-byte limit came back to bite me, along with an internal 2048-byte limit that GRLDR allocates for its NTFS bootstrap code. There were only a few spare bytes. Something like this would more or less fit, to print a single mark character at various points so that I could see how far it was getting:

	pushal
	xorw	%bx, %bx	/* video page 0 */
	movw	$0x0e4d, %ax	/* print 'M' */
	int	$0x10
	popal

In a few places, if I removed some code I didn't need on my test machine (say, CHS compatibility), I could even fit in cheap and nasty code to print a single register in hex (as long as you didn't mind 'A' to 'F' actually being ':' to '?' in ASCII; and note that this is real-mode code, so the loop counter is %cx not %ecx):

	/* print %edx in dumbed-down hex */
	pushal
	xorw	%bx, %bx
	movb	$0xe, %ah
	movw	$8, %cx
1:
	roll	$4, %edx
	movb	%dl, %al
	andb	$0xf, %al
	int	$0x10
	loop	1b
	popal

After a considerable amount of work tracking down problems by bisection like this, I also observed that GRLDR's NTFS code bears quite a bit of resemblance in its logical flow to GRUB 2's NTFS module, and indeed the same person wrote much of both. Since I knew that the latter worked, I could use it to relieve my brain of trying to understand assembly code logic directly, and could compare the two to look for discrepancies. I did find a few of these, and corrected a simple one. Testing at this point suggested that the boot process was getting as far as GRUB but still wasn't printing anything. I removed some Ubuntu patches which quieten down GRUB's startup: still nothing - so I switched my attentions to grub-core/kern/i386/pc/startup.S, which contains the first code executed from GRUB's core image. Code before the first call to real_to_prot (which switches the processor into protected mode) succeeded, while code after that point failed. Even more mysteriously, code added to real_to_prot before the actual switch to protected mode failed too. Now I was clearly getting somewhere interesting, but what was going on? What I really wanted was to be able to single-step, or at least see what was at the memory location it was supposed to be jumping to.

Around this point I was venting on IRC, and somebody asked if it was reproducible in QEMU. Although I'd tried that already, I went back and tried again. Ubuntu's qemu is actually built from qemu-kvm, and if I used qemu -no-kvm then it worked much better. Excellent! Now I could use GDB:

(gdb) target remote | qemu -gdb stdio -no-kvm -hda /dev/sda

This let me run until the point when NTLDR was about to hand over control, then interrupt and set a breakpoint at 0x8200 (the entry point of startup.S). This revealed that the address that should have been real_to_prot was in fact garbage. I set a breakpoint at 0x7c00 (GRLDR's entry point) and stepped all the way through to ensure it was doing the right thing. In the process it was helpful to know that GDB and QEMU don't handle real mode very well between them. Useful tricks here were:

  • Use set architecture i8086 before disassembling real-mode code (and set architecture i386 to switch back).
  • GDB prints addresses relative to the current segment base, but if you want to enter an address then you need to calculate a linear address yourself. For example, breakpoints must be set at (CS << 4) + IP, rather than just at IP.

Single-stepping showed that GRLDR was loading the entirety of wubildr correctly and jumping to it. The first instruction it jumped to wasn't in startup.S, though, and then I remembered that we prefix the core image with grub-core/boot/i386/pc/lnxboot.S. Stepping through this required a clear head since it copies itself around and changes segment registers a few times. The interesting part was at real_code_2, where it copies a sector of the kernel to the target load address, and then checks a known offset to find out whether the "kernel" is in fact GRUB rather than a Linux kernel. I checked that offset by hand, and there was the smoking gun. GRUB recently acquired Reed-Solomon error correction on its core image, to allow it to recover from other software writing over sectors in the boot track. This moved the magic number lnxboot.S was checking somewhat further into the core image, after the first sector. lnxboot.S couldn't find it because it hadn't copied it yet! A bit of adjustment and all was well again.

The lesson for me from all of this has been to try hard to get an interactive debugger working. Really hard. It's worth quite a bit of up-front effort if it saves you from killing neurons stepping through pages of code by hand. I think the real-mode debugging tricks I picked up should be useful for working on GRUB in the future.

[/ubuntu] permanent link

Mon, 06 Dec 2010

NTP synchronisation problems

The Ubuntu Technical Board is currently conducting a review of the top ten Brainstorm issues users have raised about Ubuntu, and Matt asked me to investigate and respond to Idea #25301: Keeping the time accurate over the Internet by default.

My first reaction was "hey, that's odd - I thought we already did that?". We install the ntpdate package by default (although it's deprecated upstream in favour of other tools, but that shouldn't be important here). ntpdate is run from /etc/network/if-up.d/ntpdate, in other words every time you connect to a network, which should be acceptably frequent for most people, so it really ought to Just Work by default. But this is one of the top ten problems where users have gone to the trouble of proposing solutions on Brainstorm, so it couldn't be that simple. What was going on?

I brought up a clean virtual machine with a development version of Natty (the current Ubuntu development version, which will eventually become 11.04), and had a look in its logs: it was indeed synchronising its time from ntp.ubuntu.com, and I didn't think anything in that area had changed recently. On the other hand, I had occasionally noticed that my own laptop wasn't always synchronising its time quite right, but I'd put it down to local weirdness as my network isn't always very stable. Maybe this wasn't so local after all?

So, I started tracing through the scripts to figure out what was going on. It turned out that I had an empty /etc/ntp.conf file on my laptop. The /usr/sbin/ntpdate-debian script assumed that that meant I had a full NTP server installed (I don't), and fetched the list of servers from it; since the file was empty, it ended up synchronising time from no servers, that is, not synchronising at all. I removed the file and all was well.

That left the question of where that file came from. It didn't seem to be owned by any package; I was pretty sure I hadn't created it by hand either. I had a look through some bug reports, and soon found ntpdate 1:4.2.2.p4+dfsg-1ubuntu2 has a flawed configuration file. It turns out that time-admin (System -> Administration -> Time and Date) creates an empty /etc/ntp.conf file if you press the reload button (tooltip: "Synchronise now"), as part of an attempt to update NTP configuration. Aha!

Once I knew where the problems were, it was easy to fix them. I've uploaded the following changes, which will be in the 11.04 release:

  • Disregard empty ntp.conf files in ntpdate-debian.
  • Remove an empty /etc/ntp.conf file on fresh installation of the ntp package, so that it doesn't interfere with creating the normal configuration file.
  • Don't create the NTP configuration file in the time-admin backend if it doesn't exist already.

I've also sent these changes to Debian and GNOME as appropriate.

There are still a few problems. The "Synchronise now" button doesn't work quite right in general (bug #90524), and if your network doesn't allow time synchronisation from ntp.ubuntu.com then you'll have to change the value of NTPSERVERS in /etc/default/ntpdate. Furthermore, the time-admin interface is confusing and makes it seem as though the default is not to synchronise the time automatically; this interface is being redesigned at the moment, which should be a good opportunity to make it less confusing, and I will contact the designers to mention this problem. On the whole, though, I think that many fewer people should have this kind of problem in Ubuntu 11.04.

It's always possible that I missed some other problem that breaks automatic time synchronisation for people. Please do file a bug report if it still doesn't work for you in 11.04, or contact me directly (cjwatson at ubuntu.com).

[/ubuntu] permanent link

Mon, 10 May 2010

OpenSSH 5.5p1 for Lucid

For various reasons, I chose to leave Ubuntu 10.04 LTS using OpenSSH 5.3p1. The new features in 5.4p1 such as certificate authentication, the new smartcard handling, netcat mode, and tab-completion in sftp are great, but unfortunately it was available just a little bit too late for me to be able to land it for 10.04 LTS. I realise that many Lucid users want to make use of these features for one reason or another, though, so as a compromise here's a PPA containing OpenSSH 5.5p1 for Lucid.

I intend to keep this up to date for as long as I reasonably can, and I'm happy to accept bug reports on it in the usual place.

[/ubuntu] permanent link

Fri, 13 Nov 2009

Tissue of lies

In case it isn't obvious, in "Ubuntu 9.10 SP1 coming in spring 2010", "Ubuman" is blatantly lying in attributing a number of statements to me. None of the text there was written by me, and if you thought any of it was true then you should probably make sure your troll radar is working properly. Nice joke, but try harder next time - it doesn't even look like my writing style.

(I wouldn't normally bother to respond, since I'm probably just giving it more publicity, but apparently one or two people may already have been taken in by it. One person was sensible enough to write to me and check the facts.)

[/ubuntu] permanent link

Thu, 28 May 2009

code_swarm video of Ubuntu uploads

Joey Hess posted a draft of a code_swarm video for d-i a couple of weeks ago, which reminded me that I've been meaning to do something similar for Ubuntu for a while now as it's just about our archive's fifth birthday. I have a more or less complete archive of all our -changes mailing lists locally (I think I'm missing some of the very early ones, before the end of July 2004; let me know if you were one of the very early Canonical employees and have a record of these), and with the aid of launchpadlib it's fairly easy to map all the e-mail addresses into Launchpad user names, massage out some of the more obvious duplicates, and then treat the stream of uploads as if it were a stream of commits.

If you haven't seen code_swarm before, each dot represents an upload, and the dots "swarm" around their corresponding committers' names; more active committers have larger swarms of dots and brighter names. I assigned a colour to each of our archive components (uploads aren't really at the C code vs. Python code vs. translations vs. whatever kind of granularity that you see in other code_swarm videos), which mostly means that people who predominantly upload to main are in roughly an Ubuntu tan colour, people who predominantly upload to universe are coloured bluish, and people with a good mixture tend to come out coloured green. If I get a bit more time I may try to figure out enough about video editing software to add some captions.

Here's the video (194 MB).

[/ubuntu] permanent link

Thu, 05 Mar 2009

Bug triage, redux

I've been a bit surprised by the strong positive response to my previous post. People generally seemed to think it was quite non-ranty; maybe I should clean the rust off my flamethrower. :-) My hope was that I'd be able to persuade people to change some practices, so I guess that's a good thing.

Of course, there are many very smart people doing bug triage very well, and I don't want to impugn their fine work. Like its medical namesake, bug triage is a skilled discipline. While it's often repetitive, and there are lots of people showing up with similar symptoms, a triage nurse can really make a difference by spotting urgent cases, cleaning up some of the initial blood, and referring the patient quickly to a doctor for attention. Or, if a pattern of cases suddenly appears, a triage nurse might be able to warn of an incipient epidemic. [Note: I have no medical experience, so please excuse me if I'm talking crap here. :-)] The bug triagers who do this well are an absolute godsend; especially when they respond to repetitive tasks with tremendously useful pieces of automation like bughelper. The cases I have trouble with are more like somebody showing up untrained, going through everyone in the waiting room, and telling each of them that they just need to go home, get some rest, and stop complaining so much. Sometimes of course they'll be right, but without taking the time to understand the problem they're probably going to do more harm than good.

Ian Jackson reminded me that it's worth mentioning the purpose of bug reports on free software: namely, to improve the software. The GNU Project has some advice to maintainers on this. I think sometimes we stray into regarding bug reports more like support tickets. In that case it would be appropriate to focus on resolving each case as quickly as possible, if necessary by means of a workaround rather than by a software change, and only bother the developers when necessary. This is the wrong way to look at bug reports, though. The reason that we needed to set up a bug triage community in Ubuntu was that we had a relatively low developer-to-package ratio and a very high user-to-developer ratio, and we were getting a lot of bug reports that weren't fleshed out enough for a developer to investigate them without spending a lot of time in back-and-forth with the reporter, so a number of people volunteered to take care of the initial back-and-forth so that good clear bug reports could be handed over to developers. This is all well and good, and indeed I encouraged it because I was personally finding myself unable to keep up with incoming bugs and actually fix anything at the same time. Somewhere along the way, though, some people got the impression that what we wanted was a first-line support firewall to try to defend developers from users, which of course naturally leads to ideas such as closing wishlist bugs containing ideas because obviously those important developers wouldn't want to be bothered by them, and closing old bugs because clearly they must just be getting in developers' way. Let me be clear about this now: I absolutely appreciate help getting bug reports into a state where I can deal with them efficiently, but I do not want to be defended from my users! I don't have a basis from which to state that all developers feel the same way, but my guess is that most do.

Antti-Juhani Kaijanaho said he'd experienced most of these problems in Debian. I hadn't actually intended my post to go to Planet Debian - I'd forgotten that the "ubuntu" category on my blog goes there too, which generally I see as a feature, but if I'd remembered that I would have been a little clearer that I was talking about Ubuntu bug triage. If I had been talking about Debian bug triage I'd probably have emphasised different things. Nevertheless, it's interesting that at least one Debian (and non-Ubuntu) developer had experienced similar problems.

Justin Dugger mentions a practice of marking duplicate bugs invalid that he has problems with. I agree that this is suboptimal and try not to do it myself. That said, this is not something I object to to the same extent. Given that the purpose of bugs is to improve the software, the real goal is to be able to spend more time fixing bugs, not to get bugs into the ideal state when the underlying problem has already been solved. If it's a choice between somebody having to spend time tracking down the exact duplicate bug number versus fixing another bug, I know which I'd take. Obviously, when doing this, it's worth apologising that you weren't able to find the original bug number, and explaining what the user can do if they believe that you're mistaken (particularly if it's a bug that's believed to be fixed); the stock text people often use for this doesn't seem informative enough to me.

Sebastien Bacher commented that preferred bug triage practices differ among teams: for instance, the Ubuntu desktop team deals with packages that are very much to the forefront of users' attention and so get a lot of duplicate bugs. Indeed - and bug triagers who are working closely with the desktop team on this are almost certainly doing things the way the developers on the desktop team prefer, so I have no problem with that. The best advice I can give bug triagers is that their ultimate aim is to help developers, and so they should figure out which developers they need to work with and go and talk to them! That way, rather than duplicating work or being counterproductive, they can tailor their work to be most effective. Everybody wins.

[/ubuntu] permanent link

Mon, 02 Mar 2009

Bug triage rants

I hate to say this, but often when somebody does lots of bug triage on a package I work on, I find it to be a net loss for me. I end up having to go through all the things that were changed, correct a bunch of them, occasionally pacify angry bug submitters, and all the rest of it, and often the benefits are minimal at best.

I would very much like this not to be the case. Bug triage is supposed to help developers be more efficient, and I think most people who do bug triage are generally well-intentioned and eager to help. Accordingly, here is a series of mini-rants intended to have educational value.

  • Bugs are not like fruit.

    Fruit goes bad if you leave it too long. By and large, bugs don't, especially if they're on software that doesn't change very much. There is no reason why a bug filed against a package in Ubuntu 4.10 where the relevant code hasn't changed much since shouldn't still be perfectly valid. Even if it isn't, it deserves proper consideration.

    My biggest single annoyance with bug triage is people coming around and asking if bugs are still valid when they haven't put any effort into reproducing them themselves. This annoys bug submitters too; every so often somebody replies and says "didn't you even bother to check?". This gives a very bad impression of us as a project - wouldn't it be better if we looked as if we knew what we were talking about? There is a good reason to do this kind of check, of course: random undiagnosed crash reports and the like may well go away due to related changes, and it is occasionally worth checking. But if the bug is already well-understood and/or well-described, you should just go and check whether it's still there rather than asking.

    As I understand it, the intended workflow is that people file bugs, then if they aren't clear enough bug triagers work with the submitter to gather information until they are, then they're passed to developers for further work. We seem to have added an extra step wherein submitters must periodically give their bug a health-check, and if they don't then it gets closed as being out of date. In a small minority of cases this is useful; in most cases, frankly, it makes us look a bit clueless. Can we please stop doing this? The more we waste people's time doing this, the less likely it is that they'll bother to respond to us, and this might help our statistics but doesn't help the project as a whole.

    I know that there's a problem with bug count. I think every project of non-trivial size has that problem. But, honestly, the right answer is to fix more bugs - and, personally, I would be able to spend more time doing that if I weren't often running around trying to make sure that bugs I care about aren't getting overenthusiastically closed just because somebody thinks they've been lying around too long.

    There is a good way to expire bugs like this, of course. It goes something like this: "I've read through your bug and tried to reproduce it with a current release, but I'm afraid I can't do so. Are you still experiencing it? If not, then I think it might have been fixed by [this change I found in the package's history that seems to be related]." You can't do this en masse, but you'll get a much better response from submitters, you'll learn more doing it, and in the process of doing the necessary investigation of each bug you'll find that there are many cases you don't have to ask about at all.

  • Wishlist bugs are not intrinsically bad.

    There are certainly cases where something is far too broad or vague for a bug report; but there are also plenty of cases, probably far more, where the wish in question is a relatively small change to the program, or doesn't need any more sophisticated tracking, and a wishlist bug is just right. If you don't know the program very well, it may be difficult to tell whether a wishlist bug is appropriate or not; in that case, just leave the bug alone.

    Please, for the love of all that's holy, don't close wishlist bugs saying that people should use Brainstorm or write a specification instead! If you don't want to see wishlist bugs in your statistics, just filter them out; it's quite easy to do. Even worse, don't tell people that something probably isn't a good idea when you aren't familiar with the software; people who have gone to the effort of writing up their idea for us deserve a response from somebody who knows the software well. I've encountered cases where friends of mine submitted a bug report (sometimes even at my request) and then a triager told them it was a bad idea and closed their bug. This sort of thing puts people off Ubuntu.

    Specifications are software design documents. As such, they are best written by software designers. People who tell other people to go and write a specification may not realise that as a result of doing this for three years it's now essentially impossible to find anything in the specification system! The intent was never that every user of Ubuntu would need to write a specification to get anything changed; specifications are used by developers to document the results of discussions and write up plans. They are not a straightforward alternative to wishlist bugs, nor do they turn out to work very well as what many formal processes call "requirements documents"; the process of refining the latter in the context of Ubuntu might involve wishlist bugs, mailing list threads, wiki pages, private discussions with developers, or things of that nature, and probably shouldn't involve creating a specification until the requirements-gathering process is well underway.

  • Closing a bug is taking an item off somebody's to-do list.

    You wouldn't go up to a colleague's whiteboard and take an eraser to it unless you were sure that was OK, would you? Yet people seem to do that all the time with bugs. It's OK when the bug is really just like a support request - "help, it crashed, what do I do?" - and either you're pretty sure it's user error or there's just no way to get enough information to fix it. But once the initial triage process is done, now it's on somebody's to-do list.

    This is closely related to ...

  • If a developer has accepted it, leave it alone.

    Every so often I find that there's a bug that I have accepted by way of a bug comment or setting to Triaged or whatever, or even a bug that I filed on a package I work on as a reminder to myself, and somebody comes along and asks for more information or asks if we can still reproduce it or something. The hit rate on this kind of thing is extraordinarily low. There's a good chance that the developer went and verified the bug against the code, and in that case it certainly doesn't need more information (or they would have asked for it) and it probably isn't going to go away without anyone noticing.

    In most other free software projects, developers file bug reports themselves as a reminder about things that need to be done, and people leave them alone unless they're intending to help with the fix. In Ubuntu, developers also have to spend time making sure that those to-do items don't get expired. Nobody is helped by this.

    launchpad-gm-scripts includes a Greasemonkey script called lp_karma_suffix, which can help you to identify developers without having to spend lots of time clicking around.

  • Check whether the package is being actively worked on.

    Some packages are actively worked on in Ubuntu; some aren't (e.g. we just sync packages from Debian, or they're basically orphaned, or whatever). It's worth checking which is which before doing any kind of extensive triage work. If it's being actively worked on, why not go and talk to the developer(s) in question first? It's only polite, and it will probably help you to do a better job.

[/ubuntu] permanent link

Mon, 27 Oct 2008

Totem BBC plugin

A while back, the BBC approached Canonical about providing seamless access to unencumbered BBC content for all Ubuntu users (in the UK and elsewhere). We agreed to approach this by way of a plugin for our primary media player, Totem, and asked Collabora Multimedia to do the plugin development work.

The results are in what will shortly be released as Ubuntu 8.10, and are looking really rather good. At the moment the content available from the BBC at present is mostly audio, but support for video is in place and the feed is expected to be fleshed out here over time. We have a genre classification scheme in place, and will see how that scales as the amount of available content grows. The code has been submitted upstream, although there are still a few issues to work out there.

This is not the same thing as iPlayer; all the content available here is DRM-free. Some of it is geographically restricted to the UK, and these restrictions are handled on the server side to make sure that the client is free of encumbrances.

Christian Schaller from Collabora posted about this a little while ago. Since then, the UI has been improved somewhat and some I/O issues have been fixed to the point where we felt comfortable enabling the BBC plugin (as well as the YouTube plugin) by default in Ubuntu 8.10. Here's a screenshot of the current interface.

This is exciting stuff with a lot of potential. To try it out, run Applications -> Sound & Video -> Movie Player and select the "BBC" entry from the drop-down box labelled "Playlist". If you find bugs, please report them!

[/ubuntu] permanent link

Sat, 12 Apr 2008

Desktop automounting pain

Ubuntu's live CD installer, Ubiquity, needs to suppress desktop automounting while it's doing partitioning and generally messing about with mount points, otherwise its temporary mount points end up busy on unmount due to some smart-arse desktop component that decides to open a window for it.

To date, it employs the following methods, each of which was sufficient at the time:

  • Set the /desktop/gnome/volume_manager/automount_drives and /desktop/gnome/volume_manager/automount_media gconf keys to false.
  • Tell kded to unload its medianotifier module, and load it again just before the installer exits.
  • Set the /apps/nautilus/desktop/volumes_visible gconf key to false.
  • Set the AutomountDrives and AutomountMedia keys in $HOME/.config/Thunar/volmanrc to FALSE.
  • Set the /apps/nautilus/preferences/media_automount and /apps/nautilus/preferences/media_automount_open gconf keys to false.
  • The entire installer is run under hal-lock --interface org.freedesktop.Hal.Device.Storage --exclusive.
  • Set the /apps/nautilus/preferences/media_autorun_never gconf key to true (experimental, but apparently now required since nautilus uses the gio volume monitor).

This is getting ridiculous. Dear desktop implementors: please pick a configuration mechanism and stick to it, and provide backward compatibility if you can't. This is not a rocket-science concept.

I rather liked the hal-lock mechanism; it was simple and involved minimal fuss. I had hoped that it might end up as a standard, but I guess that would be too easy.

[/ubuntu] permanent link

Thu, 31 Jan 2008

Vim omni completion for Launchpad bugs

I hacked together a little timesaver for developers this morning: omni completion for Launchpad bugs in Vim's debchangelog mode. To use it, install vim 7.1-138+1ubuntu3 once it hits the mirrors, open up a debian/changelog file, type "LP: #", and hit Ctrl-X Ctrl-O. It'll think for a while and then give you a list of all the bugs open in Launchpad against the package in question, from which you can select to insert the bug number into your changelog.

Here's a screenshot to make it clearer:

Thanks to Stefano Zacchiroli for doing the same for Debian bugs back in July.

[/ubuntu] permanent link

Tue, 03 Jan 2006

Single-stage installer

Hot on the heels of Joey's tale of getting rid of base-config (the second stage of the installer) in Debian, we've now pretty much got rid of it in Ubuntu Dapper too. The upshot of this is that rather than asking a bunch of questions, installing the base system, and rebooting to install everything else, we now just install everything in one go and reboot into a completed system.

This does mean that, if your system doesn't boot, you don't get to find out about it for a bit longer. However, it has lots of advantages in terms of speed (the much-maligned archive-copier mostly goes away), reducing code duplication (base-config had a bunch of infrastructure of its own which was done better in the core installer anyway), comprehensibility, and killing off some annoying bugs like #13561 (duplicate mirror questions in netboot installs), #15213 (second stage hangs if you skip archive-copier in the first stage), and #19571 (kernel messages scribble over base-config's UI).

To go with Joey's Debian timeline, the Ubuntu history looks a bit like this:

  • 2004 (Jul): First base-config modifications for Ubuntu; we need to install the default desktop rather than dropping into tasksel.
  • 2004 (Aug): Mark phones me up and asks if I can make the installer not need the CD in the second stage by copying all the packages across beforehand. Although it's a bit awkward, I can see the UI advantages in that, so I write archive-copier at the Canonical conference in Oxford.
  • 2004 (Sep): Mark asks me if we can ask the timezone, user/password, and apt configuration questions before the first reboot. With less than a month to go until our first release, I have a heart-attack at how much needs to be done, and it eventually gets deferred to Hoary.
  • 2005 (Jan): Matt fixes up debconf's passthrough frontend for use on the live CD, and we realise that this is an obvious way to run bits of base-config before the first reboot. It's rather messy and takes until March or so before it really works right, but we get there in the end.
  • 2005 (Apr): I get "put a progress bar in front of the dpkg output in the second stage" as a goal for Breezy. Naïvely, I think it's a simple matter of programming, since I'd already done something similar for debootstrap and base-installer the previous year.
  • 2005 (May): I hack progress bar support into debconf. Nothing actually uses it for anything yet, except as a convenient passthrough stub.
  • 2005 (Jul/Aug): I actually try to implement the second-stage progress bar and realise that it's about an order of magnitude harder than I thought, requiring a whole load of extra infrastructure in apt. Fortunately Michael Vogt saves the day here by writing lots of working code, and the progress bar works by early August.
  • 2005 (Sep-Dec): Upstream d-i development ramps back up again, with tzsetup, clock-setup, apt-setup, and user-setup all being cranked out in short order and the corresponding pieces removed from base-config. I merge these as they mature, and manage to get agreement on including the Ubuntu debconf template changes in upstream apt-setup, which helps the diff size a lot.
  • 2005 (Nov/Dec): Joey and I chat one evening about the Ubuntu second-stage progress bar work, and we end up designing and writing debconf-apt-progress based on its ideas, after which Joey knocks up pkgsel in no time flat.
  • 2006 (Jan): The rest of the pieces land in Ubuntu, and we drop base-config out of the installer. To my surprise, nearly everything still just works.

Although it caused some friction, I'm glad that we did the first cuts of many of these things outside Debian and got to try things out before landing version-2-quality code in Debian. The end result is much nicer than the intermediate ones ever were.

[/ubuntu] permanent link