APT resolver bugs

I’ve managed to go for eleven years working on Debian and nearly eight on Ubuntu without ever needing to teach myself how APT’s resolver works. I get the impression that there’s a certain mystique about it in general (alternatively, I’m just the last person to figure this out). Recently, though, I had a couple of Ubuntu upgrade bugs to fix that turned out to be bugs in the resolver, and I thought it might be interesting to walk through the process of fixing them based on the Debug::pkgProblemResolver=true log files.

Breakage with Breaks

The first was Ubuntu bug #922485 (apt.log). To understand the log, you first need to know that APT makes up to ten passes of the resolver to attempt to fix broken dependencies by upgrading, removing, or holding back packages; if there are still broken packages after this point, it’s generally because it’s got itself stuck in some kind of loop, and it bails out rather than carrying on forever. The current pass number is shown in each “Investigating” log entry, so they start with “Investigating (0)” and carry on up to at most “Investigating (9)”. Any packages that you see still being investigated on the tenth pass are probably something to do with whatever’s going wrong.

In this case, most packages have been resolved by the end of the fourth pass, but xserver-xorg-core is causing some trouble. (Not a particular surprise, as it’s an important package with lots of relationships.) We can see that each breakage is:

Broken xserver-xorg-core:i386 Breaks on xserver-xorg-video-6 [ i386 ] < none > ( none )

This is a Breaks (a relatively new package relationship type introduced a few years ago as a sort of weaker form of Conflicts) on a virtual package, which means that in order to unpack xserver-xorg-core each package that provides xserver-xorg-video-6 must be deconfigured. Much like Conflicts, APT responds to this by upgrading providing packages to versions that don’t provide the offending virtual package if it can, and otherwise removing them. We can see it doing just that in the log (some lines omitted):

Investigating (0) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-tseng:i386
Investigating (1) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-i740:i386
Investigating (2) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-nv:i386

OK, so that makes sense - presumably upgrading those packages didn’t help at the time. But look at the pass numbers. Rather than just fixing all the packages that provide xserver-xorg-video-6 in a single pass, which it would be perfectly able to do, it only fixes one per pass. This means that if a package Breaks a virtual package which is provided by more than ten installed packages, the resolver will fail to handle that situation. On inspection of the code, this was being handled correctly for Conflicts by carrying on through the list of possible targets for the dependency relation in that case, but apparently when Breaks support was implemented in APT this case was overlooked. The fix is to carry on through the list of possible targets for any “negative” dependency relation, not just Conflicts, and I’ve filed a patch as Debian bug #657695.

My cup overfloweth

The second bug I looked at was Ubuntu bug #917173 (apt.log). Just as in the previous case, we can see the resolver “running out of time” by reaching the end of the tenth pass with some dependencies still broken. This one is a lot less obvious, though. The last few entries clearly indicate that the resolver is stuck in a loop:

Investigating (8) dpkg [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( admin )
Broken dpkg:i386 Breaks on dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils ) (< 1.15.8)
  Considering dpkg-dev:i386 29 as a solution to dpkg:i386 7205
  Upgrading dpkg-dev:i386 due to Breaks field in dpkg:i386
Investigating (8) dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils )
Broken dpkg-dev:i386 Depends on libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl ) (= 1.16.1.2ubuntu5)
  Considering libdpkg-perl:i386 12 as a solution to dpkg-dev:i386 29
  Holding Back dpkg-dev:i386 rather than change libdpkg-perl:i386
Investigating (9) dpkg [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( admin )
Broken dpkg:i386 Breaks on dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils ) (< 1.15.8)
  Considering dpkg-dev:i386 29 as a solution to dpkg:i386 7205
  Upgrading dpkg-dev:i386 due to Breaks field in dpkg:i386
Investigating (9) dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils )
Broken dpkg-dev:i386 Depends on libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl ) (= 1.16.1.2ubuntu5)
  Considering libdpkg-perl:i386 12 as a solution to dpkg-dev:i386 29
  Holding Back dpkg-dev:i386 rather than change libdpkg-perl:i386

The new version of dpkg requires upgrading dpkg-dev, but it can’t because of something wrong with libdpkg-perl. Following the breadcrumb trail back through the log, we find:

Investigating (1) libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl )
Broken libdpkg-perl:i386 Depends on perl [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
  Considering perl:i386 1472 as a solution to libdpkg-perl:i386 12
  Holding Back libdpkg-perl:i386 rather than change perl:i386

Investigating (1) perl [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Broken perl:i386 Depends on perl-base [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl ) (= 5.14.2-6ubuntu1)
  Considering perl-base:i386 5806 as a solution to perl:i386 1472
  Removing perl:i386 rather than change perl-base:i386

Investigating (1) perl-base [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Broken perl-base:i386 PreDepends on libc6 [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs ) (>= 2.11)
  Considering libc6:i386 -17473 as a solution to perl-base:i386 5806
  Added libc6:i386 to the remove list

Investigating (0) libc6 [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs )
Broken libc6:i386 Depends on libc-bin [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs ) (= 2.11.1-0ubuntu7.8)
  Considering libc-bin:i386 10358 as a solution to libc6:i386 -17473
  Removing libc6:i386 rather than change libc-bin:i386

So ultimately the problem is something to do with libc6; but what? As Steve Langasek said in the bug, libc6’s dependencies have been very carefully structured, and surely we would have seen some hint of it elsewhere if they were wrong. At this point ideally I wanted to break out GDB or at the very least experiment a bit with apt-get, but due to some tedious local problems I hadn’t been able to restore the apt-clone state file for this bug onto my system so that I could attack it directly. So I fell back on the last refuge of the frustrated debugger and sat and thought about it for a bit.

Eventually I noticed something. The numbers after the package names in the third line of each of these log entries are “scores”: roughly, the more important a package is, the higher its score should be. The function that calculates these is pkgProblemResolver::MakeScores() in apt-pkg/algorithms.cc. Reading this, I noticed that the various values added up to make each score are almost all provably positive, for example:

Scores[I->ID] += abs(OldScores[D.ParentPkg()->ID]);

The only exceptions are an initial -1 or -2 points for Priority: optional or Priority: extra packages respectively, or some values that could theoretically be configured to be negative but weren’t in this case. OK. So how come libc6 has such a huge negative score of -17473, when one would normally expect it to be an extremely powerful package with a large positive score?

Oh. This is computer programming, not mathematics … and each score is stored in a signed short, so in a sufficiently large upgrade all those bonus points add up to something larger than 32767 and everything goes haywire. Bingo. Make it an int instead - the number of installed packages is going to be on the order of tens of thousands at most, so it’s not as though it’ll make a substantial difference to the amount of memory used - and chances are everything will be fine. I’ve filed a patch as Debian bug #657732.

I’d expected this to be a pretty challenging pair of bugs. While I certainly haven’t lost any respect for the APT maintainers for dealing with this stuff regularly, it wasn’t as bad as I thought. I’d expected to have to figure out how to retune some slightly out-of-balance heuristics and not really know whether I’d broken anything else in the process; but in the end both patches were very straightforward.