APT resolver bugs
I’ve managed to go for eleven years working on Debian and nearly eight on
Ubuntu without ever needing to teach myself how APT’s resolver works. I get
the impression that there’s a certain mystique about it in general
(alternatively, I’m just the last person to figure this out). Recently,
though, I had a couple of Ubuntu upgrade bugs to fix that turned out to be
bugs in the resolver, and I thought it might be interesting to walk through
the process of fixing them based on the Debug::pkgProblemResolver=true
log files.
Breakage with Breaks
The first was Ubuntu bug #922485 (apt.log). To understand the log, you first need to know that APT makes up to ten passes of the resolver to attempt to fix broken dependencies by upgrading, removing, or holding back packages; if there are still broken packages after this point, it’s generally because it’s got itself stuck in some kind of loop, and it bails out rather than carrying on forever. The current pass number is shown in each “Investigating” log entry, so they start with “Investigating (0)” and carry on up to at most “Investigating (9)”. Any packages that you see still being investigated on the tenth pass are probably something to do with whatever’s going wrong.
In this case, most packages have been resolved by the end of the fourth
pass, but xserver-xorg-core
is causing some trouble. (Not a particular
surprise, as it’s an important package with lots of relationships.) We can
see that each breakage is:
Broken xserver-xorg-core:i386 Breaks on xserver-xorg-video-6 [ i386 ] < none > ( none )
This is a
Breaks
(a relatively new package relationship type introduced a few years ago as a
sort of weaker form of Conflicts
) on a virtual package, which means that
in order to unpack xserver-xorg-core
each package that provides
xserver-xorg-video-6
must be deconfigured. Much like Conflicts
, APT
responds to this by upgrading providing packages to versions that don’t
provide the offending virtual package if it can, and otherwise removing
them. We can see it doing just that in the log (some lines omitted):
Investigating (0) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-tseng:i386
Investigating (1) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-i740:i386
Investigating (2) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-nv:i386
OK, so that makes sense - presumably upgrading those packages didn’t help at
the time. But look at the pass numbers. Rather than just fixing all the
packages that provide xserver-xorg-video-6
in a single pass, which it
would be perfectly able to do, it only fixes one per pass. This means that
if a package Breaks
a virtual package which is provided by more than ten
installed packages, the resolver will fail to handle that situation. On
inspection of the code, this was being handled correctly for Conflicts
by
carrying on through the list of possible targets for the dependency relation
in that case, but apparently when Breaks
support was implemented in APT
this case was overlooked. The fix is to carry on through the list of
possible targets for any “negative” dependency relation, not just
Conflicts
, and I’ve filed a patch as Debian
bug #657695.
My cup overfloweth
The second bug I looked at was Ubuntu bug #917173 (apt.log). Just as in the previous case, we can see the resolver “running out of time” by reaching the end of the tenth pass with some dependencies still broken. This one is a lot less obvious, though. The last few entries clearly indicate that the resolver is stuck in a loop:
Investigating (8) dpkg [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( admin )
Broken dpkg:i386 Breaks on dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils ) (< 1.15.8)
Considering dpkg-dev:i386 29 as a solution to dpkg:i386 7205
Upgrading dpkg-dev:i386 due to Breaks field in dpkg:i386
Investigating (8) dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils )
Broken dpkg-dev:i386 Depends on libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl ) (= 1.16.1.2ubuntu5)
Considering libdpkg-perl:i386 12 as a solution to dpkg-dev:i386 29
Holding Back dpkg-dev:i386 rather than change libdpkg-perl:i386
Investigating (9) dpkg [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( admin )
Broken dpkg:i386 Breaks on dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils ) (< 1.15.8)
Considering dpkg-dev:i386 29 as a solution to dpkg:i386 7205
Upgrading dpkg-dev:i386 due to Breaks field in dpkg:i386
Investigating (9) dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils )
Broken dpkg-dev:i386 Depends on libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl ) (= 1.16.1.2ubuntu5)
Considering libdpkg-perl:i386 12 as a solution to dpkg-dev:i386 29
Holding Back dpkg-dev:i386 rather than change libdpkg-perl:i386
The new version of dpkg
requires upgrading dpkg-dev
, but it can’t
because of something wrong with libdpkg-perl
. Following the breadcrumb
trail back through the log, we find:
Investigating (1) libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl )
Broken libdpkg-perl:i386 Depends on perl [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Considering perl:i386 1472 as a solution to libdpkg-perl:i386 12
Holding Back libdpkg-perl:i386 rather than change perl:i386
Investigating (1) perl [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Broken perl:i386 Depends on perl-base [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl ) (= 5.14.2-6ubuntu1)
Considering perl-base:i386 5806 as a solution to perl:i386 1472
Removing perl:i386 rather than change perl-base:i386
Investigating (1) perl-base [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Broken perl-base:i386 PreDepends on libc6 [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs ) (>= 2.11)
Considering libc6:i386 -17473 as a solution to perl-base:i386 5806
Added libc6:i386 to the remove list
Investigating (0) libc6 [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs )
Broken libc6:i386 Depends on libc-bin [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs ) (= 2.11.1-0ubuntu7.8)
Considering libc-bin:i386 10358 as a solution to libc6:i386 -17473
Removing libc6:i386 rather than change libc-bin:i386
So ultimately the problem is something to do with libc6; but what? As
Steve Langasek said in the
bug,
libc6’s dependencies have been very carefully structured, and surely we
would have seen some hint of it elsewhere if they were wrong. At this point
ideally I wanted to break out GDB or at the very least experiment a bit with
apt-get
, but due to some tedious local problems I hadn’t been able to
restore the apt-clone
state file for this bug onto my system so that I
could attack it directly. So I fell back on the last refuge of the
frustrated debugger and sat and thought about it for a bit.
Eventually I noticed something. The numbers after the package names in the
third line of each of these log entries are “scores”: roughly, the more
important a package is, the higher its score should be. The function that
calculates these is pkgProblemResolver::MakeScores()
in
apt-pkg/algorithms.cc.
Reading this, I noticed that the various values added up to make each score
are almost all provably positive, for example:
Scores[I->ID] += abs(OldScores[D.ParentPkg()->ID]);
The only exceptions are an initial -1 or -2 points for Priority: optional
or Priority: extra
packages respectively, or some values that could
theoretically be configured to be negative but weren’t in this case. OK.
So how come libc6
has such a huge negative score of -17473, when one would
normally expect it to be an extremely powerful package with a large positive score?
Oh. This is computer programming, not mathematics … and each score is
stored in a signed short
, so in a sufficiently large upgrade all those
bonus points add up to something larger than 32767 and everything goes
haywire. Bingo. Make it an int
instead - the number of installed
packages is going to be on the order of tens of thousands at most, so it’s
not as though it’ll make a substantial difference to the amount of memory
used - and chances are everything will be fine. I’ve filed a patch as
Debian bug #657732.
I’d expected this to be a pretty challenging pair of bugs. While I certainly haven’t lost any respect for the APT maintainers for dealing with this stuff regularly, it wasn’t as bad as I thought. I’d expected to have to figure out how to retune some slightly out-of-balance heuristics and not really know whether I’d broken anything else in the process; but in the end both patches were very straightforward.