Colin Watson's blog - planet-debianhttps://www.chiark.greenend.org.uk/~cjwatson/blog/2024-03-04T10:39:50+00:00Free software activity in January/February 20242024-03-04T10:39:50+00:002024-03-04T10:39:50+00:00Colin Watsontag:www.chiark.greenend.org.uk,2024-03-04:/~cjwatson/blog/activity-2024-02.html<p>Two months into my <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/going-freelance.html">new gig</a> and it’s going
great! <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/task-management.html">Tracking my time</a> has taken a bit of
getting used to, but having something that amounts to a queryable database
of everything I’ve done has also allowed some helpful introspection.</p>
<p>Freexian <a href="https://www.freexian.com/about/debian-contributions/">sponsors</a> up
to 20% of my time …</p><p>Two months into my <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/going-freelance.html">new gig</a> and it’s going
great! <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/task-management.html">Tracking my time</a> has taken a bit of
getting used to, but having something that amounts to a queryable database
of everything I’ve done has also allowed some helpful introspection.</p>
<p>Freexian <a href="https://www.freexian.com/about/debian-contributions/">sponsors</a> up
to 20% of my time on Debian tasks of my choice. In fact I’ve been spending
the bulk of my time on
<a href="https://freexian-team.pages.debian.net/debusine/">debusine</a> which is itself
intended to accelerate work on Debian, but more details on that later.
While I contribute to Freexian’s
<a href="https://www.freexian.com/tags/debian-contributions/">summaries</a> now, I’ve
also decided to start writing monthly posts about my free software activity
as many others do, to get into some more detail.</p>
<h2>January 2024</h2>
<ul>
<li>I <a href="https://salsa.debian.org/ci-team/autopkgtest/-/merge_requests/272">added Incus
support</a>
to autopkgtest. <a href="https://linuxcontainers.org/incus/">Incus</a> is a system
container and virtual machine manager, forked from <a href="https://github.com/canonical/lxd">Canonical’s
<span class="caps">LXD</span></a>. I switched my laptop over to it
and then quickly found that it was inconvenient not to be able to run
Debian package test suites using
<a href="https://manpages.debian.org/man/autopkgtest">autopkgtest</a>, so I tweaked
autopkgtest’s existing <span class="caps">LXD</span> integration to support using either <span class="caps">LXD</span> or Incus.</li>
<li>I discovered <a href="https://metacpan.org/dist/Perl-Critic">Perl::Critic</a> and
used it to tidy up some poor practices in several of my packages,
including debconf. Perl used to be my language of choice but I’ve been
mostly using Python for over a decade now, so I’m not as fluent as I used
to be and some mechanical assistance with spotting common errors is
helpful; besides, I’m generally a big fan of applying static analysis to
everything possible in the hope of reducing bug density. Of course, this
did result in a couple of regressions
(<a href="https://salsa.debian.org/pkg-debconf/debconf/-/commit/4f8b9f969679fa4a38aca8da2702057ea861ffae">1</a>,
<a href="https://salsa.debian.org/pkg-debconf/debconf/-/commit/7274bf66e82b2557156813f93ed0592539a2ac1c">2</a>),
but at least we caught them fairly quickly.</li>
<li>I did some overdue debconf maintenance, mainly around tidying up error
message handling in several places (<a href="https://bugs.debian.org/797071">1</a>,
<a href="https://bugs.debian.org/754123">2</a>,
<a href="https://bugs.debian.org/682508">3</a>).</li>
<li>I did some routine maintenance to move several of my upstream projects to
a new <a href="https://www.gnu.org/software/gnulib/manual/html_node/Stable-Branches.html">Gnulib stable
branch</a>.</li>
<li><a href="https://salsa.debian.org/debian/debmirror">debmirror</a> includes a <a href="https://salsa.debian.org/debian/debmirror/-/blob/master/mirror_size">useful
summary</a>
of how big a Debian mirror is, but it hadn’t been updated since 2010 and
the script to do so had bitrotted quite badly. I <a href="https://salsa.debian.org/debian/debmirror/-/commit/7ae93742377d9205c57b7e47ef96d4663110f0ff">fixed
that</a>
and added a recurring task for myself to refresh this every six months.</li>
</ul>
<h2>February 2024</h2>
<ul>
<li>Some time back I added AppArmor and seccomp confinement to man-db. This
was mainly motivated by a desire to <a href="https://forum.snapcraft.io/t/support-for-man-pages/2299/24">support manual pages in
snaps</a> (which
is <a href="https://bugs.launchpad.net/snapd/+bug/1575593">still open</a> several
years later …), but since reading manual pages involves a <a href="https://www.gnu.org/software/groff/">non-trivial
text processing toolchain mostly written in
C++</a>, I thought it was reasonable to
assume that some day it might have a vulnerability even though its track
record has been good; so <code>man</code> now restricts the system calls that
<code>groff</code> can execute and the parts of the file system that it can access.
I stand by this, but it did cause some problems that have needed a
succession of small fixes over the years. This month I issued
<a href="https://lists.debian.org/debian-lts-announce/2024/02/msg00001.html"><span class="caps">DLA</span>-3731-1</a>,
backporting some of those fixes to buster.</li>
<li>I spent some time chasing a <a href="https://bugs.debian.org/1063413">console-setup build
failure</a> following the removal of
kFreeBSD support, which was uploaded by mistake. I suggested a <a href="https://salsa.debian.org/holgerw/console-setup/-/merge_requests/1">set of
fixes</a>
for this, but the author of the change to remove kFreeBSD support decided
to take a different approach (fair enough), so I’ve abandoned this.</li>
<li>I updated the <a href="https://tracker.debian.org/pkg/zope.testrunner">Debian zope.testrunner
package</a> to 6.3.1.</li>
<li>openssh:<ul>
<li>A Freexian collaborator had a problem with automating installations
involving changes to <code>/etc/ssh/sshd_config</code>. This turned out to be
resolvable without any changes, but in the process of investigating I
noticed that my dodgy arrangements to avoid
<a href="https://manpages.debian.org/man/ucf">ucf</a> prompts in certain cases
had bitrotted slightly, which meant that some people might be prompted
unnecessarily. I <a href="https://salsa.debian.org/ssh-team/openssh/-/commit/b9671cc74475922fa61e9ebdba56ec84446d19ac">fixed this and arranged for it not to happen
again</a>.</li>
<li>Following a <a href="https://lists.debian.org/debian-devel/2024/02/msg00239.html">recent debian-devel
discussion</a>,
I realized that some particularly awkward code in the OpenSSH
packaging was now obsolete, and <a href="https://salsa.debian.org/ssh-team/openssh/-/commit/a6c7b9ef532489671e3a654ad38102cc30d94b5a">removed
it</a>.</li>
</ul>
</li>
<li>I backported a <a href="https://bugs.debian.org/1027387">python-channels-redis
fix</a> to bookworm. I wasn’t the first
person to run into this, but I rediscovered it while working on debusine
and it was confusing enough that it seemed worth fixing in stable.</li>
<li>I fixed a <a href="https://bugs.debian.org/1064699">simple build failure in
storm</a>.</li>
<li>I dug into a very confusing cluster of celery build failures
(<a href="https://bugs.debian.org/1056232">1</a>,
<a href="https://bugs.debian.org/1058317">2</a>,
<a href="https://bugs.debian.org/1063345">3</a>), and tracked the hardest bit down
to a <a href="https://github.com/python/cpython/issues/115874">Python 3.12
regression</a>, now fixed
in unstable thanks to Stefano Rivera. Getting celery back into testing
is blocked on the <a href="https://wiki.debian.org/ReleaseGoals/64bit-time">64-bit <code>time_t</code>
transition</a> for now, but
once that’s out of the way it should flow smoothly again.</li>
</ul>Task management2024-01-17T13:28:19+00:002024-01-17T13:28:19+00:00Colin Watsontag:www.chiark.greenend.org.uk,2024-01-17:/~cjwatson/blog/task-management.html<p>Now that I’m <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/going-freelance.html">freelancing</a>, I need to
actually track my time, which is something I’ve had the luxury of not having
to do before. That meant something of a rethink of the way I’ve been
keeping track of my to-do list. Up to now that was a …</p><p>Now that I’m <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/going-freelance.html">freelancing</a>, I need to
actually track my time, which is something I’ve had the luxury of not having
to do before. That meant something of a rethink of the way I’ve been
keeping track of my to-do list. Up to now that was a combination of things
like the bug lists for the projects I’m working on at the moment, whatever
task tracking system Canonical was using at the moment (Jira when I left),
and a giant flat text file in which I recorded logbook-style notes of what
I’d done each day plus a few extra notes at the bottom to remind myself of
particularly urgent tasks. I <em>could</em> have started manually adding times to
each logbook entry, but ugh, let’s not.</p>
<p>In general, I had the following goals (which were a bit reminiscent of my
<a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/new-address-book.html">address book</a>):</p>
<ul>
<li>free software throughout</li>
<li>storage under my control</li>
<li>ability to annotate tasks with URLs (especially bugs and merge requests)</li>
<li>lightweight time tracking (I’m <span class="caps">OK</span> with having to explicitly tell it when
I start and stop tasks)</li>
<li>ability to drive everything from the command line</li>
<li>decent filtering so I don’t have to look at my entire to-do list all the time</li>
<li>ability to easily generate billing information for multiple clients</li>
<li>optionally, integration with Android (mainly so I can tick off personal
tasks like “change bedroom lightbulb” or whatever that don’t involve
being near a computer)</li>
</ul>
<p>I didn’t do an elaborate evaluation of multiple options, because I’m not
trying to come up with the best possible solution for a client here. Also,
there are a bazillion to-do list trackers out there and if I tried to
evaluate them all I’d never do anything else. I just wanted something that
works well enough for me.</p>
<p>Since it <a href="https://fosstodon.org/@dondelelcaro/111682622624262162">came up on
Mastodon</a>: a bunch
of people swear by <a href="https://orgmode.org/">Org mode</a>, which I know can do at
least some of this sort of thing. However, I don’t use Emacs and don’t plan
to use Emacs. <a href="https://github.com/nvim-orgmode/orgmode">nvim-orgmode</a> does
have some support for time tracking, but when I’ve tried <code>vim</code>-based
versions of Org mode in the past I’ve found they haven’t really fitted my
brain very well.</p>
<h2>Taskwarrior and Timewarrior</h2>
<p>One of the other Freexian collaborators mentioned
<a href="https://taskwarrior.org/">Taskwarrior</a> and
<a href="https://timewarrior.net/">Timewarrior</a>, so I had a look at those.</p>
<p>The basic idea of Taskwarrior is that you have a <code>task</code> command that tracks
each task as a blob of <span class="caps">JSON</span> and provides subcommands to let you add, modify,
and remove tasks with a minimum of friction. <code>task add</code> adds a task, and
you can add metadata like <code>project:Personal</code> (I always make sure every task
has a project, for ease of filtering). Just running <code>task</code> shows you a task
list sorted by Taskwarrior’s idea of urgency, with an <span class="caps">ID</span> for each task, and
there are various other reports with different filtering and verbosity.
<code>task <id> annotate</code> lets you attach more information to a task. <code>task <id>
done</code> marks it as done. So far so good, so a redacted version of my to-do
list looks like this:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>task<span class="w"> </span>ls
ID<span class="w"> </span>A<span class="w"> </span>Project<span class="w"> </span>Tags<span class="w"> </span>Description
<span class="m">17</span><span class="w"> </span>Freexian<span class="w"> </span>Add<span class="w"> </span>Incus<span class="w"> </span>support<span class="w"> </span>to<span class="w"> </span>autopkgtest<span class="w"> </span><span class="o">[</span><span class="m">2</span><span class="o">]</span>
<span class="w"> </span><span class="m">7</span><span class="w"> </span>Columbiform<span class="w"> </span>Figure<span class="w"> </span>out<span class="w"> </span>Lloyds<span class="w"> </span>online<span class="w"> </span>banking<span class="w"> </span><span class="o">[</span><span class="m">1</span><span class="o">]</span>
<span class="w"> </span><span class="m">2</span><span class="w"> </span>Debian<span class="w"> </span>Fix<span class="w"> </span>troffcvt<span class="w"> </span><span class="k">for</span><span class="w"> </span>groff<span class="w"> </span><span class="m">1</span>.23.0<span class="w"> </span><span class="o">[</span><span class="m">1</span><span class="o">]</span>
<span class="m">11</span><span class="w"> </span>Personal<span class="w"> </span>Replace<span class="w"> </span>living<span class="w"> </span>room<span class="w"> </span>curtain<span class="w"> </span>rail
</code></pre></div>
<p>Once I got comfortable with it, this was already a big improvement. I
haven’t bothered to learn all the filtering gadgets yet, but it was easy
enough to see that I could do something like <code>task all project:Personal</code> and
it’d show me both pending and completed tasks in that project, and that all
the data was stored in <code>~/.task</code> - though I have to say that there are
enough reporting bells and whistles that I haven’t needed to poke around
manually. In combination with the regular backups that I do anyway (you do
too, right?), this gave me enough confidence to abandon my previous
text-file logbook approach.</p>
<p>Next was time tracking. Timewarrior integrates with Taskwarrior, albeit in
<a href="https://timewarrior.net/docs/taskwarrior/">an only semi-packaged way</a>, and
it was easy enough to set that up. Now I can do:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>task<span class="w"> </span><span class="m">25</span><span class="w"> </span>start
Starting<span class="w"> </span>task<span class="w"> </span>00a9516f<span class="w"> </span><span class="s1">'Write blog post about task tracking'</span>.
Started<span class="w"> </span><span class="m">1</span><span class="w"> </span>task.
Note:<span class="w"> </span><span class="s1">'"Write blog post about task tracking"'</span><span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>new<span class="w"> </span>tag.
Tracking<span class="w"> </span>Columbiform<span class="w"> </span><span class="s2">"Write blog post about task tracking"</span>
<span class="w"> </span>Started<span class="w"> </span><span class="m">2024</span>-01-10T11:28:38
<span class="w"> </span>Current<span class="w"> </span><span class="m">38</span>
<span class="w"> </span>Total<span class="w"> </span><span class="m">0</span>:00:00
You<span class="w"> </span>have<span class="w"> </span>more<span class="w"> </span>urgent<span class="w"> </span>tasks.
Project<span class="w"> </span><span class="s1">'Columbiform'</span><span class="w"> </span>is<span class="w"> </span><span class="m">25</span>%<span class="w"> </span><span class="nb">complete</span><span class="w"> </span><span class="o">(</span><span class="m">3</span><span class="w"> </span>of<span class="w"> </span><span class="m">4</span><span class="w"> </span>tasks<span class="w"> </span>remaining<span class="o">)</span>.
</code></pre></div>
<p>When I stop work on something, I do <code>task active</code> to find the <span class="caps">ID</span>, then <code>task
<id> stop</code>. Timewarrior does the tedious stopwatch business for me, and I
can manually enter times if I forget to start/stop a task. Then the really
useful bit: I can do something like <code>timew summary :month <name-of-client></code>
and it tells me how much to bill that client for this month. Perfect.</p>
<p>I also started using <a href="https://github.com/vit-project/vit"><span class="caps">VIT</span></a> to simplify
the day-to-day flow a little, which means I’m normally just using one or two
keystrokes rather than typing longer commands. That isn’t really necessary
from my point of view, but it does save some time.</p>
<h2>Android integration</h2>
<p>I left Android integration for a bit later since it wasn’t essential. When
I got round to it, I have to say that it felt a bit clumsy, but it did
eventually work.</p>
<p>The first step was to <a href="https://gothenburgbitfactory.github.io/taskserver-setup/">set up a
taskserver</a>. Most
of the setup procedure was <span class="caps">OK</span>, but I wanted to use Let’s Encrypt to minimize
the amount of messing around with CAs I had to do. Getting this to work
involved hitting things with sticks a bit, and there’s still a local <span class="caps">CA</span>
involved for client certificates. What I ended up with was a <code>certbot</code>
setup with the <code>webroot</code> authenticator and a custom deploy hook as follows
(with <code>cert_name</code> replaced by a <span class="caps">DNS</span> name in my house domain):</p>
<div class="highlight"><pre><span></span><code><span class="ch">#! /bin/sh</span>
<span class="nb">set</span><span class="w"> </span>-eu
<span class="nv">cert_name</span><span class="o">=</span>taskd.example.org
<span class="nv">found</span><span class="o">=</span><span class="nb">false</span>
<span class="k">for</span><span class="w"> </span>domain<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nv">$RENEWED_DOMAINS</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s2">"</span><span class="nv">$domain</span><span class="s2">"</span><span class="w"> </span><span class="k">in</span>
<span class="w"> </span><span class="nv">$cert_name</span><span class="o">)</span>
<span class="w"> </span><span class="nv">found</span><span class="o">=</span>:
<span class="w"> </span><span class="p">;;</span>
<span class="w"> </span><span class="k">esac</span>
<span class="k">done</span>
<span class="nv">$found</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">exit</span><span class="w"> </span><span class="m">0</span>
install<span class="w"> </span>-m<span class="w"> </span><span class="m">644</span><span class="w"> </span><span class="s2">"/etc/letsencrypt/live/</span><span class="nv">$cert_name</span><span class="s2">/fullchain.pem"</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/var/lib/taskd/pki/fullchain.pem
install<span class="w"> </span>-m<span class="w"> </span><span class="m">640</span><span class="w"> </span>-g<span class="w"> </span>Debian-taskd<span class="w"> </span><span class="s2">"/etc/letsencrypt/live/</span><span class="nv">$cert_name</span><span class="s2">/privkey.pem"</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/var/lib/taskd/pki/privkey.pem
systemctl<span class="w"> </span>restart<span class="w"> </span>taskd.service
</code></pre></div>
<p>I could then set this in <code>/etc/taskd/config</code> (<code>server.crl.pem</code> and
<code>ca.cert.pem</code> were generated using the documented taskserver setup procedure):</p>
<div class="highlight"><pre><span></span><code><span class="n">server</span><span class="o">.</span><span class="n">key</span><span class="o">=/</span><span class="k">var</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">taskd</span><span class="o">/</span><span class="n">pki</span><span class="o">/</span><span class="n">privkey</span><span class="o">.</span><span class="n">pem</span>
<span class="n">server</span><span class="o">.</span><span class="n">cert</span><span class="o">=/</span><span class="k">var</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">taskd</span><span class="o">/</span><span class="n">pki</span><span class="o">/</span><span class="n">fullchain</span><span class="o">.</span><span class="n">pem</span>
<span class="n">server</span><span class="o">.</span><span class="n">crl</span><span class="o">=/</span><span class="k">var</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">taskd</span><span class="o">/</span><span class="n">pki</span><span class="o">/</span><span class="n">server</span><span class="o">.</span><span class="n">crl</span><span class="o">.</span><span class="n">pem</span>
<span class="n">ca</span><span class="o">.</span><span class="n">cert</span><span class="o">=/</span><span class="k">var</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">taskd</span><span class="o">/</span><span class="n">pki</span><span class="o">/</span><span class="n">ca</span><span class="o">.</span><span class="n">cert</span><span class="o">.</span><span class="n">pem</span>
</code></pre></div>
<p>Then I could set <code>taskd.ca</code> on my laptop to
<code>/usr/share/ca-certificates/mozilla/ISRG_Root_X1.crt</code> and otherwise follow
the client setup instructions, run <code>task sync init</code> to get things started,
and then <code>task sync</code> every so often to sync changes between my laptop and
the taskserver.</p>
<p>I used <a href="https://play.google.com/store/apps/details?id=com.ccextractor.taskwarriorflutter">TaskWarrior
Mobile</a>
as the client. I have to say I wouldn’t want to use that client as my
primary task tracking interface: the setup procedure is clunky even beyond
the necessity of copying a client certificate around, it expects you to give
it a <code>.taskrc</code> rather than having a proper settings interface for that, and
it only seems to let you add a task if you specify a due date for it. It
also lacks Timewarrior integration, so I can only really use it when I don’t
care about time tracking, e.g. personal tasks. But that’s really all I
need, so it meets my minimum requirements.</p>
<h2>Next?</h2>
<p>Considering this is literally the first thing I tried, I have to say I’m
pretty happy with it. There are a bunch of optional extras I haven’t tried
yet, but in general it kind of has the <code>vim</code> nature for me: if I need
something it’s very likely to exist or easy enough to build, but the
features I don’t use don’t get in my way.</p>
<p>I wouldn’t recommend any of this to somebody who didn’t already spend most
of their time in a terminal - but I do. I’m glad people have gone to all
the effort to build this so I didn’t have to.</p>OpenUK New Year’s Honours2024-01-15T16:15:49+00:002024-01-15T16:15:49+00:00Colin Watsontag:www.chiark.greenend.org.uk,2024-01-15:/~cjwatson/blog/openuk-new-years-honours.html<p>Apparently I got an <a href="https://openuk.uk/2024-honours-list/">honour</a> from OpenUK.</p>
<p>There are a bunch of people I know on that list. Chris Lamb and Mark Brown
are familiar names from <a href="https://www.debian.org/">Debian</a>. Colin King and
Jonathan Riddell are people I know from past work in
<a href="https://ubuntu.com/">Ubuntu</a>. I’ve admired David MacIver’s work on …</p><p>Apparently I got an <a href="https://openuk.uk/2024-honours-list/">honour</a> from OpenUK.</p>
<p>There are a bunch of people I know on that list. Chris Lamb and Mark Brown
are familiar names from <a href="https://www.debian.org/">Debian</a>. Colin King and
Jonathan Riddell are people I know from past work in
<a href="https://ubuntu.com/">Ubuntu</a>. I’ve admired David MacIver’s work on
<a href="https://hypothesis.works/">Hypothesis</a> and Richard Hughes’ work on
<a href="https://fwupd.org/">firmware updates</a> from afar. And there are a bunch of
other excellent projects represented there:
<a href="https://www.openstreetmap.org/">OpenStreetMap</a>,
<a href="https://www.textualize.io/">Textualize</a>, and my alma mater of
<a href="https://www.cam.ac.uk/">Cambridge</a> to name but a few.</p>
<p>My friend Stuart Langridge
<a href="https://www.kryogenix.org/days/2021/01/10/openuk-honours/">wrote</a> about
being on a similar list a few years ago, and I can’t do much better than to
echo it: in particular he wrote about the way the open source development
community is often at best unwelcoming to people who don’t look like Stuart
and I do. I can’t tell a whole lot about demographic distribution just by
looking at a list of names, but while these honours still seem to be skewed
somewhat male, I’m fairly sure they’re doing a lot better in terms of gender
balance than my “home” project of Debian is, for one. I hope this is a sign
of improvement for the future, and I’ll do what I can to pay it forward.</p>Going freelance2024-01-10T09:50:23+00:002024-01-10T09:50:23+00:00Colin Watsontag:www.chiark.greenend.org.uk,2024-01-10:/~cjwatson/blog/going-freelance.html<p>I’ve mentioned this in a
<a href="https://mastodon.ie/@cjwatson/111348289616136892">couple</a> of
<a href="https://www.linkedin.com/posts/colin-watson-79535025b_columbiform-activity-7138117110676779008-ooSD">other</a>
places, but I realized I never got round to posting about it on my own blog
rather than on other people’s services. How remiss of me.</p>
<p>Anyway: after much soul-searching, I decided a few months ago that it was
time …</p><p>I’ve mentioned this in a
<a href="https://mastodon.ie/@cjwatson/111348289616136892">couple</a> of
<a href="https://www.linkedin.com/posts/colin-watson-79535025b_columbiform-activity-7138117110676779008-ooSD">other</a>
places, but I realized I never got round to posting about it on my own blog
rather than on other people’s services. How remiss of me.</p>
<p>Anyway: after much soul-searching, I decided a few months ago that it was
time for me to move on from <a href="https://canonical.com/">Canonical</a> and the
<a href="https://launchpad.net/">Launchpad</a> team there. Nearly 20 years is a long
time to spend at any company, and although there are a bunch of people I’ll
miss, Launchpad is in a reasonable state where I can let other people have a turn.</p>
<p>I’m now in business for myself as a freelance developer! My new company is
<a href="https://www.columbiform.co.uk/">Columbiform</a>, and I’m focusing on Debian
packaging and custom Python development. My
<a href="https://www.columbiform.co.uk/services.html">services</a> page has some
self-promotion on the sorts of things I can do.</p>
<p>My first gig, and the one that made it viable to make this jump, is at
<a href="https://www.freexian.com/">Freexian</a> where I’m helping with an exciting
infrastructure project that we hope will start making Debian developers’
lives easier in the near future. This is likely to take up most of my time
at least through to the end of 2024, but I may have some spare cycles.
<a href="https://www.columbiform.co.uk/contact.html">Drop me a line</a> if you have
something where you think I could be a good fit, and we can have a talk
about it.</p>Reproducible man-db databases2022-10-16T16:54:32+01:002022-10-16T16:54:32+01:00Colin Watsontag:www.chiark.greenend.org.uk,2022-10-16:/~cjwatson/blog/reproducible-man-db-databases.html<p>I’ve released man-db 2.11.0
(<a href="https://lists.nongnu.org/archive/html/man-db-announce/2022-10/msg00000.html">announcement</a>,
<a href="https://gitlab.com/cjwatson/man-db/-/blob/2.11.0/NEWS.md"><span class="caps">NEWS</span></a>), and
uploaded it to Debian unstable.</p>
<p>The biggest chunk of work here was fixing some extremely long-standing
issues with how the database is built. Despite being in the package name,
man-db’s database is much less important than it used to …</p><p>I’ve released man-db 2.11.0
(<a href="https://lists.nongnu.org/archive/html/man-db-announce/2022-10/msg00000.html">announcement</a>,
<a href="https://gitlab.com/cjwatson/man-db/-/blob/2.11.0/NEWS.md"><span class="caps">NEWS</span></a>), and
uploaded it to Debian unstable.</p>
<p>The biggest chunk of work here was fixing some extremely long-standing
issues with how the database is built. Despite being in the package name,
man-db’s database is much less important than it used to be: most uses of
<code>man(1)</code> haven’t required it in a long time, and both hardware and
<a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/man-db-K.html">software</a>
<a href="https://lists.nongnu.org/archive/html/man-db-announce/2022-02/msg00000.html">improvements</a>
mean that even some searches can be done by brute force without needing
prior indexing. However, the database is still needed for the <code>whatis(1)</code>
and <code>apropos(1)</code> commands.</p>
<p>The database has a simple format - no relational structure here, it’s just a
simple key-value database using old-fashioned <span class="caps">DBM</span>-like interfaces and
composing a few fields to form values - but there are a number of subtleties
involved. The issues tend to amount to this: what does a manual page name
mean? At first glance it might seem simple, because you have file names
that look something like <code>/usr/share/man/man1/ls.1.gz</code> and that’s obviously
<code>ls(1)</code>. Some pages are symlinks to other pages (which we track separately
because it makes it easier to figure out which entries to update when the
contents of the file system change), and sometimes multiple pages are even
hard links to the same file.</p>
<p>The real complications come with “whatis references”. Pages can list a
bunch of names in their <code>NAME</code> section, and the historical expectation is
that it should be possible to use those names as arguments to <code>man(1)</code> even
if they don’t also appear in the file system (although Debian policy has
<a href="https://www.debian.org/doc/debian-policy/ch-docs.html#manual-pages">deprecated relying on
this</a>
for some time). Not only does that mean that <code>man(1)</code> sometimes needs to
consult the database, but it also means that the database is inherently more
complicated, since a page might list something in its <code>NAME</code> section that
conflicts with an actual file name in the file system, and now you need a
priority system to resolve ambiguities. There are some other possible
causes of ambiguity as well.</p>
<p>The people working on <a href="https://reproducible-builds.org/">reproducible
builds</a> in Debian branched out to the
related challenge of reproducible installations some time ago: can you take
a collection of packages, bootstrap a file system image from them, and
reproduce that exact same image somewhere else? This is useful for the same
sorts of reasons that reproducible builds are useful: it lets you verify
that an image is built from the components it’s supposed to be built from,
and doesn’t contain any other skulduggery by accident or design. One of the
people working on this <a href="https://bugs.debian.org/1010957">noticed</a> that
man-db’s database files were an obstacle to that: in particular, the exact
contents of the database seemed to depend on the order in which files were
scanned when building it. The reporter proposed solving this by processing
files in sorted order, but I wasn’t keen on that approach: firstly because
it would mean we could no longer process files in an order that makes it
more efficient to read them all from disk (still valuable on rotational
disks), but mostly because the differences seemed to point to other bugs.</p>
<p>Having understood this, there then followed several late nights of very
fiddly work on the details of how the database is maintained. None of this
was conceptually difficult: it mainly amounted to ensuring that we maintain
a consistent <a href="https://en.wikipedia.org/wiki/Well-order">well-order</a> for
different entries that we might want to insert for a given database key, and
that we consider the same names for insertion regardless of the order in
which we encounter files. As usual, the tricky bit is making sure that we
have the right data structures to support this. man-db is written in C
which is not very well-supplied with built-in data structures, and
originally much of the code was written in a style that tried to minimize
memory allocations; this came at the cost of ownership and lifetime often
being rather unclear, and it was often difficult to make changes without
causing leaks or double-frees. Over the years I’ve been gradually
introducing better encapsulation to make things easier to follow, and I had
to do another round of that here. There were also some problems with
caching being done at slightly the wrong layer: we need to make use of a
“trace” of the chain of links followed to resolve a page to its ultimate
source file, but we were incorrectly caching that trace and reusing it for
any link to the same file, with incorrect results in many cases.</p>
<p>Oh, and after doing all that I found that the on-disk representation of a
<span class="caps">GDBM</span> database is insertion-order-dependent, so I ended up having to manually
reorganize the database at the end by reading it all in and writing it all
back out in sorted order, which feels really weird to me coming from
spending most of my time with PostgreSQL these days. Fortunately the
database is small so this takes negligible time.</p>
<p>None of this is particularly glamorous work, but it paid off:</p>
<div class="highlight"><pre><span></span><code><span class="gp"># </span><span class="nb">export</span><span class="w"> </span><span class="nv">SOURCE_DATE_EPOCH</span><span class="o">=</span><span class="s2">"</span><span class="k">$(</span>date<span class="w"> </span>+%s<span class="k">)</span><span class="s2">"</span>
<span class="gp"># </span>mkdir<span class="w"> </span>emptydir<span class="w"> </span>disorder
<span class="gp"># </span>disorderfs<span class="w"> </span>--multi-user<span class="o">=</span>yes<span class="w"> </span>--shuffle-dirents<span class="o">=</span>yes<span class="w"> </span>--reverse-dirents<span class="o">=</span>no<span class="w"> </span>emptydir<span class="w"> </span>disorder
<span class="gp"># </span><span class="nb">export</span><span class="w"> </span><span class="nv">TMPDIR</span><span class="o">=</span><span class="s2">"</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">/disorder"</span>
<span class="gp"># </span>mmdebstrap<span class="w"> </span>--variant<span class="o">=</span>standard<span class="w"> </span>--hook-dir<span class="o">=</span>/usr/share/mmdebstrap/hooks/merged-usr<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>unstable<span class="w"> </span>out1.tar
<span class="gp"># </span>mmdebstrap<span class="w"> </span>--variant<span class="o">=</span>standard<span class="w"> </span>--hook-dir<span class="o">=</span>/usr/share/mmdebstrap/hooks/merged-usr<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>unstable<span class="w"> </span>out2.tar
<span class="gp"># </span>cmp<span class="w"> </span>out1.tar<span class="w"> </span>out2.tar
<span class="gp"># </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="go">0</span>
</code></pre></div>Launchpad now supports SSH Ed25519 keys and RSA SHA-2 signatures2022-02-18T13:49:59+00:002022-02-18T13:49:59+00:00Colin Watsontag:www.chiark.greenend.org.uk,2022-02-18:/~cjwatson/blog/lp-new-ssh-features.html<p>As of 2022-02-16, Launchpad supports a couple of features on its <span class="caps">SSH</span>
endpoints (<code>git.launchpad.net</code>, <code>bazaar.launchpad.net</code>, <code>ppa.launchpad.net</code>,
and <code>upload.ubuntu.com</code>) that it previously didn’t: <a href="https://bugs.launchpad.net/bugs/907675">Ed25519 public
keys</a> (a well-regarded format,
supported by OpenSSH since 6.5 in 2014) and <a href="https://bugs.launchpad.net/bugs/1933722">signatures with existing <span class="caps">RSA …</span></a></p><p>As of 2022-02-16, Launchpad supports a couple of features on its <span class="caps">SSH</span>
endpoints (<code>git.launchpad.net</code>, <code>bazaar.launchpad.net</code>, <code>ppa.launchpad.net</code>,
and <code>upload.ubuntu.com</code>) that it previously didn’t: <a href="https://bugs.launchpad.net/bugs/907675">Ed25519 public
keys</a> (a well-regarded format,
supported by OpenSSH since 6.5 in 2014) and <a href="https://bugs.launchpad.net/bugs/1933722">signatures with existing <span class="caps">RSA</span>
public keys using <span class="caps">SHA</span>-2 rather than
<span class="caps">SHA</span>-1</a> (supported by OpenSSH since
7.2 in 2016).</p>
<p>I’m hesitant to call these features “new”, since they’ve been around for a
long time elsewhere, and people might quite reasonably ask why it’s taken us
so long. The problem has always been that Launchpad can’t really use a
normal <span class="caps">SSH</span> server such as OpenSSH because it needs features that aren’t
practical to implement that way, such as virtual filesystems and dynamic
user key authorization against the Launchpad database. Instead, we use
<a href="https://twistedmatrix.com/trac/wiki/TwistedConch">Twisted Conch</a>, which is
a very extensible Python <span class="caps">SSH</span> implementation that has generally served us
well. The downside is that, because it’s an independent implementation and
one that occupies a relatively small niche, it often lags behind in terms of
newer protocol features.</p>
<p>Catching up to this point has been something we’ve been working on for
around five years, although it’s taken a painfully long time for a variety
of reasons which I thought some people might find interesting to go into, at
least people who have the patience for details of the <span class="caps">SSH</span> protocol. Many of
the delays were my own responsibility, although realistically we probably
couldn’t have added Ed25519 support before OpenSSL/<code>cryptography</code> work that
landed in 2019.</p>
<ul>
<li>In 2015, we did some similar work on <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/ssh-sha-2-support-in-twisted.html"><span class="caps">SHA</span>-2 key exchange and <span class="caps">MAC</span>
algorithms</a>.</li>
<li>In 2016, various other contributors were working on <span class="caps">ECDSA</span> and Ed25519
support (e.g. <a href="https://github.com/twisted/twisted/pull/533">#533</a> and
<a href="https://github.com/twisted/twisted/pull/644">#644</a>). At the time, it
seemed best to keep an eye on this but mainly leave them to it. I’m very
glad that some people worked on this before me - studying their PRs
helped a lot, even parts that didn’t end up being merged directly.</li>
<li>In 2017, it became clear that this was likely to need some more
attention, but before we could do anything else we had to revamp
Launchpad’s build system to use <a href="https://pip.pypa.io/en/stable/">pip</a>
rather than <a href="https://www.buildout.org/en/latest/">buildout</a>, since
without that we couldn’t upgrade to any newer versions of Twisted. That
proved to be a substantial piece of yak-shaving: first we had to upgrade
Launchpad off Ubuntu 12.04, and then the actual <a href="https://code.launchpad.net/~cjwatson/launchpad/virtualenv-pip/+merge/331388">build system
rewrite</a>
was a complicated project of its own.</li>
<li>In 2018, I fixed an <a href="https://bugs.launchpad.net/bugs/830679">authentication
hang</a> that happened if a client
even tried to offer <span class="caps">ECDSA</span> or Ed25519 public keys to Launchpad, and we got
<span class="caps">ECDSA</span> support fully working in Launchpad. We also discovered as a result
of automated interoperability tests run as part of the Debian OpenSSH
packaging that Twisted needed to gain support for the new
<code>openssh-key-v1</code> private key format, which became a prerequisite for
Ed25519 support since OpenSSH only ever writes those keys in the new
format, and so I <a href="https://github.com/twisted/twisted/pull/1193">fixed
that</a>.</li>
<li>In 2019, Python’s <a href="https://pypi.org/project/cryptography/">cryptography</a>
package gained support for X25519 (the Diffie-Hellman key exchange
function based on <a href="https://en.wikipedia.org/wiki/Curve25519">Curve25519</a>)
and Ed25519, and it became somewhat practical to add support to Twisted
on top of that. However, it required OpenSSL 1.1.1b, and it seemed
unlikely that we would be in a position to upgrade all the relevant bits
of Launchpad’s infrastructure to use that in the near term. I at least
managed to add <a href="https://github.com/twisted/twisted/pull/1202">curve25519-sha256 key exchange
support</a> to Twisted based
on some <a href="https://github.com/twisted/twisted/pull/644">previous work</a> by
another contributor, and I prepared <a href="https://github.com/twisted/twisted/pull/1210">support for Ed25519
keys</a> in Twisted even
though I knew we weren’t going to be able to use it yet.</li>
<li>2020 was … well, everyone knows what 2020 was like, plus we had a new
baby. I did some experimentation in spare moments, but I didn’t really
have the focus to be able to move this sort of complex problem forward.</li>
<li>In 2021, I bit the bullet and started seriously working on <a href="https://github.com/twisted/twisted/pull/1607">fallback
mechanisms to allow us to use
Ed25519</a> even on systems
lacking a sufficient version of OpenSSL, though found myself blocked on
figuring out type-checking issues following a code review. It then
became clear on the release of <a href="https://www.openssh.com/releasenotes.html#8.8p1">OpenSSH
8.8</a> that we were going
to have to deal with <span class="caps">RSA</span> <span class="caps">SHA</span>-2 signatures as well, since otherwise
OpenSSH in Ubuntu soon wouldn’t be able to authenticate to Launchpad by
default (which also caused me to delay <a href="https://bugs.debian.org/996391">uploading 8.8 to Debian
unstable</a> for a while). To deal with
that, I first had to add <a href="https://github.com/twisted/twisted/pull/1666"><span class="caps">SSH</span> extension
negotiation</a> to Twisted.</li>
<li>Finally, in 2022, I added <a href="https://github.com/twisted/twisted/pull/1692"><span class="caps">RSA</span> <span class="caps">SHA</span>-2 signature
support</a> to Twisted,
finally unblocked myself on the type-checking issue with the Ed25519
fallback mechanism, quickly put together a <a href="https://git.launchpad.net/~launchpad/twisted/+git/twisted/commit/?id=536a8934be619044fc95f51822139b96edea9dcc">similar fallback mechanism
for
X25519</a>,
backported the whole mess to Twisted 20.3.0 since we currently can’t use
anything newer due to the somewhat old version of Python 3 that we’re
running, promptly ran into and fixed a
<a href="https://github.com/twisted/twisted/pull/1696">regression</a> that affected
<span class="caps">SFTP</span> uploads to <code>ppa.launchpad.net</code> and <code>upload.ubuntu.com</code>, and finally
added Ed25519 as a permissible key type in Launchpad’s authserver.</li>
</ul>
<p>Phew! Thanks to everyone who works on Twisted, <code>cryptography</code>, and OpenSSL
- it’s been really useful to be able to build on solid lower-level
cryptographic primitives - and to those who helped with code review.</p>Launchpad now runs on Python 3!2021-08-02T11:34:29+01:002021-08-02T11:34:29+01:00Colin Watsontag:www.chiark.greenend.org.uk,2021-08-02:/~cjwatson/blog/lp-python3.html<p>After a <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/lp-python3-progress.html">very long porting journey</a>,
<a href="https://launchpad.net/">Launchpad</a> is finally running on Python 3 across
all of our systems.</p>
<p>I wanted to take a bit of time to reflect on why my emotional responses to
this port differ so much from those of some others who’ve done large ports,
such …</p><p>After a <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/lp-python3-progress.html">very long porting journey</a>,
<a href="https://launchpad.net/">Launchpad</a> is finally running on Python 3 across
all of our systems.</p>
<p>I wanted to take a bit of time to reflect on why my emotional responses to
this port differ so much from those of some others who’ve done large ports,
such as the <a href="https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/">Mercurial
maintainers</a>.
It’s hard to deny that we’ve had to burn a lot of time on this, which I’m
sure has had an opportunity cost, and from one point of view it’s
essentially running to stand still: there is no single compelling feature
that we get solely by porting to Python 3, although it’s clearly a
prerequisite for tidying up old compatibility code and being able to use
modern language facilities in the future. And yet, on the whole, I found
this a rewarding project and enjoyed doing it.</p>
<p>Some of this may be because by inclination I’m a maintenance programmer and
actually enjoy this sort of thing. My default view tends to be that
software version upgrades may be a pain but it’s much better to get that
pain over with as soon as you can rather than trying to hold back the tide;
you can certainly get involved and try to shape where things end up, but
rightly or wrongly I can’t think of many cases when a righteously indignant
user base managed to arrange for the old version to be maintained in
perpetuity so that they never had to deal with the new thing (<span class="caps">OK</span>, maybe Perl
5 counts here).</p>
<p>I think a more compelling difference between Launchpad and Mercurial,
though, may be that very few other people really had a vested interest in
what Python version Launchpad happened to be running, because it’s all
server-side code (aside from some client libraries such as
<a href="https://pypi.org/project/launchpadlib"><code>launchpadlib</code></a>, which were ported
years ago). As such, we weren’t trying to do this with the internet having
Strong Opinions at us. We were doing this because it was obviously the only
long-term-maintainable path forward, and in more recent times because some
of our library dependencies were starting to drop support for Python 2 and
so it was obviously going to become a practical problem for us sooner or
later; but if we’d just stayed on Python 2 forever then fundamentally hardly
anyone else would really have cared directly, only maybe about some indirect
consequences of that. I don’t follow Mercurial development so I may be
entirely off-base, but if other people were yelling at me about how late my
project was to finish its port, that <em>in itself</em> would make me feel more
negatively about the project even if I thought it was a good idea. Having
most of the pressure come from ourselves rather than from outside meant that
wasn’t an issue for us.</p>
<p>I’m somewhat inclined to think of the process as an extreme version of
paying down technical debt. Moving from Python 2.7 to 3.5, as we just did,
means skipping over multiple language versions in one go, and if similar
changes had been made more gradually it would probably have felt a lot more
like the typical dependency update treadmill. I appreciate why not everyone
might want to think of it this way: maybe this is just my own rationalization.</p>
<h2>Reflections on porting to Python 3</h2>
<p>I’m not going to defend the Python 3 migration process; it was pretty rough
in a lot of ways. Nor am I going to spend much effort relitigating it here,
as it’s already been done to death elsewhere, and as I understand it the
core Python developers have got the message loud and clear by now. At a
bare minimum, a lot of valuable time was lost early in Python 3’s lifetime
hanging on to flag-day-type porting strategies that were impractical for
large projects, when it should have been providing for “bilingual”
strategies (code that runs in both Python 2 and 3 for a transitional period)
which is where most libraries and most large migrations ended up in
practice. For instance, the early advice to library maintainers to maintain
two parallel versions or perhaps translate dynamically with <code>2to3</code> was
entirely impractical in most non-trivial cases and wasn’t what most people
ended up doing, and yet the idea that <code>2to3</code> is all you need still floats
around Stack Overflow and the like as a result. (These days, I would
probably point people towards something more like <a href="https://eev.ee/blog/2016/07/31/python-faq-how-do-i-port-to-python-3/">Eevee’s porting
<span class="caps">FAQ</span></a>
as somewhere to start.)</p>
<p>There are various fairly straightforward things that people often suggest
could have been done to smooth the path, and I largely agree: not removing
the <code>u''</code> string prefix only to put it back in 3.3, fewer gratuitous
compatibility breaks in the name of tidiness, and so on. But if I had a
time machine, the number one thing I would ask to have been done differently
would be introducing type annotations in Python 2 before Python 3 branched
off. It’s true that it’s <a href="https://www.python.org/dev/peps/pep-0484/#suggested-syntax-for-python-2-7-and-straddling-code">technically
possible</a>
to do type annotations in Python 2, but the fact that it’s a different
syntax that would have to be fixed later is offputting, and in practice it
wasn’t widely used in Python 2 code. To make a significant difference to
the ease of porting, annotations would need to have been introduced early
enough that lots of Python 2 library code used them so that porting code
didn’t have to be quite so much of an exercise of manually figuring out the
exact nature of string types from context.</p>
<p>Launchpad is a complex piece of software that interacts with multiple
domains: for example, it deals with a database, <span class="caps">HTTP</span>, web page rendering,
Debian-format archive publishing, and multiple revision control systems, and
there’s often overlap between domains. Each of these tends to imply
different kinds of string handling. Web page rendering is normally done
mainly in Unicode, converting to bytes as late as possible; revision control
systems normally want to spend most of their time working with bytes,
although the exact details vary; <span class="caps">HTTP</span> is of course bytes on the wire, but
Python’s <span class="caps">WSGI</span> interface has some <a href="https://www.python.org/dev/peps/pep-3333/#a-note-on-string-types">string type
subtleties</a>.
In practice I found myself thinking about at least four string-like “types”
(that is, things that in a language with a stricter type system I might well
want to define as distinct types and restrict conversion between them):
bytes, text, “ordinary” native strings (<code>str</code> in either language, encoded to
<span class="caps">UTF</span>-8 in Python 2), and native strings with <span class="caps">WSGI</span>’s encoding rules. Some of
these are emergent properties of writing in the intersection of Python 2 and
3, which is effectively a specialized language of its own without coherent
official documentation whose users must intuit its behaviour by comparing
multiple sources of information, or by referring to unofficial porting
guides: not a very satisfactory situation. Fortunately much of the
complexity collapses once it becomes possible to write solely in Python 3.</p>
<p>Some of the difficulties we ran into are not ones that are typically thought
of as Python 2-to-3 porting issues, because they were changed later in
Python 3’s development process. For instance, the <code>email</code> module was
substantially improved in around the 3.2/3.3 timeframe to handle Python 3’s
bytes/text model more correctly, and since Launchpad sends quite a few
different kinds of email messages and has some quite picky tests for exactly
what it emits, this entailed a lot of work in our email sending code and in
our test suite to account for that. (It took me a while to work out whether
we should be treating raw email messages as bytes or as text; bytes turned
out to work best.) 3.4 made some tweaks to the implementation of
quoted-printable encoding that broke a number of our tests in ways that took
some effort to fix, because the tests needed to work on both 2.7 and 3.5.
The list goes on. I got quite proficient at digging through Python’s git
history to figure out when and why some particular bit of behaviour had changed.</p>
<p>One of the thorniest problems was parsing <span class="caps">HTTP</span> form data. We mainly rely on
<a href="https://pypi.org/project/zope.publisher"><code>zope.publisher</code></a> for this, which
in turn relied on
<a href="https://docs.python.org/3/library/cgi.html"><code>cgi.FieldStorage</code></a>; but
<code>cgi.FieldStorage</code> is <a href="https://bugs.python.org/issue27777">badly broken in some
situations</a> on Python 3. Even if that
bug were fixed in a more recent version of Python, we can’t easily use
anything newer than 3.5 for the first stage of our port due to the version
of the base <span class="caps">OS</span> we’re currently running, so it wouldn’t help much. In the
end I fixed some minor issues in the
<a href="https://pypi.org/project/multipart"><code>multipart</code></a> module (and was kindly
given co-maintenance of it) and <a href="https://github.com/zopefoundation/zope.publisher/pull/55">converted <code>zope.publisher</code> to use
it</a>. Although
this took a while to sort out, it seems to have gone very well.</p>
<p>A couple of other interesting late-arriving issues were around
<a href="https://docs.python.org/3/library/pickle.html"><code>pickle</code></a>. For most things
we normally prefer safer formats such as <span class="caps">JSON</span>, but there are a few cases
where we use pickle, particularly for our session databases. One of my
colleagues pointed out that I needed to remember to tell <code>pickle</code> to <a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398534">stick
to protocol
2</a>,
so that we’d be able to switch back and forward between Python 2 and 3 for a
while; quite right, and we later ran into a similar problem with
<a href="https://docs.python.org/3/library/marshal.html"><code>marshal</code></a> too. A more
surprising problem was that <code>datetime.datetime</code> objects pickled on Python 2
<a href="https://bugs.python.org/issue22005">require special care</a> when unpickling
on Python 3; rather than the approach that ended up being implemented and
<a href="https://docs.python.org/3/library/pickle.html#pickle.Unpickler">documented</a>
for Python 3.6, though, I preferred a <a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/399133">custom
unpickler</a>,
both so that things would work on Python 3.5 and so that I wouldn’t have to
risk affecting the decoding of other pickled strings in the session database.</p>
<h2>General lessons</h2>
<p>Writing this over a year after Python 2’s end-of-life date, and certainly
nowhere near the leading edge of Python 3 porting work, it’s perhaps more
useful to look at this in terms of the lessons it has for other large
technical debt projects.</p>
<p>I mentioned in my <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/lp-python3-progress.html">previous article</a> that
I used the approach of an enormous and frequently-rebased git branch as a
working area for the port, committing often and sometimes combining and
extracting commits for review once they seemed to be ready. A port of this
scale would have been entirely intractable without a tool of similar power
to <code>git rebase</code>, so I’m very glad that we finished migrating to git in 2019.
I relied on this right up to the end of the port, and it also allowed for
quick assessments of how much more there was to land. <a href="https://git-scm.com/docs/git-worktree">git
worktree</a> was also helpful, in that I
could easily maintain working trees built for each of Python 2 and 3 for comparison.</p>
<p>As is usual for most multi-developer projects, all changes to Launchpad need
to go through code review, although we sometimes make exceptions for very
simple and obvious changes that can be self-reviewed. Since I knew from the
outset that this was going to generate a lot of changes for review, I
therefore structured my work from the outset to try to make it as easy as
possible for my colleagues to review it. This generally involved keeping
most changes to a somewhat manageable size of 800 lines or less (although
this wasn’t always possible), and arranging commits mainly according to the
kind of change they made rather than their location. For example, when I
needed to fix issues with <code>/</code> in Python 3 being true division rather than
floor division, I did so in <a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/396326">one
commit</a>
across the various places where it mattered and took care not to mix it with
other unrelated changes. This is good practice for nearly any kind of
development, but it was especially important here since it allowed reviewers
to consider a clear explanation of what I was doing in the commit message
and then skim-read the rest of it much more quickly.</p>
<p>It was vital to keep the codebase in a working state at all times, and
deploy to production reasonably often: this way if something went wrong the
amount of code we had to debug to figure out what had happened was always
tractable. (Although I can’t seem to find it now to link to it, I saw an
account a while back of a company that had taken a flag-day approach instead
with a large codebase. It seemed to work for them, but I’m certain we
couldn’t have made it work for Launchpad.)</p>
<p>I can’t speak too highly of Launchpad’s test suite, much of which originated
before my time. Without a great deal of extensive coverage of all sorts of
interesting edge cases at both the unit and functional level, and a
corresponding culture of maintaining that test suite well when making new
changes, it would have been impossible to be anything like as confident of
the port as we were.</p>
<p>As part of the porting work, we split out a couple of substantial chunks of
the Launchpad codebase that could easily be decoupled from the core: its
<a href="https://launchpad.net/lp-mailman">Mailman integration</a> and its <a href="https://launchpad.net/lp-codeimport">code import
worker</a>. Both of these had substantial
dependencies with complex requirements for porting to Python 3, and
arranging to be able to do these separately on their own schedule was
absolutely worth it. Like disentangling balls of wool, any opportunity you
can take to make things less tightly-coupled is probably going to make it
easier to disentangle the rest. (I can see a tractable way forward to
porting the code import worker, so we may well get that done soon. Our
Mailman integration will need to be rewritten, though, since it currently
depends on the Python-2-only Mailman 2, and Mailman 3 has a different architecture.)</p>
<h2>Python lessons</h2>
<p>Our <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/storm-py3.html">database layer</a> was already in pretty good
shape for a port, since at least the modern bits of its table modelling
interface were already strict about using Unicode for text columns. If you
have any kind of pervasive low-level framework like this, then making it be
pedantic at you in advance of a Python 3 port will probably incur much less
swearing in the long run, as you won’t be trying to deal with quite so many
bytes/text issues at the same time as everything else.</p>
<p>Early in our port, we established a standard set of
<a href="https://docs.python.org/3/library/__future__.html"><code>__future__</code></a> imports
and started incrementally converting files over to them, mainly because we
weren’t yet sure what else to do and it seemed likely to be helpful.
<code>absolute_import</code> was definitely reasonable (and not often a problem in our
code), and <code>print_function</code> was annoying but necessary. In hindsight I’m
not sure about <code>unicode_literals</code>, though. For files that only deal with
bytes and text it was reasonable enough, but as I mentioned above there were
also a number of cases where we needed literals of the language’s native
<code>str</code> type, i.e. bytes in Python 2 and text in Python 3: this was
particularly noticeable in <span class="caps">WSGI</span> contexts, but also cropped up in <a href="https://github.com/zopefoundation/zope.configuration/pull/19">some other
surprising
places</a>. We
generally either omitted <code>unicode_literals</code> or used <code>six.ensure_str</code> in such
cases, but it was definitely a bit awkward and maybe I should have listened
more to people telling me it might be a bad idea.</p>
<p>A lot of Launchpad’s early tests used
<a href="https://docs.python.org/3/library/doctest.html">doctest</a>, mainly in the
<a href="https://docs.python.org/3/library/doctest.html#simple-usage-checking-examples-in-a-text-file">style</a>
where you have text files that interleave narrative commentary with
examples. The development team later reached consensus that this was best
avoided in most cases, but by then there were far too many doctests to
conveniently rewrite in some other form. Porting doctests to Python 3 is
really annoying. You run into all the little changes in how objects are
represented as text (particularly <code>u'...'</code> versus <code>'...'</code>, but plenty of
other cases as well); you have next to no tools to do anything useful like
skipping individual bits of a doctest that don’t apply; using <code>__future__</code>
imports requires the rather obscure approach of adding the relevant names to
the doctest’s globals in the relevant <code>DocFileSuite</code> or <code>DocTestSuite</code>;
dealing with many exception tracebacks requires something like
<a href="https://github.com/zopefoundation/zope.testing/blob/master/src/zope/testing/renormalizing.py"><code>zope.testing.renormalizing</code></a>;
and whatever code refactoring tools you’re using probably don’t work
properly. Basically, don’t have done that. It did all turn out to be
tractable for us in the end, and I managed to avoid using much in the way of
fragile doctest extensions aside from the aforementioned
<code>zope.testing.renormalizing</code>, but it was not an enjoyable experience.</p>
<h2>Regressions</h2>
<p>I know of nine regressions that reached Launchpad’s production systems as a
result of this porting work; of course there were various other regressions
caught by <span class="caps">CI</span> or in manual testing. (Considering the size of this project, I
count it as a resounding success that there were only nine production
issues, and that for the most part we were able to fix them quickly.)</p>
<h3>Equality testing of removed database objects</h3>
<p>One of the things we had to do while porting to Python 3 was to
<a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398087">implement</a>
the <code>__eq__</code>, <code>__ne__</code>, and <code>__hash__</code> special methods for all our database
objects. This was quite conceptually fiddly, because doing this requires
knowing each object’s primary key, and that may not yet be available if
we’ve created an object in Python but not yet flushed the actual <code>INSERT</code>
statement to the database (most of our primary keys are auto-incrementing
sequences). We thus had to take care to flush pending <span class="caps">SQL</span> statements in
such cases in order to ensure that we know the primary keys.</p>
<p>However, it’s possible to have a problem at the other end of the object
lifecycle: that is, a Python object might still be reachable in memory even
though the underlying row has been <code>DELETE</code>d from the database. In most
cases we don’t keep removed objects around for obvious reasons, but it can
happen in caching code, and buildd-manager
<a href="https://bugs.launchpad.net/launchpad/+bug/1916522">crashed</a> as a result (in
fact while it was still running on Python 2). We had to <a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398498">take extra
care</a>
to avoid this problem.</p>
<h3>Debian imports crashed on non-<span class="caps">UTF</span>-8 filenames</h3>
<p>Python 2 has some <a href="https://bugs.launchpad.net/launchpad/+bug/1917449">unfortunate
behaviour</a> around passing
bytes or Unicode strings (depending on the platform) to <code>shutil.rmtree</code>, and
the combination of some <a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398367">porting
work</a>
and a particular source package in Debian that contained a non-<span class="caps">UTF</span>-8 file
name caused us to run into this. The
<a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398971">fix</a>
was to ensure that the argument passed to <code>shutil.rmtree</code> is a <code>str</code>
regardless of Python version.</p>
<p>We’d actually run into <a href="https://code.launchpad.net/~cjwatson/turnip/+git/turnip/+merge/359051">something
similar</a>
before: it’s a subtle porting gotcha, since it’s quite easy to end up
passing Unicode strings to <code>shutil.rmtree</code> if you’re in the process of
porting your code to Python 3, and you might easily not notice if the file
names in your tests are all encoded using <span class="caps">UTF</span>-8.</p>
<h3>lazr.restful ETags</h3>
<p>We eventually got far enough along that we could switch one of our four
appserver machines (we have quite a number of other machines too, but the
appservers handle web and <span class="caps">API</span> requests) to Python 3 and see what happened.
By this point our extensive test suite had shaken out the vast majority of
the things that could go wrong, but there was always going to be room for
some interesting edge cases.</p>
<p>One of the Ubuntu kernel team reported that they were seeing an increase in
<a href="https://httpstatusdogs.com/412-precondition-failed">412 Precondition
Failed</a> errors in some
of their scripts that use our webservice <span class="caps">API</span>. These can happen when you’re
trying to modify an existing resource: the underlying protocol involves
sending an <code>If-Match</code> header with the <code>ETag</code> that the client thinks the
resource has, and if this doesn’t match the <code>ETag</code> that the server calculates
for the resource then the client has to refresh its copy of the resource and
try again. We initially thought that this might be legitimate since it can
happen in normal operation if you collide with another client making changes
to the same resource, but it soon became clear that something stranger was
going on: we were getting inconsistent <code>ETag</code>s for the same object even when
it was unchanged. Since we’d recently switched a quarter of our appservers
to Python 3, that was a natural suspect.</p>
<p>Our <code>lazr.restful</code> package provides the framework for our webservice <span class="caps">API</span>,
and roughly speaking it generates <code>ETag</code>s by serializing objects into some
kind of canonical form and hashing the result. Unfortunately the
serialization was dependent on the Python version in a few ways, and in
particular it serialized lists of strings such as lists of bug tags
differently: Python 2 used <code>[u'foo', u'bar', u'baz']</code> where Python 3 used
<code>['foo', 'bar', 'baz']</code>. In <code>lazr.restful</code> 1.0.3 we <a href="https://code.launchpad.net/~cjwatson/lazr.restful/etag-json/+merge/402920">switched to using
<span class="caps">JSON</span></a>
for this, removing the Python version dependency and ensuring consistent
behaviour between appservers.</p>
<h3>Memory leaks</h3>
<p>This problem took the longest to solve. We noticed fairly quickly from our
graphs that the appserver machine we’d switched to Python 3 had a serious
memory leak. Our appservers had always been a bit leaky, but now it wasn’t
so much “a small hole that we can bail occasionally” as “the boat is sinking rapidly”:</p>
<p><img alt="A serious memory leak" src="https://www.chiark.greenend.org.uk/~cjwatson/blog/images/chaenomeles-leak.png"></p>
<p>(Yes, this got in the way of working out what was going on with <code>ETag</code>s for
a while.)</p>
<p>I spent ages messing around with various attempts to fix this. Since only
a quarter of our appservers were affected, and we could get by on 75%
capacity for a while, it wasn’t urgent but it was definitely annoying.
After spending some quality time with
<a href="https://mg.pov.lt/objgraph/">objgraph</a>, for
some time I thought <a href="https://cosmicpercolator.com/2016/01/13/exception-leaks-in-python-2-and-3/">traceback reference
cycles</a>
might be at fault, and I sent a number of fixes to various upstream projects
for those (e.g.
<a href="https://github.com/zopefoundation/zope.pagetemplate/pull/27">zope.pagetemplate</a>).
Those didn’t help the leaks much though, and after a while it became clear
to me that this couldn’t be the sole problem: Python has a cyclic garbage
collector that will eventually collect reference cycles as long as there are
no strong references to any objects in them, although it might not happen
very quickly. Something else must be going on.</p>
<p>Debugging reference leaks in any non-trivial and long-running Python program
is extremely arduous, especially with ORMs that naturally tend to end up
with lots of cycles and caches. After a while I formed a hypothesis that
<a href="https://pypi.org/project/zope.server">zope.server</a> might be keeping a
strong reference to something, although I never managed to nail it down more
firmly than that. This was an attractive theory as we were already in the
process of migrating to <a href="https://docs.gunicorn.org/en/stable/">Gunicorn</a> for
other reasons anyway, and Gunicorn also has a convenient
<a href="https://docs.gunicorn.org/en/stable/settings.html#max-requests"><code>max_requests</code></a>
setting that’s good at mitigating memory leaks. Getting this all in place
took some time, but once we did we found that everything was much more stable:</p>
<p><img alt="A rather flat memory graph" src="https://www.chiark.greenend.org.uk/~cjwatson/blog/images/chaenomeles-stable.png"></p>
<p>This isn’t completely satisfying as we never quite got to the bottom of the
leak itself, and it’s entirely possible that we’ve only papered over it
using <code>max_requests</code>: I expect we’ll gradually back off on how frequently we
restart workers over time to try to track this down. However,
pragmatically, it’s no longer an operational concern.</p>
<h3>Mirror prober <span class="caps">HTTPS</span> proxy handling</h3>
<p>After we switched our script servers to Python 3, we had several reports of
<a href="https://bugs.launchpad.net/launchpad/+bug/1935999">mirror probing
failures</a>. (Launchpad
keeps lists of Ubuntu archive and image mirrors, and probes them every so
often to check that they’re reasonably complete and up to date.) This only
affected <span class="caps">HTTPS</span> mirrors when probed via a proxy server, support for which is
a relatively recent feature in Launchpad and involved some code that we
never managed to unit-test properly: of course this is exactly the code that
went wrong. Sadly I wasn’t able to sort out that gap, but at least the
<a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/405688">fix</a>
was simple.</p>
<h3>Non-<span class="caps">MIME</span>-encoded email headers</h3>
<p>As I mentioned above, there were substantial changes in the <code>email</code> package
between Python 2 and 3, and indeed between minor versions of Python 3. Our
test coverage here is pretty good, but it’s an area where it’s very easy to
have gaps. We noticed that a script that processes incoming email was
crashing on messages with headers that were non-<span class="caps">ASCII</span> but not
<a href="https://datatracker.ietf.org/doc/html/rfc2047.html"><span class="caps">MIME</span>-encoded</a> (and
indeed then crashing again when it tried to send a notification of the
crash!). The only examples of these I looked at were spam, but we still
didn’t want to crash on them.</p>
<p>The
<a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/405924">fix</a>
involved being somewhat more careful about both the handling of headers
returned by Python’s email parser and the building of outgoing email
notifications. This seems to be working well so far, although I wouldn’t be
surprised to find the odd other incorrect detail in this sort of area.</p>
<h3>Failure to handle non-<span class="caps">ISO</span>-8859-1 <span class="caps">URL</span>-encoded form input</h3>
<p>Remember how I said that parsing <span class="caps">HTTP</span> form data was thorny? After we
finished upgrading all our appservers to Python 3, people started reporting
that they <a href="https://bugs.launchpad.net/launchpad/+bug/1937345">couldn’t post Unicode comments to
bugs</a>, which turned out
to be only if the attempt was made using JavaScript, and was because I
hadn’t quite managed to get <span class="caps">URL</span>-encoded form data working properly with
<code>zope.publisher</code> and <code>multipart</code>. The current standard describes the
<span class="caps">URL</span>-encoded format for form data as <a href="https://url.spec.whatwg.org/#application/x-www-form-urlencoded">“in many ways an aberrant
monstrosity”</a>,
so this was no great surprise.</p>
<p>Part of the problem was some <a href="https://github.com/zopefoundation/zope.publisher/issues/65">very strange
choices</a> in
<code>zope.publisher</code> dating back to 2004 or earlier, which I attempted to <a href="https://github.com/zopefoundation/zope.publisher/pull/66">clean
up and simplify</a>.
The rest was that Python 2’s <code>urlparse.parse_qs</code> unconditionally decodes
percent-encoded sequences as <span class="caps">ISO</span>-8859-1 if they’re passed in as part of a
Unicode string, so <code>multipart</code> needs to <a href="https://github.com/defnull/multipart/pull/36">work around
this</a> on Python 2.</p>
<p>I’m still not completely confident that this is correct in all situations,
but at least now that we’re on Python 3 everywhere the matrix of cases we
need to care about is smaller.</p>
<h3>Inconsistent marshalling of Loggerhead’s disk cache</h3>
<p>We use <a href="https://pypi.org/project/loggerhead">Loggerhead</a> for providing web
browsing of Bazaar branches. When we upgraded one of its two servers to
Python 3, we immediately noticed that the one still on Python 2 was failing
to read back its revision information cache, which it stores in a database
on disk. (We noticed this because it caused a deployment to fail: when we
tried to roll out new code to the instance still on Python 2, Nagios checks
had already caused an incompatible cache to be written for one branch from
the Python 3 instance.)</p>
<p>This turned out to be a similar problem to the <code>pickle</code> issue mentioned
above, except this one was with <code>marshal</code>, which I didn’t think to look for
because it’s a relatively obscure module mostly used for internal purposes
by Python itself; I’m not sure that Loggerhead should really be using it in
the first place. The fix was
<a href="https://code.launchpad.net/~cjwatson/loggerhead/marshal-version/+merge/406291">relatively</a>
<a href="https://code.launchpad.net/~cjwatson/loggerhead/fix-marshal-version/+merge/406308">straightforward</a>,
complicated mainly by now needing to cope with throwing away unreadable
cache data.</p>
<p>Ironically, if we’d just gone ahead and taken the nominally riskier path of
upgrading both servers at the same time, we might never have had a problem here.</p>
<h3>Intermittent bzr failures</h3>
<p>Finally, after we upgraded one of our two Bazaar codehosting servers to
Python 3, we had a
<a href="https://bugs.launchpad.net/launchpad/+bug/1938335">report</a> of intermittent
<code>bzr branch</code> hangs. After some digging I found this in our logs:</p>
<div class="highlight"><pre><span></span><code><span class="gt">Traceback (most recent call last):</span>
<span class="w"> </span><span class="c">...</span>
File <span class="nb">"/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/conch/ssh/channel.py"</span>, line <span class="m">136</span>, in <span class="n">addWindowBytes</span>
<span class="w"> </span><span class="bp">self</span><span class="o">.</span><span class="n">startWriting</span><span class="p">()</span>
File <span class="nb">"/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/lazr/sshserver/session.py"</span>, line <span class="m">88</span>, in <span class="n">startWriting</span>
<span class="w"> </span><span class="n">resumeProducing</span><span class="p">()</span>
File <span class="nb">"/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/internet/process.py"</span>, line <span class="m">894</span>, in <span class="n">resumeProducing</span>
<span class="w"> </span><span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">pipes</span><span class="o">.</span><span class="n">itervalues</span><span class="p">():</span>
<span class="gr">builtins.AttributeError</span>: <span class="n">'dict' object has no attribute 'itervalues'</span>
</code></pre></div>
<p>I’d seen this before in our git hosting service: it was a bug in Twisted’s
Python 3 port, <a href="https://github.com/twisted/twisted/pull/1478">fixed</a> after
20.3.0 but unfortunately after the last release that supported Python 2, so
we had to backport that patch. Using the same backport dealt with this.</p>
<h2><a href="https://eev.ee/blog/2016/07/31/python-faq-why-should-i-use-python-3/">Onwards!</a></h2>SSH quoting2021-06-11T11:22:21+01:002021-06-11T11:22:21+01:00Colin Watsontag:www.chiark.greenend.org.uk,2021-06-11:/~cjwatson/blog/ssh-quoting.html<p>A while back there was a thread on one of our company mailing lists about
<span class="caps">SSH</span> quoting, and I posted a long answer to it. Since then a few people have
asked me questions that caused me to reach for it, so I thought it might be
helpful if I …</p><p>A while back there was a thread on one of our company mailing lists about
<span class="caps">SSH</span> quoting, and I posted a long answer to it. Since then a few people have
asked me questions that caused me to reach for it, so I thought it might be
helpful if I were to anonymize the original question and post my answer here.</p>
<p>The question was why a sequence of commands involving <code>ssh</code> and fiddly
quoting produced the output they did. The first example was this:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>ssh<span class="w"> </span>user@machine.local<span class="w"> </span>bash<span class="w"> </span>-lc<span class="w"> </span><span class="s2">"cd /tmp;pwd"</span>
/home/user
</code></pre></div>
<p>Oh hi, my dubious life choices have been such that this is my specialist subject!</p>
<p>This is because <span class="caps">SSH</span> command-line parsing is not quite what you expect.</p>
<p>First, recall that your local shell will apply its usual parsing, and the
actual <span class="caps">OS</span>-level execution of <code>ssh</code> will be like this:</p>
<div class="highlight"><pre><span></span><code><span class="o">[</span><span class="n">0</span><span class="o">]</span><span class="err">:</span><span class="w"> </span><span class="n">ssh</span>
<span class="o">[</span><span class="n">1</span><span class="o">]</span><span class="err">:</span><span class="w"> </span><span class="k">user</span><span class="nv">@machine</span><span class="p">.</span><span class="k">local</span>
<span class="o">[</span><span class="n">2</span><span class="o">]</span><span class="err">:</span><span class="w"> </span><span class="n">bash</span>
<span class="o">[</span><span class="n">3</span><span class="o">]</span><span class="err">:</span><span class="w"> </span><span class="o">-</span><span class="n">lc</span>
<span class="o">[</span><span class="n">4</span><span class="o">]</span><span class="err">:</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="o">/</span><span class="n">tmp</span><span class="p">;</span><span class="n">pwd</span>
</code></pre></div>
<p>Now, the <span class="caps">SSH</span> wire protocol only takes a single string as the command, with
the expectation that it should be passed to a shell by the remote end. The
OpenSSH client deals with this by taking all its arguments after things like
options and the target, which in this case are:</p>
<div class="highlight"><pre><span></span><code>[0]: bash
[1]: -lc
[2]: cd /tmp;pwd
</code></pre></div>
<p>It then joins them with a single space:</p>
<div class="highlight"><pre><span></span><code>bash -lc cd /tmp;pwd
</code></pre></div>
<p>This is passed as a string to the server, which then passes that entire
string to a shell for evaluation, so as if you’d typed this directly on the server:</p>
<div class="highlight"><pre><span></span><code>sh -c 'bash -lc cd /tmp;pwd'
</code></pre></div>
<p>The shell then parses this as two commands:</p>
<div class="highlight"><pre><span></span><code>bash -lc cd /tmp
pwd
</code></pre></div>
<p>The directory change thus happens in a subshell (actually it doesn’t quite
even do that, because <code>bash -lc cd /tmp</code> in fact ends up just calling <code>cd</code>
because of the way <code>bash -c</code> parses multiple arguments), and then that
subshell exits, then <code>pwd</code> is called in the outer shell which still has the
original working directory.</p>
<p>The second example was this:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>ssh<span class="w"> </span>user@machine.local<span class="w"> </span>bash<span class="w"> </span>-lc<span class="w"> </span><span class="s2">"pwd;cd /tmp;pwd"</span>
/home/user
/tmp
</code></pre></div>
<p>Following the logic above, this ends up as if you’d run this on the server:</p>
<div class="highlight"><pre><span></span><code>sh -c 'bash -lc pwd; cd /tmp; pwd'
</code></pre></div>
<p>The third example was this:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>ssh<span class="w"> </span>user@machine.local<span class="w"> </span>bash<span class="w"> </span>-lc<span class="w"> </span><span class="s2">"cd /tmp;cd /tmp;pwd"</span>
/tmp
</code></pre></div>
<p>And this is as if you’d run:</p>
<div class="highlight"><pre><span></span><code>sh -c 'bash -lc cd /tmp; cd /tmp; pwd'
</code></pre></div>
<p>Now, I wouldn’t have implemented the <span class="caps">SSH</span> client this way, because I agree
that it’s confusing. But <code>/usr/bin/ssh</code> is used as a transport for other
things so much that changing its behaviour now would be enormously
disruptive, so it’s probably impossible to fix. (I have occasionally
agitated on openssh-unix-dev@ for at least documenting this better, but
haven’t made much headway yet; I need to get round to preparing a
documentation patch.) Once you know about it you can use the proper
quoting, though. In this case that would simply be:</p>
<div class="highlight"><pre><span></span><code><span class="n">ssh</span><span class="w"> </span><span class="k">user</span><span class="nv">@machine</span><span class="p">.</span><span class="k">local</span><span class="w"> </span><span class="s1">'cd /tmp;pwd'</span>
</code></pre></div>
<p>Or if you do need to specifically invoke <code>bash -l</code> there for some reason
(I’m assuming that the original example was reduced from something more
complicated), then you can minimise your confusion by passing the whole
thing as a single string in the form you want the remote <code>sh -c</code> to see, in
a way that ensures that the quotes are preserved and sent to the server
rather than being removed by your local shell:</p>
<div class="highlight"><pre><span></span><code><span class="n">ssh</span><span class="w"> </span><span class="k">user</span><span class="nv">@machine</span><span class="p">.</span><span class="k">local</span><span class="w"> </span><span class="s1">'bash -lc "cd /tmp;pwd"'</span>
</code></pre></div>
<p>Shell parsing is hard.</p>Porting Launchpad to Python 3: progress report2020-09-25T12:01:40+01:002020-09-25T12:01:40+01:00Colin Watsontag:www.chiark.greenend.org.uk,2020-09-25:/~cjwatson/blog/lp-python3-progress.html<p><a href="https://launchpad.net/">Launchpad</a> still requires Python 2, which in 2020
is <a href="https://www.python.org/doc/sunset-python-2/">a bit of a problem</a>.
Unlike a lot of the rest of 2020, though, there’s good reason to be
optimistic about progress.</p>
<p>I’ve been porting Python 2 code to Python 3 on and off for a long time, from …</p><p><a href="https://launchpad.net/">Launchpad</a> still requires Python 2, which in 2020
is <a href="https://www.python.org/doc/sunset-python-2/">a bit of a problem</a>.
Unlike a lot of the rest of 2020, though, there’s good reason to be
optimistic about progress.</p>
<p>I’ve been porting Python 2 code to Python 3 on and off for a long time, from
back when I was on the Ubuntu Foundations team and maintaining things like
the <a href="https://launchpad.net/ubiquity">Ubiquity installer</a>. When I moved to
Launchpad in 2015 it was certainly on my mind that this was a large body of
code still stuck on Python 2. One option would have been to just accept
that and leave it as it is, maybe doing more backporting work over time as
support for Python 2 fades away. I’ve long been of the opinion that this
would doom Launchpad to being unmaintainable in the long run, and since I
genuinely love working on Launchpad - I find it an incredibly rewarding
project - this wasn’t something I was willing to accept. We’re already
seeing some of our important dependencies dropping support for Python 2,
which is perfectly reasonable on their terms but which is starting to become
a genuine obstacle to delivering important features when we need new
features from newer versions of those dependencies. It also looks as though
it may be difficult for us to run on Ubuntu 20.04 <span class="caps">LTS</span> (we’re currently on
16.04, with an upgrade to 18.04 in progress) as long as we still require
Python 2, since we have some system dependencies that 20.04 no longer
provides. And then there are exciting new features like <a href="https://docs.python.org/3/library/typing.html">type
hints</a> and
<a href="https://docs.python.org/3/library/asyncio.html">async/await</a> that we’d like
to be able to use.</p>
<p>However, until last year there were so many blockers that even considering a
port was barely conceivable. What changed in 2019 was sorting out a
trifecta of core dependencies. We <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/storm-py3.html">ported</a> our
database layer, <a href="https://storm.canonical.com/">Storm</a>. We
<a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/376781">upgraded</a>
to modern versions of our <a href="https://www.zope.org/">Zope</a> Toolkit dependencies
(after contributing various fixes upstream, including some substantial
changes to Zope’s <a href="https://pypi.org/project/zope.testrunner/">test runner</a>
that we’d carried as local patches for some years). And we
<a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/373805">ported</a>
our Bazaar code hosting infrastructure to
<a href="https://www.breezy-vcs.org/">Breezy</a>. With all that in place, a port
seemed more of a realistic possibility.</p>
<p>Still, even with this, it was never going to be a matter of just following
some <a href="http://python3porting.com/">standard porting advice</a> and calling it
good. Launchpad has almost a million lines of Python code in its <a href="https://git.launchpad.net/launchpad">main git
tree</a>, and around 250 dependencies of
which a number are quite Launchpad-specific. In a project that size, not
only is following standard porting advice an extremely time-consuming task
in its own right, but just about every strange corner case is going to show
up somewhere. (Did you know that <code>StringIO.StringIO(None)</code> and
<code>io.StringIO(None)</code> do different things even after you account for the
native string vs. Unicode text difference? How about <a href="https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/385711">the behaviour of
<code>.union()</code> on a subclass of
<code>frozenset</code></a>?)
Launchpad’s test suite is fortunately extremely thorough, but even just
starting up the test suite involves importing most of the data model code,
so before you can start taking advantage of it you have to make a large
fraction of the codebase be at least syntactically-correct Python 3 code and
use only modules that exist in Python 3 while still working in Python 2; in
a project this size that turns out to be a large effort on its own, and can
be quite
<a href="https://blog.launchpad.net/general/login-regression-for-users-with-non-ascii-names">risky</a>
in places.</p>
<p>Canonical’s product engineering teams work on a six-month cycle, but it just
isn’t possible to cram this sort of thing into six months unless you do
literally nothing else, and “please can we put all feature development on
hold while we run to stand still” is a pretty tough sell to even the most
understanding management. Fortunately, we’ve been able to grow the
<a href="https://launchpad.net/~launchpad">Launchpad team</a> in the last year or so,
and so it’s been possible to put “Python 3” on our roadmap on the
understanding that we aren’t going to get all the way there in one cycle,
while still being able to do other substantial feature development work as well.</p>
<p>So, with all that preamble, what have we done this cycle? We’ve taken a
two-pronged approach. From one end, we identified 147 classes that needed
to be ported away from some compatibility code in our database layer that
was substantially less friendly to Python 3: we’ve ported 38 of those, so
there’s clearly a fair bit more to do, but we were able to distribute this
work out among the team quite effectively. From the other end, it was clear
that it would be very inefficient to do general porting work when any
attempt to even run the test suite would run straight into the same crashes
in the same order, so I set myself a target of getting the test suite to
start up, and started hacking on an <a href="https://git.launchpad.net/~cjwatson/launchpad?h=py3">enormous git
branch</a> that I never
expected to try to land directly: instead, I felt free to commit just about
anything that looked reasonable and moved things forward even if it was very
rough, and every so often went back to tidy things up and cherry-pick
individual commits into a form that included some kind of explanation and
passed existing tests so that I could propose them for review.</p>
<p>This strategy has been dramatically more successful than anything I’ve tried
before at this scale. So far this cycle, considering only Launchpad’s main
git tree, we’ve landed 137 Python-3-relevant merge proposals for a total of
39552 lines of <code>git diff</code> output, keeping our existing tests passing along
the way and deploying incrementally to production. We have about 27000 more
lines of patch at varying degrees of quality to tidy up and merge. Our main
development branch is only perhaps 10 or 20 more patches away from the test
suite being able to start up, at which point we’ll be able to get a buildbot
running so that multiple developers can work on this much more easily and
see the effect of their work. With the full unlanded patch stack, about 75%
of the test suite passes on Python 3! This still leaves a long tail of
several thousand tests to figure out and fix, but it’s a much more
incrementally-tractable kind of problem than where we started.</p>
<p>Finally: the funniest (to me) bug I’ve encountered in this effort was the
one I encountered in the test runner and fixed in
<a href="https://github.com/zopefoundation/zope.testrunner/pull/106">zopefoundation/zope.testrunner#106</a>:
IDs of failing tests were written to a pipe, so if you have a test suite
that’s large enough and broken enough then eventually that pipe would reach
its capacity and your test runner would just give up and hang. Pretty
annoying when it meant an overnight test run didn’t give useful results, but
also eloquent commentary of sorts.</p>Porting Storm to Python 32019-09-22T08:56:42+01:002019-09-22T08:56:42+01:00Colin Watsontag:www.chiark.greenend.org.uk,2019-09-22:/~cjwatson/blog/storm-py3.html<p>We released <a href="https://storm.canonical.com/">Storm</a> 0.21 on Friday (the
release announcement seems to be stuck in moderation, but you can look at
the <a href="https://bazaar.launchpad.net/+branch/storm/view/head:/NEWS"><span class="caps">NEWS</span></a> file
directly). For me, the biggest part of this release was adding Python 3 support.</p>
<p>Storm is a really nice and lightweight <span class="caps">ORM</span> (object-relational mapper) for
Python …</p><p>We released <a href="https://storm.canonical.com/">Storm</a> 0.21 on Friday (the
release announcement seems to be stuck in moderation, but you can look at
the <a href="https://bazaar.launchpad.net/+branch/storm/view/head:/NEWS"><span class="caps">NEWS</span></a> file
directly). For me, the biggest part of this release was adding Python 3 support.</p>
<p>Storm is a really nice and lightweight <span class="caps">ORM</span> (object-relational mapper) for
Python, developed by Canonical. We use it for some major products
(<a href="https://launchpad.net/">Launchpad</a> and
<a href="https://landscape.canonical.com/">Landscape</a> are the ones I know of), and
it’s also free software and used by some other folks as well. Other popular
ORMs for Python include <a href="http://sqlobject.org/">SQLObject</a>,
<a href="https://www.sqlalchemy.org/">SQLAlchemy</a> and the
<a href="https://www.djangoproject.com/">Django</a> <span class="caps">ORM</span>; we use those in various places
too depending on the context, but personally I’ve always preferred Storm for
the readability of code that uses it and for how easy it is to debug and
extend it.</p>
<p>It’s been a problem for a while that Storm only worked with Python 2. It’s
one of a handful of major blockers to getting Launchpad running on Python 3,
which we definitely want to do; <a href="https://github.com/stoq/stoq">stoq</a> ended
up with a local fork of Storm to cope with this; and it was recently
<a href="https://bugs.debian.org/933983">removed from Debian</a> for this and other
reasons. None of that was great. So, with significant assistance from a
large patch contributed by Thiago Bellini, and with patient code review from
Simon Poirier and some of my other colleagues, we finally managed to get
that sorted out in this release.</p>
<p>In many ways, Storm was in fairly good shape already for a project that
hadn’t yet been ported to Python 3: while its internal idea of which strings
were bytes and which text required quite a bit of untangling in the way that
Python 2 code usually does, its normal class used for text database columns
was already <code>Unicode</code> which only accepted text input (<code>unicode</code> in Python
2), so it could have been a lot worse; this also means that applications
that use Storm tend to get at least this part right even in Python 2. Aside
from the bytes/text thing, many of the required changes were just the usual
largely-mechanical ones that anyone who’s done 2-to-3 porting will be
familiar with. But there were some areas that required non-trivial thought,
and I’d like to talk about some of those here.</p>
<h2>Exception types</h2>
<p>Concrete database implementations such as
<a href="http://initd.org/psycopg/">psycopg2</a> raise implementation-specific
exception types. The inheritance hierarchy for these is defined by the
<a href="https://www.python.org/dev/peps/pep-0249/">Python Database <span class="caps">API</span></a> (<span class="caps">DB</span>-<span class="caps">API</span>),
but the actual exception classes aren’t in a common place; rather, you might
get an instance of <code>psycopg2.errors.IntegrityError</code> when using PostgreSQL
but an instance of <code>sqlite3.IntegrityError</code> when using SQLite. To make
things easier for applications that don’t have a strict requirement for a
particular database backend, Storm arranged to inject its own virtual
exception types as additional base classes of these concrete exceptions by
patching their <code>__bases__</code> attribute, so for example, you could import
<code>IntegrityError</code> from <code>storm.exceptions</code> and catch that rather than having
to catch each backend-specific possibility.</p>
<p>Although this was always a bit of a cheat, it worked well in practice for a
while, but the first sign of trouble even before porting to Python 3 was
with psycopg2 2.5. This release started implementing its <span class="caps">DB</span>-<span class="caps">API</span> exception
types in a C extension, which meant that it was no longer possible to patch
<code>__bases__</code>. To get around that, a few years ago I landed a
<a href="https://code.launchpad.net/~cjwatson/storm/psycopg-2.5/+merge/278330">patch</a>
to Storm to use <code>abc.ABCMeta.register</code> instead to register the <span class="caps">DB</span>-<span class="caps">API</span>
exceptions as virtual subclasses of Storm’s exceptions, which solved the
problem for Python 2. However, even at the time I landed that, I knew that
it would be a porting obstacle due to <a href="https://bugs.python.org/issue12029">Python issue
12029</a>; Django ran into that as well.</p>
<p>In the end, I opted to
<a href="https://code.launchpad.net/~cjwatson/storm/refactor-exception-wrapping/+merge/369319">refactor</a>
how Storm handles exceptions: it now wraps cursor and connection objects in
such a way as to catch <span class="caps">DB</span>-<span class="caps">API</span> exceptions raised by their methods and
properties and re-raise them using wrapper exception types that inherit from
both the appropriate subclass of <code>StormError</code> and the original <span class="caps">DB</span>-<span class="caps">API</span>
exception type, and with some care I even managed to avoid this being
painfully repetitive. Out-of-tree database backends will need to make some
minor adjustments (removing <code>install_exceptions</code>, adding an
<code>_exception_module</code> property to their <code>Database</code> subclass, adjusting the
<code>raw_connect</code> method of their <code>Database</code> subclass to do exception wrapping,
and possibly implementing <code>_make_combined_exception_type</code> and/or
<code>_wrap_exception</code> if they need to add extra attributes to the wrapper
exceptions). Applications that follow the usual Storm idiom of catching
<code>StormError</code> or any of its subclasses should continue to work without
needing any changes.</p>
<h2>SQLObject compatibility</h2>
<p>Storm includes some <span class="caps">API</span> compatibility with SQLObject; this was from before
my time, but I believe it was mainly because Launchpad and possibly
Landscape previously used SQLObject and this made the port to Storm very
much easier. It still works fine for the parts of Launchpad that haven’t
been ported to Storm, but I wouldn’t be surprised if there were newer
features of SQLObject that it doesn’t support.</p>
<p>The main question here was what to do with <code>StringCol</code> and its associated
<code>AutoUnicodeVariable</code>. I opted to make these explicitly only accept text on
Python 3, since the main reason for them to accept bytes was to allow using
them with Python 2 native strings (i.e. <code>str</code>), and on Python 3 <code>str</code> is
already text so there’s much less need for the porting affordance in that case.</p>
<p>Since releasing 0.21 I realised that the <code>StringCol</code> implementation in
SQLObject itself in fact accepts both bytes and text even on Python 3, so
it’s possible that we’ll need to change this in the future, although we
haven’t yet found any real code using Storm’s SQLObject compatibility layer
that might rely on this. Still, it’s much easier for Storm to start out on
the stricter side and perhaps become more lenient than it is to go the other
way round.</p>
<h2>inspect.getargspec</h2>
<p>Storm had some fairly complicated use of <code>inspect.getargspec</code> on Python 2 as
part of its test mocking arrangements. This didn’t work in Python 3 due to
some subtleties relating to bound methods. I
<a href="https://code.launchpad.net/~cjwatson/storm/py3-mocker-inspect/+merge/371174">switched</a>
to the modern <code>inspect.signature</code> <span class="caps">API</span> in Python 3 to fix this, which in any
case is rather simpler with the exception of a wrinkle in how method
descriptors work.</p>
<p>(It’s possible that these mocking arrangements could be simplified nowadays
by using some more off-the-shelf mocking library; I haven’t looked into that
in any detail.)</p>
<h2>What’s next?</h2>
<p>I’m <a href="https://bugs.debian.org/940876">working on getting Storm back into
Debian</a> now, which will be with Python 3
support only since Debian is in the process of gradually removing Python 2
module support. Other than that I don’t really have any particular plans
for Storm at the moment (although of course I’m not the only person with an
interest in it), aside from ideally avoiding leaving six years between
releases again. I expect we can go back into bug-fixing mode there for a while.</p>
<p>From the Launchpad side, I’ve recently made progress on one of the other
major Python 3 blockers (porting Bazaar code hosting to
<a href="https://www.breezy-vcs.org/">Breezy</a>, coming soon). There are still some
other significant blockers, the largest being migrating to Mailman 3,
subvertpy fixes so that we can port code importing to Breezy as well, and
porting the lazr.restful stack; but we may soon be able to reach the point
where it’s possible to start running interesting subsets of the test suite
using Python 3 and categorising the failures, at which point we’ll be able
to get a much better idea of how far we still have to go. Porting a project
with the best part of a million lines of code and around three hundred
dependencies is always going to take a while, but I’m happy to be making
progress there, both due to Python 2’s impending end of upstream support and
so that eventually we can start using new language facilities.</p>man-db 2.8.72019-08-27T06:55:25+01:002019-08-27T06:55:25+01:00Colin Watsontag:www.chiark.greenend.org.uk,2019-08-27:/~cjwatson/blog/man-db-2.8.7.html<p>I’ve released man-db 2.8.7
(<a href="https://lists.nongnu.org/archive/html/man-db-announce/2019-08/msg00002.html">announcement</a>,
<a href="http://git.savannah.gnu.org/cgit/man-db.git/tree/NEWS?id=2.8.7"><span class="caps">NEWS</span></a>),
and uploaded it to Debian unstable.</p>
<p>There are a few things of note that I wanted to talk about here. Firstly, I
made some further improvements to the seccomp sandbox originally introduced
in 2.8.0. I do still think it …</p><p>I’ve released man-db 2.8.7
(<a href="https://lists.nongnu.org/archive/html/man-db-announce/2019-08/msg00002.html">announcement</a>,
<a href="http://git.savannah.gnu.org/cgit/man-db.git/tree/NEWS?id=2.8.7"><span class="caps">NEWS</span></a>),
and uploaded it to Debian unstable.</p>
<p>There are a few things of note that I wanted to talk about here. Firstly, I
made some further improvements to the seccomp sandbox originally introduced
in 2.8.0. I do still think it’s correct to try to confine subprocesses this
way as a defence against malicious documents, but it’s also been a pretty
rough ride for some users, especially those who use various kinds of VPNs or
antivirus programs that install themselves using <code>/etc/ld.so.preload</code> and
cause other programs to perform additional system calls. As well as a few
specific tweaks, a <a href="https://lwn.net/Articles/796108/">recent discussion on
<span class="caps">LWN</span></a> reminded me that it would be better
to make seccomp return <code>EPERM</code> rather than raising <code>SIGSYS</code>, since that’s
easier to handle gracefully: in particular, it fixes <a href="https://bugs.debian.org/902257">an odd corner case
related to glibc’s nscd handling</a>.</p>
<p>Secondly, there was a <a href="https://savannah.nongnu.org/bugs/?56734">build failure on
macOS</a> that took a while to figure
out, not least because I don’t have a macOS test system myself. In 2.8.6 I
tried to make life easier for people on this platform with a <a href="https://git.savannah.gnu.org/cgit/man-db.git/commit/?id=056e8c7c012b00261133259d6438ff8303a8c36c"><code>CFLAGS</code>
tweak</a>,
but I made it a bit too general and accidentally took away configure’s
ability to detect undefined symbols properly, which caused very confusing
failures. More importantly, I hadn’t really thought through why this change
was necessary and whether it was a good idea. man-db uses private shared
libraries to keep its executable size down, and it passes <code>-no-undefined</code> to
<code>libtool</code> to declare that those shared libraries have no undefined symbols
after linking, which is necessary to build shared libraries on some
platforms. But the <code>CFLAGS</code> tweak above directly contradicts this! So,
instead of playing core wars with my own build system, I did some
refactoring so that the assertion that man-db’s shared libraries have no
undefined symbols after linking is actually true: this involved <a href="https://git.savannah.gnu.org/cgit/man-db.git/commit/?id=2519d2ffe769a4059bfe475a092afa40722eb38d">moving
decompression code out of
<code>libman</code></a>,
and arranging for the code in <code>libmandb</code> to take the database path as a
parameter rather than as a global variable (something I’ve meant to fix for
ages anyway;
<a href="https://git.savannah.gnu.org/cgit/man-db.git/commit/?id=252d7cbc2328b27457aafcbd6fa5958a8be9fded">252d7cbc23</a>,
<a href="https://git.savannah.gnu.org/cgit/man-db.git/commit/?id=036aa910ea000d716bcf0f4bcbcee3a54a848be7">036aa910ea</a>,
<a href="https://git.savannah.gnu.org/cgit/man-db.git/commit/?id=a97d977b0bfc1ed34c3021b8d6702b047e8251af">a97d977b0b</a>).
Lesson: don’t make build system changes you don’t quite understand.</p>Buster upgrade2019-05-05T01:10:48+01:002019-05-05T01:10:48+01:00Colin Watsontag:www.chiark.greenend.org.uk,2019-05-05:/~cjwatson/blog/buster-upgrade.html<p>I upgraded my home server from Debian stretch to buster recently, which is
something I normally do once we’re frozen: this is a system that was first
installed in 1999 and has a lot of complicated stuff on it, and while I try
to keep it as cleanly-maintained as …</p><p>I upgraded my home server from Debian stretch to buster recently, which is
something I normally do once we’re frozen: this is a system that was first
installed in 1999 and has a lot of complicated stuff on it, and while I try
to keep it as cleanly-maintained as I can it still often runs into some
interesting problems. Things went largely <span class="caps">OK</span> this time round, although
there were a few snags of various degrees of severity, some of which weren’t
Debian’s fault.</p>
<p>As ever, <code>etckeeper</code> made it much more comfortable to make non-trivial
configuration file changes without fearing that I was going to lose information.</p>
<ul>
<li>
<p>The first <code>apt full-upgrade</code> failed part-way through with “dependency
problems prevent processing triggers for desktop-file-utils” for what
didn’t seem like a particularly good reason; <code>dpkg --configure -a</code> sorted
it out and I was able to resume the upgrade from there. I think I’ve
seen a report of this somewhere recently as it rang a bell, though I
haven’t yet found it.</p>
</li>
<li>
<p>I had a number of truly annoying configuration file resolutions to
perform. There’s not much to be done about that except try to gradually
move things to <code>.d</code> directories where available, and other such
strategies to minimise the local differences I’m maintaining.</p>
</li>
<li>
<p>I had an old backup disk that had failed some time ago but was still
plugged in and occasionally generating <span class="caps">ATA</span> errors. These made some parts
of the upgrade excruciatingly slow, so as soon as I got to a point where
I had to reboot anyway I took the opportunity to open up the case and
unplug it.</p>
</li>
<li>
<p>I hit <a href="https://bugs.debian.org/919621">#919621 “lvm2: Update unexpectedly activates system <span class="caps">ID</span> check,
bypassing impossible”</a>. Fortunately I
noticed the problem before rebooting due to warning messages from various
things, and I adjusted my <span class="caps">LVM</span> configuration to set a system <span class="caps">ID</span> matching
the one in my volume group. Unfortunately I forgot to run
<code>update-initramfs -u</code> after doing so, and so I ended up having to use
<code>break=premount</code> on the kernel command line and fix things up in the same
way in the initramfs until I could update it properly. I’m not sure what
the right fix for this is, although it probably only affects some rather
old VGs; I created mine in 2004.</p>
</li>
<li>
<p>I ran into <a href="https://bugs.debian.org/924881">#924881 “postgresql: buster upgrade breaks older postgresql
(9.6) and newer postgresql (11) is also
inoperative”</a> (in fact a bug in
<code>ssl-cert</code>). It was correct to reject the snakeoil certificate, but the
upgrade failure mode was pretty graceless and it would have been helpful
for something to notice the situation and prompt me to regenerate the certificate.</p>
</li>
<li>
<p>My networking wasn’t happy after the upgrade; I ended up with some
missing addresses, which I’m prepared to believe was the fault of my very
old and badly-organised <code>/etc/network/interfaces</code> file, so I rearranged
it to follow what seems to be the modern best practice of handling
multiple addresses on an interface by just having one <code>iface</code> stanza per
address using the same interface name, rather than <code>pre-up ip addr add</code>
lines or alias interfaces or anything like that. After that, the
interface sometimes refused to come up at all with “<span class="caps">ADDRCONF</span>(NETDEV_UP):
eth0: link is not ready” messages. Some web-searching and grepping of
the kernel source led me to the idea that listing <code>inet6</code> stanzas before
<code>inet</code> stanzas for a given interface name was likely to be helpful, and
so it proved: I now have an <code>/etc/network/interfaces</code> that both works and
is much easier to read.</p>
</li>
<li>
<p>I had to do some manual steps to get Icinga Web 2 authentication working
again: I followed the <a href="https://icinga.com/docs/icingaweb2/latest/doc/80-Upgrading/#upgrading-pgsql-db">upstream directions to upgrade the database
schema</a>,
and I had to run <code>a2enmod php7.3</code> manually since the previous enablement
of <code>php7.0</code> wasn’t carried over. (I’m not completely sure if the first
step was required, but the second certainly was.)</p>
</li>
</ul>
<p>Other than that, everything seems to be working well now.</p>binfmt-support 2.2.02019-01-25T11:21:57+00:002019-01-25T11:21:57+00:00Colin Watsontag:www.chiark.greenend.org.uk,2019-01-25:/~cjwatson/blog/binfmt-support-2.2.0.html<p>I’ve released binfmt-support 2.2.0. These are the major changes since 2.1.8:</p>
<ul>
<li>Remove support for the old procfs interface, which has been unused since
Linux 2.4.13 and which caused trouble in environments where we can’t use
modprobe. Thanks to Bastian Blank.</li>
<li>Sort formats …</li></ul><p>I’ve released binfmt-support 2.2.0. These are the major changes since 2.1.8:</p>
<ul>
<li>Remove support for the old procfs interface, which has been unused since
Linux 2.4.13 and which caused trouble in environments where we can’t use
modprobe. Thanks to Bastian Blank.</li>
<li>Sort formats by name in the output of <code>update-binfmts --display</code>.</li>
<li>Building binfmt-support now requires Autoconf >= 2.63.</li>
<li>Add a new <code>--unimport</code> action, which is the inverse of <code>--import</code>.</li>
<li>Don’t enable formats on import or disable them on unimport unless
<code>/proc/sys/fs/binfmt_misc</code> is already mounted. This avoids causing
cleanup problems in chroots.</li>
<li><code>--fix-binary yes</code> is incompatible with detectors. Warn the user if they
try to use both at once. Thanks to Stefan Agner.</li>
</ul>
<p>In the corresponding Debian upload (2.2.0-1), I’ve changed <span class="caps">README</span>.Debian to
recommend using <code>update-binfmts --unimport <name></code> in the prerm rather than
a more complicated <code>update-binfmts --package <package> --remove <name>
<path></code> command. I don’t intend to push for existing packages to switch
over to this before buster, though, since the stricter package relationships
needed to arrange for a new enough version of binfmt-support to be present
when the prerm runs would make the upgrade path more complicated, and it
isn’t an urgent change.</p>Deploying Swift2018-12-04T01:37:11+00:002018-12-04T01:37:11+00:00Colin Watsontag:www.chiark.greenend.org.uk,2018-12-04:/~cjwatson/blog/deploying-swift.html<p>Sometimes I want to deploy <a href="https://docs.openstack.org/swift/">Swift</a>, the
OpenStack object storage system.</p>
<p>Well, no, that’s not true. I basically never actually want to deploy Swift
as such. What I generally want to do is to debug some bit of production
service deployment machinery that relies on Swift for getting build …</p><p>Sometimes I want to deploy <a href="https://docs.openstack.org/swift/">Swift</a>, the
OpenStack object storage system.</p>
<p>Well, no, that’s not true. I basically never actually want to deploy Swift
as such. What I generally want to do is to debug some bit of production
service deployment machinery that relies on Swift for getting build
artifacts into the right place, or maybe the parts of the
<a href="https://launchpad.net/">Launchpad</a> librarian (our blob storage service)
that use Swift. I could find an existing private or public cloud that
offers the right <span class="caps">API</span> and test with that, but sometimes I need to test with
particular versions, and in any case I have a terribly slow internet
connection and shuffling large build artifacts back and forward over the
relevant bit of wet string makes it painfully slow to test things.</p>
<p>For a while I’ve had an Ubuntu 12.04 <span class="caps">VM</span> lying around with an
<a href="https://releases.openstack.org/icehouse/">Icehouse</a>-based Swift deployment
that I put together by hand. It works, but I didn’t keep good notes and
have no real idea how to reproduce it, not that I really want to keep
limping along with manually-constructed VMs for this kind of thing anyway;
and I don’t want to be dependent on obsolete releases forever. For the
sorts of things I’m doing I need to make sure that authentication works
broadly the same way as it does in a real production deployment, so I want
to have <a href="https://docs.openstack.org/keystone/">Keystone</a> too. At the same
time, I definitely don’t want to do anything close to a full OpenStack
deployment of my own: it’s much too big a sledgehammer for this particular
nut, and I don’t really have the hardware for it.</p>
<p>Here’s my solution to this, which is compact enough that I can run it on my
laptop, and while it isn’t completely automatic it’s close enough that I can
spin it up for a test and discard it when I’m finished (so I haven’t worried
very much about producing something that runs efficiently). It relies on
<a href="https://docs.jujucharms.com/">Juju</a> and
<a href="https://linuxcontainers.org/lxd/"><span class="caps">LXD</span></a>. I’ve only tested it on Ubuntu
18.04, using <a href="https://releases.openstack.org/queens/">Queens</a>; for anything
else you’re on your own. In general, I probably can’t help you if you run
into trouble with the directions here: this is provided “as is”, without
warranty of any kind, and all that kind of thing.</p>
<p>First, install Juju and <span class="caps">LXD</span> if necessary, following the instructions
provided by those projects, and also install the <code>python-openstackclient</code>
package as you’ll need it later. You’ll want to <a href="https://docs.jujucharms.com/2.4/en/tut-lxd">set Juju up to use
<span class="caps">LXD</span></a>, and you should probably
make sure that the shells you’re working in don’t have <code>http_proxy</code> set as
it’s quite likely to confuse things unless you’ve arranged for your proxy to
be able to cope with your local <span class="caps">LXD</span> containers. Then add a
<a href="https://docs.jujucharms.com/2.4/en/juju-concepts#model">model</a>:</p>
<div class="highlight"><pre><span></span><code>juju<span class="w"> </span>add-model<span class="w"> </span>swift
</code></pre></div>
<p>At this point there’s a bit of complexity that you normally don’t have to
worry about with Juju. The <a href="https://jujucharms.com/swift-storage">swift-storage
charm</a> wants to mount something to use
for storage, which with the <span class="caps">LXD</span> provider in practice ends up being some kind
of loopback mount. Unfortunately, being able to perform loopback mounts
exposes too much kernel attack surface, so <span class="caps">LXD</span> doesn’t allow unprivileged
containers to do it.
(<a href="https://bugs.launchpad.net/charm-swift-storage/+bug/1250965">Ideally</a> the
swift-storage charm would just let you use directory storage instead.) To
make the containers we’re about to create privileged enough for this to
work, run:</p>
<div class="highlight"><pre><span></span><code>lxc<span class="w"> </span>profile<span class="w"> </span><span class="nb">set</span><span class="w"> </span>juju-swift<span class="w"> </span>security.privileged<span class="w"> </span><span class="nb">true</span>
lxc<span class="w"> </span>profile<span class="w"> </span>device<span class="w"> </span>add<span class="w"> </span>juju-swift<span class="w"> </span>loop-control<span class="w"> </span>unix-char<span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="nv">major</span><span class="o">=</span><span class="m">10</span><span class="w"> </span><span class="nv">minor</span><span class="o">=</span><span class="m">237</span><span class="w"> </span><span class="nv">path</span><span class="o">=</span>/dev/loop-control
<span class="k">for</span><span class="w"> </span>i<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="k">$(</span>seq<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">255</span><span class="k">)</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>lxc<span class="w"> </span>profile<span class="w"> </span>device<span class="w"> </span>add<span class="w"> </span>juju-swift<span class="w"> </span>loop<span class="nv">$i</span><span class="w"> </span>unix-block<span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="nv">major</span><span class="o">=</span><span class="m">7</span><span class="w"> </span><span class="nv">minor</span><span class="o">=</span><span class="nv">$i</span><span class="w"> </span><span class="nv">path</span><span class="o">=</span>/dev/loop<span class="nv">$i</span>
<span class="k">done</span>
</code></pre></div>
<p>Now we can start deploying things! Save this to a file, e.g.
<code>swift.bundle</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nt">series</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">bionic</span>
<span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s">"Swift</span><span class="nv"> </span><span class="s">in</span><span class="nv"> </span><span class="s">a</span><span class="nv"> </span><span class="s">box"</span>
<span class="nt">applications</span><span class="p">:</span>
<span class="w"> </span><span class="nt">mysql</span><span class="p">:</span>
<span class="w"> </span><span class="nt">charm</span><span class="p">:</span><span class="w"> </span><span class="s">"cs:mysql-62"</span>
<span class="w"> </span><span class="nt">channel</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">candidate</span>
<span class="w"> </span><span class="nt">num_units</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="w"> </span><span class="nt">options</span><span class="p">:</span>
<span class="w"> </span><span class="nt">dataset-size</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">512M</span>
<span class="w"> </span><span class="nt">keystone</span><span class="p">:</span>
<span class="w"> </span><span class="nt">charm</span><span class="p">:</span><span class="w"> </span><span class="s">"cs:keystone"</span>
<span class="w"> </span><span class="nt">num_units</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="w"> </span><span class="nt">swift-storage</span><span class="p">:</span>
<span class="w"> </span><span class="nt">charm</span><span class="p">:</span><span class="w"> </span><span class="s">"cs:swift-storage"</span>
<span class="w"> </span><span class="nt">num_units</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="w"> </span><span class="nt">to</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">keystone</span><span class="p p-Indicator">]</span>
<span class="w"> </span><span class="nt">options</span><span class="p">:</span>
<span class="w"> </span><span class="nt">block-device</span><span class="p">:</span><span class="w"> </span><span class="s">"/etc/swift/storage.img|5G"</span>
<span class="w"> </span><span class="nt">swift-proxy</span><span class="p">:</span>
<span class="w"> </span><span class="nt">charm</span><span class="p">:</span><span class="w"> </span><span class="s">"cs:swift-proxy"</span>
<span class="w"> </span><span class="nt">num_units</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="w"> </span><span class="nt">to</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">mysql</span><span class="p p-Indicator">]</span>
<span class="w"> </span><span class="nt">options</span><span class="p">:</span>
<span class="w"> </span><span class="nt">zone-assignment</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">auto</span>
<span class="w"> </span><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="nt">relations</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"keystone:shared-db"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"mysql:shared-db"</span><span class="p p-Indicator">]</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"swift-proxy:swift-storage"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"swift-storage:swift-storage"</span><span class="p p-Indicator">]</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"swift-proxy:identity-service"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"keystone:identity-service"</span><span class="p p-Indicator">]</span>
</code></pre></div>
<p>And run:</p>
<div class="highlight"><pre><span></span><code>juju<span class="w"> </span>deploy<span class="w"> </span>./swift.bundle
</code></pre></div>
<p>This will take a while. You can run <code>juju status</code> to see how it’s going in
general terms, or <code>juju debug-log</code> for detailed logs from the individual
containers as they’re putting themselves together. When it’s all done, it
should look something like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">Model</span><span class="w"> </span><span class="n">Controller</span><span class="w"> </span><span class="n">Cloud</span><span class="o">/</span><span class="n">Region</span><span class="w"> </span><span class="n">Version</span><span class="w"> </span><span class="n">SLA</span>
<span class="n">swift</span><span class="w"> </span><span class="n">lxd</span><span class="w"> </span><span class="n">localhost</span><span class="w"> </span><span class="mf">2.3</span><span class="o">.</span><span class="mi">1</span><span class="w"> </span><span class="n">unsupported</span>
<span class="n">App</span><span class="w"> </span><span class="n">Version</span><span class="w"> </span><span class="n">Status</span><span class="w"> </span><span class="n">Scale</span><span class="w"> </span><span class="n">Charm</span><span class="w"> </span><span class="n">Store</span><span class="w"> </span><span class="n">Rev</span><span class="w"> </span><span class="n">OS</span><span class="w"> </span><span class="n">Notes</span>
<span class="n">keystone</span><span class="w"> </span><span class="mf">13.0</span><span class="o">.</span><span class="mi">1</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">keystone</span><span class="w"> </span><span class="n">jujucharms</span><span class="w"> </span><span class="mi">290</span><span class="w"> </span><span class="n">ubuntu</span>
<span class="n">mysql</span><span class="w"> </span><span class="mf">5.7</span><span class="o">.</span><span class="mi">24</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">mysql</span><span class="w"> </span><span class="n">jujucharms</span><span class="w"> </span><span class="mi">62</span><span class="w"> </span><span class="n">ubuntu</span>
<span class="n">swift</span><span class="o">-</span><span class="n">proxy</span><span class="w"> </span><span class="mf">2.17</span><span class="o">.</span><span class="mi">0</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">swift</span><span class="o">-</span><span class="n">proxy</span><span class="w"> </span><span class="n">jujucharms</span><span class="w"> </span><span class="mi">75</span><span class="w"> </span><span class="n">ubuntu</span>
<span class="n">swift</span><span class="o">-</span><span class="n">storage</span><span class="w"> </span><span class="mf">2.17</span><span class="o">.</span><span class="mi">0</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">swift</span><span class="o">-</span><span class="n">storage</span><span class="w"> </span><span class="n">jujucharms</span><span class="w"> </span><span class="mi">250</span><span class="w"> </span><span class="n">ubuntu</span>
<span class="n">Unit</span><span class="w"> </span><span class="n">Workload</span><span class="w"> </span><span class="n">Agent</span><span class="w"> </span><span class="n">Machine</span><span class="w"> </span><span class="n">Public</span><span class="w"> </span><span class="n">address</span><span class="w"> </span><span class="n">Ports</span><span class="w"> </span><span class="n">Message</span>
<span class="n">keystone</span><span class="o">/</span><span class="mi">0</span><span class="o">*</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="n">idle</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="mf">10.36</span><span class="o">.</span><span class="mf">63.133</span><span class="w"> </span><span class="mi">5000</span><span class="o">/</span><span class="n">tcp</span><span class="w"> </span><span class="n">Unit</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">ready</span>
<span class="n">mysql</span><span class="o">/</span><span class="mi">0</span><span class="o">*</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="n">idle</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mf">10.36</span><span class="o">.</span><span class="mf">63.44</span><span class="w"> </span><span class="mi">3306</span><span class="o">/</span><span class="n">tcp</span><span class="w"> </span><span class="n">Ready</span>
<span class="n">swift</span><span class="o">-</span><span class="n">proxy</span><span class="o">/</span><span class="mi">0</span><span class="o">*</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="n">idle</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mf">10.36</span><span class="o">.</span><span class="mf">63.44</span><span class="w"> </span><span class="mi">8080</span><span class="o">/</span><span class="n">tcp</span><span class="w"> </span><span class="n">Unit</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">ready</span>
<span class="n">swift</span><span class="o">-</span><span class="n">storage</span><span class="o">/</span><span class="mi">0</span><span class="o">*</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="n">idle</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="mf">10.36</span><span class="o">.</span><span class="mf">63.133</span><span class="w"> </span><span class="n">Unit</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">ready</span>
<span class="n">Machine</span><span class="w"> </span><span class="n">State</span><span class="w"> </span><span class="n">DNS</span><span class="w"> </span><span class="n">Inst</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="n">Series</span><span class="w"> </span><span class="n">AZ</span><span class="w"> </span><span class="n">Message</span>
<span class="mi">0</span><span class="w"> </span><span class="n">started</span><span class="w"> </span><span class="mf">10.36</span><span class="o">.</span><span class="mf">63.133</span><span class="w"> </span><span class="n">juju</span><span class="o">-</span><span class="n">d3e703</span><span class="o">-</span><span class="mi">0</span><span class="w"> </span><span class="n">bionic</span><span class="w"> </span><span class="n">Running</span>
<span class="mi">1</span><span class="w"> </span><span class="n">started</span><span class="w"> </span><span class="mf">10.36</span><span class="o">.</span><span class="mf">63.44</span><span class="w"> </span><span class="n">juju</span><span class="o">-</span><span class="n">d3e703</span><span class="o">-</span><span class="mi">1</span><span class="w"> </span><span class="n">bionic</span><span class="w"> </span><span class="n">Running</span>
</code></pre></div>
<p>At this point you have what should be a working installation, but with only
administrative privileges set up. Normally you want to create at least one
normal user. To do this, start by creating a configuration file granting
administrator privileges (this one comes verbatim from the <a href="https://api.jujucharms.com/charmstore/v5/openstack-base/archive/openrc">openstack-base
bundle</a>,
though with one
<a href="https://github.com/openstack-charmers/openstack-charm-testing/commit/720a55eb629653bd78194a0cfbc21864406252d7#diff-291a64b96dd1baf50cca0b4c158a47b2">change</a>
that isn’t yet in the charm store version at the time of writing):</p>
<div class="highlight"><pre><span></span><code><span class="nv">_OS_PARAMS</span><span class="o">=</span><span class="k">$(</span>env<span class="w"> </span><span class="p">|</span><span class="w"> </span>awk<span class="w"> </span><span class="s1">'BEGIN {FS="="} /^OS_/ {print $1;}'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>paste<span class="w"> </span>-sd<span class="w"> </span><span class="s1">' '</span><span class="k">)</span>
<span class="k">for</span><span class="w"> </span>param<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nv">$_OS_PARAMS</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">[</span><span class="w"> </span><span class="s2">"</span><span class="nv">$param</span><span class="s2">"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"OS_AUTH_PROTOCOL"</span><span class="w"> </span><span class="o">]</span><span class="p">;</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="k">continue</span><span class="p">;</span><span class="w"> </span><span class="k">fi</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">[</span><span class="w"> </span><span class="s2">"</span><span class="nv">$param</span><span class="s2">"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"OS_CACERT"</span><span class="w"> </span><span class="o">]</span><span class="p">;</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="k">continue</span><span class="p">;</span><span class="w"> </span><span class="k">fi</span>
<span class="w"> </span><span class="nb">unset</span><span class="w"> </span><span class="nv">$param</span>
<span class="k">done</span>
<span class="nb">unset</span><span class="w"> </span>_OS_PARAMS
<span class="nv">_keystone_unit</span><span class="o">=</span><span class="k">$(</span>juju<span class="w"> </span>status<span class="w"> </span>keystone<span class="w"> </span>--format<span class="w"> </span>yaml<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>awk<span class="w"> </span><span class="s1">'/units:$/ {getline; gsub(/:$/, ""); print $1; exit}'</span><span class="k">)</span>
<span class="nv">_keystone_ip</span><span class="o">=</span><span class="k">$(</span>juju<span class="w"> </span>run<span class="w"> </span>--unit<span class="w"> </span><span class="si">${</span><span class="nv">_keystone_unit</span><span class="si">}</span><span class="w"> </span><span class="s1">'unit-get private-address'</span><span class="k">)</span>
<span class="nv">_password</span><span class="o">=</span><span class="k">$(</span>juju<span class="w"> </span>run<span class="w"> </span>--unit<span class="w"> </span><span class="si">${</span><span class="nv">_keystone_unit</span><span class="si">}</span><span class="w"> </span><span class="s1">'leader-get admin_passwd'</span><span class="k">)</span>
<span class="nb">export</span><span class="w"> </span><span class="nv">OS_AUTH_URL</span><span class="o">=</span><span class="si">${</span><span class="nv">OS_AUTH_PROTOCOL</span><span class="k">:-</span><span class="nv">http</span><span class="si">}</span>://<span class="si">${</span><span class="nv">_keystone_ip</span><span class="si">}</span>:5000/v3
<span class="nb">export</span><span class="w"> </span><span class="nv">OS_USERNAME</span><span class="o">=</span>admin
<span class="nb">export</span><span class="w"> </span><span class="nv">OS_PASSWORD</span><span class="o">=</span><span class="si">${</span><span class="nv">_password</span><span class="si">}</span>
<span class="nb">export</span><span class="w"> </span><span class="nv">OS_USER_DOMAIN_NAME</span><span class="o">=</span>admin_domain
<span class="nb">export</span><span class="w"> </span><span class="nv">OS_PROJECT_DOMAIN_NAME</span><span class="o">=</span>admin_domain
<span class="nb">export</span><span class="w"> </span><span class="nv">OS_PROJECT_NAME</span><span class="o">=</span>admin
<span class="nb">export</span><span class="w"> </span><span class="nv">OS_REGION_NAME</span><span class="o">=</span>RegionOne
<span class="nb">export</span><span class="w"> </span><span class="nv">OS_IDENTITY_API_VERSION</span><span class="o">=</span><span class="m">3</span>
<span class="c1"># Swift needs this:</span>
<span class="nb">export</span><span class="w"> </span><span class="nv">OS_AUTH_VERSION</span><span class="o">=</span><span class="m">3</span>
<span class="c1"># Gnocchi needs this</span>
<span class="nb">export</span><span class="w"> </span><span class="nv">OS_AUTH_TYPE</span><span class="o">=</span>password
</code></pre></div>
<p>Source this into a shell: for instance, if you saved this to
<code>~/.swiftrc.juju-admin</code>, then run:</p>
<div class="highlight"><pre><span></span><code>. ~/.swiftrc.juju-admin
</code></pre></div>
<p>You should now be able to run <code>openstack endpoint list</code> and see a table for
the various services exposed by your deployment. Then you can create a
dummy project and a user with enough privileges to use Swift:</p>
<div class="highlight"><pre><span></span><code><span class="nv">USERNAME</span><span class="o">=</span>your-username
<span class="nv">PASSWORD</span><span class="o">=</span>your-password
openstack<span class="w"> </span>domain<span class="w"> </span>create<span class="w"> </span>SwiftDomain
openstack<span class="w"> </span>project<span class="w"> </span>create<span class="w"> </span>--domain<span class="w"> </span>SwiftDomain<span class="w"> </span>--description<span class="w"> </span>Swift<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>SwiftProject
openstack<span class="w"> </span>user<span class="w"> </span>create<span class="w"> </span>--domain<span class="w"> </span>SwiftDomain<span class="w"> </span>--project-domain<span class="w"> </span>SwiftDomain<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--project<span class="w"> </span>SwiftProject<span class="w"> </span>--password<span class="w"> </span><span class="s2">"</span><span class="nv">$PASSWORD</span><span class="s2">"</span><span class="w"> </span><span class="s2">"</span><span class="nv">$USERNAME</span><span class="s2">"</span>
openstack<span class="w"> </span>role<span class="w"> </span>add<span class="w"> </span>--project<span class="w"> </span>SwiftProject<span class="w"> </span>--user-domain<span class="w"> </span>SwiftDomain<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--user<span class="w"> </span><span class="s2">"</span><span class="nv">$USERNAME</span><span class="s2">"</span><span class="w"> </span>Member
</code></pre></div>
<p>(This is intended for testing rather than for doing anything particularly
sensitive. If you cared about keeping the password secret then you’d use
the <code>--password-prompt</code> option to <code>openstack user create</code> instead of
supplying the password on the command line.)</p>
<p>Now create a configuration file granting privileges for the user you just
created. I felt like automating this to at least some degree:</p>
<div class="highlight"><pre><span></span><code>touch ~/.swiftrc.juju
chmod 600 ~/.swiftrc.juju
sed '/^_password=/d;
s/\( OS_PROJECT_DOMAIN_NAME=\).*/\1SwiftDomain/;
s/\( OS_PROJECT_NAME=\).*/\1SwiftProject/;
s/\( OS_USER_DOMAIN_NAME=\).*/\1SwiftDomain/;
s/\( OS_USERNAME=\).*/\1'"$USERNAME"'/;
s/\( OS_PASSWORD=\).*/\1'"$PASSWORD"'/' \
<~/.swiftrc.juju-admin >~/.swiftrc.juju
</code></pre></div>
<p>Source this into a shell. For example:</p>
<div class="highlight"><pre><span></span><code>. ~/.swiftrc.juju
</code></pre></div>
<p>You should now find that <code>swift list</code> works. Success! Now you can <code>swift
upload</code> files, or just start testing whatever it was that you were actually
trying to test in the first place.</p>
<p>This is not a setup I expect to leave running for a long time, so to tear it
down again:</p>
<div class="highlight"><pre><span></span><code>juju destroy-model swift
</code></pre></div>
<p>This will probably get stuck trying to remove the <code>swift-storage</code> unit,
since nothing deals with detaching the loop device. If that happens, find
the relevant device in <code>losetup -a</code> from another window and use <code>losetup -d</code>
to detach it; <code>juju destroy-model</code> should then be able to proceed.</p>
<p>Credit to the Juju and <span class="caps">LXD</span> teams and to the maintainers of the various
charms used here, as well as of course to the OpenStack folks: their work
made it very much easier to put this together.</p>
<p><em>2019-01-18: Edited to deploy to two containers rather than four, and to
incorporate a <code>~/.swiftrc.juju-admin</code> change to cope with that.</em></p>An odd test failure2017-12-19T13:52:52+00:002017-12-19T13:52:52+00:00Colin Watsontag:www.chiark.greenend.org.uk,2017-12-19:/~cjwatson/blog/odd-test-failure.html<p>Weird test failures are great at teaching you things that you didn’t realise
you might need to know.</p>
<p><a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/mysterious-bug-with-twisted-plugins.html">As previously
mentioned</a>, I’ve been
working on converting Launchpad from <a href="http://www.buildout.org/">Buildout</a> to
<a href="https://virtualenv.pypa.io/en/stable/">virtualenv</a> and
<a href="https://pip.pypa.io/en/stable/">pip</a>, and I finally landed that change on
our development branch today. The final landing was …</p><p>Weird test failures are great at teaching you things that you didn’t realise
you might need to know.</p>
<p><a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/mysterious-bug-with-twisted-plugins.html">As previously
mentioned</a>, I’ve been
working on converting Launchpad from <a href="http://www.buildout.org/">Buildout</a> to
<a href="https://virtualenv.pypa.io/en/stable/">virtualenv</a> and
<a href="https://pip.pypa.io/en/stable/">pip</a>, and I finally landed that change on
our development branch today. The final landing was mostly quite smooth,
except for one test failure on our buildbot that I hadn’t seen before:</p>
<div class="highlight"><pre><span></span><code><span class="x">ERROR: lp.codehosting.codeimport.tests.test_worker.TestBzrSvnImport.test_stacked</span>
<span class="x">worker ID: unknown worker (bug in our subunit output?)</span>
<span class="x">----------------------------------------------------------------------</span>
<span class="gt">Traceback (most recent call last):</span>
<span class="gr">_StringException</span>: <span class="n">log: {{{</span>
<span class="x">36.384 creating repository in file:///tmp/testbzr-6CwSLV.tmp/lp.codehosting.codeimport.tests.test_worker.TestBzrSvnImport.test_stacked/work/stacked-on/.bzr/.</span>
<span class="x">36.388 creating branch <bzrlib.branch.BzrBranchFormat7 object at 0xeb85b36c> in file:///tmp/testbzr-6CwSLV.tmp/lp.codehosting.codeimport.tests.test_worker.TestBzrSvnImport.test_stacked/work/stacked-on/</span>
<span class="x">}}}</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"/srv/buildbot/lpbuildbot/lp-devel-xenial/build/lib/lp/codehosting/codeimport/tests/test_worker.py"</span>, line <span class="m">1108</span>, in <span class="n">test_stacked</span>
<span class="w"> </span><span class="n">stacked_on</span><span class="o">.</span><span class="n">fetch</span><span class="p">(</span><span class="n">Branch</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">source_details</span><span class="o">.</span><span class="n">url</span><span class="p">))</span>
File <span class="nb">"/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/branch.py"</span>, line <span class="m">186</span>, in <span class="n">open</span>
<span class="w"> </span><span class="n">possible_transports</span><span class="o">=</span><span class="n">possible_transports</span><span class="p">,</span> <span class="n">_unsupported</span><span class="o">=</span><span class="n">_unsupported</span><span class="p">)</span>
File <span class="nb">"/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/controldir.py"</span>, line <span class="m">689</span>, in <span class="n">open</span>
<span class="w"> </span><span class="n">_unsupported</span><span class="o">=</span><span class="n">_unsupported</span><span class="p">)</span>
File <span class="nb">"/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/controldir.py"</span>, line <span class="m">718</span>, in <span class="n">open_from_transport</span>
<span class="w"> </span><span class="n">find_format</span><span class="p">,</span> <span class="n">transport</span><span class="p">,</span> <span class="n">redirected</span><span class="p">)</span>
File <span class="nb">"/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/transport/__init__.py"</span>, line <span class="m">1719</span>, in <span class="n">do_catching_redirections</span>
<span class="w"> </span><span class="k">return</span> <span class="n">action</span><span class="p">(</span><span class="n">transport</span><span class="p">)</span>
File <span class="nb">"/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/controldir.py"</span>, line <span class="m">706</span>, in <span class="n">find_format</span>
<span class="w"> </span><span class="n">probers</span><span class="o">=</span><span class="n">probers</span><span class="p">)</span>
File <span class="nb">"/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/controldir.py"</span>, line <span class="m">1155</span>, in <span class="n">find_format</span>
<span class="w"> </span><span class="k">raise</span> <span class="n">errors</span><span class="o">.</span><span class="n">NotBranchError</span><span class="p">(</span><span class="n">path</span><span class="o">=</span><span class="n">transport</span><span class="o">.</span><span class="n">base</span><span class="p">)</span>
<span class="gr">NotBranchError</span>: <span class="n">Not a branch: "/tmp/tmpdwqrc6/trunk/".</span>
</code></pre></div>
<p>When I investigated this locally, I found that I could reproduce it if I ran
just that test on its own, but not if I ran it together with the other tests
in the same class. That’s certainly my favourite way round for test
isolation failures to present themselves (it’s more usual to find state from
one test leaking out and causing another one to fail, which can make for a
very time-consuming exercise of trying to find the critical combination),
but it’s still pretty odd.</p>
<p>I stepped through the <code>Branch.open</code> call in each case in the hope of some
enlightenment. The interesting difference was that the custom probers
installed by the <code>bzr-svn</code> plugin weren’t installed when I ran that one test
on its own, so it was trying to open a branch as a Bazaar branch rather than
using the foreign-branch logic for Subversion, and this presumably depended
on some configuration that only some tests put in place. I was on the verge
of just explicitly setting up that plugin in the test suite’s <code>setUp</code>
method, but I was still curious about exactly what was breaking this.</p>
<p>Launchpad installs several Bazaar plugins, and
<code>lib/lp/codehosting/__init__.py</code> is responsible for putting most of these in
place: anything in Launchpad itself that uses Bazaar is generally supposed
to do something like <code>import lp.codehosting</code> to set everything up. I
therefore put a breakpoint at the top of <code>lp.codehosting</code> and stepped
through it to see whether anything was going wrong in the initial setup.
Sure enough, I found that <code>bzrlib.plugins.svn</code> was failing to import due to
an exception raised by <code>bzrlib.i18n.load_plugin_translations</code>, which was
being swallowed silently but meant that its custom probers weren’t being
installed. Here’s what that function looks like:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">load_plugin_translations</span><span class="p">(</span><span class="n">domain</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Load the translations for a specific plugin.</span>
<span class="sd"> :param domain: Gettext domain name (usually 'bzr-PLUGINNAME')</span>
<span class="sd"> """</span>
<span class="n">locale_base</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span>
<span class="n">unicode</span><span class="p">(</span><span class="vm">__file__</span><span class="p">,</span> <span class="n">sys</span><span class="o">.</span><span class="n">getfilesystemencoding</span><span class="p">()))</span>
<span class="n">translation</span> <span class="o">=</span> <span class="n">install_translations</span><span class="p">(</span><span class="n">domain</span><span class="o">=</span><span class="n">domain</span><span class="p">,</span>
<span class="n">locale_base</span><span class="o">=</span><span class="n">locale_base</span><span class="p">)</span>
<span class="n">add_fallback</span><span class="p">(</span><span class="n">translation</span><span class="p">)</span>
<span class="k">return</span> <span class="n">translation</span>
</code></pre></div>
<p>In this case, <code>sys.getfilesystemencoding</code> was returning <code>None</code>, which isn’t
a valid <code>encoding</code> argument to <code>unicode</code>. But why would that be? It gave
me a sensible result when I ran it from a Python shell in this environment.
A bit of head-scratching later and it occurred to me to look at a backtrace:</p>
<div class="highlight"><pre><span></span><code>(Pdb) bt
/home/cjwatson/src/canonical/launchpad/lp-branches/testfix/env/lib/python2.7/site.py(703)<module>()
-> main()
/home/cjwatson/src/canonical/launchpad/lp-branches/testfix/env/lib/python2.7/site.py(694)main()
-> execsitecustomize()
/home/cjwatson/src/canonical/launchpad/lp-branches/testfix/env/lib/python2.7/site.py(548)execsitecustomize()
-> import sitecustomize
/home/cjwatson/src/canonical/launchpad/lp-branches/testfix/env/lib/python2.7/sitecustomize.py(7)<module>()
-> lp_sitecustomize.main()
/home/cjwatson/src/canonical/launchpad/lp-branches/testfix/lib/lp_sitecustomize.py(193)main()
-> dont_wrap_bzr_branch_classes()
/home/cjwatson/src/canonical/launchpad/lp-branches/testfix/lib/lp_sitecustomize.py(139)dont_wrap_bzr_branch_classes()
-> import lp.codehosting
> /home/cjwatson/src/canonical/launchpad/lp-branches/testfix/lib/lp/codehosting/__init__.py(54)<module>()
-> load_plugins([_get_bzr_plugins_path()])
</code></pre></div>
<p>I wonder if there’s something interesting about being imported from a
<code>sitecustomize</code> hook? Sure enough, when I went to look at Python for where
<code>sys.getfilesystemencoding</code> is set up, I found this in <code>Py_InitializeEx</code>:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">Py_NoSiteFlag</span><span class="p">)</span>
<span class="w"> </span><span class="n">initsite</span><span class="p">();</span><span class="w"> </span><span class="cm">/* Module site */</span>
<span class="w"> </span><span class="p">...</span>
<span class="cp">#if defined(Py_USING_UNICODE) && defined(HAVE_LANGINFO_H) && defined(CODESET)</span>
<span class="w"> </span><span class="cm">/* On Unix, set the file system encoding according to the</span>
<span class="cm"> user's preference, if the CODESET names a well-known</span>
<span class="cm"> Python codec, and Py_FileSystemDefaultEncoding isn't</span>
<span class="cm"> initialized by other means. Also set the encoding of</span>
<span class="cm"> stdin and stdout if these are terminals, unless overridden. */</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">overridden</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="o">!</span><span class="n">Py_FileSystemDefaultEncoding</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>I <a href="https://code.launchpad.net/~cjwatson/launchpad/avoid-importing-bzr-plugins-from-site/+merge/335379">moved this out of
sitecustomize</a>,
and it’s working better now. But did you know that a <code>sitecustomize</code> hook
can’t safely use anything that depends on <code>sys.getfilesystemencoding</code>? I
certainly didn’t, until it bit me.</p>Kitten Block equivalent for Firefox 572017-11-16T19:15:48+00:002017-11-16T19:15:48+00:00Colin Watsontag:www.chiark.greenend.org.uk,2017-11-16:/~cjwatson/blog/kitten-block-equivalent-for-firefox-57.html<p>I’ve been using <a href="https://addons.mozilla.org/en-US/firefox/addon/kitten-block/">Kitten
Block</a> for
years, since I don’t really need the blood pressure spike caused by
accidentally following links to certain <span class="caps">UK</span> newspapers. Unfortunately it
hasn’t been ported to Firefox 57. I tried emailing the author a couple of
months ago, but my email bounced …</p><p>I’ve been using <a href="https://addons.mozilla.org/en-US/firefox/addon/kitten-block/">Kitten
Block</a> for
years, since I don’t really need the blood pressure spike caused by
accidentally following links to certain <span class="caps">UK</span> newspapers. Unfortunately it
hasn’t been ported to Firefox 57. I tried emailing the author a couple of
months ago, but my email bounced.</p>
<p>However, if your primary goal is just to block the websites in question
rather than seeing kitten pictures as such (let’s face it, the internet is
not short of alternative sources of kitten pictures), then it’s easy to do
with <a href="https://addons.mozilla.org/en-GB/firefox/addon/ublock-origin/">uBlock
Origin</a>.
After installing the extension if necessary, go to Tools → Add-ons →
Extensions → uBlock Origin → Preferences → My filters, and add
<code>www.dailymail.co.uk</code> and <code>www.express.co.uk</code>, each on its own line. (Of
course you can easily add more if you like.) Voilà: instant tranquility.</p>
<p>Incidentally, this also works fine on Android. The fact that it was easy to
install a good ad blocker without having to mess about with a rooted device
or strange proxy settings was the main reason I switched to Firefox on my phone.</p>A mysterious bug with Twisted plugins2017-09-26T11:20:14-04:002017-09-26T11:20:14-04:00Colin Watsontag:www.chiark.greenend.org.uk,2017-09-26:/~cjwatson/blog/mysterious-bug-with-twisted-plugins.html<p>I fixed a bug in Launchpad recently that led me deeper than I expected.</p>
<p>Launchpad uses <a href="http://www.buildout.org/">Buildout</a> as its build system for
Python packages, and it’s served us well for many years. However, we’re
using 1.7.1, which doesn’t support ensuring that packages required using
setuptools …</p><p>I fixed a bug in Launchpad recently that led me deeper than I expected.</p>
<p>Launchpad uses <a href="http://www.buildout.org/">Buildout</a> as its build system for
Python packages, and it’s served us well for many years. However, we’re
using 1.7.1, which doesn’t support ensuring that packages required using
setuptools’ <a href="https://setuptools.readthedocs.io/en/latest/setuptools.html#new-and-changed-setup-keywords">setup_requires
keyword</a>
only ever come from the local index <span class="caps">URL</span> when one is specified; that’s an
essential constraint we need to be able to impose so that our build system
isn’t immediately sensitive to downtime or changes in PyPI. There are
various issues/PRs about this in Buildout (e.g.
<a href="https://github.com/buildout/buildout/pull/238">#238</a>), but even if those
are fixed it’ll almost certainly only be in Buildout v2, and upgrading to
that is its own kettle of fish for other reasons. All this is a serious
problem for us because newer versions of many of our vital dependencies
(<a href="http://twistedmatrix.com/">Twisted</a> and
<a href="https://pypi.python.org/pypi/testtools">testtools</a>, to name but two) use
<code>setup_requires</code> to pull in <a href="https://pypi.python.org/pypi/pbr">pbr</a>, and so
we’ve been stuck on old versions for some time; this is part of why
Launchpad doesn’t yet support newer <span class="caps">SSH</span> key types, for instance. This
situation obviously isn’t sustainable.</p>
<p>To deal with this, I’ve been working for some time on switching to
<a href="https://virtualenv.pypa.io/en/stable/">virtualenv</a> and
<a href="https://pip.pypa.io/en/stable/">pip</a>. This is harder than you might think:
Launchpad is a long-lived and complicated project, and it had quite a number
of explicit and implicit dependencies on Buildout’s configuration and
behaviour. Upgrading our infrastructure from Ubuntu 12.04 to 16.04 has
helped a lot (12.04’s baseline virtualenv and pip have some deficiencies
that would have required a more complicated bootstrapping procedure). I’ve
dealt with most of these: for example, I had to reorganise a lot of our
helper scripts
(<a href="https://code.launchpad.net/~cjwatson/launchpad/simplify-buildout-bin-python-easy/+merge/314976">1</a>,
<a href="https://code.launchpad.net/~cjwatson/launchpad/simplify-buildout-bin-shell/+merge/314973">2</a>,
<a href="https://code.launchpad.net/~cjwatson/launchpad/simplify-buildout-bin-test/+merge/323743">3</a>),
but there are still a few more things to go.</p>
<p>One remaining problem was that our Buildout configuration relied on building
several different environments with different Python paths for various
things. While this would technically be possible by way of building
multiple virtualenvs, this would inflate our build time even further (we’re
already going to have to cope with some slowdown as a result of using
virtualenv, because the build system now has to do a lot more than
constructing a glorified link farm to a bunch of cached eggs), and it seems
like unnecessary complexity. The obvious thing to do seemed to be to
collapse these into a single environment, since there was no obvious reason
why it should actually matter if
<a href="https://pypi.python.org/pypi/txpkgupload">txpkgupload</a> and
<a href="https://pypi.python.org/pypi/txlongpoll">txlongpoll</a> were carefully kept
off the path when running most of Launchpad: so <a href="https://code.launchpad.net/~cjwatson/launchpad/simplify-buildout-recipes/+merge/330159">I did
that</a>.</p>
<p>Then our build system <a href="http://lpbuildbot.canonical.com/builders/lp-devel-precise/builds/1582/steps/shell_9/logs/summary">got very
sad</a>.</p>
<p>Hmm, I thought. To keep our test times somewhat manageable, we run them in
parallel across 20 containers, and we randomise the order in which they run
to try to shake out test isolation bugs. It’s not completely unknown for
there to be some oddities resulting from that. So I ran it again. <a href="http://lpbuildbot.canonical.com/builders/lp-devel-precise/builds/1583/steps/shell_9/logs/summary">Nope,
but slightly differently sad this
time</a>.
Furthermore, I couldn’t reproduce these failures locally no matter how hard
I tried. Oh dear. This was obviously not going to be a good day.</p>
<p>In fact I spent a while on various different guesswork-based approaches. I
found <a href="https://bugs.launchpad.net/ampoule/+bug/571334">bug 571334</a> in
Ampoule, an <span class="caps">AMP</span>-based process pool implementation that we use for some job
runners, and proposed a
<a href="https://code.launchpad.net/~cjwatson/ampoule/process-error-not-ready/+merge/330848">fix</a>
for that, but cherry-picking that fix into Launchpad didn’t help matters. I
tried backing out subsets of my changes and determined that if both
<code>txlongpoll</code> and <code>txpkgupload</code> were absent from the Python module path in
the context of the tests in question then everything was fine. I tried
running <code>strace</code> locally and staring at the output for some time in the hope
of enlightenment: that reminded me that the two packages in question install
modules under <code>twisted.plugins</code>, which did at least establish a reason they
might affect the environment that was more plausible than magic, but nothing
much more specific than that.</p>
<p>On Friday I was fiddling about with this again and trying to insert some
more debugging when I noticed some interesting behaviour around <a href="https://twistedmatrix.com/documents/current/core/howto/plugin.html#plugin-caching">plugin
caching</a>.
If I caused the <code>txpkgupload</code> plugin to raise an exception when loaded, the
Twisted plugin system would remove its <code>dropin.cache</code> (because it was stale)
and not create a new one (because there was now no content to put in it).
After that, running the relevant tests would fail as I’d seen in our
buildbot. Aha! This meant that I could also reproduce it by doing an even
cleaner build than I’d previously tried to do, by removing the cached
<code>txpkgupload</code> and <code>txlongpoll</code> eggs and allowing the build system to
recreate them. When they were recreated, they didn’t contain
<code>dropin.cache</code>, instead allowing that to be created on first use.</p>
<p>Based on this clue I was able to get to the answer relatively quickly.
Ampoule has a specialised bootstrapping sequence for its worker processes
that starts by doing this:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">twisted.application</span> <span class="kn">import</span> <span class="n">reactors</span>
<span class="n">reactors</span><span class="o">.</span><span class="n">installReactor</span><span class="p">(</span><span class="n">reactor</span><span class="p">)</span>
</code></pre></div>
<p>Now, <code>twisted.application.reactors.installReactor</code> calls
<code>twisted.plugin.getPlugins</code>, so the very start of this bootstrapping
sequence is going to involve loading all plugins found on the module path (I
assume it’s possible to write a plugin that adds an alternative reactor
implementation). If <code>dropin.cache</code> is up to date, then it will just get the
information it needs from that; but if it isn’t, it will go ahead and import
the plugin. If the plugin happens (as Twisted code often does) to run <code>from
twisted.internet import reactor</code> at some point while being imported, then
that will install the platform’s default reactor, and <em>then</em>
<code>twisted.application.reactors.installReactor</code> will raise
<code>ReactorAlreadyInstalledError</code>. Since Ampoule turns this into an info-level
log message for some reason, and the tests in question only passed through
error-level messages or higher, this meant that all we could see was that a
worker process had exited non-zero but not why.</p>
<p>The Twisted documentation
<a href="https://twistedmatrix.com/documents/current/core/howto/plugin.html#plugin-caching">recommends</a>
generating the plugin cache at build time for other reasons, but we weren’t
doing that. <a href="https://code.launchpad.net/~cjwatson/launchpad/build-twisted-plugin-cache/+merge/331240">Fixing
that</a>
makes everything work again.</p>
<p>There are still a few more things needed to get us onto pip, but we’re now
pretty close. After that we can finally start bringing our dependencies up
to date.</p>env —chdir2017-08-30T00:54:42+01:002017-08-30T00:54:42+01:00Colin Watsontag:www.chiark.greenend.org.uk,2017-08-30:/~cjwatson/blog/env-chdir.html<p>I was recently asked to sort things out so that
<a href="https://snapcraft.io/">snap</a> builds on <a href="https://launchpad.net/">Launchpad</a>
could themselves install snaps as build-dependencies. To make this work we
need to start doing builds in <a href="https://linuxcontainers.org/lxd/"><span class="caps">LXD</span>
containers</a> rather than in chroots. As a
result I’ve been doing some quite extensive refactoring of
<a href="https://launchpad.net/launchpad-buildd">launchpad-buildd …</a></p><p>I was recently asked to sort things out so that
<a href="https://snapcraft.io/">snap</a> builds on <a href="https://launchpad.net/">Launchpad</a>
could themselves install snaps as build-dependencies. To make this work we
need to start doing builds in <a href="https://linuxcontainers.org/lxd/"><span class="caps">LXD</span>
containers</a> rather than in chroots. As a
result I’ve been doing some quite extensive refactoring of
<a href="https://launchpad.net/launchpad-buildd">launchpad-buildd</a>: it previously
had the assumption that it was going to use a chroot for everything baked
into lots of untested helper shell scripts, and I’ve been rewriting those in
Python with unit tests and with a single <code>Backend</code> abstraction that isolates
the high-level logic from the details of where each build is being performed.</p>
<p>This is all interesting work in its own right, but it’s not what I want to
talk about here. While I was doing all this refactoring, I ran across a
couple of methods I wrote a while back which looked something like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">chroot</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">echo</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Run a command in the chroot.</span>
<span class="sd"> :param args: the command and arguments to run.</span>
<span class="sd"> """</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">set_personality</span><span class="p">(</span>
<span class="n">args</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">arch</span><span class="p">,</span> <span class="n">series</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">series</span><span class="p">)</span>
<span class="k">if</span> <span class="n">echo</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Running in chroot: </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span>
<span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s2">"'</span><span class="si">%s</span><span class="s2">'"</span> <span class="o">%</span> <span class="n">arg</span> <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args</span><span class="p">))</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">flush</span><span class="p">()</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">check_call</span><span class="p">([</span>
<span class="s2">"/usr/bin/sudo"</span><span class="p">,</span> <span class="s2">"/usr/sbin/chroot"</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">chroot_path</span><span class="p">]</span> <span class="o">+</span> <span class="n">args</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">run_build_command</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">env</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">echo</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Run a build command in the chroot.</span>
<span class="sd"> This is unpleasant because we need to run it in /build under sudo</span>
<span class="sd"> chroot, and there's no way to do this without either a helper</span>
<span class="sd"> program in the chroot or unpleasant quoting. We go for the</span>
<span class="sd"> unpleasant quoting.</span>
<span class="sd"> :param args: the command and arguments to run.</span>
<span class="sd"> :param env: dictionary of additional environment variables to set.</span>
<span class="sd"> """</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="n">shell_escape</span><span class="p">(</span><span class="n">arg</span><span class="p">)</span> <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args</span><span class="p">]</span>
<span class="k">if</span> <span class="n">env</span><span class="p">:</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"env"</span><span class="p">]</span> <span class="o">+</span> <span class="p">[</span>
<span class="s2">"</span><span class="si">%s</span><span class="s2">=</span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">shell_escape</span><span class="p">(</span><span class="n">value</span><span class="p">))</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">env</span><span class="o">.</span><span class="n">items</span><span class="p">()]</span> <span class="o">+</span> <span class="n">args</span>
<span class="n">command</span> <span class="o">=</span> <span class="s2">"cd /build && </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="s2">" "</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">chroot</span><span class="p">([</span><span class="s2">"/bin/sh"</span><span class="p">,</span> <span class="s2">"-c"</span><span class="p">,</span> <span class="n">command</span><span class="p">],</span> <span class="n">echo</span><span class="o">=</span><span class="n">echo</span><span class="p">)</span>
</code></pre></div>
<p>(I’ve already replaced the <code>chroot</code> method with a call to <code>Backend.run</code>, but
it’s easier to see what I’m talking about in the original form.)</p>
<p>One thing to notice about this code is that it uses several <em>adverbial</em>
commands: that is, commands that run another command in a different way.
For example, <code>sudo</code> runs another command as another user, while <code>chroot</code>
runs another command with a different root directory, and <code>env</code> runs another
command with different environment variables set. These commands chain
neatly, and they also have the useful property that they take the subsidiary
command and its arguments as a list of arguments. coreutils has <a href="https://www.gnu.org/software/coreutils/manual/html_node/Modified-command-invocation.html">several
other
commands</a>
that behave this way, and
<a href="http://www.greenend.org.uk/rjk/sw/adverbio.html">adverbio</a> is another
useful example.</p>
<p>By contrast, <code>su -c</code> is something you might call a “quasi-adverbial”
command: it does modify the behaviour of another command, but it takes it as
a single argument which it then passes to <code>sh -c</code>. Every time you have
something that’s passed to a shell like this, you need a corresponding layer
of shell quoting to escape any shell metacharacters that should be
interpreted literally. This is often cumbersome and is easy to get wrong.
My Python implementation is as follows, and I wouldn’t be totally surprised
to discover that it contained a bug:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">re</span>
<span class="n">non_meta_re</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^[a-zA-Z0-9+,./:=@_-]+$'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">shell_escape</span><span class="p">(</span><span class="n">arg</span><span class="p">):</span>
<span class="k">if</span> <span class="n">non_meta_re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="n">arg</span><span class="p">):</span>
<span class="k">return</span> <span class="n">arg</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="s2">"'</span><span class="si">%s</span><span class="s2">'"</span> <span class="o">%</span> <span class="n">arg</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">"'"</span><span class="p">,</span> <span class="s2">"'</span><span class="se">\\</span><span class="s2">''"</span><span class="p">)</span>
</code></pre></div>
<p>Python >= 3.3 has
<a href="https://docs.python.org/3/library/shlex#shlex.quote">shlex.quote</a>, which is
an improvement and we should probably use that instead, but it’s still
another thing to forget to call. This is why process-spawning libraries
such as Python’s <a href="https://docs.python.org/3/library/subprocess">subprocess</a>,
Perl’s <a href="http://perldoc.perl.org/functions/system.html">system</a> and
<a href="http://perldoc.perl.org/functions/open.html">open</a>, and my own
<a href="http://libpipeline.nongnu.org/">libpipeline</a> for C encourage programmers to
use a list syntax and to avoid involving the shell entirely wherever possible.</p>
<p>One thing that the standard Unix tools don’t let you do in an adverbial way
is to change your working directory, and I’ve run into this annoying
limitation several times. This means that it’s difficult to chain that
operation together with other adverbs, for example to run a command in a
particular working directory inside a chroot. The workaround I used above
was to invoke a shell that runs <code>cd /build && ...</code>, but that’s another
command that’s only quasi-adverbial, since the extra shell means an extra
layer of shell quoting.</p>
<p>(Ian Jackson rightly observes that you can in fact write the necessary
adverb as something like <code>sh -ec 'cd "$1"; shift; exec "$@"' chdir</code>. I
think that’s a bit uglier than I ideally want to use in production code, but
you might reasonably think that it’s worth it to avoid the extra layer of
shell quoting.)</p>
<p>I therefore decided that this was a feature that belonged in
<a href="https://www.gnu.org/software/coreutils/">coreutils</a>, and after <a href="https://lists.gnu.org/archive/html/coreutils/2017-08/msg00053.html">a bit of
mailing list
discussion</a>
we felt it was best implemented as a new option to
<a href="https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html">env(1)</a>.
I sent a patch for this which has been
<a href="https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=57dea5ed07471b2192cc5edf08993e663a3f6802">accepted</a>.
This means that we have a new composable adverb, <code>env --chdir=NEWDIR</code>, which
will allow the <code>run_build_command</code> method above to be rewritten as something
like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">run_build_command</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">env</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">echo</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Run a build command in the chroot.</span>
<span class="sd"> :param args: the command and arguments to run.</span>
<span class="sd"> :param env: dictionary of additional environment variables to set.</span>
<span class="sd"> """</span>
<span class="n">env_args</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"env"</span><span class="p">,</span> <span class="s2">"--chdir=/build"</span><span class="p">]</span>
<span class="k">if</span> <span class="n">env</span><span class="p">:</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">env</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">env_args</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s2">"</span><span class="si">%s</span><span class="s2">=</span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">chroot</span><span class="p">(</span><span class="n">env_args</span> <span class="o">+</span> <span class="n">args</span><span class="p">,</span> <span class="n">echo</span><span class="o">=</span><span class="n">echo</span><span class="p">)</span>
</code></pre></div>
<p>The <code>env --chdir</code> option will be in coreutils 8.28. We won’t be able to use
it in launchpad-buildd until that’s available in all Ubuntu series we might
want to build for, so in this particular application that’s going to take a
few years; but other applications may well be able to make use of it sooner.</p>New address book2017-06-27T12:57:27+01:002017-06-27T12:57:27+01:00Colin Watsontag:www.chiark.greenend.org.uk,2017-06-27:/~cjwatson/blog/new-address-book.html<p>I’ve had a kludgy mess of electronic address books for most of two decades,
and have got rather fed up with it. My stack consisted of:</p>
<ul>
<li><code>~/.mutt/aliases</code>, a flat text file consisting of <code>mutt</code> <code>alias</code> commands</li>
<li><a href="http://www.spinnaker.de/lbdb/">lbdb</a> configuration to query
<code>~/.mutt/aliases</code>, Debian’s <span class="caps">LDAP</span> database, and Canonical …</li></ul><p>I’ve had a kludgy mess of electronic address books for most of two decades,
and have got rather fed up with it. My stack consisted of:</p>
<ul>
<li><code>~/.mutt/aliases</code>, a flat text file consisting of <code>mutt</code> <code>alias</code> commands</li>
<li><a href="http://www.spinnaker.de/lbdb/">lbdb</a> configuration to query
<code>~/.mutt/aliases</code>, Debian’s <span class="caps">LDAP</span> database, and Canonical’s <span class="caps">LDAP</span> database,
so that I can search by name with Ctrl-t in <code>mutt</code> when composing a new message</li>
<li>Google Contacts, which I used from Android and was completely separate
from all of the above</li>
</ul>
<p>The biggest practical problem with this was that I had the address book that
was most convenient for me to add things to (Google Contacts) and the one I
used when sending email, and no sensible way to merge them or move things
between them. I also wasn’t especially comfortable with having all my
contact information in a proprietary web service.</p>
<p>My goals for a replacement address book system were:</p>
<ul>
<li>free software throughout</li>
<li>storage under my control</li>
<li>single common database</li>
<li>minimal manual transcription when consolidating existing databases</li>
<li>integration with Android such that I can continue using the same
contacts, messaging, etc. apps</li>
<li>integration with <code>mutt</code> such that I can continue using the same query interface</li>
<li>not having to write my own software, because honestly</li>
</ul>
<p>I think I have all this now!</p>
<h2>New stack</h2>
<p>The obvious basic technology to use is
<a href="https://en.wikipedia.org/wiki/CardDAV">CardDAV</a>: it’s fairly complex,
admittedly, but lots of software supports it and one of my goals was not
having to write my own thing. This meant I needed a CardDAV server, some
way to sync the database to and from both Android and the system where I run
<code>mutt</code>, and whatever query glue was necessary to get <code>mutt</code> to understand vCards.</p>
<p>There are lots of different alternatives here, and if anything the problem
was an embarrassment of choice. In the end I just decided to go for things
that looked roughly the right shape for me and tried not to spend too much
time in analysis paralysis.</p>
<h3>CardDAV server</h3>
<p>I went with <a href="https://www.jelmer.uk/xandikos-intro.html">Xandikos</a> for the
server, largely because I know Jelmer and have generally had pretty good
experiences with their software, but also because using Git for history of
the backend storage seems like something my future self will thank me for.</p>
<p>It isn’t packaged in stretch, but it’s in Debian unstable, so I installed it
from there.</p>
<p>Rather than the standalone mode suggested on the web page, I decided to set
it up in what felt like a more robust way using <span class="caps">WSGI</span>. I installed
<code>gunicorn</code> and <code>python3-gunicorn</code>, created the following file in
<code>/etc/systemd/system/xandikos.socket</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">[Unit]</span>
<span class="na">Description</span><span class="o">=</span><span class="s">Xandikos socket</span>
<span class="k">[Socket]</span>
<span class="na">ListenStream</span><span class="o">=</span><span class="s">/run/xandikos.socket</span>
<span class="k">[Install]</span>
<span class="na">WantedBy</span><span class="o">=</span><span class="s">sockets.target</span>
</code></pre></div>
<p>… and the following file in <code>/etc/systemd/system/xandikos.service</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">[Unit]</span>
<span class="na">Description</span><span class="o">=</span><span class="s">Xandikos CalDAV/CardDAV server</span>
<span class="na">Documentation</span><span class="o">=</span><span class="s">man:xandikos(1)</span>
<span class="na">Requires</span><span class="o">=</span><span class="s">xandikos.socket</span>
<span class="k">[Service]</span>
<span class="na">User</span><span class="o">=</span><span class="s">xandikos</span>
<span class="na">Group</span><span class="o">=</span><span class="s">xandikos</span>
<span class="na">Restart</span><span class="o">=</span><span class="s">on-failure</span>
<span class="na">ExecStart</span><span class="o">=</span><span class="s">/usr/bin/python3 /usr/bin/gunicorn --bind=unix:/run/xandikos.socket xandikos.wsgi:app</span>
<span class="na">ExecReload</span><span class="o">=</span><span class="s">/bin/kill -s HUP $MAINPID</span>
<span class="na">ExecStop</span><span class="o">=</span><span class="s">/bin/kill -s TERM $MAINPID</span>
<span class="na">Environment</span><span class="o">=</span><span class="s">XANDIKOSPATH=/srv/xandikos/collections</span>
<span class="na">ProtectSystem</span><span class="o">=</span><span class="s">strict</span>
<span class="na">ProtectKernelTunables</span><span class="o">=</span><span class="s">yes</span>
<span class="na">ProtectControlGroups</span><span class="o">=</span><span class="s">yes</span>
<span class="na">PrivateDevices</span><span class="o">=</span><span class="s">yes</span>
<span class="na">PrivateTmp</span><span class="o">=</span><span class="s">yes</span>
<span class="na">ReadWritePaths</span><span class="o">=</span><span class="s">/run/xandikos.socket /srv/xandikos</span>
</code></pre></div>
<p>The path (<code>/srv/xandikos/collections</code>) was arbitrary. You need to create
the <code>xandikos</code> user and group first (<code>adduser --system --group
--no-create-home --disabled-login xandikos</code>). I created <code>/srv/xandikos</code>
owned by <code>xandikos:xandikos</code> and mode 0700. You should also run <code>sudo -u
xandikos xandikos -d /srv/xandikos/collections --autocreate</code> and then Ctrl-c
it after a short time (I think it would be nicer if there were a way to <a href="https://bugs.debian.org/866093">ask
the <span class="caps">WSGI</span> wrapper to do this</a>). If you
aren’t using systemd then you can of course write equivalent init scripts instead.</p>
<p>For Apache setup, I kept it reasonably simple: I ran <code>a2enmod proxy_http</code>,
used <code>htpasswd</code> to create <code>/etc/apache2/xandikos.passwd</code> with a username and
password for myself, added a virtual host in
<code>/etc/apache2/sites-available/xandikos.conf</code>, and enabled it with <code>a2ensite
xandikos</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nt"><VirtualHost</span><span class="w"> </span><span class="s">*:443</span><span class="nt">></span>
<span class="w"> </span><span class="nb">ServerName</span><span class="w"> </span>xandikos.example.org
<span class="w"> </span><span class="nb">ServerAdmin</span><span class="w"> </span>me@example.org
<span class="w"> </span><span class="nb">ErrorLog</span><span class="w"> </span><span class="sx">/var/log/apache2/xandikos-error.log</span>
<span class="w"> </span><span class="nb">TransferLog</span><span class="w"> </span><span class="sx">/var/log/apache2/xandikos-access.log</span>
<span class="w"> </span><span class="nt"><Location</span><span class="w"> </span><span class="s">/</span><span class="nt">></span>
<span class="w"> </span><span class="nb">ProxyPass</span><span class="w"> </span><span class="s2">"unix:/run/xandikos.socket|http://xandikos.riva.dynamic.greenend.org.uk/"</span>
<span class="w"> </span><span class="nb">AuthType</span><span class="w"> </span>Basic
<span class="w"> </span><span class="nb">AuthName</span><span class="w"> </span><span class="s2">"Xandikos"</span>
<span class="w"> </span><span class="nb">AuthBasicProvider</span><span class="w"> </span>file
<span class="w"> </span><span class="nb">AuthUserFile</span><span class="w"> </span><span class="s2">"/etc/apache2/xandikos.passwd"</span>
<span class="w"> </span><span class="nb">Require</span><span class="w"> </span>valid-user
<span class="w"> </span><span class="nt"></Location></span>
<span class="nt"></VirtualHost></span>
</code></pre></div>
<p>You should of course adjust the <code>ProxyPass</code> line to match your own deployment.</p>
<p>Then <code>service apache2 reload</code>, set the new virtual host up with <a href="https://letsencrypt.org/">Let’s
Encrypt</a>, reloaded again, and off we go.</p>
<h3>Android integration</h3>
<p>I installed <a href="https://www.davx5.com/">DAVx⁵</a> from the Play Store: it cost a
few pounds, but I was <span class="caps">OK</span> with that since it’s GPLv3 and I’m happy to help
fund free software. I created two accounts, one for my existing Google
Contacts database (and in fact calendaring as well, although I don’t intend
to switch over to self-hosting that just yet), and one for the new Xandikos
instance. The Google setup was a bit fiddly because I have two-step
verification turned on so I had to create an app-specific password. The
Xandikos setup was straightforward: base <span class="caps">URL</span>, username, password, and done.</p>
<p>Since I didn’t completely trust the new setup yet, I followed what seemed
like the most robust option from the <a href="https://www.davx5.com/faq/existing-contacts-are-not-synced">DAVx⁵ contacts syncing
documentation</a>,
and used the stock contacts app to export my Google Contacts account to a
<code>.vcf</code> file and then import that into the appropriate DAVx⁵ account (which
showed up automatically). This seemed straightforward and everything got
pushed to Xandikos. There are some weird delays in syncing contacts that I
don’t entirely understand, but it all seems to get there in the end.</p>
<p><em>2019-06-13: Followed rename of DAVdroid to DAVx⁵. At the moment Google
Contacts support seems to be flaky at best; see the <a href="https://forums.bitfire.at/tags/google">DAVx⁵
forums</a> for tips.</em></p>
<h3>mutt integration</h3>
<p>First off I needed to sync the contacts. (In fact I happen to run <code>mutt</code> on
the same system where I run Xandikos at the moment, but I don’t want to rely
on that, and going through the CardDAV server means that I don’t have to
poke holes for myself using filesystem permissions.) I used
<a href="https://vdirsyncer.pimutils.org/">vdirsyncer</a> for this. In
<code>~/.vdirsyncer/config</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">[general]</span>
<span class="na">status_path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"~/.vdirsyncer/status/"</span>
<span class="k">[pair contacts]</span>
<span class="na">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"contacts_local"</span>
<span class="na">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"contacts_remote"</span>
<span class="na">collections</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">["from a", "from b"]</span>
<span class="k">[storage contacts_local]</span>
<span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"filesystem"</span>
<span class="na">path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"~/.contacts/"</span>
<span class="na">fileext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">".vcf"</span>
<span class="k">[storage contacts_remote]</span>
<span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"carddav"</span>
<span class="na">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"<Xandikos base URL>"</span>
<span class="na">username</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"<my username>"</span>
<span class="na">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"<my password>"</span>
</code></pre></div>
<p>Running <code>vdirsyncer discover</code> and <code>vdirsyncer sync</code> then synced everything
into <code>~/.contacts/</code>. I added an hourly <code>crontab</code> entry to run <code>vdirsyncer
-v WARNING sync</code>.</p>
<p>Next, I needed a command-line address book tool based on this.
<a href="https://github.com/scheibler/khard">khard</a> looked about right and is in
stretch, so I installed that. In <code>~/.config/khard/khard.conf</code> (this is
mostly just the example configuration, but I preferred to sort by first name
since not all my contacts have neat first/last names):</p>
<div class="highlight"><pre><span></span><code><span class="k">[addressbooks]</span>
<span class="k">[[contacts]]</span>
<span class="na">path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">~/.contacts/<UUID of my contacts collection>/</span>
<span class="k">[general]</span>
<span class="na">debug</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">no</span>
<span class="na">default_action</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">list</span>
<span class="na">editor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">vim</span>
<span class="na">merge_editor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">vimdiff</span>
<span class="k">[contact table]</span>
<span class="c1"># display names by first or last name: first_name / last_name</span>
<span class="na">display</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">first_name</span>
<span class="c1"># group by address book: yes / no</span>
<span class="na">group_by_addressbook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">no</span>
<span class="c1"># reverse table ordering: yes / no</span>
<span class="na">reverse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">no</span>
<span class="c1"># append nicknames to name column: yes / no</span>
<span class="na">show_nicknames</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">no</span>
<span class="c1"># show uid table column: yes / no</span>
<span class="na">show_uids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">yes</span>
<span class="c1"># sort by first or last name: first_name / last_name</span>
<span class="na">sort</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">first_name</span>
<span class="k">[vcard]</span>
<span class="c1"># extend contacts with your own private objects</span>
<span class="c1"># these objects are stored with a leading "X-" before the object name in the vcard files</span>
<span class="c1"># every object label may only contain letters, digits and the - character</span>
<span class="c1"># example:</span>
<span class="c1"># private_objects = Jabber, Skype, Twitter</span>
<span class="na">private_objects</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">Jabber, Skype, Twitter</span>
<span class="c1"># preferred vcard version: 3.0 / 4.0</span>
<span class="na">preferred_version</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">3.0</span>
<span class="c1"># Look into source vcf files to speed up search queries: yes / no</span>
<span class="na">search_in_source_files</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">no</span>
<span class="c1"># skip unparsable vcard files: yes / no</span>
<span class="na">skip_unparsable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">no</span>
</code></pre></div>
<p>Now <code>khard list</code> shows all my contacts. So far so good. Apparently there
are some <a href="https://github.com/scheibler/khard#khard">awkward vCard compatibility
issues</a> with creating or modifying
contacts from the <code>khard</code> end. I’ve tried adding one address from
<code>~/.mutt/aliases</code> using <code>khard</code> and it seems to at least minimally work for
me, but I haven’t explored this very much yet.</p>
<p>I had to install python3-vobject 0.9.4.1-1 from experimental to fix
<a href="https://github.com/eventable/vobject/issues/39">eventable/vobject#39</a>
saving certain vCard files.</p>
<p>Finally, <code>mutt</code> integration. I already had <code>set query_command="lbdbq '%s'"</code>
in <code>~/.muttrc</code>, and I wanted to keep that in place since I still wanted to
use <span class="caps">LDAP</span> querying as well. I had to write a very small amount of code for
this (perhaps I should contribute this to <code>lbdb</code> upstream?), in
<code>~/.lbdb/modules/m_khard</code>:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#! /bin/sh</span>
m_khard_query<span class="w"> </span><span class="o">()</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>khard<span class="w"> </span>email<span class="w"> </span>--parsable<span class="w"> </span>--remove-first-line<span class="w"> </span>--search-in-source-files<span class="w"> </span><span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span>
<span class="o">}</span>
</code></pre></div>
<p>My full <code>~/.lbdb/rc</code> now reads as follows (you probably won’t want the <span class="caps">LDAP</span>
stuff, but I’ve included it here for completeness):</p>
<div class="highlight"><pre><span></span><code>MODULES_PATH="$MODULES_PATH $HOME/.lbdb/modules"
METHODS='m_muttalias m_khard m_ldap'
LDAP_NICKS='debian canonical'
</code></pre></div>
<h2>Next steps</h2>
<p>I’ve deleted one account from Google Contacts just to make sure that
everything still works (e.g. I can still search for it when composing a new
message), but I haven’t yet deleted everything. I won’t be adding anything
new there though.</p>
<p>I need to push everything from <code>~/.mutt/aliases</code> into the new system. This
is only about 30 contacts so shouldn’t take too long.</p>
<p>Overall this feels like a big improvement! It wasn’t a trivial amount of
setup for just me, but it means I have both better usability for myself and
more independence from proprietary services, and I think I can add extra
users with much less effort if I need to.</p>
<h2>Postscript</h2>
<p>A day later and I’ve consolidated all my accounts from Google Contacts and
<code>~/.mutt/aliases</code> into the new system, with the exception of one group that
I had defined as a <code>mutt</code> alias and need to work out what to do with. This
all went smoothly.</p>
<p>I’ve filed the new <code>lbdb</code> module as
<a href="https://bugs.debian.org/866178">#866178</a>, and the <code>python3-vobject</code> bug as
<a href="https://bugs.debian.org/866181">#866181</a>.</p>The sad tale of CVE-2015-13362016-12-11T23:42:55+00:002016-12-11T23:42:55+00:00Colin Watsontag:www.chiark.greenend.org.uk,2016-12-11:/~cjwatson/blog/cve-2015-1336.html<p>Today I released man-db 2.7.6
(<a href="https://lists.nongnu.org/archive/html/man-db-announce/2016-12/msg00000.html">announcement</a>,
<a href="http://git.savannah.gnu.org/cgit/man-db.git/tree/NEWS?id=2.7.6"><span class="caps">NEWS</span></a>,
<a href="http://git.savannah.gnu.org/cgit/man-db.git/log/?h=2.7.6">git log</a>), and
uploaded it to Debian unstable. The major change in this release was a set
of fixes for two security vulnerabilities,
<a href="http://www.halfdog.net/Security/2015/SetgidDirectoryPrivilegeEscalation/">one</a>
of which affected all man-db installations since 2.3.12 (or 2.3.10-66 in
Debian), and …</p><p>Today I released man-db 2.7.6
(<a href="https://lists.nongnu.org/archive/html/man-db-announce/2016-12/msg00000.html">announcement</a>,
<a href="http://git.savannah.gnu.org/cgit/man-db.git/tree/NEWS?id=2.7.6"><span class="caps">NEWS</span></a>,
<a href="http://git.savannah.gnu.org/cgit/man-db.git/log/?h=2.7.6">git log</a>), and
uploaded it to Debian unstable. The major change in this release was a set
of fixes for two security vulnerabilities,
<a href="http://www.halfdog.net/Security/2015/SetgidDirectoryPrivilegeEscalation/">one</a>
of which affected all man-db installations since 2.3.12 (or 2.3.10-66 in
Debian), and <a href="http://www.halfdog.net/Security/2015/MandbSymlinkLocalRootPrivilegeEscalation/">the
other</a>
of which was specific to Debian and its derivatives.</p>
<p>It’s probably obvious from the dates here that this has not been my finest
hour in terms of responding to security issues in a timely fashion, and I
apologise for that. Some of this is just the usual life reasons, which I
shan’t bore you by reciting, but some of it has been that fixing this
properly in man-db was genuinely rather complicated and delicate. Since
I’ve previously advocated man-db over some of its competitors on the basis
of a better security posture, I think it behooves me to write up a longer description.</p>
<p>I took over maintaining man-db over fifteen years ago in slightly unexpected
circumstances (I got annoyed with its bug list and made a couple of
non-maintainer uploads, and then the previous maintainer
<a href="https://www.debian.org/News/2001/20010402b">died</a>, so I ended up taking
over both in Debian and upstream). I was a fairly new developer at the
time, and there weren’t a lot of people I could ask questions of, but I did
my best to recover as much of the history as I could and learn from it. One
thing that became clear very quickly, both from my own inspection and from
the bug list, was that most of the code had been written in a rather more
innocent time. It was absolutely riddled with dangerous uses of the shell,
poor temporary file handling, buffer overruns, and various common-or-garden
deficiencies of that kind. I spent several years reworking large swathes of
the codebase to be more robust against those kinds of bugs by design, and
for example <a href="http://libpipeline.nongnu.org/">libpipeline</a> came out of that effort.</p>
<p>The most subtle and risky set of problems came from the fact that the <code>man</code>
and <code>mandb</code> programs were installed set-user-id to the <code>man</code> user. Part of
this was so that <code>man</code> could maintain preformatted “cat pages”, and part of
it was so that users could run <code>mandb</code> if the system databases were out of
date (this is now much less useful since most package managers, including
<code>dpkg</code>, support some kind of trigger mechanism that can run <code>mandb</code> whenever
new system-level manual pages are installed). One of the first things I did
was to make this optional, and this has been a disabled-by-default <code>debconf</code>
option in Debian for a long time now. But it’s still a supported option and
is enabled by default upstream, and when running setuid <code>man</code> and <code>mandb</code>
need to take care to drop privileges when dealing with user-controlled data
and to write files with the appropriate ownership and permissions.</p>
<p>My predecessor had problems related to this such as
<a href="https://bugs.debian.org/26002">Debian #26002</a>, and one of the ways they
dealt with them was to make <code>/var/cache/man/</code> set-group-id root, in order
that files written to that directory would have consistent group ownership.
This always struck me as rather strange and I meant to do something about it
at some point, but until the first vulnerability report above I regarded it
as mainly a curiosity, since nothing in there was group-writeable anyway.
As a result, with the more immediate aim of making the system behave
consistently and dealing with bug reports, various bits of code had accreted
that assumed that <code>/var/cache/man/</code> would be <code>man:root 2755</code>, and not all of
it was immediately obvious.</p>
<p>This interacted with the second vulnerability report in two ways. Firstly,
at some level it caused it because I was dealing with the day-to-day
problems rather than thinking at a higher level: a
<a href="https://bugs.debian.org/129340">series</a>
<a href="https://bugs.debian.org/619726">of</a> <a href="https://bugs.debian.org/734063">bugs</a>
led me down the path of whacking problems over the head with a recursive
<code>chown</code> of <code>/var/cache/man/</code> from <code>cron</code>, rather than working out why things
got that way in the first place. Secondly, once I’d done that, I couldn’t
remove the <code>chown</code> without a much more extensive excursion into all the code
that dealt with cache files, for fear of reintroducing those bugs. So
although the fix for the second vulnerability is <a href="https://anonscm.debian.org/cgit/pkg-man-db/man-db.git/commit/?id=2f47ed4e682183f60f9aeed7f69f61e162019b20">very simple in
itself</a>,
I couldn’t get there without dealing with the first vulnerability.</p>
<p>In some ways, of course, cat pages are a bit of an anachronism. Most modern
systems can format pages quickly enough that it’s not much of an issue.
However, I’m loath to drop the feature entirely: I’m generally wary of
assuming that just because I have a fast system that everyone does. So,
instead, I
<a href="http://git.savannah.gnu.org/cgit/man-db.git/commit/?id=31552334cecee82809059ec598a37d9ea82683f0">did</a>
what I should have done years ago: make <code>man</code> and <code>mandb</code> set-group-id <code>man</code>
as well as set-user-id <code>man</code>, at which point we can simply make all the
cache files and directories be owned by <code>man:man</code> and drop the setgid bit on
cache directories. This should be simpler and less prone to
difficult-to-understand problems.</p>
<p>I expect that my next substantial upstream release will switch to
<code>--disable-setuid</code> by default to reduce exposure, though, and distributions
can start thinking about whether they want to follow that (Fedora already
does, for example). If this becomes widely disabled without complaints then
that would be good evidence that it’s reasonable to drop the feature
entirely. I’m not in a rush, but if you do need cat pages then now is a
good time to write to me and tell me why.</p>
<p>This is the fiddliest set of vulnerabilities I’ve dealt with in man-db for
quite some time, so I hope that if there are more then I can get back to my
previous quick response time.</p>No more “Hash Sum Mismatch” errors2016-04-08T15:06:03+01:002016-04-08T15:06:03+01:00Colin Watsontag:www.chiark.greenend.org.uk,2016-04-08:/~cjwatson/blog/no-more-hash-sum-mismatch-errors.html<p>The Debian repository format was designed a long time ago. The oldest
versions of it were produced with the help of tools such as
<code>dpkg-scanpackages</code> and consumed by <code>dselect</code> access methods such as
<code>dpkg-ftp</code>. The access methods just fetched a <code>Packages</code> file (perhaps
compressed) and used it as an index …</p><p>The Debian repository format was designed a long time ago. The oldest
versions of it were produced with the help of tools such as
<code>dpkg-scanpackages</code> and consumed by <code>dselect</code> access methods such as
<code>dpkg-ftp</code>. The access methods just fetched a <code>Packages</code> file (perhaps
compressed) and used it as an index of which packages were available; each
package had an <span class="caps">MD5</span> checksum to defend against transport errors, but being
from a more innocent age there was no repository signing or other protection
against man-in-the-middle attacks.</p>
<p>An important and intentional feature of the early format was that, apart
from the top-level <code>Packages</code> file, all other files were <em>static</em> in the
sense that, once published, their content would never change without also
changing the file name. This means that repositories can be efficiently
copied around using <code>rsync</code> without having to tell it to re-checksum all
files, and it avoids network races when fetching updates: the repository
you’re updating from might change in the middle of your update, but as long
as the repository maintenance software keeps superseded packages around for
a suitable grace period, you’ll still be able to fetch them.</p>
<p>The repository format evolved rather organically over time as different
needs arose, by what one might call distributed consensus among the
maintainers of the various client tools that consumed it. Of course all
sorts of fields were added to the index files themselves, which have an
extensible format so that this kind of thing is usually easy to do. At some
point a <code>Sources</code> index for source packages was added, which worked pretty
much the same way as <code>Packages</code> except for having a different set of fields.
But by far the most significant change to the repository structure was the
“package pools” project.</p>
<p>The original repository layout put the packages themselves under the
<code>dists/</code> tree along with the index files. The <code>dists/</code> tree is organised by
“suite” (modern examples of which would be “stable”, “stable-updates”,
“testing”, “unstable”, “xenial”, “xenial-updates”, and so on). This meant
that making a release of Debian tended to involve copying lots of data
around, and implementing the “testing” suite would have been very costly.
Package pools solved this problem by moving individual package files out of
<code>dists/</code> and into a new <code>pool/</code> tree, allowing those files to be shared
between multiple suites with only a negligible cost in disk space and mirror
bandwidth. From a database design perspective this is obviously much more
sensible. As part of this project, the original Debian “dinstall”
repository maintenance scripts were
<a href="https://lists.debian.org/debian-devel-announce/2000/10/msg00007.html">replaced</a>
by “da-katie” or “dak”, which among other things used a new <code>apt-ftparchive</code>
program to build the index files; this replaced <code>dpkg-scanpackages</code> and
<code>dpkg-scansources</code>, and included its own database cache which made a big
difference to performance at the scale of a distribution.</p>
<p>A few months after the initial implementation of package pools, <code>Release</code>
files were added. These formed a sort of meta-index for each suite, telling
<span class="caps">APT</span> which index files were available (<code>main/binary-i386/Packages</code>,
<code>non-free/source/Sources</code>, and so on) and what their checksums were.
Detached signatures were added alongside that (<code>Release.gpg</code>) so that it was
now possible to fetch packages securely given a public key for the
repository, and <a href="https://lists.debian.org/debian-devel/2003/12/msg01986.html">client-side verification
support</a> for
this eventually made its way into Debian and Ubuntu. The repository
structure stayed more or less like this for several years.</p>
<p>At some point along the way, those of us by now involved in repository
maintenance realised that an important property had been lost. I mentioned
earlier that the original format allowed race-free updates, but this was no
longer true with the introduction of the <code>Release</code> file. A client now had
to fetch <code>Release</code> and then fetch whichever other index files such as
<code>Packages</code> they wanted, typically in separate <span class="caps">HTTP</span> transactions. If a
client was unlucky, these transactions would fall on either side of a mirror
update and they’d get a “Hash Sum Mismatch” error from <span class="caps">APT</span>. Worse, if a
<em>mirror</em> was unlucky and also didn’t go to special lengths to verify index
integrity (most don’t), its own updates could span an update of its upstream
mirror and then all its clients would see mismatches until the next mirror
update. This was compounded by using detached signatures, so <code>Release</code> and
<code>Release.gpg</code> were fetched separately and could be out of sync.</p>
<p>Fixing this has been a long road (the first time I remember talking about
this was in late 2007!), and we’ve had to take care to maintain
client/server compatibility along the way. The first step was to add
inline-signed versions of the <code>Release</code> file, called <code>InRelease</code>, so that
there would no longer be a race between fetching <code>Release</code> and fetching its
signature. <span class="caps">APT</span> has had this for a while, Debian’s repository supports it as
of <code>stretch</code>, and we finally <a href="https://bugs.launchpad.net/launchpad/+bug/804252">implemented it for
Ubuntu</a> six months ago.
Dealing with the other index files is more complicated, though; it isn’t
sensible to inline them, as clients usually only need to fetch a small
fraction of all the indexes available for a given suite.</p>
<p>The solution we’ve ended up with, thanks to Michael Vogt’s work implementing
it in <span class="caps">APT</span>, is called
<a href="https://wiki.debian.org/RepositoryFormat#indices_acquisition_via_hashsums_.28by-hash.29">by-hash</a>
and should be familiar in concept to people who’ve used <code>git</code>: with the
exception of the top-level <code>InRelease</code> file, index files for suites that
support the by-hash mechanism may now be fetched using a <span class="caps">URL</span> based on one of
their hashes listed in <code>InRelease</code>. This means that clients can now operate
like this:</p>
<ul>
<li>Fetch <code>dists/xenial/InRelease</code></li>
<li>Fetch
<code>dists/xenial/main/binary-amd64/by-hash/SHA256/46316a202cdae76a73b555414741b11d08c66620b76c470a1623cedcc8a14740</code>
(and so on)</li>
<li>Fetch individual package files</li>
</ul>
<p>This is now <a href="https://bugs.launchpad.net/launchpad/+bug/1430011">enabled by default in
Ubuntu</a>. It’s only there
as of xenial (16.04), since earlier versions of Ubuntu don’t have the
necessary support in <span class="caps">APT</span>. With this, hash mismatches on updates should be a
thing of the past.</p>
<p>There will still be some people who won’t yet benefit from this.
<code>debmirror</code> doesn’t support by-hash yet; <code>apt-cacher-ng</code> only supports it as
of xenial, although there’s an <a href="https://bugs.debian.org/819852">easy configuration
workaround</a>. Full archive mirrors must make
sure that they put new by-hash files in place before new <code>InRelease</code> files
(I just fixed our <a href="https://wiki.ubuntu.com/Mirrors/Scripts">recommended two-stage sync
script</a> to do this;
<a href="https://launchpad.net/ubumirror">ubumirror</a> still needs some work; Debian’s
<a href="https://www.debian.org/mirror/ftpmirror#how">ftpsync</a> is almost correct but
needs a tweak for its handling of translation files, which I’ve sent to its
maintainers). Other mirrors and proxies that have specific handling of the
repository format may need similar changes.</p>
<p>Please let me know if you see strange things happening as a result of this
change. It’s useful to check the output of <code>apt -o
Debug::Acquire::http=true update</code> to see exactly what requests are being issued.</p>Re-signing PPAs2016-03-30T10:20:32+01:002016-03-30T10:20:32+01:00Colin Watsontag:www.chiark.greenend.org.uk,2016-03-30:/~cjwatson/blog/re-signing-ppas.html<p>Julian has
<a href="https://juliank.wordpress.com/2016/03/14/dropping-sha-1-support-in-apt/">written</a>
about their efforts to strengthen security in <span class="caps">APT</span>, and shortly before that
<a href="https://bugs.launchpad.net/bugs/1556666">notified</a> us that Launchpad’s
signatures on <acronym title="Personal Package Archives">PPAs</acronym> use
weak <span class="caps">SHA</span>-1 digests. Unfortunately we hadn’t noticed that before; GnuPG’s
defaults tend to result in weak digests unless carefully tweaked, which is a …</p><p>Julian has
<a href="https://juliank.wordpress.com/2016/03/14/dropping-sha-1-support-in-apt/">written</a>
about their efforts to strengthen security in <span class="caps">APT</span>, and shortly before that
<a href="https://bugs.launchpad.net/bugs/1556666">notified</a> us that Launchpad’s
signatures on <acronym title="Personal Package Archives">PPAs</acronym> use
weak <span class="caps">SHA</span>-1 digests. Unfortunately we hadn’t noticed that before; GnuPG’s
defaults tend to result in weak digests unless carefully tweaked, which is a shame.</p>
<p>I started on the necessary fixes for this immediately we heard of the
problem, but it’s taken a little while to get everything in place, and I
thought I’d explain why since some of the problems uncovered are interesting
in their own right.</p>
<p>Firstly, there was the relatively trivial matter of <a href="https://code.launchpad.net/~cjwatson/launchpad/digest-algo-sha512/+merge/289052">using <span class="caps">SHA</span>-512 digests
on new
signatures</a>.
This was mostly a matter of adjusting our configuration, although writing
the test was a bit tricky since
<a href="https://pypi.python.org/pypi/pygpgme">PyGPGME</a> isn’t as helpful as it could
be. (Simpler repository implementations that call <code>gpg</code> from the command
line should probably just add the <code>--digest-algo SHA512</code> option instead of
imitating this.)</p>
<p>After getting that in place, any change to a suite in a <span class="caps">PPA</span> will result in
it being re-signed with <span class="caps">SHA</span>-512, which is good as far as it goes, but we
also want to re-sign PPAs that haven’t been modified. Launchpad hosts more
than 50000 active PPAs, though, a significant percentage of which include
packages for sufficiently recent Ubuntu releases that we’d want to re-sign
them for this. We can’t expect everyone to push new uploads, and we need to
run this through at least some part of our usual publication machinery
rather than just writing a hacky shell script to do the job (which would
have no idea which keys to sign with, to start with); but forcing full
reprocessing of all those PPAs would take a prohibitively long time, and at
the moment we need to interrupt normal <span class="caps">PPA</span> publication to do this kind of
work. I therefore had to spend some quality time working out how to make
things go fast enough.</p>
<p>The first couple of changes
(<a href="https://code.launchpad.net/~cjwatson/launchpad/publish-distro-careful-release/+merge/289401">1</a>,
<a href="https://code.launchpad.net/~cjwatson/launchpad/publish-distro-disable-steps/+merge/289658">2</a>)
were to add options to our publisher script to let us run just the one step
we need in “careful” mode: that is, forcibly re-run the <code>Release</code> file
processing step even if it thinks nothing has changed, and entirely disable
the other steps such as generating <code>Packages</code> and <code>Sources</code> files. Then
last week I finally got around to timing things on one of our staging
systems so that we could estimate how long a full run would take. It was
taking a little over two seconds per archive, which meant that if we were to
re-sign all published PPAs then that would take more than 33 hours!
Obviously this wasn’t viable; even just re-signing xenial would be
prohibitively slow.</p>
<p>The next question was where all that time was going. I thought perhaps that
the actual signing might be slow for some reason, but it was taking about
half a second per archive: not great, but not enough to account for most of
the slowness. The main part of the delay was in fact when we committed the
database transaction after processing each archive, but not in the actual
PostgreSQL commit, rather in the <acronym title="object-relational
mapper"><span class="caps">ORM</span></acronym> <code>invalidate</code> method called to prepare for a commit.</p>
<p>Launchpad uses the excellent <a href="https://storm.canonical.com/">Storm</a> for all
of its database interactions. One property of this <span class="caps">ORM</span> (and possibly of
others; I’ll cheerfully admit to not having spent much time with other ORMs)
is that it uses a
<a href="https://docs.python.org/2/library/weakref.html#weakref.WeakValueDictionary">WeakValueDictionary</a>
to keep track of the objects it’s populated with database results. Before
it commits a transaction, it iterates over all those “alive” objects to note
that if they’re used in future then information needs to be reloaded from
the database first. Usually this is a very good thing: it saves us from
having to think too hard about data consistency at the application layer.
But in this case, one of the things we did at the start of the publisher
script was:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">getPPAs</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">distribution</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Find private package archives for the selected distribution."""</span>
<span class="k">if</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">isCareful</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">careful_publishing</span><span class="p">)</span> <span class="ow">or</span>
<span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">include_non_pending</span><span class="p">):</span>
<span class="k">return</span> <span class="n">distribution</span><span class="o">.</span><span class="n">getAllPPAs</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">distribution</span><span class="o">.</span><span class="n">getPendingPublicationPPAs</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">getTargetArchives</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">distribution</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Find the archive(s) selected by the script's options."""</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">partner</span><span class="p">:</span>
<span class="k">return</span> <span class="p">[</span><span class="n">distribution</span><span class="o">.</span><span class="n">getArchiveByComponent</span><span class="p">(</span><span class="s1">'partner'</span><span class="p">)]</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">ppa</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">filter</span><span class="p">(</span><span class="n">is_ppa_public</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">getPPAs</span><span class="p">(</span><span class="n">distribution</span><span class="p">))</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">private_ppa</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">filter</span><span class="p">(</span><span class="n">is_ppa_private</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">getPPAs</span><span class="p">(</span><span class="n">distribution</span><span class="p">))</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">copy_archive</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">getCopyArchives</span><span class="p">(</span><span class="n">distribution</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="p">[</span><span class="n">distribution</span><span class="o">.</span><span class="n">main_archive</span><span class="p">]</span>
</code></pre></div>
<p>That innocuous-looking <code>filter</code> means that we do all the public/private
filtering of PPAs up-front and return a list of all the PPAs we intend to
operate on. This means that all those objects are alive as far as Storm is
concerned and need to be considered for invalidation on every commit, and
the time required for that stacks up when many thousands of objects are
involved: this is essentially <a href="http://accidentallyquadratic.tumblr.com/">accidentally
quadratic</a> behaviour, because all
archives are considered when committing changes to each archive in turn.
Normally this isn’t too bad because only a few hundred PPAs need to be
processed in any given run; but if we’re running in a mode where we’re
processing all PPAs rather than just ones that are pending publication, then
suddenly this balloons to the point where it takes a couple of seconds. The
<a href="https://code.launchpad.net/~cjwatson/launchpad/publish-distro-many-ppas/+merge/289925">fix</a>
is very simple, using an
<a href="https://docs.python.org/2/library/stdtypes.html#typeiter">iterator</a> instead
so that we don’t need to keep all the objects alive:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">ifilter</span>
<span class="k">def</span> <span class="nf">getTargetArchives</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">distribution</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Find the archive(s) selected by the script's options."""</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">partner</span><span class="p">:</span>
<span class="k">return</span> <span class="p">[</span><span class="n">distribution</span><span class="o">.</span><span class="n">getArchiveByComponent</span><span class="p">(</span><span class="s1">'partner'</span><span class="p">)]</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">ppa</span><span class="p">:</span>
<span class="k">return</span> <span class="n">ifilter</span><span class="p">(</span><span class="n">is_ppa_public</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">getPPAs</span><span class="p">(</span><span class="n">distribution</span><span class="p">))</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">private_ppa</span><span class="p">:</span>
<span class="k">return</span> <span class="n">ifilter</span><span class="p">(</span><span class="n">is_ppa_private</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">getPPAs</span><span class="p">(</span><span class="n">distribution</span><span class="p">))</span>
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">copy_archive</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">getCopyArchives</span><span class="p">(</span><span class="n">distribution</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="p">[</span><span class="n">distribution</span><span class="o">.</span><span class="n">main_archive</span><span class="p">]</span>
</code></pre></div>
<p>After that, I turned to that half a second for signing. A good chunk of
that was accounted for by the <code>signContent</code> method taking a fingerprint
rather than a key, despite the fact that we normally already had the key in
hand; this caused us to have to ask <span class="caps">GPGME</span> to reload the key, which requires
two subprocess calls. Converting this to <a href="https://code.launchpad.net/~cjwatson/launchpad/faster-gpg-operations/+merge/289950">take a key rather than a
fingerprint</a>
gets the per-archive time down to about a quarter of a second on our staging
system, about eight times faster than where we started.</p>
<p>Using this, we’ve now re-signed all xenial <code>Release</code> files in PPAs using
<span class="caps">SHA</span>-512 digests. On production, this took about 80 minutes to iterate over
around 70000 archives, of which 1761 were modified. Most of the time
appears to have been spent skipping over unmodified archives; even a few
hundredths of a second per archive adds up quickly there. The remaining
time comes out to around 0.4 seconds per modified archive. There’s
certainly still room for speeding this up a bit.</p>
<p>We wouldn’t want to do this procedure every day, but it’s acceptable for
occasional tasks like this. I expect that we’ll similarly re-sign wily,
vivid, and trusty <code>Release</code> files soon in the same way.</p>SSH SHA-2 support in Twisted2015-12-02T20:42:25+00:002015-12-02T20:42:25+00:00Colin Watsontag:www.chiark.greenend.org.uk,2015-12-02:/~cjwatson/blog/ssh-sha-2-support-in-twisted.html<p>Launchpad operates a few <span class="caps">SSH</span> endpoints: <code>bazaar.launchpad.net</code> and
<code>git.launchpad.net</code> for code hosting, and <code>upload.ubuntu.com</code> and
<code>ppa.launchpad.net</code> for uploading packages. None of these are
straightforward OpenSSH servers, because they don’t give ordinary shell
access and they authenticate against users’ <span class="caps">SSH</span> keys recorded …</p><p>Launchpad operates a few <span class="caps">SSH</span> endpoints: <code>bazaar.launchpad.net</code> and
<code>git.launchpad.net</code> for code hosting, and <code>upload.ubuntu.com</code> and
<code>ppa.launchpad.net</code> for uploading packages. None of these are
straightforward OpenSSH servers, because they don’t give ordinary shell
access and they authenticate against users’ <span class="caps">SSH</span> keys recorded in Launchpad;
both of these are much easier to do with <span class="caps">SSH</span> server code that we can use in
library form as part of another service. We use
<a href="https://pypi.python.org/pypi/Twisted">Twisted</a> for several other tasks
where we need event-based networking code, and its
<a href="https://twistedmatrix.com/trac/wiki/TwistedConch">conch</a> package is a good
fit for this.</p>
<p>Of course, this means that it’s important that conch keeps up to date with
the cryptographic state of the art in other <span class="caps">SSH</span> implementations, and this
hasn’t always been the case. OpenSSH 7.0 <a href="http://www.openssh.com/txt/release-7.0">dropped support for some old
algorithms</a>, including disabling the
1024-bit <code>diffie-hellman-group1-sha1</code> key exchange method at run-time.
Unfortunately, this also happened to be the only key exchange method that
Launchpad’s <span class="caps">SSH</span> endpoints supported (conch supported the slightly better
<code>diffie-hellman-group-exchange-sha1</code> method as well, but that was disabled
in Launchpad due to a missing piece of configuration). <a href="https://bugs.launchpad.net/bugs/1445619"><span class="caps">SHA</span>-2
support</a> was clearly called for,
and the fact that we had to get this sorted out in conch first meant that
everything took a bit longer than we’d hoped.</p>
<p>In <a href="https://twistedmatrix.com/pipermail/twisted-python/2015-November/029993.html">Twisted
15.5</a>,
we contributed support for several conch improvements:</p>
<ul>
<li><a href="https://twistedmatrix.com/trac/ticket/7717">diffie-hellman-group14-sha1 key
exchange</a> (mostly by Ian
Moore, finished off by me)</li>
<li><a href="https://twistedmatrix.com/trac/ticket/7672">diffie-hellman-group-exchange-sha256 key exchange</a></li>
<li><a href="https://twistedmatrix.com/trac/ticket/8108">hmac-sha2-256 and hmac-sha2-512 MACs</a></li>
</ul>
<p>Between them and with some adjustments to the
<a href="https://pypi.python.org/pypi/lazr.sshserver">lazr.sshserver</a> package we use
to glue all this together to add support for <span class="caps">DH</span> group exchange, these are
enough to allow us not to rely on <span class="caps">SHA</span>-1 at all, and these improvements have
now been rolled out to all four endpoints listed above. I’ve thus also
uploaded OpenSSH 7.1 packages to Debian unstable.</p>
<p>If you also run a Twisted-based <span class="caps">SSH</span> server, upgrade it now! Otherwise it
will be <a href="http://www.openssh.com/legacy.html">harder</a> for users of recent
OpenSSH client versions to use your server, and for good reason.</p>Moving on, but not too far2014-10-26T18:54:34-04:002014-10-26T18:54:34-04:00Colin Watsontag:www.chiark.greenend.org.uk,2014-10-26:/~cjwatson/blog/moving-on-but-not-too-far.html<p>The <a href="http://www.ubuntu.com/about/about-ubuntu/conduct">Ubuntu Code of
Conduct</a> says:</p>
<blockquote>
<p><strong>Step down considerately</strong>: When somebody leaves or disengages from the
project, we ask that they do so in a way that minimises disruption to the
project. They should tell people they are leaving and take the proper
steps to ensure that others can pick …</p></blockquote><p>The <a href="http://www.ubuntu.com/about/about-ubuntu/conduct">Ubuntu Code of
Conduct</a> says:</p>
<blockquote>
<p><strong>Step down considerately</strong>: When somebody leaves or disengages from the
project, we ask that they do so in a way that minimises disruption to the
project. They should tell people they are leaving and take the proper
steps to ensure that others can pick up where they left off.</p>
</blockquote>
<p>I’ve been working on Ubuntu for over ten years now, almost right from the
very start; I’m Canonical’s employee #17 due to working out a notice period
in my previous job, but I was one of the founding group of developers. I
occasionally tell the story that Mark originally hired me mainly to work on
what later became Launchpad Bugs due to my experience maintaining the Debian
bug tracking system, but then not long afterwards Jeff Waugh got in touch
and said “hey Colin, would you mind just sorting out some installable <span class="caps">CD</span>
images for us?”. This is where you imagine one of those movie time-lapse
clocks … At some point it became fairly clear that I was working on
Ubuntu, and the bug system work fell to other people. Then, when Matt
Zimmerman could no longer manage the entire Ubuntu team in Canonical by
himself, Scott James Remnant and I stepped up to help him out. I did that
for a couple of years, starting the Foundations team in the process. As the
team grew I found that my interests really lay in hands-on development
rather than in management, so I switched over to being the technical lead
for Foundations, and have made my home there ever since. Over the years
this has given me the opportunity to do all sorts of things, particularly
working on our installers and on the <span class="caps">GRUB</span> boot loader, leading the
development work on many of our archive maintenance tools, instituting the
+1 maintenance effort and proposed-migration, and developing the Click
package manager, and I’ve had the great pleasure of working with many
exceptionally talented people.</p>
<p>However. In recent months I’ve been feeling a general sense of malaise and
what I’ve come to recognise with hindsight as the symptoms of approaching
burnout. I’ve been working long hours for a long time, and while I can draw
on a lot of experience by now, it’s been getting harder to summon the
enthusiasm and creativity to go with that. I have a wonderful wife, amazing
children, and lovely friends, and I want to be able to spend a bit more time
with them. After ten years doing the same kinds of things, I’ve accreted
history with and responsibility for a lot of projects. One of the things I
always loved about Foundations was that it’s a broad church, covering a wide
range of software and with a correspondingly wide range of opportunities;
but, over time, this has made it difficult for me to focus on things that
are important because there are so many areas where I might be called upon
to help. I thought about simply stepping down from the technical lead
position and remaining in the same team, but I decided that that wouldn’t
make enough of a difference to what matters to me. I need a clean break and
an opportunity to reset my habits before I burn out for real.</p>
<p>One of the things that has consistently held my interest through all of this
has been making sure that the infrastructure for Ubuntu keeps running
reliably and that other developers can work efficiently. As part of this,
I’ve been able to do <a href="https://dev.launchpad.net/Contributions#colin_watson">a lot of
work</a> over the years
on <a href="https://launchpad.net/">Launchpad</a> where it was a good fit with my
remit: this has included significant performance improvements to archive
publishing, moving most archive administration operations from
excessively-privileged command-line operations to the webservice, making
build cancellation reliable across the board, and moving live filesystem
building from an unscalable ad-hoc collection of machines into the Launchpad
build farm. The Launchpad development team has generally welcomed help with
open arms, and in fact I joined the <a href="https://launchpad.net/~launchpad">~launchpad
team</a> last year.</p>
<p>So, the logical next step for me is to make this informal involvement
permanent. As such, at the end of this year I will be moving from Ubuntu
Foundations to the Launchpad engineering team.</p>
<p>This doesn’t mean me leaving Ubuntu. Within Canonical, Launchpad
development is currently organised under the Continuous Integration team,
which is part of Ubuntu Engineering. I’ll still be around in more or less
the usual places and available for people to ask me questions. But I will
in general be trying to reduce my involvement in Ubuntu proper to things
that are closely related to the operation of Launchpad, and a small number
of low-effort things that I’m interested enough in to find free time for
them. I still need to sort out a lot of details, but it’ll very likely
involve me handing over project leadership of Click, drastically reducing my
involvement in the installer, and looking for at least some help with boot
loader work, among others. I don’t expect my Debian involvement to change,
and I may well find myself more motivated there now that it won’t be so
closely linked with my day job, although it’s possible that I will pare some
things back that I was mostly doing on Ubuntu’s behalf. If you ask me for
help with something over the next few months, expect me to be more likely to
direct you to other people or suggest ways you can help yourself out, so
that I can start disentangling myself from my current web of projects.</p>
<p>Please contact me sooner or later if you’re interested in helping out with
any of the things I’m visible in right now, and we can see what makes sense.
I’m looking forward to this!</p>Porting GHC: A Tale of Two Architectures2014-04-15T02:36:01+01:002014-04-15T02:36:01+01:00Colin Watsontag:www.chiark.greenend.org.uk,2014-04-15:/~cjwatson/blog/porting-ghc-a-tale-of-two-architectures.html<p>We had
<a href="https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2013-December/014795.html">some</a>
<a href="https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2014-March/014907.html">requests</a>
to get <a href="http://www.haskell.org/ghc/"><span class="caps">GHC</span></a> (the Glasgow Haskell Compiler) up
and running on two new Ubuntu architectures:
<acronym title="64-bit ARM, a.k.a. aarch64">arm64</acronym>, added in 13.10,
and <acronym title="little-endian 64-bit PowerPC">ppc64el</acronym>, added
in 14.04. This has been something of a saga, and has involved rather more
late-night hacking than is probably good for me …</p><p>We had
<a href="https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2013-December/014795.html">some</a>
<a href="https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2014-March/014907.html">requests</a>
to get <a href="http://www.haskell.org/ghc/"><span class="caps">GHC</span></a> (the Glasgow Haskell Compiler) up
and running on two new Ubuntu architectures:
<acronym title="64-bit ARM, a.k.a. aarch64">arm64</acronym>, added in 13.10,
and <acronym title="little-endian 64-bit PowerPC">ppc64el</acronym>, added
in 14.04. This has been something of a saga, and has involved rather more
late-night hacking than is probably good for me.</p>
<h2>Book the First: Recalled to a life of strange build systems</h2>
<p>You might not know it from the sheer bulk of uploads I do sometimes, but I
actually don’t speak a word of Haskell and it’s not very high up my list of
things to learn. But I am a pretty experienced build engineer, and I enjoy
porting things to new architectures: I’m firmly of the belief that breadth
of architecture support is a good way to shake out certain categories of
issues in code, that it’s worth doing aggressively across an entire
distribution, and that, even if you don’t think you need something now, new
requirements have a habit of coming along when you least expect them and you
might as well be prepared in advance. Furthermore, it annoys me when we
have excessive noise in our <a href="http://qa.ubuntuwire.com/ftbfs/">build failure</a>
and <a href="https://wiki.ubuntu.com/ProposedMigration">proposed-migration</a> output
and I often put bits and pieces of spare time into gardening miscellaneous
problems there, and at one point there was a lot of Haskell stuff on the
list and it got a bit annoying to have to keep sending patches rather than
just fixing things myself, and … well, I ended up as probably the only
non-Haskell-programmer on the Debian Haskell team and found myself fixing
problems there in my free time. Life is a bit weird sometimes.</p>
<p>Bootstrapping packages on a new architecture is a bit of a black art that
only a fairly small number of relatively bitter and twisted people know very
much about. Doing it in Ubuntu is specifically painful because we’ve always
forbidden direct binary uploads: all binaries have to come from a build
daemon. Compilers in particular often tend to be written in the language
they compile, and it’s not uncommon for them to build-depend on themselves:
that is, you need a previous version of the compiler to build the compiler,
stretching back to the dawn of time where somebody put things together with
a big magnet or something. So how do you get started on a new architecture?
Well, what we do in this case is we construct a binary somehow (usually
involving cross-compilation) and insert it as a build-dependency for a
proper build in Launchpad. The ability to do this is restricted to a small
group of Canonical employees, partly because it’s very easy to make mistakes
and partly because things like the classic “<a href="http://cm.bell-labs.com/who/ken/trust.html">Reflections on Trusting
Trust</a>” are in the backs of our
minds somewhere. We have an iron rule for our own sanity that the injected
build-dependencies must themselves have been built from the unmodified
source package in Ubuntu, although there can be source modifications further
back in the chain. Fortunately, we don’t need to do this very often, but it
does mean that as somebody who can do it I feel an obligation to try and
unblock other people where I can.</p>
<p>As far as constructing those build-dependencies goes, sometimes we look for
binaries built by other distributions (particularly Debian), and that’s
pretty straightforward. In this case, though, these two architectures are
pretty new and the Debian ports are only just getting going, and as far as I
can tell none of the other distributions with active arm64 or ppc64el ports
(or trivial name variants) has got as far as porting <span class="caps">GHC</span> yet. Well, <span class="caps">OK</span>.
This was somewhere around the Christmas holidays and I had some time.
Muggins here cracks his knuckles and decides to have a go at bootstrapping
it from scratch. It can’t be that hard, right? Not to mention that it was
a blocker for over 600 entries on that build failure list I mentioned, which
is definitely enough to make me sit up and take notice; we’d even had the
odd customer request for it.</p>
<p>Several attempts later and I was starting to doubt my sanity, not least for
trying in the first place. We ship <span class="caps">GHC</span> 7.6, and upgrading to 7.8 is not a
project I’d like to tackle until the much more experienced Haskell folks in
Debian have switched to it in unstable. The <a href="https://ghc.haskell.org/trac/ghc/wiki/Building/Porting">porting documentation for
7.6</a> has bitrotted
more or less beyond usability, and the <a href="https://ghc.haskell.org/trac/ghc/wiki/CrossCompilation">corresponding documentation for
7.8</a> really isn’t
backportable to 7.6. I tried building 7.8 for ppc64el anyway, picking that
on the basis that we had quicker hardware for it and didn’t seem likely to
be particularly more arduous than arm64 (ho ho), and I even got to the point
of having a cross-built stage2 compiler (stage1, in the cross-building case,
is a <span class="caps">GHC</span> binary that runs on your starting architecture and generates code
for your target architecture) that I could copy over to a ppc64el box and
try to use as the base for a fully-native build, but it segfaulted
incomprehensibly just after spawning any child process. Compilers tend to
do rather a lot, especially when they’re built to use <span class="caps">GCC</span> to generate object
code, so this was a pretty serious problem, and it resisted analysis. I
poked at it for a while but didn’t get anywhere, and I had other things to
do so declared it a write-off and gave up.</p>
<h2>Book the Second: The golden thread of progress</h2>
<p>In March, another mailing list conversation prodded me into finding a <a href="https://ghcarm.wordpress.com/2014/01/18/unregisterised-ghc-head-build-for-arm64-platform/">blog
entry by Karel
Gardas</a>
on building <span class="caps">GHC</span> for arm64. This was enough to be worth another look, and
indeed it turned out that (with some help from Karel in private mail) I was
able to cross-build a compiler that actually worked and could be used to run
a fully-native build that also worked. Of course this was 7.8, since as I
mentioned cross-building 7.6 is unrealistically difficult unless you’re
considerably more of an expert on <span class="caps">GHC</span>’s labyrinthine build system than I am.
<span class="caps">OK</span>, no problem, right? Getting a <span class="caps">GHC</span> at all is the hard bit, and 7.8 must
be at least as capable as 7.6, so it should be able to build 7.6 easily
enough …</p>
<p>Not so much. What I’d missed here was that compiler engineers generally
only care very much about building the compiler with <em>older</em> versions of
itself, and if the language in question has any kind of deprecation cycle
then the compiler itself is likely to be behind on various things compared
to more typical code since it has to be buildable with older versions. This
means that the removal of some deprecated interfaces from 7.8 posed a
problem, as did some changes in certain <acronym title="primitive
operations">primops</acronym> that had gained an associated compatibility
layer in 7.8 but nobody had gone back to put the corresponding compatibility
layer into 7.6. <span class="caps">GHC</span> supports running Haskell code through the C
preprocessor, and there’s a <code>__GLASGOW_HASKELL__</code> definition with the
compiler’s version number, so this was just a slog tracking down changes in
git and adding <code>#ifdef</code>-guarded code that coped with the newer compiler
(remembering that stage1 will be built with 7.8 and stage2 with stage1, i.e.
7.6, from the same source tree). More inscrutably, <span class="caps">GHC</span> has its own
packaging system called Cabal which is also used by the compiler build
process to determine which subpackages to build and how to link them against
each other, and some crucial subpackages weren’t being built: it looked like
it was stuck on picking versions from “stage0” (i.e. the initial compiler
used as an input to the whole process) when it should have been building its
own. Eventually I figured out that this was because <span class="caps">GHC</span>’s use of its
packaging system hadn’t anticipated this case, and was selecting the higher
version of the <code>ghc</code> package itself from stage0 rather than the version it
was about to build for itself, and thus never actually tried to build most
of the compiler. Editing <code>ghc_stage1_DEPS</code> in <code>ghc/stage1/package-data.mk</code>
after its initial generation sorted this out. One late night building round
and round in circles for a while until I had something stable, and a Debian
source upload to add basic support for the architecture name (and other
changes which were a bit over the top in retrospect: I didn’t need to touch
the embedded copy of libffi, as we build with the system one), and I was
able to feed this all into Launchpad and watch the builders munch away very
satisfyingly at the Haskell library stack for a while.</p>
<p>This was all interesting, and finally all that work was actually paying off
in terms of getting to watch a slew of several hundred build failures vanish
from arm64 (the final count was something like 640, I think). The fly in
the ointment was that ppc64el was still blocked, as the problem there wasn’t
building 7.6, it was getting a working 7.8. But now I really did have other
much more urgent things to do, so I figured I just wouldn’t get to this by
release time and stuck it on the figurative shelf.</p>
<h2>Book the Third: The track of a bug</h2>
<p>Then, last Friday, I cleared out my urgent pile and thought I’d have another
quick look. (I get a bit obsessive about things like this that smell of
“interesting intellectual puzzle”.) slyfox on the #ghc <span class="caps">IRC</span> channel gave me
some general debugging advice and, particularly usefully, a reduced example
program that I could use to debug just the process-spawning problem without
having to wade through noise from running the rest of the compiler. I
reproduced the same problem there, and then found that the program crashed
earlier (in <code>stg_ap_0_fast</code>, part of the run-time system) if I compiled it
with <code>+RTS -Da -RTS</code>. I nailed it down to a small enough region of assembly
that I could see all of the assembly, the source code, and an intermediate
representation or two from the compiler, and then started meditating on what
makes ppc64el special.</p>
<p>You see, the vast majority of porting bugs come down to what I might call
gross properties of the architecture. You have things like whether it’s
32-bit or 64-bit, big-endian or little-endian, whether <code>char</code> is signed or
unsigned, that sort of thing. There’s a <a href="https://wiki.debian.org/ArchitectureSpecificsMemo">big
table</a> on the Debian wiki
that handily summarises most of the important ones. Sometimes you have to
deal with distribution-specific things like whether <span class="caps">GL</span> or <span class="caps">GLES</span> is used;
often, especially for new variants of existing architectures, you have to
cope with foolish configure scripts that think they can guess certain things
from the architecture name and get it wrong (assuming that <code>powerpc*</code> means
big-endian, for instance). We often have to update <code>config.guess</code> and
<code>config.sub</code>, and on ppc64el we have the additional hassle of updating
libtool macros too. But I’ve done a lot of this stuff and I’d accounted for
everything I could think of. ppc64el is actually a lot like amd64 in terms
of many of these porting-relevant properties, and not even that far off
arm64 which I’d just successfully ported <span class="caps">GHC</span> to, so I couldn’t be dealing
with anything particularly obvious. There was some hand-written assembly
which certainly could have been problematic, but I’d carefully checked that
this wasn’t being used by the “unregisterised” (no specialised machine
dependencies, so relatively easy to port but not well-optimised) build I was
using. A problem around spawning processes suggested a problem with
<code>SIGCHLD</code> handling, but I ruled that out by slowing down the first child
process that it spawned and using <code>strace</code> to confirm that <code>SIGSEGV</code> was the
first signal received. What on earth was the problem?</p>
<p>From some painstaking gdb work, one thing I eventually noticed was that
<code>stg_ap_0_fast</code><span class="quo">‘</span>s local stack appeared to be being corrupted by a function
call, specifically a call to the colourfully-named <code>debugBelch</code>. Now, when
<span class="caps">IBM</span>’s toolchain engineers were putting together ppc64el based on ppc64, they
took the opportunity to fix a number of problems with their <span class="caps">ABI</span>: there’s an
<a href="https://bugs.openjdk.java.net/browse/JDK-8035647">OpenJDK bug</a> with a handy
list of references. One of the things I noticed there was that there were
some <a href="http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01149.html">stack allocation
optimisations</a> in
the new <span class="caps">ABI</span>, which affected functions that don’t call any vararg functions
and don’t call any functions that take enough parameters that some of them
have to be passed on the stack rather than in registers. <code>debugBelch</code> takes
varargs: hmm. Now, the calling code isn’t quite in C as such, but in a
related dialect called “Cmm”, a variant of C— (yes, minus), that <span class="caps">GHC</span> uses
to help bridge the gap between the functional world and its code generation,
and which is compiled down to C by <span class="caps">GHC</span>. When importing C functions into
Cmm, <span class="caps">GHC</span> generates prototypes for them, but it doesn’t do enough parsing to
work out the true prototype; instead, they all just get something like
<code>extern StgFunPtr f(void);</code>. In most architectures you can get away with
this, because the arguments get passed in the usual calling convention
anyway and it all works out, but on ppc64el this means that the caller
doesn’t generate enough stack space and then the callee tries to save its
varargs onto the stack in an area that in fact belongs to the caller, and
suddenly everything goes south. Things were starting to make sense.</p>
<p>Now, <code>debugBelch</code> is only used in optional debugging code; but
<code>runInteractiveProcess</code> (the function associated with the initial round of
failures) takes no fewer than twelve arguments, plenty to force some of them
onto the stack. I poked around the <span class="caps">GCC</span> patch for this <span class="caps">ABI</span> change a bit and
determined that it only optimised away the stack allocation if it had a full
prototype for all the callees, so I guessed that changing those prototypes
to <code>extern StgFunPtr f();</code> might work: it’s still technically wrong, not
least because omitting the parameter list is an obsolescent feature in C11,
but it’s at least just omitting information about the parameter list rather
than actively lying about it. I tweaked that and ran the cross-build from
scratch again. Lo and behold, suddenly I had a working compiler, and I
could go through the same build-7.6-using-7.8 procedure as with arm64, much
more quickly this time now that I knew what I was doing. One <a href="https://ghc.haskell.org/trac/ghc/ticket/8965">upstream
bug</a>, one Debian upload, and
several bootstrapping builds later, and <span class="caps">GHC</span> was up and running on another
architecture in Launchpad. Success!</p>
<h2>Epilogue</h2>
<p>There’s still more to do. I gather there may be a Google Summer of Code
project in Linaro to write proper native code generation for <span class="caps">GHC</span> on arm64:
this would make things a good deal faster, but also enable GHCi (the
interpreter) and Template Haskell, and thus clear quite a few more build
failures. Since there’s already native code generation for ppc64 in <span class="caps">GHC</span>,
getting it going for ppc64el would probably only be a couple of days’ work
at this point. But these are niceties by comparison, and I’m more than
happy with what I got working for 14.04.</p>
<p>The upshot of all of this is that I may be the first non-Haskell-programmer
to ever port <span class="caps">GHC</span> to two entirely new architectures. I’m not sure if I gain
much from that personally aside from a lot of lost sleep and being
considered extremely strange. It has, however, been by far the most
challenging set of packages I’ve ported, and a fascinating trip through some
odd corners of build systems and undefined behaviour that I don’t normally
need to touch.</p>Testing wanted: GRUB 2.02~beta2 Debian/Ubuntu packages2014-01-18T01:46:55+00:002014-01-18T01:46:55+00:00Colin Watsontag:www.chiark.greenend.org.uk,2014-01-18:/~cjwatson/blog/testing-wanted-grub-2.02-beta2.html<p>This is mostly a repost of my <a href="https://lists.ubuntu.com/archives/ubuntu-devel/2014-January/037978.html">ubuntu-devel
mail</a>
for a wider audience, but see below for some additions.</p>
<p>I’d like to upgrade to <span class="caps">GRUB</span> 2.02 for Ubuntu 14.04; it’s currently in beta.
This represents a year and a half of upstream development, and contains many …</p><p>This is mostly a repost of my <a href="https://lists.ubuntu.com/archives/ubuntu-devel/2014-January/037978.html">ubuntu-devel
mail</a>
for a wider audience, but see below for some additions.</p>
<p>I’d like to upgrade to <span class="caps">GRUB</span> 2.02 for Ubuntu 14.04; it’s currently in beta.
This represents a year and a half of upstream development, and contains many
new features, which you can see in the
<a href="http://git.savannah.gnu.org/gitweb/?p=grub.git;a=blob;f=NEWS"><span class="caps">NEWS</span></a> file.</p>
<p>Obviously I want to be very careful with substantial upgrades to the default
boot loader. So, I’ve put this in trusty-proposed, and filed a <a href="https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1269992">blocking
bug</a> to ensure
that it doesn’t reach trusty proper until it’s had a reasonable amount of
manual testing. If you are already using trusty and have some time to try
this out, it would be very helpful to me. I suggest that you only attempt
this if you’re comfortable driving <code>apt-get</code> directly and recovering from
errors at that level, and if you’re willing to spend time working with me on
narrowing down any problems that arise.</p>
<p>Please ensure that you have rescue media to hand before starting testing.
The simplest way to upgrade is to enable trusty-proposed, upgrade <span class="caps">ONLY</span>
packages whose names start with “grub” (e.g. use <code>apt-get dist-upgrade</code> to
show the full list, say no to the upgrade, and then pass all the relevant
package names to <code>apt-get install</code>), and then (very important!) disable
trusty-proposed again. Provided that there were no errors in this process,
you should be safe to reboot. If there were errors, you should be able to
downgrade back to 2.00-22 (or 1.27+2.00-22 in the case of grub-efi-amd64-signed).</p>
<p>Please report your experiences (positive and negative) with this upgrade in
the <a href="https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1269992">tracking
bug</a>. I’m
particularly interested in systems that are complex in any way: <span class="caps">UEFI</span> Secure
Boot, non-trivial disk setups, manual configuration, that kind of thing. If
any of the problems you see are also ones you saw with earlier versions of
<span class="caps">GRUB</span>, please identify those clearly, as I want to prioritise handling
regressions over anything else. I’ve assigned myself to that bug to ensure
that messages to it are filtered directly into my inbox.</p>
<p>I’ll add a couple of things that weren’t in my ubuntu-devel mail. Firstly,
this is all in Debian experimental as well (I do all the work in Debian and
sync it across, so the grub2 source package in Ubuntu is a verbatim copy of
the one in Debian these days). There are some configuration differences
applied at build time, but a large fraction of test cases will apply equally
well to both. I don’t have a definite schedule for pushing this into jessie
yet - I only just finished getting 2.00 in place there, and the release
schedule gives me a bit more time - but I certainly want to ship jessie with
2.02 or newer, and any test feedback would be welcome. It’s probably best
to just e-mail feedback to me directly for now, or to the pkg-grub-devel list.</p>
<p>Secondly, a couple of news sites have picked this up and run it as
“Canonical intends to ship Ubuntu 14.04 <span class="caps">LTS</span> with a beta version of <span class="caps">GRUB</span>”.
This isn’t in fact my intent at all. I’m doing this now because I think
<span class="caps">GRUB</span> 2.02 will be ready in non-beta form in time for Ubuntu 14.04, and
indeed that putting it in our development release will help to stabilise it;
I’m an upstream <span class="caps">GRUB</span> developer too and I find the exposure of widely-used
packages very helpful in that context. It will certainly be much easier to
upgrade to a beta now and a final release later than it would be to try to
jump from 2.00 to 2.02 in a month or two’s time.</p>
<p>Even if there’s some unforeseen delay and 2.02 isn’t released in time,
though, I think nearly three months of stabilisation is still plenty to
yield a boot loader that I’m comfortable with shipping in an <span class="caps">LTS</span>. I’ve been
backporting a lot of changes to 2.00 and even 1.99, and, as ever for an
actively-developed codebase, it gets harder and harder over time (in
particular, I’ve spent longer than I’d like hunting down and backporting
fixes for non-512-byte sector disks). While I can still manage it, I don’t
want to be supporting 2.00 for five more years after upstream has moved on;
I don’t think that would be in anyone’s best interests. And I definitely
want some of the new features which aren’t sensibly backportable, such as
several of the new platforms (<span class="caps">ARM</span>, <span class="caps">ARM64</span>, Xen) and various networking
improvements; I can imagine a number of our users being interested in things
like optional signature verification of files <span class="caps">GRUB</span> reads from disk, improved
Mac support, and the TrueCrypt <span class="caps">ISO</span> loader, just to name a few. This should
be a much stronger base for five-year support.</p>Automatic installability checking2012-10-26T10:18:26+01:002012-10-26T10:20:07+01:00Colin Watsontag:www.chiark.greenend.org.uk,2012-10-26:/~cjwatson/blog/automatic-installability-checking.html<p>I’ve just finished deploying automatic installability checking for Ubuntu’s
development release, which is more or less equivalent to the way that
uploads are promoted from Debian unstable to testing. See <a href="https://lists.ubuntu.com/archives/ubuntu-devel/2012-October/036043.html">my ubuntu-devel
post</a>
and <a href="https://lists.ubuntu.com/archives/ubuntu-devel-announce/2012-October/000989.html">my ubuntu-devel-announce
post</a>
for details. This now means that we’ll be opening the …</p><p>I’ve just finished deploying automatic installability checking for Ubuntu’s
development release, which is more or less equivalent to the way that
uploads are promoted from Debian unstable to testing. See <a href="https://lists.ubuntu.com/archives/ubuntu-devel/2012-October/036043.html">my ubuntu-devel
post</a>
and <a href="https://lists.ubuntu.com/archives/ubuntu-devel-announce/2012-October/000989.html">my ubuntu-devel-announce
post</a>
for details. This now means that we’ll be opening the archive for general
development once glibc 2.16 packages are ready.</p>
<p>I’m very excited about this because it’s something I’ve wanted to do for a
long, long time. In fact, back in 2004 when I had my very first telephone
conversation with a certain spaceman about this crazy Debian-based project
he wanted me to work on, I remember talking about Debian’s testing migration
system and some ways I thought it could be improved. I don’t remember the
details of that conversation any more and what I just deployed may well bear
very little resemblance to it, but it should transform the extent to which
our development release is continuously usable.</p>
<p>The next step is to hook in <a href="http://dep.debian.net/deps/dep8/">autopkgtest</a>
results. This will allow us to do a degree of automatic testing of
reverse-dependencies when we upgrade low-level libraries.</p>OpenSSH 6.0p12012-05-27T20:12:12+01:002012-05-27T20:12:12+01:00Colin Watsontag:www.chiark.greenend.org.uk,2012-05-27:/~cjwatson/blog/openssh-6.0p1.html<p>OpenSSH 6.0p1 was <a href="http://www.openssh.com/txt/release-6.0">released</a> a
little while back; this weekend I belatedly got round to uploading packages
of it to Debian unstable and Ubuntu quantal.</p>
<p>I was a bit delayed by needing to put together an <a href="https://bugzilla.mindrot.org/show_bug.cgi?id=2011">improvement to privsep
sandbox selection</a> that
particularly matters in the context of distributions …</p><p>OpenSSH 6.0p1 was <a href="http://www.openssh.com/txt/release-6.0">released</a> a
little while back; this weekend I belatedly got round to uploading packages
of it to Debian unstable and Ubuntu quantal.</p>
<p>I was a bit delayed by needing to put together an <a href="https://bugzilla.mindrot.org/show_bug.cgi?id=2011">improvement to privsep
sandbox selection</a> that
particularly matters in the context of distributions. One of the experts on
<code>seccomp_filter</code> has commented favourably on it, but I haven’t yet had a
comment from upstream themselves, so I may need to refine this depending on
what they say.</p>
<p>(This is a good example of how it matters that software is often not built
on the system that it’s going to run on, and in particular that the kernel
version is rather likely to be different. Where possible it’s always best
to detect kernel capabilities at run-time rather than at build-time.)</p>
<p>I didn’t make it very clear in the changelog, but using the new
<code>seccomp_filter</code> sandbox currently requires <code>UsePrivilegeSeparation sandbox</code>
in <code>sshd_config</code> as well as a capable kernel. I won’t change the default
here in advance of upstream, who still consider privsep sandboxing experimental.</p>libpipeline 1.2.1 released2012-03-02T21:49:10+00:002012-03-02T21:49:10+00:00Colin Watsontag:www.chiark.greenend.org.uk,2012-03-02:/~cjwatson/blog/libpipeline-1.2.1-released.html<p>I’ve released <a href="http://libpipeline.nongnu.org/">libpipeline 1.2.1</a>, and
uploaded it to Debian unstable. This is a bug-fix release:</p>
<ul>
<li>Retry reads and writes on <code>EINTR</code>.</li>
<li>Fix opening of output files requested by <code>pipeline_want_outfile</code>; these
are now created if they do not already exist, and truncated if they do.</li>
<li><code><pipeline.h></code> is …</li></ul><p>I’ve released <a href="http://libpipeline.nongnu.org/">libpipeline 1.2.1</a>, and
uploaded it to Debian unstable. This is a bug-fix release:</p>
<ul>
<li>Retry reads and writes on <code>EINTR</code>.</li>
<li>Fix opening of output files requested by <code>pipeline_want_outfile</code>; these
are now created if they do not already exist, and truncated if they do.</li>
<li><code><pipeline.h></code> is now wrapped in <code>extern "C"</code> when used in a C++
compilation unit.</li>
</ul>APT resolver bugs2012-01-30T10:54:25+00:002012-01-30T10:54:25+00:00Colin Watsontag:www.chiark.greenend.org.uk,2012-01-30:/~cjwatson/blog/apt-resolver-bugs.html<p>I’ve managed to go for eleven years working on Debian and nearly eight on
Ubuntu without ever needing to teach myself how <span class="caps">APT</span>’s resolver works. I get
the impression that there’s a certain mystique about it in general
(alternatively, I’m just the last person to figure …</p><p>I’ve managed to go for eleven years working on Debian and nearly eight on
Ubuntu without ever needing to teach myself how <span class="caps">APT</span>’s resolver works. I get
the impression that there’s a certain mystique about it in general
(alternatively, I’m just the last person to figure this out). Recently,
though, I had a couple of Ubuntu upgrade bugs to fix that turned out to be
bugs in the resolver, and I thought it might be interesting to walk through
the process of fixing them based on the <code>Debug::pkgProblemResolver=true</code> log files.</p>
<h2>Breakage with Breaks</h2>
<p>The first was <a href="https://bugs.launchpad.net/bugs/922485">Ubuntu bug #922485</a>
(<a href="https://launchpadlibrarian.net/91187038/apt.log">apt.log</a>). To understand
the log, you first need to know that <span class="caps">APT</span> makes up to ten passes of the
resolver to attempt to fix broken dependencies by upgrading, removing, or
holding back packages; if there are still broken packages after this point,
it’s generally because it’s got itself stuck in some kind of loop, and it
bails out rather than carrying on forever. The current pass number is shown
in each “Investigating” log entry, so they start with “Investigating (0)”
and carry on up to at most “Investigating (9)”. Any packages that you see
still being investigated on the tenth pass are probably something to do with
whatever’s going wrong.</p>
<p>In this case, most packages have been resolved by the end of the fourth
pass, but <code>xserver-xorg-core</code> is causing some trouble. (Not a particular
surprise, as it’s an important package with lots of relationships.) We can
see that each breakage is:</p>
<div class="highlight"><pre><span></span><code>Broken xserver-xorg-core:i386 Breaks on xserver-xorg-video-6 [ i386 ] < none > ( none )
</code></pre></div>
<p>This is a
<a href="http://www.debian.org/doc/debian-policy/ch-relationships.html#s-breaks"><code>Breaks</code></a>
(a relatively new package relationship type introduced a few years ago as a
sort of weaker form of <code>Conflicts</code>) on a virtual package, which means that
in order to unpack <code>xserver-xorg-core</code> each package that provides
<code>xserver-xorg-video-6</code> must be deconfigured. Much like <code>Conflicts</code>, <span class="caps">APT</span>
responds to this by upgrading providing packages to versions that don’t
provide the offending virtual package if it can, and otherwise removing
them. We can see it doing just that in the log (some lines omitted):</p>
<div class="highlight"><pre><span></span><code><span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="n">xserver</span><span class="o">-</span><span class="n">xorg</span><span class="o">-</span><span class="n">core</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">2</span><span class="o">:</span><span class="mf">1.7.6</span><span class="o">-</span><span class="mi">2u</span><span class="n">buntu7</span><span class="mf">.10</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mi">2</span><span class="o">:</span><span class="mf">1.11.3</span><span class="o">-</span><span class="mi">0u</span><span class="n">buntu8</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">x11</span><span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="n">Fixing</span><span class="w"> </span><span class="n">xserver</span><span class="o">-</span><span class="n">xorg</span><span class="o">-</span><span class="n">core</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">via</span><span class="w"> </span><span class="n">remove</span><span class="w"> </span><span class="kr">of</span><span class="w"> </span><span class="n">xserver</span><span class="o">-</span><span class="n">xorg</span><span class="o">-</span><span class="n">video</span><span class="o">-</span><span class="n">tseng</span><span class="o">:</span><span class="n">i386</span>
<span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="n">xserver</span><span class="o">-</span><span class="n">xorg</span><span class="o">-</span><span class="n">core</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">2</span><span class="o">:</span><span class="mf">1.7.6</span><span class="o">-</span><span class="mi">2u</span><span class="n">buntu7</span><span class="mf">.10</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mi">2</span><span class="o">:</span><span class="mf">1.11.3</span><span class="o">-</span><span class="mi">0u</span><span class="n">buntu8</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">x11</span><span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="n">Fixing</span><span class="w"> </span><span class="n">xserver</span><span class="o">-</span><span class="n">xorg</span><span class="o">-</span><span class="n">core</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">via</span><span class="w"> </span><span class="n">remove</span><span class="w"> </span><span class="kr">of</span><span class="w"> </span><span class="n">xserver</span><span class="o">-</span><span class="n">xorg</span><span class="o">-</span><span class="n">video</span><span class="o">-</span><span class="n">i740</span><span class="o">:</span><span class="n">i386</span>
<span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="n">xserver</span><span class="o">-</span><span class="n">xorg</span><span class="o">-</span><span class="n">core</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">2</span><span class="o">:</span><span class="mf">1.7.6</span><span class="o">-</span><span class="mi">2u</span><span class="n">buntu7</span><span class="mf">.10</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mi">2</span><span class="o">:</span><span class="mf">1.11.3</span><span class="o">-</span><span class="mi">0u</span><span class="n">buntu8</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">x11</span><span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="n">Fixing</span><span class="w"> </span><span class="n">xserver</span><span class="o">-</span><span class="n">xorg</span><span class="o">-</span><span class="n">core</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">via</span><span class="w"> </span><span class="n">remove</span><span class="w"> </span><span class="kr">of</span><span class="w"> </span><span class="n">xserver</span><span class="o">-</span><span class="n">xorg</span><span class="o">-</span><span class="n">video</span><span class="o">-</span><span class="n">nv</span><span class="o">:</span><span class="n">i386</span>
</code></pre></div>
<p><span class="caps">OK</span>, so that makes sense - presumably upgrading those packages didn’t help at
the time. But look at the pass numbers. Rather than just fixing all the
packages that provide <code>xserver-xorg-video-6</code> in a single pass, which it
would be perfectly able to do, it only fixes one per pass. This means that
if a package <code>Breaks</code> a virtual package which is provided by more than ten
installed packages, the resolver will fail to handle that situation. On
inspection of the code, this was being handled correctly for <code>Conflicts</code> by
carrying on through the list of possible targets for the dependency relation
in that case, but apparently when <code>Breaks</code> support was implemented in <span class="caps">APT</span>
this case was overlooked. The fix is to carry on through the list of
possible targets for any “negative” dependency relation, not just
<code>Conflicts</code>, and I’ve filed a patch as <a href="http://bugs.debian.org/657695">Debian
bug #657695</a>.</p>
<h2>My cup overfloweth</h2>
<p>The second bug I looked at was <a href="https://bugs.launchpad.net/bugs/917173">Ubuntu
bug #917173</a>
(<a href="https://launchpadlibrarian.net/90202820/apt.log">apt.log</a>). Just as in
the previous case, we can see the resolver “running out of time” by reaching
the end of the tenth pass with some dependencies still broken. This one is
a lot less obvious, though. The last few entries clearly indicate that the
resolver is stuck in a loop:</p>
<div class="highlight"><pre><span></span><code><span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">8</span><span class="p">)</span><span class="w"> </span><span class="n">dpkg</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">1.15.5.6</span><span class="n">ubuntu4</span><span class="mf">.5</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">admin</span><span class="w"> </span><span class="p">)</span>
<span class="n">Broken</span><span class="w"> </span><span class="n">dpkg</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">Breaks</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">1.15.5.6</span><span class="n">ubuntu4</span><span class="mf">.5</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">utils</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o"><</span><span class="w"> </span><span class="mf">1.15.8</span><span class="p">)</span>
<span class="w"> </span><span class="n">Considering</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">29</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">solution</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">dpkg</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">7205</span>
<span class="w"> </span><span class="n">Upgrading</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">due</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">Breaks</span><span class="w"> </span><span class="n">field</span><span class="w"> </span><span class="kr">in</span><span class="w"> </span><span class="n">dpkg</span><span class="o">:</span><span class="n">i386</span>
<span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">8</span><span class="p">)</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">1.15.5.6</span><span class="n">ubuntu4</span><span class="mf">.5</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">utils</span><span class="w"> </span><span class="p">)</span>
<span class="n">Broken</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">Depends</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">libdpkg</span><span class="o">-</span><span class="n">perl</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">none</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">perl</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">=</span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="p">)</span>
<span class="w"> </span><span class="n">Considering</span><span class="w"> </span><span class="n">libdpkg</span><span class="o">-</span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">12</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">solution</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">29</span>
<span class="w"> </span><span class="n">Holding</span><span class="w"> </span><span class="n">Back</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">rather</span><span class="w"> </span><span class="n">than</span><span class="w"> </span><span class="n">change</span><span class="w"> </span><span class="n">libdpkg</span><span class="o">-</span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span>
<span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">9</span><span class="p">)</span><span class="w"> </span><span class="n">dpkg</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">1.15.5.6</span><span class="n">ubuntu4</span><span class="mf">.5</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">admin</span><span class="w"> </span><span class="p">)</span>
<span class="n">Broken</span><span class="w"> </span><span class="n">dpkg</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">Breaks</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">1.15.5.6</span><span class="n">ubuntu4</span><span class="mf">.5</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">utils</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o"><</span><span class="w"> </span><span class="mf">1.15.8</span><span class="p">)</span>
<span class="w"> </span><span class="n">Considering</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">29</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">solution</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">dpkg</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">7205</span>
<span class="w"> </span><span class="n">Upgrading</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">due</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">Breaks</span><span class="w"> </span><span class="n">field</span><span class="w"> </span><span class="kr">in</span><span class="w"> </span><span class="n">dpkg</span><span class="o">:</span><span class="n">i386</span>
<span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">9</span><span class="p">)</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">1.15.5.6</span><span class="n">ubuntu4</span><span class="mf">.5</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">utils</span><span class="w"> </span><span class="p">)</span>
<span class="n">Broken</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">Depends</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">libdpkg</span><span class="o">-</span><span class="n">perl</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">none</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">perl</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">=</span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="p">)</span>
<span class="w"> </span><span class="n">Considering</span><span class="w"> </span><span class="n">libdpkg</span><span class="o">-</span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">12</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">solution</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">29</span>
<span class="w"> </span><span class="n">Holding</span><span class="w"> </span><span class="n">Back</span><span class="w"> </span><span class="n">dpkg</span><span class="o">-</span><span class="n">dev</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">rather</span><span class="w"> </span><span class="n">than</span><span class="w"> </span><span class="n">change</span><span class="w"> </span><span class="n">libdpkg</span><span class="o">-</span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span>
</code></pre></div>
<p>The new version of <code>dpkg</code> requires upgrading <code>dpkg-dev</code>, but it can’t
because of something wrong with <code>libdpkg-perl</code>. Following the breadcrumb
trail back through the log, we find:</p>
<div class="highlight"><pre><span></span><code><span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="n">libdpkg</span><span class="o">-</span><span class="n">perl</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">none</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">1.16.1.2</span><span class="n">ubuntu5</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">perl</span><span class="w"> </span><span class="p">)</span>
<span class="n">Broken</span><span class="w"> </span><span class="n">libdpkg</span><span class="o">-</span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">Depends</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">perl</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">5.10.1</span><span class="o">-</span><span class="mi">8u</span><span class="n">buntu2</span><span class="mf">.1</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">5.14.2</span><span class="o">-</span><span class="mi">6u</span><span class="n">buntu1</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">perl</span><span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="n">Considering</span><span class="w"> </span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">1472</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">solution</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">libdpkg</span><span class="o">-</span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">12</span>
<span class="w"> </span><span class="n">Holding</span><span class="w"> </span><span class="n">Back</span><span class="w"> </span><span class="n">libdpkg</span><span class="o">-</span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">rather</span><span class="w"> </span><span class="n">than</span><span class="w"> </span><span class="n">change</span><span class="w"> </span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span>
<span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="n">perl</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">5.10.1</span><span class="o">-</span><span class="mi">8u</span><span class="n">buntu2</span><span class="mf">.1</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">5.14.2</span><span class="o">-</span><span class="mi">6u</span><span class="n">buntu1</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">perl</span><span class="w"> </span><span class="p">)</span>
<span class="n">Broken</span><span class="w"> </span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">Depends</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">perl</span><span class="o">-</span><span class="n">base</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">5.10.1</span><span class="o">-</span><span class="mi">8u</span><span class="n">buntu2</span><span class="mf">.1</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">5.14.2</span><span class="o">-</span><span class="mi">6u</span><span class="n">buntu1</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">perl</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">=</span><span class="w"> </span><span class="mf">5.14.2</span><span class="o">-</span><span class="mi">6u</span><span class="n">buntu1</span><span class="p">)</span>
<span class="w"> </span><span class="n">Considering</span><span class="w"> </span><span class="n">perl</span><span class="o">-</span><span class="n">base</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">5806</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">solution</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">1472</span>
<span class="w"> </span><span class="n">Removing</span><span class="w"> </span><span class="n">perl</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">rather</span><span class="w"> </span><span class="n">than</span><span class="w"> </span><span class="n">change</span><span class="w"> </span><span class="n">perl</span><span class="o">-</span><span class="n">base</span><span class="o">:</span><span class="n">i386</span>
<span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="n">perl</span><span class="o">-</span><span class="n">base</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">5.10.1</span><span class="o">-</span><span class="mi">8u</span><span class="n">buntu2</span><span class="mf">.1</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">5.14.2</span><span class="o">-</span><span class="mi">6u</span><span class="n">buntu1</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">perl</span><span class="w"> </span><span class="p">)</span>
<span class="n">Broken</span><span class="w"> </span><span class="n">perl</span><span class="o">-</span><span class="n">base</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">PreDepends</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">libc6</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">2.11.1</span><span class="o">-</span><span class="mi">0u</span><span class="n">buntu7</span><span class="mf">.8</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">2.13</span><span class="o">-</span><span class="mi">24u</span><span class="n">buntu2</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">libs</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">>=</span><span class="w"> </span><span class="mf">2.11</span><span class="p">)</span>
<span class="w"> </span><span class="n">Considering</span><span class="w"> </span><span class="n">libc6</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="o">-</span><span class="mi">17473</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">solution</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">perl</span><span class="o">-</span><span class="n">base</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">5806</span>
<span class="w"> </span><span class="n">Added</span><span class="w"> </span><span class="n">libc6</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">remove</span><span class="w"> </span><span class="n">list</span>
<span class="n">Investigating</span><span class="w"> </span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="n">libc6</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">2.11.1</span><span class="o">-</span><span class="mi">0u</span><span class="n">buntu7</span><span class="mf">.8</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">2.13</span><span class="o">-</span><span class="mi">24u</span><span class="n">buntu2</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">libs</span><span class="w"> </span><span class="p">)</span>
<span class="n">Broken</span><span class="w"> </span><span class="n">libc6</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">Depends</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="n">libc</span><span class="o">-</span><span class="n">bin</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">i386</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">2.11.1</span><span class="o">-</span><span class="mi">0u</span><span class="n">buntu7</span><span class="mf">.8</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="mf">2.13</span><span class="o">-</span><span class="mi">24u</span><span class="n">buntu2</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">libs</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">=</span><span class="w"> </span><span class="mf">2.11.1</span><span class="o">-</span><span class="mi">0u</span><span class="n">buntu7</span><span class="mf">.8</span><span class="p">)</span>
<span class="w"> </span><span class="n">Considering</span><span class="w"> </span><span class="n">libc</span><span class="o">-</span><span class="n">bin</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="mi">10358</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">solution</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">libc6</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="o">-</span><span class="mi">17473</span>
<span class="w"> </span><span class="n">Removing</span><span class="w"> </span><span class="n">libc6</span><span class="o">:</span><span class="n">i386</span><span class="w"> </span><span class="n">rather</span><span class="w"> </span><span class="n">than</span><span class="w"> </span><span class="n">change</span><span class="w"> </span><span class="n">libc</span><span class="o">-</span><span class="n">bin</span><span class="o">:</span><span class="n">i386</span>
</code></pre></div>
<p>So ultimately the problem is something to do with libc6; but what? <a href="https://bugs.launchpad.net/ubuntu/+source/apt/+bug/917173/comments/10">As
Steve Langasek said in the
bug</a>,
libc6’s dependencies have been very carefully structured, and surely we
would have seen some hint of it elsewhere if they were wrong. At this point
ideally I wanted to break out <span class="caps">GDB</span> or at the very least experiment a bit with
<code>apt-get</code>, but due to some tedious local problems I hadn’t been able to
restore the <code>apt-clone</code> state file for this bug onto my system so that I
could attack it directly. So I fell back on the last refuge of the
frustrated debugger and sat and thought about it for a bit.</p>
<p>Eventually I noticed something. The numbers after the package names in the
third line of each of these log entries are “scores”: roughly, the more
important a package is, the higher its score should be. The function that
calculates these is <code>pkgProblemResolver::MakeScores()</code> in
<a href="http://anonscm.debian.org/cgit/apt/apt.git/tree/apt-pkg/algorithms.cc?id=f23e1e940214c7abbf87c28bc71a5d37d117aa57">apt-pkg/algorithms.cc</a>.
Reading this, I noticed that the various values added up to make each score
are almost all provably positive, for example:</p>
<div class="highlight"><pre><span></span><code><span class="n">Scores</span><span class="p">[</span><span class="n">I</span><span class="o">-></span><span class="n">ID</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">abs</span><span class="p">(</span><span class="n">OldScores</span><span class="p">[</span><span class="n">D</span><span class="p">.</span><span class="n">ParentPkg</span><span class="p">()</span><span class="o">-></span><span class="n">ID</span><span class="p">]);</span>
</code></pre></div>
<p>The only exceptions are an initial -1 or -2 points for <code>Priority: optional</code>
or <code>Priority: extra</code> packages respectively, or some values that could
theoretically be configured to be negative but weren’t in this case. <span class="caps">OK</span>.
So how come <code>libc6</code> has such a huge negative score of -17473, when one would
normally expect it to be an extremely powerful package with a large positive score?</p>
<p>Oh. This is computer programming, not mathematics … and each score is
stored in a <code>signed short</code>, so in a sufficiently large upgrade all those
bonus points add up to something larger than 32767 and everything goes
haywire. Bingo. Make it an <code>int</code> instead - the number of installed
packages is going to be on the order of tens of thousands at most, so it’s
not as though it’ll make a substantial difference to the amount of memory
used - and chances are everything will be fine. I’ve filed a patch as
<a href="http://bugs.debian.org/657732">Debian bug #657732</a>.</p>
<p>I’d expected this to be a pretty challenging pair of bugs. While I
certainly haven’t lost any respect for the <span class="caps">APT</span> maintainers for dealing with
this stuff regularly, it wasn’t as bad as I thought. I’d expected to have
to figure out how to retune some slightly out-of-balance heuristics and not
really know whether I’d broken anything else in the process; but in the end
both patches were very straightforward.</p>Quality in Ubuntu 12.04 LTS2011-10-24T14:57:41+01:002011-10-24T14:57:41+01:00Colin Watsontag:www.chiark.greenend.org.uk,2011-10-24:/~cjwatson/blog/quality-in-12-04.html<p>As is natural for an <span class="caps">LTS</span> cycle, lots of people are thinking and talking
about work focused on quality rather than features. With Canonical
<a href="http://www.canonical.com/content/ubuntu-1204-feature-extended-support-period-desktop-users">extending <span class="caps">LTS</span>
support</a>
to five years on the desktop for 12.04, much of this is quite rightly
focused on the desktop. I’m really not …</p><p>As is natural for an <span class="caps">LTS</span> cycle, lots of people are thinking and talking
about work focused on quality rather than features. With Canonical
<a href="http://www.canonical.com/content/ubuntu-1204-feature-extended-support-period-desktop-users">extending <span class="caps">LTS</span>
support</a>
to five years on the desktop for 12.04, much of this is quite rightly
focused on the desktop. I’m really not a desktop hacker in any way, shape,
or form, though. I spent my first few years in Ubuntu working mainly on the
installer - I still do, although I do some other things now too - and I used
to say only half-jokingly that my job was done once X started. Of course
there are plenty of bugs I can fix, but I wanted to see if I could do
something with a bit more structure, so I got to thinking about projects we
could work on at the foundations level that would make a big difference.</p>
<h2>Image build pipeline</h2>
<p>One difficulty we have is that quite a few of our bugs - especially
installer bugs, although this goes for some other things too - are only
really caught when people are doing coordinated image testing just before a
milestone release. Now, it takes a while to do all the builds and then it
takes a while to test them. The excellent work of the <span class="caps">QA</span> team has meant
that testing is much quicker now than it used to be, and a certain amount of
smoke-testing is automated (particularly for server images). On the other
hand, the build phase has only got longer as we’ve added more flavours and
architectures, particularly as some parts of the process are still
serialised per architecture or subarchitecture so <span class="caps">ARM</span> builds in particular
take a very long time indeed. Exact timings are a bit difficult to get for
various reasons, but I think the minimum time between a developer uploading
a fix and us having a full set of candidate images on all architectures
including that fix is currently somewhere north of eight hours, and that’s
with people cutting corners and pulling strings which is a suboptimal thing
to have to do around release time. This obviously makes us reluctant to
respin for anything short of showstopper bugs. If we could get things down
to something closer to two hours, respins would be a much less horrible
proposition and so we might be able to fix a few bugs that are serious but
not showstoppers, not to mention that the release team would feel less
burned out.</p>
<p>We discussed this problem at the release sprint, and came up with a <a href="https://blueprints.launchpad.net/ubuntu/+spec/foundations-p-image-build-pipeline">laundry
list of
improvements</a>;
I’ve scheduled this for discussion at <span class="caps">UDS</span> in case we can think of any more.
Please come along if you’re interested!</p>
<p>One thing in particular that I’m working on is refactoring
<a href="https://launchpad.net/germinate">Germinate</a>, a tool which dates right back
to our first meeting before Ubuntu was even called Ubuntu and whose job is
to expand dependencies starting from our lists of “seed” packages; we use
this, among other things, to generate <code>Task</code> fields in the archive and to
decide which packages to copy into our images. This was acceptably quick in
2004, but now that we run it forty times (eight flavours multiplied by five
architectures) at the end of every publisher run it’s actually become rather
a serious performance problem: <code>cron.germinate</code> takes about ten minutes,
which is over a third of the typical publisher runtime. It parses Packages
files eight times as often as it needs to, Sources files forty times as
often as it needs to, and recalculates the dependency tree of the base
system five times as often as it needs to. I am confident that we can
significantly reduce the runtime here, and I think there’s some hope that we
might be able to move the publisher back to a 30-minute cycle, which would
increase the velocity of Ubuntu development in general.</p>
<h2>Maintaining the development release</h2>
<p>Our release cycle always starts with syncing and merging packages from
Debian unstable (or testing in the case of <span class="caps">LTS</span> cycles). The vast majority
of packages in Ubuntu arrive this way, and generally speaking if we didn’t
do this we would fall behind in ways that would be difficult to recover from
later. However, this does mean that we get a “big bang” of changes at the
start of the cycle, and it takes a while for the archive to be usable again.
Furthermore, even once we’ve taken care of this, we have a long-established
rhythm where the first part of the cycle is mainly about feature development
and the second part of the cycle is mainly about stabilisation. As a
result, we’ve got used to the archive being fairly broken for the first few
months, and we even tell people that they shouldn’t expect things to work
reliably until somewhere approaching beta.</p>
<p>This makes some kind of sense from the inside. But how are you supposed to
do feature development that relies on other things in the development release?</p>
<p>In the first few years of Ubuntu, this question didn’t matter very much.
Nearly all the people doing serious feature development were themselves
serious Ubuntu developers; they were capable of fixing problems in the
development release as they went along, and while it got in their way a
little bit it wasn’t all that big a deal. Now, though, we have people
focusing on things like Unity development, and we shouldn’t assume that just
because somebody is (say) an OpenGL expert or a window management expert
that they should be able to recover from arbitrary failures in development
release upgrades. One of the best things we could do to help the 12.04
desktop be more stable is to have the entire system be less unstable as we
go along, so that developers further up the stack don’t have to be
distracted by things wobbling underneath them. Plus, it’s just good
software engineering to keep the basics working as you go along: it should
always build, it should always install, it should always upgrade. Ubuntu is
too big to do something like having everyone stop any time the build breaks,
the way you might do in a smaller project, but we shouldn’t let things slide
for months either.</p>
<p>I’ve been talking to <a href="http://theravingrick.blogspot.com/">Rick Spencer</a> and
the other Ubuntu engineering leads at Canonical about this. Canonical has a
system of “rotations”, where you can go off to another team for a while if
you’re in need of a change or want to branch out a bit; so I proposed that
we allow our engineers to spend a month or two at a time on what I’m calling
the <strong>+1 Maintenance Team</strong>, whose job is simply to keep the development
release buildable, installable, and upgradeable at all times. Rick has been
very receptive to this, and we’re going to be running this as a trial
throughout the 12.04 cycle, with probably about three people at a time. As
well as being professional archive gardeners, these people will also work on
developing infrastructure to help us keep better track of what we need to
do. For instance, we could deploy better tools from Debian <span class="caps">QA</span> to help us
track uninstallable packages, or we could enhance
<a href="http://people.canonical.com/~ubuntu-archive/nbs.html">some</a> of our
<a href="http://conflictchecker.ubuntu.com/possible-conflicts/oneiric/main.txt">many</a>
<a href="http://people.canonical.com/~ubuntu-archive/component-mismatches.txt">existing</a>
<a href="http://people.canonical.com/~ubuntu-archive/testing/precise_probs.html">reports</a>
to have bug links and/or comment facilities, or we could spruce up the
<a href="http://reports.qa.ubuntu.com/reports/ogasawara/weatherreport.html">weather
report</a>;
there are lots of things we could do to make our own lives easier.</p>
<p>By 12.04, I would like, in no particular order:</p>
<ul>
<li>Precise to have been more or less continuously usable from Alpha 1 onward
for people with reasonable general technical ability</li>
<li>Canonical engineering teams outside Ubuntu (<span class="caps">DX</span>, Ubuntu One, Launchpad,
etc.) to be comfortable with running the development release on at least
one system from Alpha 2 onward</li>
<li>Installability problems in daily image builds to be dealt with within one
working day, or preferably before they even make it to daily builds</li>
<li>The archive to be close to consistent as we start milestone preparation,
rather than the release team having to scramble to make it so</li>
<li>A very significant reduction in our long-term backlog of
automatically-detected problems</li>
</ul>
<p>Of course, this overlaps to a certain extent with the kinds of things that
the <span class="caps">MOTU</span> team have been doing for years, not to mention with what all
developers should be doing to keep their own houses in reasonable order, and
I’d like us to work together on this; we’re trying to provide some extra
hands here to make Ubuntu better for everyone, not take over! I would love
this to be an opportunity to re-energise <span class="caps">MOTU</span> and bring some new people on board.</p>
<p>I’ve registered a couple of blueprints
(<a href="https://blueprints.launchpad.net/ubuntu/+spec/other-p-plusonemaint-priorities">priorities</a>,
<a href="https://blueprints.launchpad.net/ubuntu/+spec/other-p-plusonemaint-infrastructure">infrastructure</a>)
for discussion at <span class="caps">UDS</span>. These are deliberately open-ended skeleton sessions,
and I’ll try to make sure they’re scheduled fairly early in the week, so
that we have time for break-out sessions later on. If you’re interested,
please come along and give your feedback!</p>Top ideas on Ubuntu Brainstorm (August 2011)2011-10-06T16:58:51+01:002011-10-06T16:58:51+01:00Colin Watsontag:www.chiark.greenend.org.uk,2011-10-06:/~cjwatson/blog/brainstorm-review.html<p>The Ubuntu Technical Board conducts a regular review of the most popular <a href="http://brainstorm.ubuntu.com/">Ubuntu Brainstorm</a> ideas (previous reviews conducted by <a href="http://mdzlog.alcor.net/2010/12/10/ubuntu-brainstorm-top-10-for-december-2010/">Matt Zimmerman</a> and <a href="http://www.piware.de/2011/04/top-ideas-on-ubuntu-brainstorm-march-2011/">Martin Pitt</a>). This time it was my turn. Apologies for the late arrival of this review.</p>
<h2>Contact lens in the Unity Dash (<a href="http://brainstorm.ubuntu.com/idea/27584/">#27584</a>)</h2>
<p>Unity supports <a href="https://wiki.ubuntu.com/Unity/Lenses">Lenses</a>, which provide …</p><p>The Ubuntu Technical Board conducts a regular review of the most popular <a href="http://brainstorm.ubuntu.com/">Ubuntu Brainstorm</a> ideas (previous reviews conducted by <a href="http://mdzlog.alcor.net/2010/12/10/ubuntu-brainstorm-top-10-for-december-2010/">Matt Zimmerman</a> and <a href="http://www.piware.de/2011/04/top-ideas-on-ubuntu-brainstorm-march-2011/">Martin Pitt</a>). This time it was my turn. Apologies for the late arrival of this review.</p>
<h2>Contact lens in the Unity Dash (<a href="http://brainstorm.ubuntu.com/idea/27584/">#27584</a>)</h2>
<p>Unity supports <a href="https://wiki.ubuntu.com/Unity/Lenses">Lenses</a>, which provide a consistent way for users to quickly search for information via the Dash. Current lenses include Applications, Files, and Music, but a number of people have asked for contacts to be accessible using the same interface.</p>
<p>While Canonical’s <span class="caps">DX</span> team isn’t currently working on this for Ubuntu 11.10 or 12.04, we’d love somebody who’s interested in this to get involved. Allison Randal <a href="http://allisonrandal.com/2011/09/27/contacts-lens/">explains how to get started</a>, including some skeleton example code and several useful links.</p>
<h2>Displaying Ubuntu version information (<a href="http://brainstorm.ubuntu.com/idea/27460/">#27460</a>)</h2>
<p>Several people have asked for it to be more obvious what Ubuntu version they’re running, as well as other general information about their system.</p>
<p>John Lea, user experience architect on the Unity team, responds that in Ubuntu 11.10 the new LightDM greeter shows the Ubuntu version number, making that basic information very easily visible. For more detail, System Settings -> System Info provides a simple summary.</p>
<h2>Volume adjustments for headphone use (<a href="http://brainstorm.ubuntu.com/idea/27275/">#27275</a>)</h2>
<p>People often find that they need to adjust their sound volume when plugging in or removing headphones. It seems as though the computer ought to be able to remember this kind of thing and do it automatically; after all, a major goal of Ubuntu is to make the desktop Just Work.</p>
<p>David Henningson, a member of Canonical’s <span class="caps">OEM</span> Services group and an Ubuntu audio developer, <a href="http://voices.canonical.com/david.henningsson/2011/09/29/independent-volume-for-headphones-and-speakers/">responds</a> on his blog with a summary of how PulseAudio jack detection has improved matters in Ubuntu 11.10, and what’s left to do:</p>
<blockquote>
<p>The good news: in the upcoming Ubuntu Oneiric (11.10), this is actually
working. The bad news: it isn’t working for everyone.</p>
</blockquote>
<h2>Making it easier to find software to handle a file (<a href="http://brainstorm.ubuntu.com/idea/28148/">#28148</a>)</h2>
<p>Ubuntu is not always as helpful as it could be when you don’t have the right software installed to handle a particular file.</p>
<p>Michael Vogt, one of the developers of the Ubuntu Software Center, responded to this. It seems that most of the pieces to make this work nicely are in place, but there are a few more bits of glue required:</p>
<blockquote>
<p>Thanks a lot for this suggestion. I like the idea and it’s something that
software-center itself supports now. In the coming version 5.0 we will
offer to “sort by top-rated” (based on the ratings&reviews data). It’s
also possible to search for an application based on its mime data. To
search for a mime-type, you can enter “mime:text/html” or “mime:audio/ogg”
into the search field. What is needed however is better integration into
the file manager nautilus. I will make sure this gets attention at the
next developer meeting and filed
<a href="https://launchpad.net/bugs/860536">bug #860536</a> about it.</p>
<p>In nautilus, there is now a button called “Find applications online”
available as an option when opening an unknown file or when the user
selects “open with…other application” in the context menu. But that
will not use the data from software-center.</p>
</blockquote>
<h2>Show pop-up alert on low battery (<a href="http://brainstorm.ubuntu.com/idea/28037/">#28037</a>)</h2>
<p>Some users have reported on Brainstorm that they are not alerted frequently enough when their laptop’s battery is low, as they clearly ought to be.</p>
<p>This is an odd one, because there are already several power alert levels and this has been working well for us for some time. Nevertheless, enough people have voted for this idea that there must be something behind it, perhaps a bug that only affects certain systems. Martin Pitt, technical lead of the Ubuntu desktop team, has <a href="http://brainstorm.ubuntu.com/idea/28037/">responded</a> directly to the Brainstorm idea with a description of the current system and how to file a bug when it does not work as intended.</p>man-db 2.6.02011-04-09T20:45:17+01:002011-04-09T20:45:17+01:00Colin Watsontag:www.chiark.greenend.org.uk,2011-04-09:/~cjwatson/blog/man-db-2.6.0.html<p>I’ve released man-db 2.6.0
(<a href="http://lists.nongnu.org/archive/html/man-db-announce/2011-04/msg00000.html">announcement</a>,
<a href="http://git.savannah.gnu.org/cgit/man-db.git/tree/NEWS?id=2.6.0"><span class="caps">NEWS</span></a>,
<a href="http://git.savannah.gnu.org/cgit/man-db.git/tree/ChangeLog?id=2.6.0">ChangeLog</a>),
and uploaded it to Debian unstable. Ubuntu is rapidly approaching beta
freeze so I’m not going to try to cram this into 11.04; it’ll be in 11.10.</p>Wubi bug 6936712011-03-14T12:56:57+00:002011-03-15T10:12:09+00:00Colin Watsontag:www.chiark.greenend.org.uk,2011-03-14:/~cjwatson/blog/wubi-bug-693671.html<p>I spent most of last week working on <a href="https://bugs.launchpad.net/bugs/693671">Ubuntu bug
693671</a> (“wubi install will not boot
- phase 2 stops with: Try (hd0,0): <span class="caps">NTFS5</span>”), which was quite a challenge to
debug since it involved digging into parts of the Wubi boot process I’d
never really touched before. Since I …</p><p>I spent most of last week working on <a href="https://bugs.launchpad.net/bugs/693671">Ubuntu bug
693671</a> (“wubi install will not boot
- phase 2 stops with: Try (hd0,0): <span class="caps">NTFS5</span>”), which was quite a challenge to
debug since it involved digging into parts of the Wubi boot process I’d
never really touched before. Since I don’t think much of this is very
well-documented, I’d like to spend a bit of time explaining what was
involved, in the hope that it will help other developers in the future.</p>
<p><a href="http://en.wikipedia.org/wiki/Wubi_%28Ubuntu_installer%29">Wubi</a> is a system
for installing Ubuntu into a file in a Windows filesystem, so that it
doesn’t require separate partitions and can be uninstalled like any other
Windows application. The purpose of this is to make it easy for Windows
users to try out Ubuntu without the need to worry about repartitioning,
before they commit to a full installation. Wubi started out as an external
project, and initially patched the installer on the fly to do all the rather
unconventional things it needed to do; we integrated it into Ubuntu 8.04
<span class="caps">LTS</span>, which involved turning these patches into proper installer facilities
that could be accessed using preseeding, so that Wubi only needs to handle
the Windows user interface and other Windows-specific tasks.</p>
<p>Anyone familiar with a <span class="caps">GNU</span>/Linux system’s boot process will immediately see
that this isn’t as simple as it sounds. Of course,
<a href="http://www.tuxera.com/community/ntfs-3g-download/">ntfs-3g</a> is a pretty
solid piece of software so we can handle the Windows filesystem without too
much trouble, and loopback mounts are well-understood so we can just have
the initramfs loop-mount the root filesystem. Where are you going to get
the kernel and initramfs from, though? Well, we used to copy them out to
the <span class="caps">NTFS</span> filesystem so that <span class="caps">GRUB</span> could read them, but this was overly
complicated and error-prone. When we switched to <span class="caps">GRUB</span> 2, we could instead
use its built-in loopback facilities, and we were able to simplify this. So
all was more or less well, except for the elephant in the room. How are you
going to load <span class="caps">GRUB</span>?</p>
<p>In a Wubi installation, <span class="caps">NTLDR</span> (or <span class="caps">BOOTMGR</span> in Windows Vista and newer) still
owns the boot process. Ubuntu is added as a boot menu option using BCDEdit.
You might then think that you can just have the Windows boot loader
chain-load <span class="caps">GRUB</span>. Unfortunately, <span class="caps">NTLDR</span> only loads 16 sectors - 8192 bytes -
from disk. <span class="caps">GRUB</span> won’t fit in that: the smallest core.img you can generate
at the moment is over 18 kilobytes. Thus, you need something that is small
enough to be loaded by <span class="caps">NTLDR</span>, but that is intelligent enough to understand
<span class="caps">NTFS</span> to the point where it can find a particular file in the root directory
of a filesystem, load boot loader code from it, and jump to that. The
answer for this was <a href="http://gna.org/projects/grub4dos/"><span class="caps">GRUB4DOS</span></a>. Most of
<span class="caps">GRUB4DOS</span> is based on <span class="caps">GRUB</span> Legacy, which is not of much interest to us any
more, but it includes an assembly-language program called <span class="caps">GRLDR</span> that
supports doing this very thing for <span class="caps">FAT</span>, <span class="caps">NTFS</span>, and ext2. In Wubi, we build
<span class="caps">GRLDR</span> as <code>wubildr.mbr</code>, and build a specially-configured <span class="caps">GRUB</span> core image as
<code>wubildr</code>.</p>
<p>Now, the messages shown in the bug report suggested a failure either within
<span class="caps">GRLDR</span> or very early in <span class="caps">GRUB</span>. The first thing I did was to remember that
<span class="caps">GRLDR</span> has been integrated into the grub-extras <code>ntldr-img</code> module suitable
for use with <span class="caps">GRUB</span> 2, so I tried building <code>wubildr.mbr</code> from that; no change,
but this gave me a modern baseline to work on. <span class="caps">OK</span>; now to try <span class="caps">QEMU</span> (you can
use tricks like <code>qemu -hda /dev/sda</code> if you’re very careful not to do
anything that might involve writing to the host filesystem from within the
guest, such as recursively booting your host <span class="caps">OS</span> … [<strong>update:</strong> Tollef Fog
Heen and Zygmunt Krynicki both point out that you can use the <code>-snapshot</code>
option to make this safer]). No go; it hung somewhere in the middle of
<span class="caps">NTLDR</span>. Still, I could at least insert debug statements, copy the built
<code>wubildr.mbr</code> over to my test machine, and reboot for each test, although it
would be slow and tedious. Couldn’t I?</p>
<p>Well, yes, I mostly could, but that 8192-byte limit came back to bite me,
along with an internal 2048-byte limit that <span class="caps">GRLDR</span> allocates for its <span class="caps">NTFS</span>
bootstrap code. There were only a few spare bytes. Something like this
would more or less fit, to print a single mark character at various points
so that I could see how far it was getting:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="nf">pushal</span>
<span class="w"> </span><span class="nf">xorw</span><span class="w"> </span><span class="nv">%bx</span><span class="p">,</span><span class="w"> </span><span class="nv">%bx</span><span class="w"> </span><span class="cm">/* video page 0 */</span>
<span class="w"> </span><span class="nf">movw</span><span class="w"> </span><span class="no">$0x0e4d</span><span class="p">,</span><span class="w"> </span><span class="nv">%ax</span><span class="w"> </span><span class="cm">/* print 'M' */</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="no">$0x10</span>
<span class="w"> </span><span class="nf">popal</span>
</code></pre></div>
<p>In a few places, if I removed some code I didn’t need on my test machine
(say, <span class="caps">CHS</span> compatibility), I could even fit in cheap and nasty code to print
a single register in hex (as long as you didn’t mind ‘A’ to ‘F’ actually
being ‘:’ to ‘?’ in <span class="caps">ASCII</span>; and note that this is real-mode code, so the loop
counter is <code>%cx</code> not <code>%ecx</code>):</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="cm">/* print %edx in dumbed-down hex */</span>
<span class="w"> </span><span class="nf">pushal</span>
<span class="w"> </span><span class="nf">xorw</span><span class="w"> </span><span class="nv">%bx</span><span class="p">,</span><span class="w"> </span><span class="nv">%bx</span>
<span class="w"> </span><span class="nf">movb</span><span class="w"> </span><span class="no">$0xe</span><span class="p">,</span><span class="w"> </span><span class="nv">%ah</span>
<span class="w"> </span><span class="nf">movw</span><span class="w"> </span><span class="no">$8</span><span class="p">,</span><span class="w"> </span><span class="nv">%cx</span>
<span class="err">1:</span>
<span class="w"> </span><span class="nf">roll</span><span class="w"> </span><span class="no">$4</span><span class="p">,</span><span class="w"> </span><span class="nv">%edx</span>
<span class="w"> </span><span class="nf">movb</span><span class="w"> </span><span class="nv">%dl</span><span class="p">,</span><span class="w"> </span><span class="nv">%al</span>
<span class="w"> </span><span class="nf">andb</span><span class="w"> </span><span class="no">$0xf</span><span class="p">,</span><span class="w"> </span><span class="nv">%al</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="no">$0x10</span>
<span class="w"> </span><span class="nf">loop</span><span class="w"> </span><span class="mi">1</span><span class="no">b</span>
<span class="w"> </span><span class="nf">popal</span>
</code></pre></div>
<p>After a considerable amount of work tracking down problems by bisection like
this, I also observed that <span class="caps">GRLDR</span>’s <span class="caps">NTFS</span> code bears quite a bit of
resemblance in its logical flow to <span class="caps">GRUB</span> 2’s <span class="caps">NTFS</span> module, and indeed the same
person wrote much of both. Since I knew that the latter worked, I could use
it to relieve my brain of trying to understand assembly code logic directly,
and could compare the two to look for discrepancies. I did find a few of
these, and corrected a simple one. Testing at this point suggested that the
boot process was getting as far as <span class="caps">GRUB</span> but still wasn’t printing anything.
I removed some Ubuntu patches which quieten down <span class="caps">GRUB</span>’s startup: still
nothing - so I switched my attentions to
<a href="http://git.savannah.gnu.org/gitweb/?p=grub.git;a=blob;f=grub-core/kern/i386/pc/startup.S;hb=HEAD">grub-core/kern/i386/pc/startup.S</a>,
which contains the first code executed from <span class="caps">GRUB</span>’s core image. Code before
the first call to <code>real_to_prot</code> (which switches the processor into
protected mode) succeeded, while code after that point failed. Even more
mysteriously, code added to <code>real_to_prot</code> <em>before</em> the actual switch to
protected mode failed too. Now I was clearly getting somewhere interesting,
but what was going on? What I really wanted was to be able to single-step,
or at least see what was at the memory location it was supposed to be
jumping to.</p>
<p>Around this point I was venting on <span class="caps">IRC</span>, and somebody asked if it was
reproducible in <span class="caps">QEMU</span>. Although I’d tried that already, I went back and
tried again. Ubuntu’s <code>qemu</code> is actually built from qemu-kvm, and if I used
<code>qemu -no-kvm</code> then it worked much better. Excellent! Now I could use <span class="caps">GDB</span>:</p>
<div class="highlight"><pre><span></span><code>(gdb) target remote | qemu -gdb stdio -no-kvm -hda /dev/sda
</code></pre></div>
<p>This let me run until the point when <span class="caps">NTLDR</span> was about to hand over control,
then interrupt and set a breakpoint at <code>0x8200</code> (the entry point of
<code>startup.S</code>). This revealed that the address that should have been
<code>real_to_prot</code> was in fact garbage. I set a breakpoint at <code>0x7c00</code> (<span class="caps">GRLDR</span>’s
entry point) and stepped all the way through to ensure it was doing the
right thing. In the process it was helpful to know that <a href="http://sourceware.org/ml/gdb/2009-01/msg00008.html"><span class="caps">GDB</span> and <span class="caps">QEMU</span> don’t
handle real mode very well between
them</a>. Useful tricks
here were:</p>
<ul>
<li>Use <code>set architecture i8086</code> before disassembling real-mode code (and
<code>set architecture i386</code> to switch back).</li>
<li><span class="caps">GDB</span> prints addresses relative to the current segment base, but if you
want to enter an address then you need to calculate a linear address
yourself. For example, breakpoints must be set at <code>(CS << 4) + IP</code>,
rather than just at <code>IP</code>.</li>
</ul>
<p>Single-stepping showed that <span class="caps">GRLDR</span> was loading the entirety of <code>wubildr</code>
correctly and jumping to it. The first instruction it jumped to wasn’t in
<code>startup.S</code>, though, and then I remembered that we prefix the core image
with
<a href="http://git.savannah.gnu.org/gitweb/?p=grub.git;a=blob;f=grub-core/boot/i386/pc/lnxboot.S;hb=edde54e656a3219a6ad5e7118e0212d50af01697">grub-core/boot/i386/pc/lnxboot.S</a>.
Stepping through this required a clear head since it copies itself around
and changes segment registers a few times. The interesting part was at
<code>real_code_2</code>, where it copies a sector of the kernel to the target load
address, and then checks a known offset to find out whether the “kernel” is
in fact <span class="caps">GRUB</span> rather than a Linux kernel. I checked that offset by hand, and
there was the smoking gun. <span class="caps">GRUB</span> recently acquired Reed-Solomon error
correction on its core image, to allow it to recover from other software
writing over sectors in the boot track. This moved the magic number
<code>lnxboot.S</code> was checking somewhat further into the core image, after the
first sector. <code>lnxboot.S</code> couldn’t find it because it hadn’t copied it yet!
A bit of
<a href="http://git.savannah.gnu.org/gitweb/?p=grub.git;a=commitdiff;h=9b43bf396a61b60a0ee4b8a1591634b1120b8906">adjustment</a>
and all was well again.</p>
<p>The lesson for me from all of this has been to try hard to get an
interactive debugger working. Really hard. It’s worth quite a bit of
up-front effort if it saves you from killing neurons stepping through pages
of code by hand. I think the real-mode debugging tricks I picked up should
be useful for working on <span class="caps">GRUB</span> in the future.</p>libpipeline 1.1.0 released2010-12-11T15:47:39+00:002010-12-11T15:50:23+00:00Colin Watsontag:www.chiark.greenend.org.uk,2010-12-11:/~cjwatson/blog/libpipeline-1.1.0-released.html<p>I’ve released <a href="http://libpipeline.nongnu.org/">libpipeline 1.1.0</a>, and
uploaded it to Debian unstable. The changes are mostly just to add a few
occasionally useful interfaces:</p>
<ul>
<li>Add <code>pipecmd_exec</code> to execute a single command, replacing the current
process; this is analogous to <code>execvp</code>.</li>
<li>Add <code>pipecmd_clearenv</code> to clear a command’s environment; this …</li></ul><p>I’ve released <a href="http://libpipeline.nongnu.org/">libpipeline 1.1.0</a>, and
uploaded it to Debian unstable. The changes are mostly just to add a few
occasionally useful interfaces:</p>
<ul>
<li>Add <code>pipecmd_exec</code> to execute a single command, replacing the current
process; this is analogous to <code>execvp</code>.</li>
<li>Add <code>pipecmd_clearenv</code> to clear a command’s environment; this is
analogous to <code>clearenv</code>.</li>
<li>Add <code>pipecmd_get_nargs</code> to get the number of arguments to a command.</li>
</ul>
<p>The shared library actually ends up being a few kilobytes smaller on Debian
than 1.0.0, probably because I tweaked the set of Gnulib modules I’m using.</p>NTP synchronisation problems2010-12-06T12:58:29+00:002010-12-06T13:05:02+00:00Colin Watsontag:www.chiark.greenend.org.uk,2010-12-06:/~cjwatson/blog/ntp-synchronisation-problems.html<p>The Ubuntu Technical Board is currently conducting a review of the top ten
Brainstorm issues users have raised about Ubuntu, and Matt asked me to
investigate and respond to <a href="http://brainstorm.ubuntu.com/idea/25301/">Idea #25301: Keeping the time accurate over the
Internet by default</a>.</p>
<p>My first reaction was “hey, that’s odd - I thought …</p><p>The Ubuntu Technical Board is currently conducting a review of the top ten
Brainstorm issues users have raised about Ubuntu, and Matt asked me to
investigate and respond to <a href="http://brainstorm.ubuntu.com/idea/25301/">Idea #25301: Keeping the time accurate over the
Internet by default</a>.</p>
<p>My first reaction was “hey, that’s odd - I thought we already did that?”.
We install the <code>ntpdate</code> package by default (although it’s <a href="http://www.eecis.udel.edu/~mills/ntp/html/ntpdate.html">deprecated
upstream</a> in favour
of other tools, but that shouldn’t be important here). <code>ntpdate</code> is run
from <code>/etc/network/if-up.d/ntpdate</code>, in other words every time you connect
to a network, which should be acceptably frequent for most people, so it
really ought to Just Work by default. But this is one of the top ten
problems where users have gone to the trouble of proposing solutions on
Brainstorm, so it couldn’t be that simple. What was going on?</p>
<p>I brought up a clean virtual machine with a development version of Natty
(the current Ubuntu development version, which will eventually become
11.04), and had a look in its logs: it was indeed synchronising its time
from <code>ntp.ubuntu.com</code>, and I didn’t think anything in that area had changed
recently. On the other hand, I had occasionally noticed that my own laptop
wasn’t always synchronising its time quite right, but I’d put it down to
local weirdness as my network isn’t always very stable. Maybe this wasn’t
so local after all?</p>
<p>So, I started tracing through the scripts to figure out what was going on.
It turned out that I had an empty <code>/etc/ntp.conf</code> file on my laptop. The
<code>/usr/sbin/ntpdate-debian</code> script assumed that that meant I had a full <span class="caps">NTP</span>
server installed (I don’t), and fetched the list of servers from it; since
the file was empty, it ended up synchronising time from no servers, that is,
not synchronising at all. I removed the file and all was well.</p>
<p>That left the question of where that file came from. It didn’t seem to be
owned by any package; I was pretty sure I hadn’t created it by hand either.
I had a look through some bug reports, and soon found <a href="https://bugs.launchpad.net/bugs/83604">ntpdate
1:4.2.2.p4+dfsg-1ubuntu2 has a flawed configuration
file</a>. It turns out that
<code>time-admin</code> (System -> Administration -> Time and Date) creates an empty
<code>/etc/ntp.conf</code> file if you press the reload button (tooltip: “Synchronise
now”), as part of an attempt to update <span class="caps">NTP</span> configuration. Aha!</p>
<p>Once I knew where the problems were, it was easy to fix them. I’ve uploaded
the following changes, which will be in the 11.04 release:</p>
<ul>
<li>Disregard empty <code>ntp.conf</code> files in <code>ntpdate-debian</code>.</li>
<li>Remove an empty <code>/etc/ntp.conf</code> file on fresh installation of the <code>ntp</code>
package, so that it doesn’t interfere with creating the normal
configuration file.</li>
<li>Don’t create the <span class="caps">NTP</span> configuration file in the <code>time-admin</code> backend if it
doesn’t exist already.</li>
</ul>
<p>I’ve also sent these changes to
<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=606107">Debian</a> and
<a href="https://bugzilla.gnome.org/show_bug.cgi?id=449267"><span class="caps">GNOME</span></a> as appropriate.</p>
<p>There are still a few problems. The “Synchronise now” button doesn’t work
quite right in general
(<a href="https://bugs.launchpad.net/bugs/90524">bug #90524</a>), and if your network
doesn’t allow time synchronisation from <code>ntp.ubuntu.com</code> then you’ll have to
change the value of <code>NTPSERVERS</code> in <code>/etc/default/ntpdate</code>. Furthermore,
the <code>time-admin</code> interface is confusing and makes it seem as though the
default is not to synchronise the time automatically; this interface is
being <a href="https://wiki.ubuntu.com/TimeAndDate">redesigned</a> at the moment, which
should be a good opportunity to make it less confusing, and I will contact
the designers to mention this problem. On the whole, though, I think that
many fewer people should have this kind of problem in Ubuntu 11.04.</p>
<p>It’s always possible that I missed some other problem that breaks automatic
time synchronisation for people. Please do file a bug report if it still
doesn’t work for you in 11.04, or contact me directly (cjwatson at ubuntu.com).</p>man-db on Fedora2010-12-02T14:06:58+00:002010-12-02T14:09:22+00:00Colin Watsontag:www.chiark.greenend.org.uk,2010-12-02:/~cjwatson/blog/man-db-on-fedora.html<p>I just found out by chance that <a href="http://fedoraproject.org/">Fedora</a> 14
switched from their old man package to <a href="http://man-db.nongnu.org/">man-db</a>.
This is great news: it should now be the beginning of the end of the
divergence of man implementations that happened way back in the mid-1990s,
when two different people took John W …</p><p>I just found out by chance that <a href="http://fedoraproject.org/">Fedora</a> 14
switched from their old man package to <a href="http://man-db.nongnu.org/">man-db</a>.
This is great news: it should now be the beginning of the end of the
divergence of man implementations that happened way back in the mid-1990s,
when two different people took John W. Eaton’s man package and developed it
in different directions without being aware of each other’s existence. For
a while it looked as though man-db was stuck on just the Debian family and
openSUSE, but a number of distributions have switched over in the last few
years. As of now, the only remaining major distribution not using man-db is
Gentoo, and they have a <a href="http://bugs.gentoo.org/show_bug.cgi?id=284822">bug for
switching</a> which I think
should be unblocked fairly soon.</p>
<p>In some ways man-db’s package name didn’t help it; people thought that the
main difference was that man-db had a database backend stuck around apropos.
These days, the database is one of the least important parts of man-db as
far as I’m concerned. Other ways in which it’s very significantly superior
to anything man could do without years of equivalent effort include correct
encoding support, robust child process handling, and use of more modern
development facilities (dear catgets: you belong to a previous millennium,
so please go away). I’m glad that Fedora has recognised this.</p>libpipeline 1.0.0 released2010-10-29T21:23:26+01:002010-10-29T21:23:26+01:00Colin Watsontag:www.chiark.greenend.org.uk,2010-10-29:/~cjwatson/blog/libpipeline-released.html<p>In my <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/pipeline-library.html">previous post</a>, I described the
pipeline library from man-db and asked whether people were interested in a
standalone release of it. Several people expressed interest, and so I’ve
now released <a href="http://libpipeline.nongnu.org/">libpipeline</a> version 1.0.0.
It’s in the Debian <span class="caps">NEW</span> queue, and <a href="https://launchpad.net/~cjwatson/+archive/ppa">my
<span class="caps">PPA</span></a> contains packages …</p><p>In my <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/pipeline-library.html">previous post</a>, I described the
pipeline library from man-db and asked whether people were interested in a
standalone release of it. Several people expressed interest, and so I’ve
now released <a href="http://libpipeline.nongnu.org/">libpipeline</a> version 1.0.0.
It’s in the Debian <span class="caps">NEW</span> queue, and <a href="https://launchpad.net/~cjwatson/+archive/ppa">my
<span class="caps">PPA</span></a> contains packages of it
for Ubuntu lucid and maverick.</p>
<p>I gave a lightning talk on this at <span class="caps">UDS</span> in Orlando, and my
<a href="http://libpipeline.nongnu.org/libpipeline-lightning-talk.odp">slides</a> are
available. I hope there’ll be a video at some point which I can link to.</p>
<p>Thanks to Scott James Remnant for code review (some time back), Ian Jackson
for an extensive design review, and Kees Cook and Matthias Klose for helpful conversations.</p>Pipeline library2010-10-03T22:59:11+01:002010-10-03T22:59:11+01:00Colin Watsontag:www.chiark.greenend.org.uk,2010-10-03:/~cjwatson/blog/pipeline-library.html<p>When I took over <a href="http://man-db.nongnu.org/">man-db</a> in 2001, one of the
major problems that became evident after maintaining it for a while was the
way it handled subprocesses. The nature of man and friends means that it
spends a lot of time calling sequences of programs such as <code>zsoelim <
input-file | tbl …</code></p><p>When I took over <a href="http://man-db.nongnu.org/">man-db</a> in 2001, one of the
major problems that became evident after maintaining it for a while was the
way it handled subprocesses. The nature of man and friends means that it
spends a lot of time calling sequences of programs such as <code>zsoelim <
input-file | tbl | nroff -mandoc -Tutf8</code>. Back then, it was using C library
facilities such as <code>system</code> and <code>popen</code> for all this, and I had to deal with
several bugs where those functions were being called with untrusted input as
arguments without properly escaping metacharacters. Of course it was
possible to chase around every such call inserting appropriate escaping
functions, but this was always bound to be error-prone and one of the tasks
that rapidly became important to me was arranging to start subprocesses in a
way that was fundamentally immune to this kind of bug.</p>
<p>In higher-level languages, there are usually standard constructs which are
safer than just passing a command line to the shell. For example, in Perl
you can use <code>system([$command, $arg1, $arg2, ...])</code> to invoke a program with
arguments without the interference of the shell, and <code>perlipc(1)</code> describes
various facilities for connecting them together. In Python, the
<a href="http://docs.python.org/library/subprocess.html">subprocess</a> module allows
you to create pipelines easily and safely (as long as you remember the
<a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/python-sigpipe.html"><span class="caps">SIGPIPE</span> gotcha</a>). C has the <code>fork</code> and
<code>execve</code> primitives, but assembling these to construct full-blown pipelines
correctly is difficult and error-prone, so many programmers don’t bother and
use the simple but unsafe library facilities instead.</p>
<p>I wrote a couple of thousand lines of library code in man-db to address this
problem, loosely and now quite distantly based on code in
<a href="http://www.gnu.org/software/groff/">groff</a>. In the following examples,
function names starting with <code>command_</code>, <code>pipeline_</code>, or <code>decompress_</code> are
real functions in the library, while any other function names are pseudocode.</p>
<p>Constructing the simplified example pipeline from my first paragraph using
this library looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">pipeline</span><span class="w"> </span><span class="o">*</span><span class="n">p</span><span class="p">;</span>
<span class="kt">int</span><span class="w"> </span><span class="n">status</span><span class="p">;</span>
<span class="n">p</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pipeline_new</span><span class="w"> </span><span class="p">();</span>
<span class="n">p</span><span class="o">-></span><span class="n">want_infile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"input-file"</span><span class="p">;</span>
<span class="n">pipeline_command_args</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">,</span><span class="w"> </span><span class="s">"zsoelim"</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span>
<span class="n">pipeline_command_args</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">,</span><span class="w"> </span><span class="s">"tbl"</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span>
<span class="n">pipeline_command_args</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">,</span><span class="w"> </span><span class="s">"nroff"</span><span class="p">,</span><span class="w"> </span><span class="s">"-mandoc"</span><span class="p">,</span><span class="w"> </span><span class="s">"-Tutf8"</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span>
<span class="n">pipeline_start</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
<span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pipeline_wait</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
<span class="n">pipeline_free</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
</code></pre></div>
<p>You might want to construct a command more dynamically:</p>
<div class="highlight"><pre><span></span><code><span class="n">command</span><span class="w"> </span><span class="o">*</span><span class="n">manconv</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">command_new_args</span><span class="w"> </span><span class="p">(</span><span class="s">"manconv"</span><span class="p">,</span><span class="w"> </span><span class="s">"-f"</span><span class="p">,</span><span class="w"> </span><span class="n">from_code</span><span class="p">,</span>
<span class="w"> </span><span class="s">"-t"</span><span class="p">,</span><span class="w"> </span><span class="s">"UTF-8"</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span>
<span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">quiet</span><span class="p">)</span>
<span class="w"> </span><span class="n">command_arg</span><span class="w"> </span><span class="p">(</span><span class="n">manconv</span><span class="p">,</span><span class="w"> </span><span class="s">"-q"</span><span class="p">);</span>
<span class="n">pipeline_command</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">,</span><span class="w"> </span><span class="n">manconv</span><span class="p">);</span>
</code></pre></div>
<p>Perhaps you want an environment variable set only while running a certain command:</p>
<div class="highlight"><pre><span></span><code><span class="n">command</span><span class="w"> </span><span class="o">*</span><span class="n">less</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">command_new</span><span class="w"> </span><span class="p">(</span><span class="s">"less"</span><span class="p">);</span>
<span class="n">command_setenv</span><span class="w"> </span><span class="p">(</span><span class="n">less</span><span class="p">,</span><span class="w"> </span><span class="s">"LESSCHARSET"</span><span class="p">,</span><span class="w"> </span><span class="n">lesscharset</span><span class="p">);</span>
</code></pre></div>
<p>You might find yourself needing to pass the output of one pipeline to
several other pipelines, in a “tee” arrangement:</p>
<div class="highlight"><pre><span></span><code><span class="n">pipeline</span><span class="w"> </span><span class="o">*</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">sink1</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">sink2</span><span class="p">;</span>
<span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">make_source</span><span class="w"> </span><span class="p">();</span>
<span class="n">sink1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">make_sink1</span><span class="w"> </span><span class="p">();</span>
<span class="n">sink2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">make_sink2</span><span class="w"> </span><span class="p">();</span>
<span class="n">pipeline_connect</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">sink1</span><span class="p">,</span><span class="w"> </span><span class="n">sink2</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span>
<span class="cm">/* Pump data among these pipelines until there's nothing left. */</span>
<span class="n">pipeline_pump</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">sink1</span><span class="p">,</span><span class="w"> </span><span class="n">sink2</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span>
<span class="n">pipeline_free</span><span class="w"> </span><span class="p">(</span><span class="n">sink2</span><span class="p">);</span>
<span class="n">pipeline_free</span><span class="w"> </span><span class="p">(</span><span class="n">sink1</span><span class="p">);</span>
<span class="n">pipeline_free</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">);</span>
</code></pre></div>
<p>Maybe one of your commands is actually an in-process function, rather than
an external program:</p>
<div class="highlight"><pre><span></span><code><span class="n">command</span><span class="w"> </span><span class="o">*</span><span class="n">inproc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">command_new_function</span><span class="w"> </span><span class="p">(</span><span class="s">"in-process"</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">func</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span>
<span class="n">pipeline_command</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">,</span><span class="w"> </span><span class="n">inproc</span><span class="p">);</span>
</code></pre></div>
<p>Sometimes your program needs to consume the output of a pipeline, rather
than sending it all to some other subprocess:</p>
<div class="highlight"><pre><span></span><code><span class="n">pipeline</span><span class="w"> </span><span class="o">*</span><span class="n">p</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">make_pipeline</span><span class="w"> </span><span class="p">();</span>
<span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">line</span><span class="p">;</span>
<span class="n">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pipeline_peekline</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
<span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">strstr</span><span class="w"> </span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="s">"coding: UTF-8"</span><span class="p">))</span>
<span class="w"> </span><span class="n">printf</span><span class="w"> </span><span class="p">(</span><span class="s">"Unicode text follows:</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pipeline_readline</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">))</span>
<span class="w"> </span><span class="n">printf</span><span class="w"> </span><span class="p">(</span><span class="s">" %s"</span><span class="p">,</span><span class="w"> </span><span class="n">line</span><span class="p">);</span>
<span class="n">pipeline_free</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
</code></pre></div>
<p>man-db deals with compressed files a lot, so I wrote an add-on library for
opening compressed files (which is somewhat man-db-specific, but the
implementation wasn’t difficult given the underlying library):</p>
<div class="highlight"><pre><span></span><code><span class="n">pipeline</span><span class="w"> </span><span class="o">*</span><span class="n">decomp_file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">decompress_open</span><span class="w"> </span><span class="p">(</span><span class="n">compressed_filename</span><span class="p">);</span>
<span class="n">pipeline</span><span class="w"> </span><span class="o">*</span><span class="n">decomp_stdin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">decompress_fdopen</span><span class="w"> </span><span class="p">(</span><span class="n">fileno</span><span class="w"> </span><span class="p">(</span><span class="n">stdin</span><span class="p">));</span>
</code></pre></div>
<p>This library has been in production in man-db for over five years now. The
very careful signal handling code has been reviewed independently and the
whole thing has been run through multiple static analysis tools, although I
would always welcome more review; in particular I have no idea what it would
take to make it safe for use in threaded programs since I generally avoid
threading wherever possible. There have been a handful of bugs, which I’ve
fixed promptly, and I’ve added various new features to support particular
requirements of man-db (though in as general a way as possible). Every so
often I see somebody asking about subprocess handling in C, and I wonder if
I should split this library out into a standalone package so that it can be
used elsewhere. Web searches for things like “pipeline library” and
“libpipeline” don’t reveal anything that’s a particularly close match for
what I have. The licensing would be GPLv2 or later; this isn’t likely to be
negotiable since some of the original code wasn’t mine and in any case I
don’t feel particularly bad about <a href="http://www.gnu.org/licenses/why-not-lgpl.html">giving an advantage to GPLed
programs</a>. For more details
on the interface, the <a href="http://git.savannah.gnu.org/cgit/man-db.git/tree/lib/pipeline.h?id=017a4c1e639d20e85d92ed11786a728913104953">header
file</a>
is well-commented.</p>
<p>Is there enough interest in this to make the effort of producing a separate
library package worthwhile? As well as the general effort of creating a new
package, I’d need to do some work to disentangle it from a few bits and
pieces specific to man-db. If you maintain a specific package that could
use this and you’re interested, please contact me with details, mentioning
any extensions you think you’d need. I intentionally haven’t enabled
comments on my blog for various reasons, but you can e-mail me at cjwatson
at debian.org or man-db-devel at nongnu.org.</p>Windows applications making GRUB 2 unbootable2010-08-28T00:47:21+01:002010-08-28T00:47:21+01:00Colin Watsontag:www.chiark.greenend.org.uk,2010-08-28:/~cjwatson/blog/windows-applications-making-grub2-unbootable.html<p>If you find that running Windows makes a <span class="caps">GRUB</span> 2-based system unbootable
(<a href="http://bugs.debian.org/550702">Debian bug</a>, <a href="https://bugs.launchpad.net/bugs/441941">Ubuntu
bug</a>), then I’d like to hear from
you. This is a bug in which some proprietary Windows-based software
overwrites particular sectors in the gap between the master boot record and
the first partition, sometimes …</p><p>If you find that running Windows makes a <span class="caps">GRUB</span> 2-based system unbootable
(<a href="http://bugs.debian.org/550702">Debian bug</a>, <a href="https://bugs.launchpad.net/bugs/441941">Ubuntu
bug</a>), then I’d like to hear from
you. This is a bug in which some proprietary Windows-based software
overwrites particular sectors in the gap between the master boot record and
the first partition, sometimes called the “embedding area”. <span class="caps">GRUB</span> Legacy and
<span class="caps">GRUB</span> 2 both normally use this part of the disk to store one of their key
components: <span class="caps">GRUB</span> Legacy calls this component Stage 1.5, while <span class="caps">GRUB</span> 2 calls
it the core image
(<a href="http://www.gnu.org/software/grub/manual/grub.html#Images">comparison</a>).
However, Stage 1.5 is less useful than the core image (for example, the
latter provides a rescue shell which can be used to recover from some
problems), and is therefore rather smaller: somewhere around <span class="caps">10KB</span> vs. <span class="caps">24KB</span>
for the common case of ext[234] on plain block devices. It seems that the
Windows-based software writes to a sector which is after the end of Stage
1.5, but before the end of the core image. This is why the problem appears
to be new with <span class="caps">GRUB</span> 2.</p>
<p>At least some occurrences of this are with software which writes a signature
to the embedding area which hangs around even after uninstallation (even
with one of those tools that tracks everything the installation process did
and reverses it, I gather), so that you cannot uninstall and reinstall the
application to defeat a trial period. This seems like a fine example of an
<a href="http://wiki.mako.cc/Antifeatures">antifeature</a>, especially given its
destructive consequences for free software, and is in general a poor piece
of engineering; what happens if multiple such programs want to use the same
sector, I wonder? They clearly aren’t doing much checking that the sector
is unused, not that that’s really possible anyway. While I do not normally
think that <span class="caps">GRUB</span> should go to any great lengths to accommodate proprietary
software, this is a case where we need to defend ourselves against the
predatory practices of some companies making us look bad: a relatively small
number of people do enough detective work to realise that it’s the fault of
a particular Windows application, but many more simply blame our operating
system because it won’t start any more.</p>
<p>I believe that it may be possible to assemble a collection of signatures of
such software, and arrange to avoid the disk sectors they have stolen.
Indeed, I have a first draft of the necessary code. This is not a
particularly pleasant solution, but it seems to be the most practical way
around the problem; I’m hoping that several of the programs at fault are
using common “licence manager” code or something like that, so that we can
address most of the problems with a relatively small number of signatures.
In order to do this, I need to hear from as many people as possible who are
affected by this problem.</p>
<p>If you suffer from this problem, then please do the following:</p>
<ul>
<li>Save the output of <code>fdisk -lu</code> to a file. In this output, take note of
the start sector of the first partition (usually 63, but might also be
2048 on recent installations, or occasionally something else). If this
is something other than 63, then replace 63 in the following items with
your number.</li>
<li>Save the contents of the embedding area to a file (replace <code>/dev/sda</code>
with your disk device if it’s something else): <code>dd if=/dev/sda of=sda.1
count=63</code></li>
<li>Do whatever you do to make <span class="caps">GRUB</span> unbootable (presumably starting Windows),
then boot into a recovery environment. Before you reinstall <span class="caps">GRUB</span>, save
the new contents of the embedding area to a different file: <code>dd
if=/dev/sda of=sda.2 count=63</code></li>
<li>Follow up to either the Debian or the Ubuntu bug with these three files
(the output of <code>fdisk -lu</code>, and the embedding area before and after
making <span class="caps">GRUB</span> unbootable.</li>
</ul>
<p>I hope that this will help me to assemble enough information to fix this bug
at least for most people, and of course if you provide this information then
I can make sure to fix your particular version of this problem. Thanks in advance!</p>debhelper statistics, redux2010-07-10T23:40:20+01:002010-07-10T23:42:45+01:00Colin Watsontag:www.chiark.greenend.org.uk,2010-07-10:/~cjwatson/blog/debhelper-statistics-redux.html<p>Apropos of <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/debhelper-statistics.html">my previous post</a>, I see
that dh has now overtaken <span class="caps">CDBS</span> as the most popular rules helper system of
its kind in Debian unstable, and shows no particular sign of slowing its
rate of uptake any time soon. The resolution of the graph is such that you
can …</p><p>Apropos of <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/debhelper-statistics.html">my previous post</a>, I see
that dh has now overtaken <span class="caps">CDBS</span> as the most popular rules helper system of
its kind in Debian unstable, and shows no particular sign of slowing its
rate of uptake any time soon. The resolution of the graph is such that you
can’t see it yet, but dh drew dead level with <span class="caps">CDBS</span> on Thursday, and today
3836 packages are using dh as opposed to 3823 using <span class="caps">CDBS</span>.</p>
<p><img alt="debhelper statistics" src="http://people.debian.org/~cjwatson/dhstats.png"></p>GRUB 2: With luck …2010-07-02T22:27:35+01:002010-07-02T22:27:35+01:00Colin Watsontag:www.chiark.greenend.org.uk,2010-07-02:/~cjwatson/blog/grub2-with-luck.html<p>… this version, or something not too far away from it, might actually
stand a chance of getting into testing.</p>
<p>I’ve just uploaded grub2 1.98+20100702-1. The most significant set of
changes in this release is that it switches <code>/boot/grub/device.map</code> and the
<code>grub-pc/install_devices</code> debconf question …</p><p>… this version, or something not too far away from it, might actually
stand a chance of getting into testing.</p>
<p>I’ve just uploaded grub2 1.98+20100702-1. The most significant set of
changes in this release is that it switches <code>/boot/grub/device.map</code> and the
<code>grub-pc/install_devices</code> debconf question over to stable device names under
<code>/dev/disk/by-id</code> (on Linux kernels). The code implementing this is
reasonably careful, and it should make it quite difficult for people to
accidentally fail to upgrade their installed <span class="caps">GRUB</span> core image; I explained
the problems that tends to cause in the <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/grub2-boot-problems.html">previous post in this
series</a>. There will probably be a few
small glitches I need to clear up, but I’ve given this much more extensive
testing than usual so I hope I won’t break too many people’s computers (again).</p>
<p>I did this work first in Ubuntu as one of my major goals for 10.04 <span class="caps">LTS</span>,
which exposed a few problems that I wanted to fix before inflicting it on
Debian as well (fixes for those are now under testing for 10.04.1). Most
significantly, I felt it was necessary to start offering partitions in the
select list for <code>grub-pc/install_devices</code>, but I went a bit overboard and
offered all partitions in a giant list. This seemed like a good idea at the
time, but it tended to confuse people into just selecting everything in the
list, which in particular tended to make Windows unbootable! So I dialled
that back a bit, and in the version I just merged it will only offer the
partitions mounted on <code>/</code>, <code>/boot</code>, and <code>/boot/grub</code> (de-duplicating if
necessary). This seems like a reasonable compromise between confusing
people too much and forcing them to install only to MBRs.</p>
<p>My next priority will be making whatever fixes are necessary to get this
version into testing, since the problems with <code>/dev/mapper</code> symlinks in
testing aren’t getting any less urgent, and this is finally a version that
shouldn’t break for most people due to the kernel’s switch to libata. I
expect that I’ll try to get mdadm 1.x metadata sorted out immediately after that.</p>
<p>Other improvements since my last entry have included:</p>
<ul>
<li>Further documentation work. Thanks to Vladimir Serbinenko (and to Jordan
Uggla for hosting it temporarily), there’s now an <a href="http://www.gnu.org/software/grub/manual/"><span class="caps">HTML</span> version of the
<span class="caps">GRUB</span> manual from trunk</a> online,
which includes new sections on embedded configuration files, the various
<span class="caps">GRUB</span> image files, <code>device.map</code>, and (shortly) a summary of changes from
<span class="caps">GRUB</span> Legacy.</li>
<li>Video improvements: among other things, <span class="caps">UEFI</span> systems whose firmware uses
the Graphics Output Protocol should now work rather better, and <span class="caps">GRUB</span> now
includes specific support for some cards often used with minimal firmware
support under emulation.</li>
<li>A fix to handle large memory maps exposed by some <span class="caps">UEFI</span> firmware.</li>
<li>Automatic configuration support for Fedora 13. You may need <a href="http://packages.qa.debian.org/o/os-prober/news/20100628T171748Z.html">os-prober
1.39</a>
from unstable as well.</li>
<li>Automatic configuration support for Linux on Xen.</li>
<li>Skip <span class="caps">LVM</span> snapshots rather than failing when they’re present.</li>
</ul>GRUB 2 boot problems2010-06-21T11:52:10+01:002010-06-21T11:52:10+01:00Colin Watsontag:www.chiark.greenend.org.uk,2010-06-21:/~cjwatson/blog/grub2-boot-problems.html<p>(This is partly a repost of material I’ve posted to bug reports and to
debian-release, put together with some more detail for a wider audience.)</p>
<p>You could be forgiven for looking at the <span class="caps">RC</span> bug activity on
<a href="http://bugs.debian.org/src:grub2">grub2</a> over the last couple of days and
thinking that it’s …</p><p>(This is partly a repost of material I’ve posted to bug reports and to
debian-release, put together with some more detail for a wider audience.)</p>
<p>You could be forgiven for looking at the <span class="caps">RC</span> bug activity on
<a href="http://bugs.debian.org/src:grub2">grub2</a> over the last couple of days and
thinking that it’s all gone to hell in a handbasket with recent uploads. In
fact, aside from an interesting case which turned out to be due to botched
handling of the <span class="caps">GRUB</span> Legacy to <span class="caps">GRUB</span> 2 chainloading setup (which prompted me
to fix three other <span class="caps">RC</span> bugs along the way), all the recent problems people
have been having have been duplicates of one of these bugs which have
existed essentially forever:</p>
<ul>
<li><a href="http://bugs.debian.org/554790">#554790 - grub-pc/install_devices uses unstable device names</a></li>
<li><a href="http://bugs.debian.org/583271">#583271 - device.map uses unstable device names</a></li>
</ul>
<p>When <span class="caps">GRUB</span> boots, its boot sector first loads its “core image”, which is
usually embedded in the gap between the boot sector and the first partition
on the same disk as the boot sector. This core image then figures out where
to find /boot/grub, and loads grub.cfg from it as well as more <span class="caps">GRUB</span> modules.</p>
<p>The thing that tends to go wrong here is that the core image must be from
the same version of <span class="caps">GRUB</span> as any modules it loads. <code>/boot/grub/*.mod</code> are
updated only by grub-install, so this normally works <span class="caps">OK</span>. However, for
various reasons (deliberate or accidental) some people install <span class="caps">GRUB</span> to
multiple disks. In this case, grub-install might update <code>/boot/grub/*.mod</code>
along with the core image on one disk, but your <span class="caps">BIOS</span> might actually be
booting from a different disk. The effect of this will be that you’ll have
an old core image and new modules, which will probably blow up in any number
of possible ways. Quite often, this problem lies dormant for a while
because <span class="caps">GRUB</span> happens not to change in a way that causes incompatibility
between the core image and modules, but then we get massive spikes of bug
reports any time the interface does change. Since these bugs sometimes bite
people upgrading from testing to unstable, they get interpreted as
regressions from the version in testing even though that isn’t strictly true
(but it tends not to be very productive to argue this line; after all,
people’s computers suddenly don’t boot!). Any problem that causes the core
image to be installed to a disk other than the one actually being booted
from, or not to be installed at all, will show up this way sooner or later.</p>
<p>On 2010-06-10, there was a substantial upstream change to the handling of
list iterators (to reduce core image size and make code clearer and faster)
which introduced an incompatibility between old core images and newer
modules. This caused a bunch of dormant problems to flare up again, and so
there was a flood of reports of booting problems with 1.98+20100614-1 and
newer, often described as “the unaligned pointer bug” due to how it happened
to manifest this time round. In previous cases, <span class="caps">GRUB</span> reported undefined
symbols on boot, but it’s all essentially the same problem even though there
are different symptoms.</p>
<p>The confusing bit when handling bug reports is that not only are there
different symptoms with the same cause, but there are also multiple causes
for the same symptom! This takes a certain amount of untangling, especially
when lots of people have thought “ooh, that bug looks a bit like mine” and
jumped in with their own comments. Working through this was a worthwhile
exercise, as it came up with an entirely new cause for a problem I thought
was fairly well-understood (thanks to debugging assistance from Sedat
Dilek). If you had set up <span class="caps">GRUB</span> 2 to be automatically chainloaded from <span class="caps">GRUB</span>
Legacy (which happens automatically on upgrade from the latter to the
former), never got round to running <code>upgrade-from-grub-legacy</code> once you
confirmed it worked, and then later ran <code>grub-install</code> by hand for one
reason or another, then the core image you installed by hand would never be
updated and would eventually <a href="http://bugs.debian.org/586143">fall over</a> the
next time the core/modules interface changed. Fixing future cases of this
was easy enough, but fixing existing cases involved figuring out how to
detect whether an installed <span class="caps">GRUB</span> boot sector came from <span class="caps">GRUB</span> Legacy or <span class="caps">GRUB</span>
2, which isn’t as easy as you might think. Fortunately, it turns out that
there are a limited number of jump offsets that have ever been used in the
second byte of the boot sector, and none of the <span class="caps">GRUB</span> 2 values clash with the
only value ever used in <span class="caps">GRUB</span> Legacy; so, if you still have
<code>/boot/grub/stage2</code> et al on upgrade, we scan all disks for a <span class="caps">GRUB</span> 2 boot
sector, and if we find one then we offer to complete the upgrade to <span class="caps">GRUB</span> 2.</p>
<p>Unless anything new shows up, that just leaves the problems that were
already understood. Today, I posted a <a href="http://lists.gnu.org/archive/html/grub-devel/2010-06/msg00118.html">patch to generate stable device
names in device.map by
default</a>.
If this is accepted, then we can do something or other to fix up device.map
on upgrade, switch over to <code>/dev/disk/by-id</code> names in
<code>grub-pc/install_devices</code> at the same time, and that should take care of the
vast majority of this kind of upgrade bug. I think at that point it should
be feasible to get a new version into testing, and we should be down from 18
<span class="caps">RC</span> bugs towards the end of last month to around 6. We can then start
attacking things like the lack of support for mdadm 1.x metadata.</p>
<p>Since my <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/hacking-on-grub2.html">last blog entry on <span class="caps">GRUB</span> 2</a>,
improvements have included:</p>
<ul>
<li>Substantial work on <code>info grub</code>, with, among other things, new sections
on <code>/etc/default/grub</code> and on configuring authentication.</li>
<li>A workaround for <span class="caps">GRUB</span>’s inability to probe dm-crypt devices, thanks to
Marc Haber.</li>
<li>Several build fixes for architectures I wasn’t testing, and a fix for
broken nested partition handling on Debian <span class="caps">GNU</span>/kFreeBSD. I’m now testing
<span class="caps">GNU</span>/kFreeBSD locally.</li>
<li>Rather less cruft in <code>fs.lst</code>, <code>partmap.lst</code>, and <code>video.lst</code>, which
should speed up booting a bit by e.g. avoiding unnecessary filesystem probing.</li>
<li><code>upgrade-from-grub-legacy</code> actually now installs <span class="caps">GRUB</span> 2 to the boot
sector (!).</li>
<li>Ask for confirmation if <code>grub-pc/install_devices</code> is left empty.</li>
</ul>
<p>The next upstream snapshot will bring several improvements to <span class="caps">EFI</span> video
support, mainly thanks to Vladimir Serbinenko. I’ve been working on making
<code>grub-install</code> actually work on <span class="caps">UEFI</span> systems as one of my goals for the next
Ubuntu release, and I hope to get this landed in the not-too-distant future.</p>Hacking on grub22010-06-04T22:57:07+01:002010-06-04T23:00:54+01:00Colin Watsontag:www.chiark.greenend.org.uk,2010-06-04:/~cjwatson/blog/hacking-on-grub2.html<p>Various people observed in a <a href="http://lists.debian.org/debian-devel/2010/05/msg00769.html">long thread on
debian-devel</a>
that the grub2 package was in a bit of a mess in terms of its
release-critical bug count, and <a href="http://oskuro.net/blog">Jordi</a> and
<a href="http://upsilon.cc/~zack/blog/planet-debian/">Stefano</a> both got in touch
with me directly to gently point out that I probably ought to be doing
something …</p><p>Various people observed in a <a href="http://lists.debian.org/debian-devel/2010/05/msg00769.html">long thread on
debian-devel</a>
that the grub2 package was in a bit of a mess in terms of its
release-critical bug count, and <a href="http://oskuro.net/blog">Jordi</a> and
<a href="http://upsilon.cc/~zack/blog/planet-debian/">Stefano</a> both got in touch
with me directly to gently point out that I probably ought to be doing
something about it as one of the co-maintainers.</p>
<p>Actually, I don’t think grub2 was in quite as bad a state as its 18 <span class="caps">RC</span> bugs
suggested. Of course every boot loader failure is critical to the person
affected by it, not to mention that <span class="caps">GRUB</span> 2 offers more complex functionality
than any other boot loader (e.g. <span class="caps">LVM</span> and <span class="caps">RAID</span>), and so it tends to
accumulate <span class="caps">RC</span> bugs at rather a high rate. That said, we’d been neglecting
its bug list for some time; <a href="http://robertmh.wordpress.com/">Robert</a> and
Felix have both been taking some time off, Jordi mostly only cared about
PowerPC and can’t do that any more due to hardware failure, and I hadn’t
been able to pick up the slack.</p>
<p>Most of my projects at <a href="http://www.ubuntu.com/">work</a> for the next while
involve <span class="caps">GRUB</span> in one way or another, so I decided it was a perfectly
reasonable use of work time to do something about this; I was going to need
fully up-to-date snapshots anyway, and practically all the Debian grub2 bugs
affect Ubuntu too. Thus, with the exception of some other little things
like releasing the first Maverick alpha, I’ve spent pretty much the last
week and a half solidly trying to get the grub2 package back into shape,
with four uploads so far.</p>
<p>The <span class="caps">RC</span> issues that remain are:</p>
<ul>
<li>
<p><code>upgrade-from-grub-legacy</code> problems
(<a href="http://bugs.debian.org/547944">#547944</a>,
<a href="http://bugs.debian.org/550477">#550477</a>):</p>
<p>I think this has just been traditionally undertested. I’m setting up a
<span class="caps">KVM</span> image now with <span class="caps">GRUB</span> Legacy which I can snapshot just before and
after running <code>upgrade-from-grub-legacy</code>, and I should be able to unpick
the bugs this way.</p>
</li>
<li>
<p><span class="caps">LVM</span> snapshots break <span class="caps">GRUB</span>’s <span class="caps">LVM</span> module
(<a href="http://bugs.debian.org/574863">#574863</a>):</p>
<p><a href="http://www.seanius.net/feeds/planet-debian/">Sean</a> has been working on
this and seems to be nearly there. Yay.</p>
</li>
<li>
<p><span class="caps">RAID</span> metadata version 1.x not supported
(<a href="http://bugs.debian.org/492897">#492897</a>):</p>
<p>This became rather more of an issue recently since <code>mdadm</code> switched its
default from the old 0.90 format which <span class="caps">GRUB</span> understood. Felix put
together a branch implementing the hard parts of this a while back, and
I’ve been trying to finish it off. The hard bit is dealing with device
naming, especially as the new-format and rather more useful names under
<code>/dev/md/</code> don’t show up during
<a href="http://www.debian.org/devel/debian-installer">d-i</a> after creating <span class="caps">RAID</span>
volumes; I think this is because we always create them as <code>/dev/md0</code>
etc. It’s looking tractable, though.</p>
</li>
<li>
<p>Another odd problem probing <span class="caps">RAID</span>
(<a href="http://bugs.debian.org/548648">#548648</a>):</p>
<p>Not sure about this one, and I’ll need to work with Josip on it as soon
as I get a chance.</p>
</li>
<li>
<p>Stable device naming <a href="http://bugs.debian.org/554790">#554790</a>) and
consequential problems due to <code>grub-install</code> not being properly run
(<a href="http://bugs.debian.org/557425">#557425</a> and many other sub-<span class="caps">RC</span> bugs):</p>
<p>Ubuntu’s been carrying a patch to rearrange device presentation in the
postinst, which Robert OKed in principle ages ago and so I’ve been
intending to merge it for a while, but there are a few known problems
with it that I need to fix first. One known unfixable problem is that
it will have to ask some people which devices they want <span class="caps">GRUB</span> to be
installed on, even if they’d answered that question before: this will be
one-time, and it’s because it recorded the answer using unstable device
names and so has in some sense forgotten. Simple cases (e.g.
single-disk) can be handled without needing to ask again, though.</p>
</li>
<li>
<p>Alignment errors on <span class="caps">SPARC</span> (<a href="http://bugs.debian.org/560823">#560823</a>):</p>
<p>I have no idea what’s going on here, I’m afraid. I’ll try to trace it,
but may have to downgrade it at some point since after all we don’t
install <span class="caps">GRUB</span> by default on <span class="caps">SPARC</span> yet.</p>
</li>
<li>
<p>Fonts not shown in gfxmenu (<a href="http://bugs.debian.org/564844">#564844</a>):</p>
<p>Apparently fixed upstream, but I couldn’t find the responsible commit so
I want to make sure I can get gfxmenu working before closing this.</p>
</li>
<li>
<p>Sensitivity to out-of-date <code>device.map</code> files
(<a href="http://bugs.debian.org/575076">#575076</a> and other sub-<span class="caps">RC</span> bugs):</p>
<p>We’re trying to get rid of <code>device.map</code> in general. It was fine in the
1990s but it’s hopeless now. Unfortunately there are still a small
number of problems with running entirely without one, and one of my
patches to help is controversial upstream, so we probably won’t get to
that for squeeze. In the meantime we’ll probably just need some extra
sanity-checking and robustness in the event that there’s an incorrect or
out-of-date <code>device.map</code> lying around, which we may just be able to do
in the maintainer scripts or something if necessary.</p>
</li>
<li>
<p>Seriously weird failures to load initramfs
(<a href="http://bugs.debian.org/582342">#582342</a>):</p>
<p>If anyone can produce a reproduction recipe for this, that would really
help me out. There are too many reports to discount as user error, but
I haven’t seen this myself yet.</p>
</li>
<li>
<p>Build failure on sparc (unfiled):</p>
<p>We’ve been discussing this upstream, but for the time being I’m just
going to stop building <code>grub-emu</code> on sparc as a workaround.</p>
</li>
</ul>
<p>If we can fix that lot, or even just the ones that are reasonably
well-understood, I think we’ll be in reasonable shape. I’d also like to
make <code>grub-mkconfig</code> a bit more robust in the event that the root filesystem
isn’t one that <span class="caps">GRUB</span> understands (<a href="http://bugs.debian.org/561855">#561855</a>,
<a href="http://bugs.debian.org/562672">#562672</a>), and I’d quite like to write some
more documentation.</p>
<p>On the upside, progress has been good. We have multiple terminal support
thanks to a new upstream snapshot
(<a href="http://bugs.debian.org/506707">#506707</a>), <code>update-grub</code> runs much faster
(<a href="http://bugs.debian.org/508834">#508834</a>,
<a href="http://bugs.debian.org/574088">#574088</a>), we have <span class="caps">DM</span>-<span class="caps">RAID</span> support with a
following wind (<a href="http://bugs.debian.org/579919">#579919</a>), the new scheme
with symlinks under <code>/dev/mapper/</code> works
(<a href="http://bugs.debian.org/550704">#550704</a>), we have basic support for btrfs
<code>/</code> as long as you have something <span class="caps">GRUB</span> understands properly on <code>/boot</code>
(<a href="http://bugs.debian.org/540786">#540786</a>), we have full info documentation
covering all the user-adjustable settings in <code>/etc/default/grub</code>, and a host
of other smaller fixes. I’m hoping we can keep this up.</p>
<p>If you’d like to help, contact me, especially if there’s something
particular that isn’t being handled that you think you could work on. <span class="caps">GRUB</span>
2 is actually quite a pleasant codebase to work on once you get used to its
layout; it’s certainly much easier to fix bugs in than <span class="caps">GRUB</span> Legacy ever was,
as far as I’m concerned. Thanks to tools like <code>grub-probe</code> and
<code>grub-fstest</code>, it’s very often possible to fix problems without needing to
reboot for anything other than a final sanity check (although <span class="caps">KVM</span> certainly
helps), and you can often debug very substantial bits of the boot loader -
the bits that actually go wrong - using standard tools such as <code>strace</code> and
<code>gdb</code>. Upstream is helpful and I’ve been able to get many of the problems
above fixed directly there. If you have a sound knowledge of C and a decent
level of understanding of the environment a boot loader needs to operate in
- or for that matter specialist knowledge of interesting device types - then
you should be able to find something to do.</p>OpenSSH 5.5p1 for Lucid2010-05-10T10:29:51+02:002010-05-10T10:29:51+02:00Colin Watsontag:www.chiark.greenend.org.uk,2010-05-10:/~cjwatson/blog/openssh-5.5p1-for-lucid.html<p>For various reasons, I chose to leave Ubuntu 10.04 <span class="caps">LTS</span> using OpenSSH 5.3p1.
The <a href="http://www.openssh.org/txt/release-5.4">new features in 5.4p1</a> such as
certificate authentication, the new smartcard handling, netcat mode, and
tab-completion in sftp are great, but unfortunately it was available just a
little bit too late for me …</p><p>For various reasons, I chose to leave Ubuntu 10.04 <span class="caps">LTS</span> using OpenSSH 5.3p1.
The <a href="http://www.openssh.org/txt/release-5.4">new features in 5.4p1</a> such as
certificate authentication, the new smartcard handling, netcat mode, and
tab-completion in sftp are great, but unfortunately it was available just a
little bit too late for me to be able to land it for 10.04 <span class="caps">LTS</span>. I realise
that many Lucid users want to make use of these features for one reason or
another, though, so as a compromise here’s a <span class="caps">PPA</span> containing <a href="https://launchpad.net/~cjwatson/+archive/openssh">OpenSSH 5.5p1
for Lucid</a>.</p>
<p>I intend to keep this up to date for as long as I reasonably can, and I’m
happy to accept bug reports on it in the <a href="https://bugs.launchpad.net/ubuntu/+source/openssh">usual
place</a>.</p>Thoughts on 3.0 (quilt) format2010-03-25T23:45:28+00:002010-03-26T01:00:06+00:00Colin Watsontag:www.chiark.greenend.org.uk,2010-03-25:/~cjwatson/blog/thoughts-on-3.0-quilt-format.html<p>Note: I wrote most of this before <a href="http://www.linux.codehelp.co.uk/serendipity/index.php?/archives/201-lintian,-source-format-3.0-and-blog-comments.html">Neil Williams’ recent comments on the 3.0
family of
formats</a>,
so despite the timing this isn’t really a reaction to that although I do
have a couple of responses. On the whole I think I agree that the Lintian
message is …</p><p>Note: I wrote most of this before <a href="http://www.linux.codehelp.co.uk/serendipity/index.php?/archives/201-lintian,-source-format-3.0-and-blog-comments.html">Neil Williams’ recent comments on the 3.0
family of
formats</a>,
so despite the timing this isn’t really a reaction to that although I do
have a couple of responses. On the whole I think I agree that the Lintian
message is a bit heavy-handed and I’m not sure I’m thrilled about the idea
of the default source format being changed (though I can see why the dpkg
maintainers are interested in that). That said, as far as I personally am
concerned, there is a vast cognitive benefit to me in having as much as
possible be common to all my packages. Once I have more than a couple of
packages that require patching and benefit from the <code>3.0 (quilt)</code> format as
a result, I find it in my interest to use it for all my non-native packages
even if they’re patchless right now, so that for instance if they need
patches in the future I can handle them the same way. It’s not unheard of
for me to apply temporary patches even to packages I actively maintain
upstream, so I don’t discount those either. I haven’t decided what to do
with my native packages yet; unless they’re big enough for bzip2 compression
to be worthwhile, there doesn’t seem to be much immediate advantage to <code>3.0
(native)</code>.</p>
<p>Anyway, on to the main body of this post:</p>
<p>I’ve been one of the holdouts resisting use of patch systems for a long
time, on the basis that I felt strongly that <code>dpkg-source -x</code> ought to give
you the source that’s actually built, rather than having to mess around with
<code>debian/rules</code> targets in order to see it. Now that the <code>3.0 (quilt)</code>
format is available to fix this bug, I felt that I ought to revisit my
resistance and start trying to use it. Migrating to it from monolithic
diffs is of course a bit more work than migrating to it from other patch
systems, so it’s taken me a little while to get round to it. I’d been
thinking about holding off until there was better integration with revision
control (e.g. bzr looms), as I feel that patch files really ought to be an
export format, but I eventually decided that I shouldn’t let the perfect be
the enemy of the good. I have enough experience with co-maintaining
packages that use build-time patch systems to be able to compare my reactions.</p>
<p>After experimenting with a couple of small packages, I moved over to the
deep end and <a href="http://packages.qa.debian.org/o/openssh/news/20100228T035004Z.html">converted
openssh</a>
a few weekends ago, since quite a few people have requested over the years
that the Debian changes to openssh be easier to audit. This was a
substantial job - over 6000 lines of upstream patches - but not actually as
much work as I expected. I took a fairly simplistic approach: first, I
unapplied all the upstream patches from my tree; then I ran <code>bzr di |
interdiff -q /dev/stdin /dev/null >x</code>, reduced it to a single
logically-discrete patch, applied it to a new quilt patch using <code>quilt
fold</code>, and repeated until <code>x</code> was empty. This was maybe an hour or two of
work, and then I went through and tagged all the patches according to
<a href="http://dep.debian.net/deps/dep3/"><span class="caps">DEP</span>-3</a>, which took another few hours.
After the first pass, I ended up with 38 patches and a much clearer idea of
what has been forwarded upstream and what hasn’t; I currently have 5 patches
to forward or eliminate, down from 18.</p>
<p>Good things:</p>
<ul>
<li>I don’t lose any of my history. Since all the patches remain applied to
the tree in revision control (this is what <code>dpkg-source -x</code> gives you, so
it’s the natural representation in revision control too), <code>bzr blame</code>
works just as you’d expect and displays both upstream and Debian changes
at once. I rely on tools like blame a lot, and I really hate the way
build-time patch systems make it hard to use revision control when the
tree is in a built state, so this was a hard requirement for me.</li>
<li>I’ve used patch tagging before, so I was expecting some benefits, but
viscerally I feel much more in <em>control</em>. It’s so much less laborious
now to see what I need to do by way of forwarding. I don’t regret
waiting for 3.0 (quilt) to become available, but I hadn’t realised quite
how much I was being held back beforehand.</li>
<li>Adding new patches is pretty natural, much more so than with build-time
patch systems. You can create and apply the patch, test-build, and
commit when it works. I much prefer this over having to clean the tree
before committing (or commit just part of the tree, which is
error-prone). The more that committing to a Debian package feels like
committing to an upstream project, the better.</li>
<li>There’s definitely something to be said for
<a href="http://patch-tracker.debian.org/package/openssh">patch-tracker</a> being
more useful. It deals with <span class="caps">DEP</span>-3 to the extent of linkifying URLs,
although it might be nice if patch descriptions were displayed on the
overview page for each version.</li>
</ul>
<p>Bad things:</p>
<ul>
<li>It’s a bit awkward to set things up when checking out from revision
control; I didn’t really want to check in the <code>.pc</code> directory, and the
tree checks out in the patched state (as it should), so I needed some way
for developers to get quilt working easily after a checkout. This is
sort of the reverse of the previous problem, where users had to do
something special after <code>dpkg-source -x</code>, and I consider it less serious
so I’m willing to put up with it. I ended up with <a href="http://bugs.debian.org/572204">a rune in
debian/rules that ought to live somewhere more
common</a>.</li>
<li>Everything ends up represented twice in revision control: the patch
files, plus the changes to the patched files themselves. I’m <span class="caps">OK</span> with
this although it is a little inelegant.</li>
<li>Although I haven’t had to do it yet, I expect that merging new upstream
releases will be a bit harder. bzr will deal with resolving conflicts in
the patched files themselves, and that’s why I use a revision control
system after all, but then I’ll have to go and refresh all the patches
and will probably end up doing some of the same conflict resolution a
second time. I think the best answer right now is to <code>quilt pop -a</code>,
force a merge despite the modified working tree, and then <code>quilt push &&
quilt refresh -pab</code> until I get back to the top of the stack, modulo
slight fiddliness when a patch disappears entirely; thus effectively
using quilt’s conflict resolution rather than bzr’s. I suppose this will
serve as additional incentive to reduce my patch count. I know that
people have been working on making this work nicely with topgit, although
I’m certainly not going to put up with the rest of git due to that; I’m
happy to wait for looms to become usable and integrated. :-)</li>
<li>It would be nice if there were some standard <span class="caps">DEP</span>-3 way to note that a
patch has been accepted or rejected upstream, beyond just putting it in
the description. In particular, it seems to me that listing patches
accepted upstream could be used to speed up the process of merging new
upstream releases.</li>
</ul>
<p>On the whole I’m satisfied with this, and the benefits definitely outweigh
the costs. Thanks to the dpkg team for all their work on this!</p>parted 2.2 transition2010-03-22T01:12:48+00:002010-03-22T01:12:48+00:00Colin Watsontag:www.chiark.greenend.org.uk,2010-03-22:/~cjwatson/blog/parted-2.2-transition.html<p>I’ve started the <a href="http://lists.debian.org/debian-release/2010/03/msg00121.html">transition of parted 2.2 to
unstable</a>.
This is a major update needed for sensible support of newer hard disks with
alignment requirements different from the archaic cylinder alignment
tradition. I posted to debian-boot with a <a href="http://lists.debian.org/debian-boot/2010/03/msg00420.html">summary of the partman changes
involved</a>.</p>debhelper statistics2010-03-03T00:22:19+00:002010-03-03T00:22:19+00:00Colin Watsontag:www.chiark.greenend.org.uk,2010-03-03:/~cjwatson/blog/debhelper-statistics.html<p>I don’t know if anyone else has been tracking this recently, but a while
back I got curious about the relative proportions of dh(1) and <span class="caps">CDBS</span> in the
archive, and started running some daily analysis on the Lintian lab.
Apologies for my poor graphing abilities, but the graph …</p><p>I don’t know if anyone else has been tracking this recently, but a while
back I got curious about the relative proportions of dh(1) and <span class="caps">CDBS</span> in the
archive, and started running some daily analysis on the Lintian lab.
Apologies for my poor graphing abilities, but the graph is here
(occasionally updated):</p>
<p><img alt="debhelper statistics" src="http://people.debian.org/~cjwatson/dhstats.png"></p>
<p>Although dh is still a bit behind <span class="caps">CDBS</span>, the steady upward trend is quite
striking - it looks set to break 20% soon, up from under 13% in September -
compared with <span class="caps">CDBS</span> which has been sitting within half a percentage point of
25% the whole time.</p>
<p>Incidentally, was that an ftpmaster trying to sign his name in the graph
over Christmas or something? :-)</p>Catching up2010-02-21T20:04:55+00:002010-02-21T20:04:55+00:00Colin Watsontag:www.chiark.greenend.org.uk,2010-02-21:/~cjwatson/blog/catching-up.html<p>I did a bit of catching up on my Debian backlog over the last week or so.
Among the things I got round to:</p>
<ul>
<li>I released man-db 2.5.7. This was mostly an “I’ve been meaning to do
this for ages” kind of thing to reduce the bug …</li></ul><p>I did a bit of catching up on my Debian backlog over the last week or so.
Among the things I got round to:</p>
<ul>
<li>I released man-db 2.5.7. This was mostly an “I’ve been meaning to do
this for ages” kind of thing to reduce the bug list a bit, closing ten
Debian bugs, but there were a few interesting things in there as well,
such as always saving cat pages in <span class="caps">UTF</span>-8 and recoding to the user’s
locale at display time (long overdue), adjusting the search order for
localised manual pages by request of quite a few non-native English
speakers to prefer a page in the right section over a page in the right
language, and a cute gimmick to make things like <code>man /usr/bin/time</code>
display the appropriate manual page rather than the text of the
executable. See the <a href="http://git.savannah.gnu.org/cgit/man-db.git/tree/NEWS"><span class="caps">NEWS</span>
file</a> for more details.</li>
<li>binfmt-support now <a href="http://bugs.debian.org/565109">installs cleanly on non-Linux
systems</a>, even if it doesn’t do anything
useful yet.</li>
<li>I fixed a couple of <a href="http://bugs.debian.org/256226">shell</a>
<a href="http://bugs.debian.org/547750">bugs</a> in groff.</li>
<li>halibut now <a href="http://bugs.debian.org/464821">complies with the Debian Vim
policy</a>, even though I can’t say I
entirely agree with it in this case.</li>
<li>I fixed a <a href="http://lists.debian.org/debian-devel-changes/2010/02/msg02219.html">really odd build failure in
troffcvt</a>.
Yay imake, or something.</li>
<li>All Debian patches to putty are now upstream, or will be once I upload a
new snapshot. Thanks to Simon Tatham and Jacob Nevins.</li>
<li>I did a few bits and pieces of packaging cleanup with an eye on my
<a href="http://qa.debian.org/developer.php"><span class="caps">DDPO</span></a> list, and added some watch
files where they were missing.</li>
<li>Responded to an offer to take over icoutils maintenance.</li>
</ul>
<p>So nothing really earth-shaking, and as ever <a href="http://lists.debian.org/debian-ssh/2010/01/msg00017.html">openssh could use some
attention</a>, but I
feel a bit better about my backlog now. I do still have a <a href="http://bugs.debian.org/564559">critical bug in
makepasswd</a> to fix, and a sponsored upload of
parrot; those are the next two things on my to-do list.</p>Tissue of lies2009-11-13T17:37:36+00:002009-11-13T19:56:01+00:00Colin Watsontag:www.chiark.greenend.org.uk,2009-11-13:/~cjwatson/blog/tissue-of-lies.html<p>In case it isn’t obvious, in <a href="http://ubuman.wordpress.com/2009/11/13/ubuntu-9-10-sp1-coming-in-spring-2010/">“Ubuntu 9.10 <span class="caps">SP1</span> coming in spring
2010”</a>,
“Ubuman” is blatantly lying in attributing a number of statements to me.
None of the text there was written by me, and if you thought any of it was
true then you should probably make …</p><p>In case it isn’t obvious, in <a href="http://ubuman.wordpress.com/2009/11/13/ubuntu-9-10-sp1-coming-in-spring-2010/">“Ubuntu 9.10 <span class="caps">SP1</span> coming in spring
2010”</a>,
“Ubuman” is blatantly lying in attributing a number of statements to me.
None of the text there was written by me, and if you thought any of it was
true then you should probably make sure your troll radar is working
properly. Nice joke, but try harder next time - it doesn’t even look like
my writing style.</p>
<p>(I wouldn’t normally bother to respond, since I’m probably just giving it
more publicity, but apparently one or two people may already have been taken
in by it. One person was sensible enough to write to me and check the facts.)</p>Keysigning bits2009-07-31T11:31:44+00:002009-07-31T11:31:44+00:00Colin Watsontag:www.chiark.greenend.org.uk,2009-07-31:/~cjwatson/blog/keysigning-bits.html<p>If you’re generating one of these shiny new <span class="caps">RSA</span> keys, do please remember to
<a href="http://ekaia.org/blog/2009/05/10/creating-new-gpgkey/">generate an encryption subkey
too</a> if you expect
people to sign it - at least your more obscure UIDs. I’m not going to mail
unencrypted signatures around unless I have some out-of-band knowledge that
the …</p><p>If you’re generating one of these shiny new <span class="caps">RSA</span> keys, do please remember to
<a href="http://ekaia.org/blog/2009/05/10/creating-new-gpgkey/">generate an encryption subkey
too</a> if you expect
people to sign it - at least your more obscure UIDs. I’m not going to mail
unencrypted signatures around unless I have some out-of-band knowledge that
the e-mail address actually belongs to the person I met.</p>
<p>I generated a new 4096-bit <span class="caps">RSA</span> key myself at DebConf (baa!), and have just
published a <a href="https://www.chiark.greenend.org.uk/~cjwatson/key-transition">key transition
document</a>.
Please consider signing my new key if you signed my old one.</p>man-db: ‘man -K’2009-07-14T15:36:45+00:002009-07-14T15:36:45+00:00Colin Watsontag:www.chiark.greenend.org.uk,2009-07-14:/~cjwatson/blog/man-db-K.html<p>I recently implemented <code>man -K</code> (full-text search over all manual pages) in
<a href="http://man-db.nongnu.org/">man-db</a>. This was inspired by a similar feature
in Federico Lucifredi’s <a href="http://primates.ximian.com/~flucifredi/man/">man</a>
package (formerly maintained by Andries Brouwer). I think I did a much
better job of it, though. The man package just forks grep for every …</p><p>I recently implemented <code>man -K</code> (full-text search over all manual pages) in
<a href="http://man-db.nongnu.org/">man-db</a>. This was inspired by a similar feature
in Federico Lucifredi’s <a href="http://primates.ximian.com/~flucifredi/man/">man</a>
package (formerly maintained by Andries Brouwer). I think I did a much
better job of it, though. The man package just forks grep for every manual
page; man-db takes advantage of the pipeline library I wrote for it a while
back and does it entirely in-process (decompression requires a fork but no
exec, while the man package has to exec gunzip as well).</p>
<p>The upshot is that, with a hot cache, man-db takes around 40 seconds to
search all manual pages on my laptop; the man package (also with a hot
cache) takes around five minutes, and interactive performance goes down the
drain while it’s doing it since it’s spawning subprocesses like crazy. If I
limit to a single section, the disparity is closer to 3x than 10x, but it’s
still very noticeable. It’s interesting how much good libraries can do to
help guide efficient approaches to problems.</p>
<p>Of course, a proper full-text search engine would be much better still, but
that’s a project for some other time …</p>Python SIGPIPE handling2009-07-02T08:14:26+00:002009-07-02T08:14:26+00:00Colin Watsontag:www.chiark.greenend.org.uk,2009-07-02:/~cjwatson/blog/python-sigpipe.html<p><a href="http://www.enricozini.org/2009/debian/python-pipes/">Enrico</a> writes about
creating pipelines with Python’s <code>subprocess</code> module, and notes that you
need to take care to close stdout in non-final subprocesses so that
subprocesses get <code>SIGPIPE</code> correctly. This is correct as far as it goes
(and true in any language, although there’s a <a href="http://bugs.python.org/issue1615376">Python bug report …</a></p><p><a href="http://www.enricozini.org/2009/debian/python-pipes/">Enrico</a> writes about
creating pipelines with Python’s <code>subprocess</code> module, and notes that you
need to take care to close stdout in non-final subprocesses so that
subprocesses get <code>SIGPIPE</code> correctly. This is correct as far as it goes
(and true in any language, although there’s a <a href="http://bugs.python.org/issue1615376">Python bug report requesting
that <code>subprocess</code> be able to do this
itself</a>, but there’s an additional
gotcha with Python that you missed.</p></p>
<p>Python ignores <code>SIGPIPE</code> on startup, because it prefers to check every write
and raise an <code>IOError</code> exception rather than taking the signal. This is all
well and good for Python itself, but most Unix subprocesses don’t expect to
work this way. Thus, when you are creating subprocesses from Python, it is
<strong>very important</strong> to set <code>SIGPIPE</code> back to the default action. Before I
realised this was necessary, I wrote code that caused serious data loss due
to a child process carrying on out of control after its parent process died!</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">signal</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="k">def</span> <span class="nf">subprocess_setup</span><span class="p">():</span>
<span class="c1"># Python installs a SIGPIPE handler by default. This is usually not what</span>
<span class="c1"># non-Python subprocesses expect.</span>
<span class="n">signal</span><span class="o">.</span><span class="n">signal</span><span class="p">(</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGPIPE</span><span class="p">,</span> <span class="n">signal</span><span class="o">.</span><span class="n">SIG_DFL</span><span class="p">)</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span><span class="n">command</span><span class="p">,</span> <span class="n">preexec_fn</span><span class="o">=</span><span class="n">subprocess_setup</span><span class="p">)</span>
</code></pre></div>
<p>I filed a <a href="http://bugs.python.org/issue1652">patch</a> a while back to add a
<code>restore_sigpipe</code> option to <code>subprocess.Popen</code>, which would take care of
this. As I say in that bug report, in a future release I think this ought
to be made the default, as it’s very easy to get things dangerously wrong
right now.</p>code_swarm video of Ubuntu uploads2009-05-28T20:29:55+00:002009-05-28T20:32:12+00:00Colin Watsontag:www.chiark.greenend.org.uk,2009-05-28:/~cjwatson/blog/code_swarm.html<p>Joey Hess posted a
<a href="http://lists.debian.org/debian-boot/2009/05/msg00265.html">draft</a> of a
<a href="http://code.google.com/p/codeswarm/">code_swarm</a> video for d-i a couple of
weeks ago, which reminded me that I’ve been meaning to do something similar
for Ubuntu for a while now as it’s just about our archive’s fifth birthday.
I have a more or less …</p><p>Joey Hess posted a
<a href="http://lists.debian.org/debian-boot/2009/05/msg00265.html">draft</a> of a
<a href="http://code.google.com/p/codeswarm/">code_swarm</a> video for d-i a couple of
weeks ago, which reminded me that I’ve been meaning to do something similar
for Ubuntu for a while now as it’s just about our archive’s fifth birthday.
I have a more or less complete archive of all our -changes mailing lists
locally (I think I’m missing some of the very early ones, before the end of
July 2004; let me know if you were one of the very early Canonical employees
and have a record of these), and with the aid of
<a href="https://help.launchpad.net/API/launchpadlib">launchpadlib</a> it’s fairly easy
to map all the e-mail addresses into Launchpad user names, massage out some
of the more obvious duplicates, and then treat the stream of uploads as if
it were a stream of commits.</p>
<p>If you haven’t seen code_swarm before, each dot represents an upload, and
the dots “swarm” around their corresponding committers’ names; more active
committers have larger swarms of dots and brighter names. I assigned a
colour to each of our archive components (uploads aren’t really at the C
code vs. Python code vs. translations vs. whatever kind of granularity that
you see in other code_swarm videos), which mostly means that people who
predominantly upload to main are in roughly an Ubuntu tan colour, people who
predominantly upload to universe are coloured bluish, and people with a good
mixture tend to come out coloured green. If I get a bit more time I may try
to figure out enough about video editing software to add some captions.</p>
<p>Here’s the <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/images/ubuntu-uploads.ogv">video</a> (194 <span class="caps">MB</span>).</p>Bug triage, redux2009-03-05T11:04:02+00:002009-03-05T11:04:02+00:00Colin Watsontag:www.chiark.greenend.org.uk,2009-03-05:/~cjwatson/blog/bug-triage-redux.html<p>I’ve been a bit surprised by the strong positive response to my <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/bug-triage-rants.html">previous
post</a>.
People generally seemed to think it was quite non-ranty; maybe I should
clean the rust off my flamethrower. :-) My hope was that I’d be able to
persuade people to change some practices, so I …</p><p>I’ve been a bit surprised by the strong positive response to my <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/bug-triage-rants.html">previous
post</a>.
People generally seemed to think it was quite non-ranty; maybe I should
clean the rust off my flamethrower. :-) My hope was that I’d be able to
persuade people to change some practices, so I guess that’s a good thing.</p>
<p>Of course, there are many very smart people doing bug triage very well, and
I don’t want to impugn their fine work. Like its medical namesake, bug
triage is a skilled discipline. While it’s often repetitive, and there are
lots of people showing up with similar symptoms, a triage nurse can really
make a difference by spotting urgent cases, cleaning up some of the initial
blood, and referring the patient quickly to a doctor for attention. Or, if
a pattern of cases suddenly appears, a triage nurse might be able to warn of
an incipient epidemic. [Note: I have no medical experience, so please
excuse me if I’m talking crap here. :-)] The bug triagers who do this well
are an absolute godsend; especially when they respond to repetitive tasks
with tremendously useful pieces of automation like
<a href="https://launchpad.net/bughelper">bughelper</a>. The cases I have trouble with
are more like somebody showing up untrained, going through everyone in the
waiting room, and telling each of them that they just need to go home, get
some rest, and stop complaining so much. Sometimes of course they’ll be
right, but without taking the time to understand the problem they’re
probably going to do more harm than good.</p>
<p>Ian Jackson reminded me that it’s worth mentioning the purpose of bug
reports on free software: namely, <strong>to improve the software</strong>. The <span class="caps">GNU</span>
Project has some <a href="http://www.gnu.org/prep/maintain/maintain.html#Mail">advice to
maintainers</a> on this.
I think sometimes we stray into regarding bug reports more like support
tickets. In that case it would be appropriate to focus on resolving each
case as quickly as possible, if necessary by means of a workaround rather
than by a software change, and only bother the developers when necessary.
This is the wrong way to look at bug reports, though. The reason that we
needed to set up a bug triage community in Ubuntu was that we had a
relatively low developer-to-package ratio and a very high user-to-developer
ratio, and we were getting a lot of bug reports that weren’t fleshed out
enough for a developer to investigate them without spending a lot of time in
back-and-forth with the reporter, so a number of people volunteered to take
care of the initial back-and-forth so that good clear bug reports could be
handed over to developers. This is all well and good, and indeed I
encouraged it because I was personally finding myself unable to keep up with
incoming bugs and actually fix anything at the same time. Somewhere along
the way, though, some people got the impression that what we wanted was a
first-line support firewall to try to defend developers from users, which of
course naturally leads to ideas such as closing wishlist bugs containing
ideas because obviously those important developers wouldn’t want to be
bothered by them, and closing old bugs because clearly they must just be
getting in developers’ way. Let me be clear about this now: I absolutely
appreciate help getting bug reports into a state where I can deal with them
efficiently, but <strong>I do not want to be defended from my users</strong>! I don’t
have a basis from which to state that all developers feel the same way, but
my guess is that most do.</p>
<p><a href="http://antti-juhani.kaijanaho.fi/newblog/archives/471">Antti-Juhani
Kaijanaho</a> said he’d
experienced most of these problems in Debian. I hadn’t actually intended my
post to go to Planet Debian - I’d forgotten that the “ubuntu” category on my
blog goes there too, which generally I see as a feature, but if I’d
remembered that I would have been a little clearer that I was talking about
Ubuntu bug triage. If I had been talking about Debian bug triage I’d
probably have emphasised different things. Nevertheless, it’s interesting
that at least one Debian (and non-Ubuntu) developer had experienced similar problems.</p>
<p><a href="http://jldugger.livejournal.com/25994.html">Justin Dugger</a> mentions a
practice of marking duplicate bugs invalid that he has problems with. I
agree that this is suboptimal and try not to do it myself. That said, this
is not something I object to to the same extent. Given that the purpose of
bugs is to improve the software, the real goal is to be able to spend more
time fixing bugs, not to get bugs into the ideal state when the underlying
problem has already been solved. If it’s a choice between somebody having
to spend time tracking down the exact duplicate bug number versus fixing
another bug, I know which I’d take. Obviously, when doing this, it’s worth
apologising that you weren’t able to find the original bug number, and
explaining what the user can do if they believe that you’re mistaken
(particularly if it’s a bug that’s believed to be fixed); the stock text
people often use for this doesn’t seem informative enough to me.</p>
<p>Sebastien Bacher commented that preferred bug triage practices differ among
teams: for instance, the Ubuntu desktop team deals with packages that are
very much to the forefront of users’ attention and so get a lot of duplicate
bugs. Indeed - and bug triagers who are working closely with the desktop
team on this are almost certainly doing things the way the developers on the
desktop team prefer, so I have no problem with that. The best advice I can
give bug triagers is that their ultimate aim is to help developers, and so
they should figure out which developers they need to work with and <strong>go and
talk to them</strong>! That way, rather than duplicating work or being
counterproductive, they can tailor their work to be most effective.
Everybody wins.</p>Bug triage rants2009-03-02T14:51:37+00:002009-03-02T14:51:37+00:00Colin Watsontag:www.chiark.greenend.org.uk,2009-03-02:/~cjwatson/blog/bug-triage-rants.html<p>I hate to say this, but often when somebody does lots of bug triage on a
package I work on, I find it to be a net loss for me. I end up having to go
through all the things that were changed, correct a bunch of them,
occasionally pacify …</p><p>I hate to say this, but often when somebody does lots of bug triage on a
package I work on, I find it to be a net loss for me. I end up having to go
through all the things that were changed, correct a bunch of them,
occasionally pacify angry bug submitters, and all the rest of it, and often
the benefits are minimal at best.</p>
<p>I would very much like this not to be the case. Bug triage is supposed to
help developers be more efficient, and I think most people who do bug triage
are generally well-intentioned and eager to help. Accordingly, here is a
series of mini-rants intended to have educational value.</p>
<ul>
<li>
<p><strong>Bugs are not like fruit.</strong></p>
<p>Fruit goes bad if you leave it too long. By and large, bugs don’t,
especially if they’re on software that doesn’t change very much. There
is no reason why a bug filed against a package in Ubuntu 4.10 where the
relevant code hasn’t changed much since shouldn’t still be perfectly
valid. Even if it isn’t, it deserves proper consideration.</p>
<p>My biggest single annoyance with bug triage is people coming around and
asking if bugs are still valid when they haven’t put any effort into
reproducing them themselves. This annoys bug submitters too; every so
often somebody replies and says “didn’t you even bother to check?”.
This gives a very bad impression of us as a project - wouldn’t it be
better if we looked as if we knew what we were talking about? There is
a good reason to do this kind of check, of course: random undiagnosed
crash reports and the like may well go away due to related changes, and
it is occasionally worth checking. But if the bug is already
well-understood and/or well-described, you should just go and check
whether it’s still there rather than asking.</p>
<p>As I understand it, the intended workflow is that people file bugs, then
if they aren’t clear enough bug triagers work with the submitter to
gather information until they are, then they’re passed to developers for
further work. We seem to have added an extra step wherein submitters
must periodically give their bug a health-check, and if they don’t then
it gets closed as being out of date. In a small minority of cases this
is useful; in most cases, frankly, it makes us look a bit clueless. Can
we please stop doing this? The more we waste people’s time doing this,
the less likely it is that they’ll bother to respond to us, and this
might help our statistics but doesn’t help the project as a whole.</p>
<p>I know that there’s a problem with bug count. I think every project of
non-trivial size has that problem. But, honestly, the right answer is
to <em>fix more bugs</em> - and, personally, I would be able to spend more time
doing that if I weren’t often running around trying to make sure that
bugs I care about aren’t getting overenthusiastically closed just
because somebody thinks they’ve been lying around too long.</p>
<p>There is a good way to expire bugs like this, of course. It goes
something like this: “I’ve read through your bug and tried to reproduce
it with a current release, but I’m afraid I can’t do so. Are you still
experiencing it? If not, then I think it might have been fixed by [this
change I found in the package’s history that seems to be related].” You
can’t do this <em>en masse</em>, but you’ll get a much better response from
submitters, you’ll learn more doing it, and in the process of doing the
necessary investigation of each bug you’ll find that there are many
cases you don’t have to ask about at all.</p>
</li>
<li>
<p><strong>Wishlist bugs are not intrinsically bad.</strong></p>
<p>There are certainly cases where something is far too broad or vague for
a bug report; but there are also plenty of cases, probably far more,
where the wish in question is a relatively small change to the program,
or doesn’t need any more sophisticated tracking, and a wishlist bug is
just right. If you don’t know the program very well, it may be
difficult to tell whether a wishlist bug is appropriate or not; in that
case, just leave the bug alone.</p>
<p>Please, for the love of all that’s holy, don’t close wishlist bugs
saying that people should use Brainstorm or write a specification
instead! If you don’t want to see wishlist bugs in your statistics,
just filter them out; it’s quite easy to do. Even worse, don’t tell
people that something probably isn’t a good idea when you aren’t
familiar with the software; people who have gone to the effort of
writing up their idea for us deserve a response from somebody who knows
the software well. I’ve encountered cases where friends of mine
submitted a bug report (sometimes even at my request) and then a triager
told them it was a bad idea and closed their bug. This sort of thing
puts people off Ubuntu.</p>
<p>Specifications are software design documents. As such, they are best
written by software designers. People who tell other people to go and
write a specification may not realise that as a result of doing this for
three years it’s now essentially impossible to find anything in the
specification system! The intent was never that every user of Ubuntu
would need to write a specification to get anything changed;
specifications are used by developers to document the results of
discussions and write up plans. They are not a straightforward
alternative to wishlist bugs, nor do they turn out to work very well as
what many formal processes call “requirements documents”; the process of
refining the latter in the context of Ubuntu might involve wishlist
bugs, mailing list threads, wiki pages, private discussions with
developers, or things of that nature, and probably shouldn’t involve
creating a specification until the requirements-gathering process is
well underway.</p>
</li>
<li>
<p><strong>Closing a bug is taking an item off somebody’s to-do list.</strong></p>
<p>You wouldn’t go up to a colleague’s whiteboard and take an eraser to it
unless you were sure that was <span class="caps">OK</span>, would you? Yet people seem to do that
all the time with bugs. It’s <span class="caps">OK</span> when the bug is really just like a
support request - “help, it crashed, what do I do?” - and either you’re
pretty sure it’s user error or there’s just no way to get enough
information to fix it. But once the initial triage process is done, now
it’s on somebody’s to-do list.</p>
<p>This is closely related to …</p>
</li>
<li>
<p><strong>If a developer has accepted it, leave it alone.</strong></p>
<p>Every so often I find that there’s a bug that I have accepted by way of
a bug comment or setting to Triaged or whatever, or even a bug that I
filed on a package I work on as a reminder to myself, and somebody comes
along and asks for more information or asks if we can still reproduce it
or something. The hit rate on this kind of thing is extraordinarily
low. There’s a good chance that the developer went and verified the bug
against the code, and in that case it certainly doesn’t need more
information (or they would have asked for it) and it probably isn’t
going to go away without anyone noticing.</p>
<p>In most other free software projects, developers file bug reports
themselves as a reminder about things that need to be done, and people
leave them alone unless they’re intending to help with the fix. In
Ubuntu, developers also have to spend time making sure that those to-do
items don’t get expired. Nobody is helped by this.</p>
<p><a href="https://launchpad.net/launchpad-gm-scripts">launchpad-gm-scripts</a>
includes a Greasemonkey script called <code>lp_karma_suffix</code>, which can help
you to identify developers without having to spend lots of time clicking around.</p>
</li>
<li>
<p><strong>Check whether the package is being actively worked on.</strong></p>
<p>Some packages are actively worked on in Ubuntu; some aren’t (e.g. we
just sync packages from Debian, or they’re basically orphaned, or
whatever). It’s worth checking which is which before doing any kind of
extensive triage work. If it’s being actively worked on, why not go and
talk to the developer(s) in question first? It’s only polite, and it
will probably help you to do a better job.</p>
</li>
</ul>Re: Perl is strange2008-06-23T15:58:08+00:002008-06-23T15:59:12+00:00Colin Watsontag:www.chiark.greenend.org.uk,2008-06-23:/~cjwatson/blog/reply-perl-is-strange.html<p><a href="http://www.df7cb.de/blog/2008/Perl_is_strange.html">Christoph</a>: That’s
because <code>=~</code> binds more tightly than <code>+</code>. This does what you meant:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>perl<span class="w"> </span>-le<span class="w"> </span><span class="s1">'print "yoo" if (1 + 1) =~ /3/'</span>
</code></pre></div>
<p><code>perlop(1)</code> has a useful table of precedence.</p>Don’t use sshkeygen.com to generate keys!2008-06-23T09:31:57+00:002008-06-23T09:54:53+00:00Colin Watsontag:www.chiark.greenend.org.uk,2008-06-23:/~cjwatson/blog/ssh-keygen.html<p>To my horror, I recently saw <a href="http://www.sshkeygen.com/">this online <span class="caps">SSH</span> key
generator</a>.</p>
<p>I hope nobody reading this needs to be told why this is a bad idea.
However, in case you do, here are a few reasons:</p>
<ul>
<li>Every <span class="caps">SSH</span> implementation I know of - certainly all the major ones - that
support public …</li></ul><p>To my horror, I recently saw <a href="http://www.sshkeygen.com/">this online <span class="caps">SSH</span> key
generator</a>.</p>
<p>I hope nobody reading this needs to be told why this is a bad idea.
However, in case you do, here are a few reasons:</p>
<ul>
<li>Every <span class="caps">SSH</span> implementation I know of - certainly all the major ones - that
support public key authentication also provide a key generation utility.
Even aside from all the good reasons not to, there is simply no reason
why you should need to use a web-based tool in the first place.</li>
<li>How can you trust the person running this site? Without implying that I
know he or she is untrustworthy (I don’t), and with the best will in the
world, it’s a big Internet with a lot of nasty people on it. Do you
really want somebody you don’t know in a position to keep a copy of all
your private keys?</li>
<li>Even if the person is trustworthy, the server running sshkeygen.com is
now a giant blinking target. If lots of people use it, there is every
incentive in the world for the bad guys to try to take control of it so
that they can keep a copy of all your private keys. (Or, as we know from
recent bitter experience, they can just give out keys from a limited set
and it will probably take a couple of years before anyone notices …)</li>
<li>The front page of sshkeygen.com says that the keys are escrowed. The
plain English meaning of this would be that the operator of that site
keeps a copy of the private key, to be held in trust in case (presumably)
you lose it and need to retrieve it. Normally this sort of thing depends
on a legal trust relationship, perhaps linked to a contract. What does
it mean here? Is it just a buzzword? If it isn’t, then this just makes
sshkeygen.com even more of a target.</li>
<li>sshkeygen.com delivers keys to you over unencrypted <span class="caps">HTTP</span>. Yes, this is
on its <a href="http://www.sshkeygen.com/about.php">to-do list</a>. That isn’t
really an excuse.</li>
<li>Even if keys were delivered over <span class="caps">HTTPS</span>, that still relies on people
diligently checking the authenticity of the certificate. A
self-signature (as suggested as an alternative in the to-do list) would
be impossible to check with any reliability; and will people who have
trouble with non-web-based key generation software really be able or
inclined to confirm the signature chain? Browsers typically don’t
enforce this very strictly, or if they do they provide fairly simple ways
to bypass the enforcement, simply because so many sites have broken or
poorly-signed <span class="caps">SSL</span> certificates, and keeping up with all the CAs is pretty
hard work too.</li>
<li>Furthermore, delivering private keys over <span class="caps">HTTPS</span> makes that <span class="caps">SSL</span>
certificate a single giant blinking target. Might it be compromised?
How would you tell? What servers would need to be compromised in order
to get a copy of the private <span class="caps">SSL</span> key?</li>
<li>Sure, Debian is in an awkward position here given the recent OpenSSL
random number generation vulnerability. However, how do you know that
sshkeygen.com is running on a system that doesn’t suffer from this? (As
it happens, I have checked, and it doesn’t appear to suffer from this
vulnerability - but most people won’t check and won’t know how to check.)</li>
</ul>
<p>I <em>think</em> this is probably being done in innocent seriousness (although I
kind of hope it’s a joke in poor taste), and have e-mailed the contact
address offering to explain why it’s a bad idea.</p>Vim omni completion for Launchpad bugs2008-01-31T11:17:58+00:002008-01-31T11:19:27+00:00Colin Watsontag:www.chiark.greenend.org.uk,2008-01-31:/~cjwatson/blog/vim-lpbug-omnicomplete.html<p>I hacked together a little timesaver for developers this morning: omni
completion for Launchpad bugs in Vim’s debchangelog mode. To use it,
install vim 7.1-138+1ubuntu3 once it hits the mirrors, open up a
<code>debian/changelog</code> file, type “<span class="caps">LP</span>: #”, and hit Ctrl-X Ctrl-O. It’ll think
for a …</p><p>I hacked together a little timesaver for developers this morning: omni
completion for Launchpad bugs in Vim’s debchangelog mode. To use it,
install vim 7.1-138+1ubuntu3 once it hits the mirrors, open up a
<code>debian/changelog</code> file, type “<span class="caps">LP</span>: #”, and hit Ctrl-X Ctrl-O. It’ll think
for a while and then give you a list of all the bugs open in Launchpad
against the package in question, from which you can select to insert the bug
number into your changelog.</p>
<p>Here’s a screenshot to make it clearer:</p>
<p><img alt="screenshot" src="https://www.chiark.greenend.org.uk/~cjwatson/blog/images/lp-omnicomplete.png"></p>
<p>Thanks to Stefano Zacchiroli for doing the same for Debian bugs back in July.</p>UTF-8 manual pages2008-01-29T01:57:51+00:002008-01-29T01:57:51+00:00Colin Watsontag:www.chiark.greenend.org.uk,2008-01-29:/~cjwatson/blog/utf-8-manual-pages.html<p>See <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/man-db-encodings.html">Encodings in man-db</a> for context.</p>
<p>Yesterday, I uploaded <a href="http://lists.debian.org/debian-devel-changes/2008/01/msg02665.html">man-db
2.5.1-1</a>
to unstable. With this version, not only is it possible to install manual
pages in <span class="caps">UTF</span>-8 (as with 2.5.0, although with fewer bugs), but it’s also
possible to ask man to produce a …</p><p>See <a href="https://www.chiark.greenend.org.uk/~cjwatson/blog/man-db-encodings.html">Encodings in man-db</a> for context.</p>
<p>Yesterday, I uploaded <a href="http://lists.debian.org/debian-devel-changes/2008/01/msg02665.html">man-db
2.5.1-1</a>
to unstable. With this version, not only is it possible to install manual
pages in <span class="caps">UTF</span>-8 (as with 2.5.0, although with fewer bugs), but it’s also
possible to ask man to produce a version of an arbitrary page in the
encoding of your choice, and have it guess the source encoding for you
fairly reliably. This finally provides enough support to have debhelper
<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=462937">automatically recode manual pages to
<span class="caps">UTF</span>-8</a>.</p>
<p>It’ll probably take a little while to shake out the corner-case bugs, but
I’m generally pretty happy with this. Once the new man-db and debhelper
land in testing, I’ll send a note to debian-devel-announce and push harder
on my <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440420">policy
amendment</a>.</p>
<p>Considering the historical state of man-db when it comes to localisation,
and all of the dependencies and general yak-shaving that had to be tackled
to get here, this represents the end of probably several hundred hours of
work, so I’m pretty happy that this is out the door. The only remaining
step is to add <span class="caps">UTF</span>-8 input support to groff, which fortunately Brian M.
Carlson is
<a href="http://lists.gnu.org/archive/html/groff/2007-11/msg00018.html">working</a>
<a href="http://lists.gnu.org/archive/html/groff/2008-01/msg00004.html">on</a>. After
that, we can reasonably claim to have dragged manual pages kicking and
screaming into the 21st century.</p>aptitude safe-upgrade2007-11-29T20:51:23+00:002007-11-29T20:51:23+00:00Colin Watsontag:www.chiark.greenend.org.uk,2007-11-29:/~cjwatson/blog/safe-upgrade.html<p><a href="http://blog.drinsama.de/erich/en/linux/debian/2007112801-dist-upgrade-hints.html">Erich</a>:
I do sometimes wonder why we don’t relax the definition of “safe” upgrades
to include installing new packages but still not removing old ones. I know
that many of my uses of dist-upgrade are just for when something grows a new
dependency that I didn’t previously have …</p><p><a href="http://blog.drinsama.de/erich/en/linux/debian/2007112801-dist-upgrade-hints.html">Erich</a>:
I do sometimes wonder why we don’t relax the definition of “safe” upgrades
to include installing new packages but still not removing old ones. I know
that many of my uses of dist-upgrade are just for when something grows a new
dependency that I didn’t previously have installed.</p>
<p>(Of course this wouldn’t always help as it wouldn’t account for a new
dependency that conflicted with an old dependency, but never mind. It would
certainly do wonders for the metapackage case.)</p>Encodings in man-db2007-09-17T07:28:20+00:002007-09-17T07:28:20+00:00Colin Watsontag:www.chiark.greenend.org.uk,2007-09-17:/~cjwatson/blog/man-db-encodings.html<p>I’ve spent some quality upstream time lately with man-db. Specifically,
I’ve been upgrading its locale support. I recently published a pre-release,
<a href="http://people.debian.org/~cjwatson/man-db/man-db-2.5.0-pre2.tar.gz">man-db
2.5.0-pre2</a>
mainly for translators, but other people may be interested in having a look
at it as well. I hope to release 2.5 …</p><p>I’ve spent some quality upstream time lately with man-db. Specifically,
I’ve been upgrading its locale support. I recently published a pre-release,
<a href="http://people.debian.org/~cjwatson/man-db/man-db-2.5.0-pre2.tar.gz">man-db
2.5.0-pre2</a>
mainly for translators, but other people may be interested in having a look
at it as well. I hope to release 2.5.0 quite soon so that all of this can
land in Debian.</p>
<p>Firstly, man-db now supports creating and using databases for per-locale
hierarchies of manual pages, not just English. This means that <a href="http://bugs.debian.org/29448">apropos and
whatis can now display information about localised manual
pages</a>.</p>
<p>Secondly, I’ve been working on the transition to <span class="caps">UTF</span>-8 manual pages. Now,
modulo some hacks, groff can’t yet deal with Unicode input; some possible
input characters are reserved for its internal use which makes full 32-bit
input difficult to do properly until that’s fixed. However, with a few
exceptions, manual pages generally just need the subset of Unicode that
corresponds to their language’s usual legacy character set, so for now it’s
good enough to just recode on the fly from <span class="caps">UTF</span>-8 to some appropriate 8-bit
character set and use groff’s support for that.</p>
<p>man-db has actually supported doing this kind of thing for a while, but it’s
been difficult to use since it only applies to <code>/usr/share/man/ll_CC.UTF-8/</code>
directories, while manual pages usually aren’t country-specific. So, man-db
2.5.0 supports using <code>/usr/share/man/ll.UTF-8/</code> instead, which is a bit more
appropriate. Also, following a <a href="http://lists.debian.org/debian-mentors/2007/09/msg00245.html">discussion with Adam
Borowski</a>,
man-db can now try decoding manual pages as <span class="caps">UTF</span>-8 and fall back to 8-bit
encodings even in directories without an explicit encoding tag; if this
fails for some reason, you can put a <code>'\" -*- coding: UTF-8 -*-</code> line at the
top of the page.</p>
<p>I’m still debating whether Debian policy should recommend installing <span class="caps">UTF</span>-8
manual pages in <code>/usr/share/man/ll.UTF-8/</code> or just in <code>/usr/share/man/ll/</code>.
Initially I was very strongly in favour of an encoding declaration, but now
that man-db can do a pretty good job of guesswork I’m coming round to Adam
Borowski’s position that people should be able to forget about character
sets with <span class="caps">UTF</span>-8. Opinions here would be welcome. One thing I haven’t moved
on is that any design that assumes that the encoding of manual pages on the
filesystem has anything to do with the user’s locale is demonstrably
incorrect and broken; I’m not going to use <code>LC_CTYPE</code> for anything except
output. However, maybe “<span class="caps">UTF</span>-8 or the usual legacy encoding provided that
the latter is not typically confused for the former” is a good enough
specification, and that still has the desirable property of not requiring a
flag day. I’ll try to come down from the fence before unleashing this code
on the world.</p>Keysigning public service announcement2007-07-04T17:45:39+00:002007-07-04T17:45:39+00:00Colin Watsontag:www.chiark.greenend.org.uk,2007-07-04:/~cjwatson/blog/keysigning-psa.html<p>If your key has so many UIDs and such a combinatorially exploded number of
signatures on it that it takes <code>gpg</code> minutes just to start up in
<code>--edit-key</code> mode, then I probably won’t bother signing it. <span class="caps">HTH</span>, <span class="caps">HAND</span>.</p>Moving conffiles between packages, redux2006-12-23T23:37:08+00:002006-12-23T23:37:08+00:00Colin Watsontag:www.chiark.greenend.org.uk,2006-12-23:/~cjwatson/blog/moving-conffiles.html<p>I spent far too much of today cleaning up an upgrade bug to do with
conffiles, which I suspect also affects other packages that have attempted
to work around dpkg conffile prompts when moving conffiles between packages.
If you maintain such a package, please review your code to make sure …</p><p>I spent far too much of today cleaning up an upgrade bug to do with
conffiles, which I suspect also affects other packages that have attempted
to work around dpkg conffile prompts when moving conffiles between packages.
If you maintain such a package, please review your code to make sure that it
works properly when upgrading both with sarge’s dpkg and with etch’s dpkg.
See <a href="http://lists.debian.org/debian-devel/2006/12/msg00647.html">my debian-devel post</a></p>
<blockquote>
<p>for a full description.</p>
</blockquote>Google Summer of Code project started (Debian)2006-05-26T17:23:00+00:002006-05-26T20:11:52+00:00Colin Watsontag:www.chiark.greenend.org.uk,2006-05-26:/~cjwatson/blog/gsoc-d-i-hurd-started.html<p>I’m mentoring <a href="http://xsunblog.blogspot.com/">Matheus Morais</a> in the <a href="http://code.google.com/soc/">Google
Summer of Code</a>, porting d-i to the Hurd. We’ve
exchanged a few mails and he has in hand all the preliminary (but not yet
functional; wouldn’t want to make it too easy :-)) patches I’ve put together
in the past …</p><p>I’m mentoring <a href="http://xsunblog.blogspot.com/">Matheus Morais</a> in the <a href="http://code.google.com/soc/">Google
Summer of Code</a>, porting d-i to the Hurd. We’ve
exchanged a few mails and he has in hand all the preliminary (but not yet
functional; wouldn’t want to make it too easy :-)) patches I’ve put together
in the past. I think I should be reasonably well-placed to judge his progress.</p>
<p>Best of luck, Matheus!</p>Unix tools: sponge2006-02-06T20:45:38+00:002006-02-06T20:45:38+00:00Colin Watsontag:www.chiark.greenend.org.uk,2006-02-06:/~cjwatson/blog/sponge.html<p>Joey
<a href="http://kitenet.net/~joey/blog/entry/unix_tools_vidir-2006-02-05-21-33.html">writes</a>
about the lack of new tools that fit into the Unix philosophy. My favourite
of such things I’ve written is
<a href="http://riva.ucam.org/svn/cjwatson/bin/sponge">sponge</a>. It addresses the
problem of editing files in-place with Unix tools, namely that if you just
redirect output to the file you’re trying to edit …</p><p>Joey
<a href="http://kitenet.net/~joey/blog/entry/unix_tools_vidir-2006-02-05-21-33.html">writes</a>
about the lack of new tools that fit into the Unix philosophy. My favourite
of such things I’ve written is
<a href="http://riva.ucam.org/svn/cjwatson/bin/sponge">sponge</a>. It addresses the
problem of editing files in-place with Unix tools, namely that if you just
redirect output to the file you’re trying to edit then the redirection takes
effect (clobbering the contents of the file) before the first command in the
pipeline gets round to reading from the file. Switches like <code>sed -i</code> and
<code>perl -i</code> work around this, but not every command you might want to use in a
pipeline has such an option, and you can’t use that approach with
multiple-command pipelines anyway.</p>
<p>I normally use sponge a bit like this:</p>
<div class="highlight"><pre><span></span><code>sed '...' file | grep '...' | sponge file
</code></pre></div>
<p>Since it’s so trivial I imagine lots of other people have written something
similar (another common name for it seems to be inplace; my name indicates
soaking up all the input and then squeezing it all out again); but I do keep
meaning to try to get a rewritten version into coreutils at some point.</p>debconf/cdebconf coinstallability2006-01-27T02:55:06+00:002006-01-27T02:55:06+00:00Colin Watsontag:www.chiark.greenend.org.uk,2006-01-27:/~cjwatson/blog/debconf-cdebconf-coinstallable.html<p><a href="http://kitenet.net/~joey/blog/">Joey</a> has been
<a href="http://lists.debian.org/debian-devel/2005/08/msg00136.html">campaigning</a>
for a while to get everything in the archive changed to depend on <code>debconf |
debconf-2.0</code> or similar rather than just <code>debconf</code>, in order that we can
start rolling out <code>cdebconf</code> as its replacement. Like most jobs that
involve touching the bulk of the archive, this …</p><p><a href="http://kitenet.net/~joey/blog/">Joey</a> has been
<a href="http://lists.debian.org/debian-devel/2005/08/msg00136.html">campaigning</a>
for a while to get everything in the archive changed to depend on <code>debconf |
debconf-2.0</code> or similar rather than just <code>debconf</code>, in order that we can
start rolling out <code>cdebconf</code> as its replacement. Like most jobs that
involve touching the bulk of the archive, this looks set to take quite a
while, as the <a href="http://bugs.debian.org/328498">list of bugs</a> should indicate.</p></p>
<p>Recently it occurred to me that we didn’t necessarily have to do it that way
round. In a bout of late-night hacking while staying awake to look after a
sick child (he seems mostly <span class="caps">OK</span> now, although the rushed trip to the hospital
earlier was a bit on the nerve-wracking side), I’ve shuffled things around
in the cdebconf package so that it no longer has any file conflicts with
debconf or debconf-doc, and changed the debconf confmodule to fire up the
cdebconf frontend rather than its own if the <code>DEBCONF_USE_CDEBCONF</code>
environment variable is non-empty. (The details of this may change before
it actually gets uploaded, as I’d like to get Joey to look it over and
approve it first.) This allows you to install cdebconf, set that
environment variable, and play around with cdebconf with relative ease; when
we come to switch to cdebconf for real, instead of a huge conflicting mess
that apt will probably have trouble resolving, it’ll just be a matter of
changing a couple of lines in <code>/usr/share/debconf/confmodule</code>.</p></p>
<p>Of course, don’t expect cdebconf to be a complete working replacement for
debconf just yet; if you try using it for a dist-upgrade run it’ll fall
over. Due to its d-i heritage, it doesn’t yet load templates automatically;
that has to be done by hand. Frontend names differ from debconf’s, which
will need some migration code. At the moment it can only handle <span class="caps">UTF</span>-8
templates, which are mandated in the installer but only optional in the rest
of the system. It doesn’t have all of debconf’s rich array of database
modules. I haven’t adapted the Perl or Python confmodules yet. The list
goes on. However, I think we at least stand a chance of getting a handle on
the problem now.</p>
<p>(I’ll post this article to debian-devel once the changes have been reviewed
and uploaded.)</p>Killer apps: bzr shelve2006-01-09T16:47:43+00:002006-01-09T16:47:43+00:00Colin Watsontag:www.chiark.greenend.org.uk,2006-01-09:/~cjwatson/blog/bzr-shelve.html<p>Working on free software has made me fairly revision control
system-agnostic; I can’t afford to get too wedded to any one system because
as soon as I do somebody will invent something new and I’ll have to convert
again, so I just work with whatever other people on …</p><p>Working on free software has made me fairly revision control
system-agnostic; I can’t afford to get too wedded to any one system because
as soon as I do somebody will invent something new and I’ll have to convert
again, so I just work with whatever other people on the same project are
using. Even <span class="caps">CVS</span> doesn’t make a lot of difference to the way I work as long
as I’m working online and have cvsps handy. And of course I usually don’t
bother with revision control if I’m just tweaking somebody else’s Debian
source package a bit (in which case I just use debdiff for paranoia).</p>
<p>Using bzr at work, though, I think I just found my killer app in Michael
Ellerman’s <a href="http://wiki.bazaar.canonical.com/BzrShelveExample">shelve</a>
plugin. My working style generally involves alternating between doing lots
and lots of stuff in the one working copy and (after testing) going through
and committing it in logical chunks. This is fine if everything’s in
separate files (most revision control systems let you commit just some
files), but if several of the chunks are in the one file then I’m reduced to
saving diffs and manually editing out the bits I don’t want to commit yet,
which is obviously pretty tedious and error-prone.</p>
<p><code>bzr shelve</code> presents each diff hunk in your working copy to you in turn and
asks you whether you want to keep it. If you say no, that hunk gets
unapplied and goes into a “shelf”, where <code>bzr unshelve</code> can later reapply
it. In the meantime commits act as though the shelved hunks didn’t exist.
This doesn’t help if you want to defer only one of two immediately adjacent
changes that end up in the same hunk, of course, but it vastly reduces the
scale of the problem.</p>
<p>I suppose it would be easy enough to write a shelve-a-like for any other
system; it’s just that I haven’t seen it for any other system yet. If
working with systems that lack it really starts to annoy me, I may have to
rip out the guts of shelve and figure out how to make it generic.</p>Single-stage installer2006-01-03T15:32:27+00:002006-01-03T15:32:27+00:00Colin Watsontag:www.chiark.greenend.org.uk,2006-01-03:/~cjwatson/blog/single-stage-installer.html<p>Hot on the heels of <a href="http://kitenet.net/~joey/blog/entry/all_this_for_a_progress_bar-2005-12-27-20-32.html">Joey’s tale of getting rid of
base-config</a>
(the second stage of the installer) in Debian, we’ve now pretty much got rid
of it in Ubuntu Dapper too. The upshot of this is that rather than asking a
bunch of questions, installing the base …</p><p>Hot on the heels of <a href="http://kitenet.net/~joey/blog/entry/all_this_for_a_progress_bar-2005-12-27-20-32.html">Joey’s tale of getting rid of
base-config</a>
(the second stage of the installer) in Debian, we’ve now pretty much got rid
of it in Ubuntu Dapper too. The upshot of this is that rather than asking a
bunch of questions, installing the base system, and rebooting to install
everything else, we now just install everything in one go and reboot into a
completed system.</p>
<p>This does mean that, if your system doesn’t boot, you don’t get to find out
about it for a bit longer. However, it has lots of advantages in terms of
speed (the much-maligned archive-copier mostly goes away), reducing code
duplication (base-config had a bunch of infrastructure of its own which was
done better in the core installer anyway), comprehensibility, and killing
off some annoying bugs like <a href="https://bugzilla.ubuntu.com/show_bug.cgi?id=13561">#13561 (duplicate mirror questions in netboot
installs)</a>, <a href="https://bugzilla.ubuntu.com/show_bug.cgi?id=15213">#15213
(second stage hangs if you skip archive-copier in the first
stage)</a>, and <a href="https://bugzilla.ubuntu.com/show_bug.cgi?id=19571">#19571
(kernel messages scribble over base-config’s
<span class="caps">UI</span>)</a>.</p></p>
<p>To go with Joey’s Debian timeline, the Ubuntu history looks a bit like this:</p>
<ul>
<li>2004 (Jul): First base-config modifications for Ubuntu; we need to
install the default desktop rather than dropping into tasksel.</li>
<li>2004 (Aug): Mark phones me up and asks if I can make the installer not
need the <span class="caps">CD</span> in the second stage by copying all the packages across
beforehand. Although it’s a bit awkward, I can see the <span class="caps">UI</span> advantages in
that, so I write archive-copier at the Canonical conference in Oxford.</li>
<li>2004 (Sep): Mark asks me if we can ask the timezone, user/password, and
apt configuration questions before the first reboot. With less than a
month to go until our first release, I have a
<a href="http://lists.ubuntu.com/archives/ubuntu-devel/2004-September/000103.html">heart-attack</a>
at how much needs to be done, and it eventually gets deferred to
<a href="https://wiki.ubuntu.com/HoaryGoals">Hoary</a>.</li>
<li>2005 (Jan): Matt fixes up debconf’s passthrough frontend for use on the
live <span class="caps">CD</span>, and we realise that this is an obvious way to run bits of
base-config before the first reboot. It’s rather messy and takes until
March or so before it really works right, but we get there in the end.</li>
<li>2005 (Apr): I get “put a progress bar in front of the dpkg output in the
second stage” as a
<a href="https://wiki.ubuntu.com/UbuntuDownUnder/BOFs/InstallerStage2Progress">goal</a>
for Breezy. Naïvely, I think it’s a simple matter of programming, since
I’d already done something similar for debootstrap and base-installer the
previous year.</li>
<li>2005 (May): I hack progress bar support into debconf. Nothing actually
uses it for anything yet, except as a convenient passthrough stub.</li>
<li>2005 (Jul/Aug): I actually try to implement the second-stage progress bar
and realise that it’s about an order of magnitude harder than I thought,
requiring a whole load of extra infrastructure in apt. Fortunately
Michael Vogt saves the day here by writing lots of working code, and the
progress bar works by early August.</li>
<li>2005 (Sep-Dec): Upstream d-i development ramps back up again, with
tzsetup, clock-setup, apt-setup, and user-setup all being cranked out in
short order and the corresponding pieces removed from base-config. I
merge these as they mature, and manage to get
<a href="http://lists.debian.org/debian-boot/2005/10/msg01407.html">agreement</a> on
including the Ubuntu debconf template changes in upstream apt-setup,
which helps the diff size a lot.</li>
<li>2005 (Nov/Dec): Joey and I
<a href="http://lists.debian.org/debian-boot/2005/11/msg01381.html">chat</a> one
evening about the Ubuntu second-stage progress bar work, and we end up
designing and writing debconf-apt-progress based on its ideas, after
which Joey knocks up pkgsel in no time flat.</li>
<li>2006 (Jan): The rest of the pieces land in Ubuntu, and we drop
base-config out of the installer. To my surprise, nearly everything
still just works.</li>
</ul>
<p>Although it caused some friction, I’m glad that we did the first cuts of
many of these things outside Debian and got to try things out before landing
version-2-quality code in Debian. The end result is much nicer than the
intermediate ones ever were.</p>Forwarding bugs to the IETF2006-01-03T13:16:06+00:002006-01-03T13:16:06+00:00Colin Watsontag:www.chiark.greenend.org.uk,2006-01-03:/~cjwatson/blog/openssh-iutf8.html<p>Sometimes following up on a bug takes you a lot further than you expected.
<a href="http://bugs.debian.org/337041">Debian bug #337041</a> looked like it was going
to be fairly straightforward once I upgraded coreutils to figure out what
the <a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html#mod">new <span class="caps">IUTF8</span> flag</a>
actually did, since the <span class="caps">SSH2</span> protocol already supports transferring termios
flags around …</p><p>Sometimes following up on a bug takes you a lot further than you expected.
<a href="http://bugs.debian.org/337041">Debian bug #337041</a> looked like it was going
to be fairly straightforward once I upgraded coreutils to figure out what
the <a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html#mod">new <span class="caps">IUTF8</span> flag</a>
actually did, since the <span class="caps">SSH2</span> protocol already supports transferring termios
flags around.</p>
<p>Unfortunately, since <span class="caps">IUTF8</span> is relatively new, it doesn’t have a number
assigned in the <a href="http://www.ietf.org/internet-drafts/draft-ietf-secsh-connect-25.txt">draft connection
protocol</a>
Moreover, that Internet-Draft is in the last stages before becoming an <span class="caps">RFC</span>
and can’t be modified any more, and it doesn’t include any facility for
private-use extensions. D’oh. To add further complication, since <span class="caps">IUTF8</span> is
Linux-specific, it’s not hard to imagine that some other <span class="caps">OS</span> might introduce
something with the same name but subtly different semantics, and so the <span class="caps">SSH</span>
protocols can’t just defer to <span class="caps">POSIX</span> for the definition but instead have to
spell out exactly what they mean.</p>
<p>As a result of all of this, it looks like the best way to make progress
might be for me to write an I-D myself that creates a channel extension to
set or clear <span class="caps">IUTF8</span>, and attempt to enlist support from some upstream
implementors. I didn’t expect bug triage to lead me into the Internet
standardisation process quite so quickly!</p>Hello!2006-01-03T13:11:31+00:002006-01-03T13:11:31+00:00Colin Watsontag:www.chiark.greenend.org.uk,2006-01-03:/~cjwatson/blog/hello.html<p>New year, new blog. I’ve had a
<a href="http://www.livejournal.com/users/cjwatson/">LiveJournal</a> for a while, but
don’t write very much in it, and many of its readers wouldn’t be interested
in me talking about Debian and such anyway. I think the best solution is
for me to keep technical posts here …</p><p>New year, new blog. I’ve had a
<a href="http://www.livejournal.com/users/cjwatson/">LiveJournal</a> for a while, but
don’t write very much in it, and many of its readers wouldn’t be interested
in me talking about Debian and such anyway. I think the best solution is
for me to keep technical posts here.</p>