Re-signing PPAs

Julian has written about their efforts to strengthen security in APT, and shortly before that notified us that Launchpad’s signatures on PPAs use weak SHA-1 digests. Unfortunately we hadn’t noticed that before; GnuPG’s defaults tend to result in weak digests unless carefully tweaked, which is a shame.

I started on the necessary fixes for this immediately we heard of the problem, but it’s taken a little while to get everything in place, and I thought I’d explain why since some of the problems uncovered are interesting in their own right.

Firstly, there was the relatively trivial matter of using SHA-512 digests on new signatures. This was mostly a matter of adjusting our configuration, although writing the test was a bit tricky since PyGPGME isn’t as helpful as it could be. (Simpler repository implementations that call gpg from the command line should probably just add the --digest-algo SHA512 option instead of imitating this.)

After getting that in place, any change to a suite in a PPA will result in it being re-signed with SHA-512, which is good as far as it goes, but we also want to re-sign PPAs that haven’t been modified. Launchpad hosts more than 50000 active PPAs, though, a significant percentage of which include packages for sufficiently recent Ubuntu releases that we’d want to re-sign them for this. We can’t expect everyone to push new uploads, and we need to run this through at least some part of our usual publication machinery rather than just writing a hacky shell script to do the job (which would have no idea which keys to sign with, to start with); but forcing full reprocessing of all those PPAs would take a prohibitively long time, and at the moment we need to interrupt normal PPA publication to do this kind of work. I therefore had to spend some quality time working out how to make things go fast enough.

The first couple of changes (1, 2) were to add options to our publisher script to let us run just the one step we need in “careful” mode: that is, forcibly re-run the Release file processing step even if it thinks nothing has changed, and entirely disable the other steps such as generating Packages and Sources files. Then last week I finally got around to timing things on one of our staging systems so that we could estimate how long a full run would take. It was taking a little over two seconds per archive, which meant that if we were to re-sign all published PPAs then that would take more than 33 hours! Obviously this wasn’t viable; even just re-signing xenial would be prohibitively slow.

The next question was where all that time was going. I thought perhaps that the actual signing might be slow for some reason, but it was taking about half a second per archive: not great, but not enough to account for most of the slowness. The main part of the delay was in fact when we committed the database transaction after processing each archive, but not in the actual PostgreSQL commit, rather in the ORM invalidate method called to prepare for a commit.

Launchpad uses the excellent Storm for all of its database interactions. One property of this ORM (and possibly of others; I’ll cheerfully admit to not having spent much time with other ORMs) is that it uses a WeakValueDictionary to keep track of the objects it’s populated with database results. Before it commits a transaction, it iterates over all those “alive” objects to note that if they’re used in future then information needs to be reloaded from the database first. Usually this is a very good thing: it saves us from having to think too hard about data consistency at the application layer. But in this case, one of the things we did at the start of the publisher script was:

def getPPAs(self, distribution):
    """Find private package archives for the selected distribution."""
    if (self.isCareful(self.options.careful_publishing) or
            self.options.include_non_pending):
        return distribution.getAllPPAs()
    else:
        return distribution.getPendingPublicationPPAs()

def getTargetArchives(self, distribution):
    """Find the archive(s) selected by the script's options."""
    if self.options.partner:
        return [distribution.getArchiveByComponent('partner')]
    elif self.options.ppa:
        return filter(is_ppa_public, self.getPPAs(distribution))
    elif self.options.private_ppa:
        return filter(is_ppa_private, self.getPPAs(distribution))
    elif self.options.copy_archive:
        return self.getCopyArchives(distribution)
    else:
        return [distribution.main_archive]

That innocuous-looking filter means that we do all the public/private filtering of PPAs up-front and return a list of all the PPAs we intend to operate on. This means that all those objects are alive as far as Storm is concerned and need to be considered for invalidation on every commit, and the time required for that stacks up when many thousands of objects are involved: this is essentially accidentally quadratic behaviour, because all archives are considered when committing changes to each archive in turn. Normally this isn’t too bad because only a few hundred PPAs need to be processed in any given run; but if we’re running in a mode where we’re processing all PPAs rather than just ones that are pending publication, then suddenly this balloons to the point where it takes a couple of seconds. The fix is very simple, using an iterator instead so that we don’t need to keep all the objects alive:

from itertools import ifilter

def getTargetArchives(self, distribution):
    """Find the archive(s) selected by the script's options."""
    if self.options.partner:
        return [distribution.getArchiveByComponent('partner')]
    elif self.options.ppa:
        return ifilter(is_ppa_public, self.getPPAs(distribution))
    elif self.options.private_ppa:
        return ifilter(is_ppa_private, self.getPPAs(distribution))
    elif self.options.copy_archive:
        return self.getCopyArchives(distribution)
    else:
        return [distribution.main_archive]

After that, I turned to that half a second for signing. A good chunk of that was accounted for by the signContent method taking a fingerprint rather than a key, despite the fact that we normally already had the key in hand; this caused us to have to ask GPGME to reload the key, which requires two subprocess calls. Converting this to take a key rather than a fingerprint gets the per-archive time down to about a quarter of a second on our staging system, about eight times faster than where we started.

Using this, we’ve now re-signed all xenial Release files in PPAs using SHA-512 digests. On production, this took about 80 minutes to iterate over around 70000 archives, of which 1761 were modified. Most of the time appears to have been spent skipping over unmodified archives; even a few hundredths of a second per archive adds up quickly there. The remaining time comes out to around 0.4 seconds per modified archive. There’s certainly still room for speeding this up a bit.

We wouldn’t want to do this procedure every day, but it’s acceptable for occasional tasks like this. I expect that we’ll similarly re-sign wily, vivid, and trusty Release files soon in the same way.