From: Colin Watson Date: Wed, 30 Mar 2016 09:20:53 +0000 (+0100) Subject: Re-signing PPAs X-Git-Url: https://www.chiark.greenend.org.uk/ucgi/~cjwatson/git?a=commitdiff_plain;h=bc9350255b5f7493ec52261638d695758613ff63;p=blog.git Re-signing PPAs --- diff --git a/content/re-signing-ppas.md b/content/re-signing-ppas.md new file mode 100644 index 00000000..2de28e1a --- /dev/null +++ b/content/re-signing-ppas.md @@ -0,0 +1,153 @@ +Title: Re-signing PPAs +Slug: re-signing-ppas +Date: 2016-03-30 10:20:32 +01:00 +Category: launchpad +Tags: launchpad, ubuntu, planet-debian, planet-ubuntu + +Julian has +[written](https://juliank.wordpress.com/2016/03/14/dropping-sha-1-support-in-apt/) +about their efforts to strengthen security in APT, and shortly before that +[notified](https://bugs.launchpad.net/bugs/1556666) us that Launchpad's +signatures on PPAs use +weak SHA-1 digests. Unfortunately we hadn't noticed that before; GnuPG's +defaults tend to result in weak digests unless carefully tweaked, which is a +shame. + +I started on the necessary fixes for this immediately we heard of the +problem, but it's taken a little while to get everything in place, and I +thought I'd explain why since some of the problems uncovered are interesting +in their own right. + +Firstly, there was the relatively trivial matter of [using SHA-512 digests +on new +signatures](https://code.launchpad.net/~cjwatson/launchpad/digest-algo-sha512/+merge/289052). +This was mostly a matter of adjusting our configuration, although writing +the test was a bit tricky since +[PyGPGME](https://pypi.python.org/pypi/pygpgme) isn't as helpful as it could +be. (Simpler repository implementations that call `gpg` from the command +line should probably just add the `--digest-algo SHA512` option instead of +imitating this.) + +After getting that in place, any change to a suite in a PPA will result in +it being re-signed with SHA-512, which is good as far as it goes, but we +also want to re-sign PPAs that haven't been modified. Launchpad hosts more +than 50000 PPAs, though, a significant percentage of which include packages +for sufficiently recent Ubuntu releases that we'd want to re-sign them for +this. We can't expect everyone to push new uploads, and we need to run this +through at least some part of our usual publication machinery rather than +just writing a hacky shell script to do the job (which would have no idea +which keys to sign with, to start with); but forcing full reprocessing of +all those PPAs would take a prohibitively long time, and at the moment we +need to interrupt normal PPA publication to do this kind of work. I +therefore had to spend some quality time working out how to make things go +fast enough. + +The first couple of changes +([1](https://code.launchpad.net/~cjwatson/launchpad/publish-distro-careful-release/+merge/289401), +[2](https://code.launchpad.net/~cjwatson/launchpad/publish-distro-disable-steps/+merge/289658)) +were to add options to our publisher script to let us run just the one step +we need in "careful" mode: that is, forcibly re-run the `Release` file +processing step even if it thinks nothing has changed, and entirely disable +the other steps such as generating `Packages` and `Sources` files. Then +last week I finally got around to timing things on one of our staging +systems so that we could estimate how long a full run would take. It was +taking a little over two seconds per archive, which meant that if we were to +re-sign all published PPAs then that would take more than 33 hours! +Obviously this wasn't viable; even just re-signing xenial would be +prohibitively slow. + +The next question was where all that time was going. I thought perhaps that +the actual signing might be slow for some reason, but it was taking about +half a second per archive: not great, but not enough to account for most of +the slowness. The main part of the delay was in fact when we committed the +database transaction after processing each archive, but not in the actual +PostgreSQL commit, rather in the ORM `invalidate` method called to prepare for a commit. + +Launchpad uses the excellent [Storm](https://storm.canonical.com/) for all +of its database interactions. One property of this ORM (and possibly of +others; I'll cheerfully admit to not having spent much time with other ORMs) +is that it uses a +[WeakValueDictionary](https://docs.python.org/2/library/weakref.html#weakref.WeakValueDictionary) +to keep track of the objects it's populated with database results. Before +it commits a transaction, it iterates over all those "alive" objects to note +that if they're used in future then information needs to be reloaded from +the database first. Usually this is a very good thing: it saves us from +having to think too hard about data consistency at the application layer. +But in this case, one of the things we did at the start of the publisher +script was: + + :::python + def getPPAs(self, distribution): + """Find private package archives for the selected distribution.""" + if (self.isCareful(self.options.careful_publishing) or + self.options.include_non_pending): + return distribution.getAllPPAs() + else: + return distribution.getPendingPublicationPPAs() + + def getTargetArchives(self, distribution): + """Find the archive(s) selected by the script's options.""" + if self.options.partner: + return [distribution.getArchiveByComponent('partner')] + elif self.options.ppa: + return filter(is_ppa_public, self.getPPAs(distribution)) + elif self.options.private_ppa: + return filter(is_ppa_private, self.getPPAs(distribution)) + elif self.options.copy_archive: + return self.getCopyArchives(distribution) + else: + return [distribution.main_archive] + +That innocuous-looking `filter` means that we do all the public/private +filtering of PPAs up-front and return a list of all the PPAs we intend to +operate on. This means that all those objects are alive as far as Storm is +concerned and need to be considered for invalidation on every commit, and +the time required for that stacks up when many thousands of objects are +involved: this is essentially [accidentally +quadratic](http://accidentallyquadratic.tumblr.com/) behaviour, because all +archives are considered when committing changes to each archive in turn. +Normally this isn't too bad because only a few hundred PPAs need to be +processed in any given run; but if we're running in a mode where we're +processing all PPAs rather than just ones that are pending publication, then +suddenly this balloons to the point where it takes a couple of seconds. The +[fix](https://code.launchpad.net/~cjwatson/launchpad/publish-distro-many-ppas/+merge/289925) +is very simple, using an +[iterator](https://docs.python.org/2/library/stdtypes.html#typeiter) instead +so that we don't need to keep all the objects alive: + + :::python + from itertools import ifilter + + def getTargetArchives(self, distribution): + """Find the archive(s) selected by the script's options.""" + if self.options.partner: + return [distribution.getArchiveByComponent('partner')] + elif self.options.ppa: + return ifilter(is_ppa_public, self.getPPAs(distribution)) + elif self.options.private_ppa: + return ifilter(is_ppa_private, self.getPPAs(distribution)) + elif self.options.copy_archive: + return self.getCopyArchives(distribution) + else: + return [distribution.main_archive] + +After that, I turned to that half a second for signing. A good chunk of +that was accounted for by the `signContent` method taking a fingerprint +rather than a key, despite the fact that we normally already had the key in +hand; this caused us to have to ask GPGME to reload the key, which requires +two subprocess calls. Converting this to [take a key rather than a +fingerprint](https://code.launchpad.net/~cjwatson/launchpad/faster-gpg-operations/+merge/289950) +gets the per-archive time down to about a quarter of a second on our staging +system, about eight times faster than where we started. + +Using this, we've now re-signed all xenial `Release` files in PPAs using +SHA-512 digests. On production, this took about 80 minutes for 1761 +affected archives. This is over two seconds per modified archive, but in +practice most of the time appears to have been spent skipping over +unmodified archives; even a few hundredths of a second per archive adds up +quickly there. There's certainly still room for speeding this up a bit. + +We wouldn't want to do this procedure every day, but it's acceptable for +occasional tasks like this. I expect that we'll similarly re-sign wily, +vivid, and trusty `Release` files soon in the same way.