--- /dev/null
+Title: Re-signing PPAs
+Slug: re-signing-ppas
+Date: 2016-03-30 10:20:32 +01:00
+Category: launchpad
+Tags: launchpad, ubuntu, planet-debian, planet-ubuntu
+
+Julian has
+[written](https://juliank.wordpress.com/2016/03/14/dropping-sha-1-support-in-apt/)
+about their efforts to strengthen security in APT, and shortly before that
+[notified](https://bugs.launchpad.net/bugs/1556666) us that Launchpad's
+signatures on <acronym title="Personal Package Archives">PPAs</acronym> use
+weak SHA-1 digests. Unfortunately we hadn't noticed that before; GnuPG's
+defaults tend to result in weak digests unless carefully tweaked, which is a
+shame.
+
+I started on the necessary fixes for this immediately we heard of the
+problem, but it's taken a little while to get everything in place, and I
+thought I'd explain why since some of the problems uncovered are interesting
+in their own right.
+
+Firstly, there was the relatively trivial matter of [using SHA-512 digests
+on new
+signatures](https://code.launchpad.net/~cjwatson/launchpad/digest-algo-sha512/+merge/289052).
+This was mostly a matter of adjusting our configuration, although writing
+the test was a bit tricky since
+[PyGPGME](https://pypi.python.org/pypi/pygpgme) isn't as helpful as it could
+be. (Simpler repository implementations that call `gpg` from the command
+line should probably just add the `--digest-algo SHA512` option instead of
+imitating this.)
+
+After getting that in place, any change to a suite in a PPA will result in
+it being re-signed with SHA-512, which is good as far as it goes, but we
+also want to re-sign PPAs that haven't been modified. Launchpad hosts more
+than 50000 PPAs, though, a significant percentage of which include packages
+for sufficiently recent Ubuntu releases that we'd want to re-sign them for
+this. We can't expect everyone to push new uploads, and we need to run this
+through at least some part of our usual publication machinery rather than
+just writing a hacky shell script to do the job (which would have no idea
+which keys to sign with, to start with); but forcing full reprocessing of
+all those PPAs would take a prohibitively long time, and at the moment we
+need to interrupt normal PPA publication to do this kind of work. I
+therefore had to spend some quality time working out how to make things go
+fast enough.
+
+The first couple of changes
+([1](https://code.launchpad.net/~cjwatson/launchpad/publish-distro-careful-release/+merge/289401),
+[2](https://code.launchpad.net/~cjwatson/launchpad/publish-distro-disable-steps/+merge/289658))
+were to add options to our publisher script to let us run just the one step
+we need in "careful" mode: that is, forcibly re-run the `Release` file
+processing step even if it thinks nothing has changed, and entirely disable
+the other steps such as generating `Packages` and `Sources` files. Then
+last week I finally got around to timing things on one of our staging
+systems so that we could estimate how long a full run would take. It was
+taking a little over two seconds per archive, which meant that if we were to
+re-sign all published PPAs then that would take more than 33 hours!
+Obviously this wasn't viable; even just re-signing xenial would be
+prohibitively slow.
+
+The next question was where all that time was going. I thought perhaps that
+the actual signing might be slow for some reason, but it was taking about
+half a second per archive: not great, but not enough to account for most of
+the slowness. The main part of the delay was in fact when we committed the
+database transaction after processing each archive, but not in the actual
+PostgreSQL commit, rather in the <acronym title="object-relational
+mapper">ORM</acronym> `invalidate` method called to prepare for a commit.
+
+Launchpad uses the excellent [Storm](https://storm.canonical.com/) for all
+of its database interactions. One property of this ORM (and possibly of
+others; I'll cheerfully admit to not having spent much time with other ORMs)
+is that it uses a
+[WeakValueDictionary](https://docs.python.org/2/library/weakref.html#weakref.WeakValueDictionary)
+to keep track of the objects it's populated with database results. Before
+it commits a transaction, it iterates over all those "alive" objects to note
+that if they're used in future then information needs to be reloaded from
+the database first. Usually this is a very good thing: it saves us from
+having to think too hard about data consistency at the application layer.
+But in this case, one of the things we did at the start of the publisher
+script was:
+
+ :::python
+ def getPPAs(self, distribution):
+ """Find private package archives for the selected distribution."""
+ if (self.isCareful(self.options.careful_publishing) or
+ self.options.include_non_pending):
+ return distribution.getAllPPAs()
+ else:
+ return distribution.getPendingPublicationPPAs()
+
+ def getTargetArchives(self, distribution):
+ """Find the archive(s) selected by the script's options."""
+ if self.options.partner:
+ return [distribution.getArchiveByComponent('partner')]
+ elif self.options.ppa:
+ return filter(is_ppa_public, self.getPPAs(distribution))
+ elif self.options.private_ppa:
+ return filter(is_ppa_private, self.getPPAs(distribution))
+ elif self.options.copy_archive:
+ return self.getCopyArchives(distribution)
+ else:
+ return [distribution.main_archive]
+
+That innocuous-looking `filter` means that we do all the public/private
+filtering of PPAs up-front and return a list of all the PPAs we intend to
+operate on. This means that all those objects are alive as far as Storm is
+concerned and need to be considered for invalidation on every commit, and
+the time required for that stacks up when many thousands of objects are
+involved: this is essentially [accidentally
+quadratic](http://accidentallyquadratic.tumblr.com/) behaviour, because all
+archives are considered when committing changes to each archive in turn.
+Normally this isn't too bad because only a few hundred PPAs need to be
+processed in any given run; but if we're running in a mode where we're
+processing all PPAs rather than just ones that are pending publication, then
+suddenly this balloons to the point where it takes a couple of seconds. The
+[fix](https://code.launchpad.net/~cjwatson/launchpad/publish-distro-many-ppas/+merge/289925)
+is very simple, using an
+[iterator](https://docs.python.org/2/library/stdtypes.html#typeiter) instead
+so that we don't need to keep all the objects alive:
+
+ :::python
+ from itertools import ifilter
+
+ def getTargetArchives(self, distribution):
+ """Find the archive(s) selected by the script's options."""
+ if self.options.partner:
+ return [distribution.getArchiveByComponent('partner')]
+ elif self.options.ppa:
+ return ifilter(is_ppa_public, self.getPPAs(distribution))
+ elif self.options.private_ppa:
+ return ifilter(is_ppa_private, self.getPPAs(distribution))
+ elif self.options.copy_archive:
+ return self.getCopyArchives(distribution)
+ else:
+ return [distribution.main_archive]
+
+After that, I turned to that half a second for signing. A good chunk of
+that was accounted for by the `signContent` method taking a fingerprint
+rather than a key, despite the fact that we normally already had the key in
+hand; this caused us to have to ask GPGME to reload the key, which requires
+two subprocess calls. Converting this to [take a key rather than a
+fingerprint](https://code.launchpad.net/~cjwatson/launchpad/faster-gpg-operations/+merge/289950)
+gets the per-archive time down to about a quarter of a second on our staging
+system, about eight times faster than where we started.
+
+Using this, we've now re-signed all xenial `Release` files in PPAs using
+SHA-512 digests. On production, this took about 80 minutes for 1761
+affected archives. This is over two seconds per modified archive, but in
+practice most of the time appears to have been spent skipping over
+unmodified archives; even a few hundredths of a second per archive adds up
+quickly there. There's certainly still room for speeding this up a bit.
+
+We wouldn't want to do this procedure every day, but it's acceptable for
+occasional tasks like this. I expect that we'll similarly re-sign wily,
+vivid, and trusty `Release` files soon in the same way.