From: Colin Watson Date: Fri, 25 Sep 2020 11:02:00 +0000 (+0100) Subject: Porting Launchpad to Python 3: progress report X-Git-Url: https://www.chiark.greenend.org.uk/ucgi/~cjwatson/git?a=commitdiff_plain;h=4853a9df2b495d06420aed1d0691b97cd3197311;p=blog.git Porting Launchpad to Python 3: progress report --- diff --git a/content/lp-python3-progress.md b/content/lp-python3-progress.md new file mode 100644 index 00000000..b7541887 --- /dev/null +++ b/content/lp-python3-progress.md @@ -0,0 +1,118 @@ +Title: Porting Launchpad to Python 3: progress report +Slug: lp-python3-progress +Date: 2020-09-25 12:01:40 +01:00 +Category: launchpad +Tags: launchpad, planet-debian, planet-ubuntu + +[Launchpad](https://launchpad.net/) still requires Python 2, which in 2020 +is [a bit of a problem](https://www.python.org/doc/sunset-python-2/). +Unlike a lot of the rest of 2020, though, there's good reason to be +optimistic about progress. + +I've been porting Python 2 code to Python 3 on and off for a long time, from +back when I was on the Ubuntu Foundations team and maintaining things like +the [Ubiquity installer](https://launchpad.net/ubiquity). When I moved to +Launchpad in 2015 it was certainly on my mind that this was a large body of +code still stuck on Python 2. One option would have been to just accept +that and leave it as it is, maybe doing more backporting work over time as +support for Python 2 fades away. I've long been of the opinion that this +would doom Launchpad to being unmaintainable in the long run, and since I +genuinely love working on Launchpad - I find it an incredibly rewarding +project - this wasn't something I was willing to accept. We're already +seeing some of our important dependencies dropping support for Python 2, +which is perfectly reasonable on their terms but which is starting to become +a genuine obstacle to delivering important features when we need new +features from newer versions of those dependencies. It also looks as though +it may be difficult for us to run on Ubuntu 20.04 LTS (we're currently on +16.04, with an upgrade to 18.04 in progress) as long as we still require +Python 2, since we have some system dependencies that 20.04 no longer +provides. And then there are exciting new features like [type +hints](https://docs.python.org/3/library/typing.html) and +[async/await](https://docs.python.org/3/library/asyncio.html) that we'd like +to be able to use. + +However, until last year there were so many blockers that even considering a +port was barely conceivable. What changed in 2019 was sorting out a +trifecta of core dependencies. We [ported]({filename}/storm-py3.md) our +database layer, [Storm](https://storm.canonical.com/). We +[upgraded](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/376781) +to modern versions of our [Zope](https://www.zope.org/) Toolkit dependencies +(after contributing various fixes upstream, including some substantial +changes to Zope's [test runner](https://pypi.org/project/zope.testrunner/) +that we'd carried as local patches for some years). And we +[ported](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/373805) +our Bazaar code hosting infrastructure to +[Breezy](https://www.breezy-vcs.org/). With all that in place, a port +seemed more of a realistic possibility. + +Still, even with this, it was never going to be a matter of just following +some [standard porting advice](http://python3porting.com/) and calling it +good. Launchpad has almost a million lines of Python code in its [main git +tree](https://git.launchpad.net/launchpad), and around 250 dependencies of +which a number are quite Launchpad-specific. In a project that size, not +only is following standard porting advice an extremely time-consuming task +in its own right, but just about every strange corner case is going to show +up somewhere. (Did you know that `StringIO.StringIO(None)` and +`io.StringIO(None)` do different things even after you account for the +native string vs. Unicode text difference? How about [the behaviour of +`.union()` on a subclass of +`frozenset`](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/385711)?) +Launchpad's test suite is fortunately extremely thorough, but even just +starting up the test suite involves importing most of the data model code, +so before you can start taking advantage of it you have to make a large +fraction of the codebase be at least syntactically-correct Python 3 code and +use only modules that exist in Python 3 while still working in Python 2; in +a project this size that turns out to be a large effort on its own, and can +be quite +[risky](https://blog.launchpad.net/general/login-regression-for-users-with-non-ascii-names) +in places. + +Canonical's product engineering teams work on a six-month cycle, but it just +isn't possible to cram this sort of thing into six months unless you do +literally nothing else, and "please can we put all feature development on +hold while we run to stand still" is a pretty tough sell to even the most +understanding management. Fortunately, we've been able to grow the +[Launchpad team](https://launchpad.net/~launchpad) in the last year or so, +and so it's been possible to put "Python 3" on our roadmap in the +understanding that we aren't going to get all the way there in one cycle, +while still being able to do other substantial feature development work as +well. + +So, with all that preamble, what have we done this cycle? We've taken a +two-pronged approach. From one end, we identified 147 classes that needed +to be ported away from some compatibility code in our database layer that +was substantially less friendly to Python 3: we've ported 38 of those, so +there's clearly a fair bit more to do, but we were able to distribute this +work out among the team quite effectively. From the other end, it was clear +that it would be very inefficient to do general porting work when any +attempt to even run the test suite would run straight into the same crashes +in the same order, so I set myself a target of getting the test suite to +start up, and started hacking on an enormous git branch that I never +expected to try to land directly: instead, I felt free to commit just about +anything that looked reasonable and moved things forward even if it was very +rough, and every so often went back to tidy things up and cherry-pick +individual commits into a form that included some kind of explanation and +passed existing tests so that I could propose them for review. + +This strategy has been dramatically more successful than anything I've tried +before at this scale. So far this cycle, considering only Launchpad's main +git tree, we've landed 137 Python-3-relevant merge proposals for a total of +39552 lines of `git diff` output, keeping our existing tests passing along +the way and deploying incrementally to production. We have about 27000 more +lines of patch at varying degrees of quality to tidy up and merge. Our main +development branch is only perhaps 10 or 20 more patches away from the test +suite being able to start up, at which point we'll be able to get a buildbot +running so that multiple developers can work on this much more easily and +see the effect of their work. With the full unlanded patch stack, about 75% +of the test suite passes on Python 3! This still leaves a long tail of +several thousand tests to figure out and fix, but it's a much more +incrementally-tractable kind of problem than where we started. + +Finally: the funniest (to me) bug I've encountered in this effort was the +one I encountered in the test runner and fixed in +[zopefoundation/zope.testrunner#106](https://github.com/zopefoundation/zope.testrunner/pull/106): +IDs of failing tests were written to a pipe, so if you have a test suite +that's large enough and broken enough then eventually that pipe would reach +its capacity and your test runner would just give up and hang. Pretty +annoying when it meant an overnight test run didn't give useful results, but +also eloquent commentary of sorts.