From f0672c283133d688dbbcceb1de9cbc3a73d51ce5 Mon Sep 17 00:00:00 2001 From: Colin Watson Date: Mon, 2 Aug 2021 11:36:08 +0100 Subject: [PATCH] Launchpad now runs on Python 3! --- content/lp-python3.md | 525 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 525 insertions(+) create mode 100644 content/lp-python3.md diff --git a/content/lp-python3.md b/content/lp-python3.md new file mode 100644 index 00000000..7512d333 --- /dev/null +++ b/content/lp-python3.md @@ -0,0 +1,525 @@ +Title: Launchpad now runs on Python 3! +Slug: lp-python3 +Date: 2021-08-02 11:34:29 +01:00 +Category: launchpad +Tags: launchpad, planet-debian, planet-ubuntu + +After a [very long porting journey]({filename}/lp-python3-progress.md), +[Launchpad](https://launchpad.net/) is finally running on Python 3 across +all of our systems. + +I wanted to take a bit of time to reflect on why my emotional responses to +this port differ so much from those of some others who've done large ports, +such as the [Mercurial +maintainers](https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/). +It's hard to deny that we've had to burn a lot of time on this, which I'm +sure has had an opportunity cost, and from one point of view it's +essentially running to stand still: there is no single compelling feature +that we get solely by porting to Python 3, although it's clearly a +prerequisite for tidying up old compatibility code and being able to use +modern language facilities in the future. And yet, on the whole, I found +this a rewarding project and enjoyed doing it. + +Some of this may be because by inclination I'm a maintenance programmer and +actually enjoy this sort of thing. My default view tends to be that +software version upgrades may be a pain but it's much better to get that +pain over with as soon as you can rather than trying to hold back the tide; +you can certainly get involved and try to shape where things end up, but +rightly or wrongly I can't think of many cases when a righteously indignant +user base managed to arrange for the old version to be maintained in +perpetuity so that they never had to deal with the new thing (OK, maybe Perl +5 counts here). + +I think a more compelling difference between Launchpad and Mercurial, +though, may be that very few other people really had a vested interest in +what Python version Launchpad happened to be running, because it's all +server-side code (aside from some client libraries such as +[`launchpadlib`](https://pypi.org/project/launchpadlib), which were ported +years ago). As such, we weren't trying to do this with the internet having +Strong Opinions at us. We were doing this because it was obviously the only +long-term-maintainable path forward, and in more recent times because some +of our library dependencies were starting to drop support for Python 2 and +so it was obviously going to become a practical problem for us sooner or +later; but if we'd just stayed on Python 2 forever then fundamentally hardly +anyone else would really have cared directly, only maybe about some indirect +consequences of that. I don't follow Mercurial development so I may be +entirely off-base, but if other people were yelling at me about how late my +project was to finish its port, that *in itself* would make me feel more +negatively about the project even if I thought it was a good idea. Having +most of the pressure come from ourselves rather than from outside meant that +wasn't an issue for us. + +I'm somewhat inclined to think of the process as an extreme version of +paying down technical debt. Moving from Python 2.7 to 3.5, as we just did, +means skipping over multiple language versions in one go, and if similar +changes had been made more gradually it would probably have felt a lot more +like the typical dependency update treadmill. I appreciate why not everyone +might want to think of it this way: maybe this is just my own +rationalization. + +## Reflections on porting to Python 3 + +I'm not going to defend the Python 3 migration process; it was pretty rough +in a lot of ways. Nor am I going to spend much effort relitigating it here, +as it's already been done to death elsewhere, and as I understand it the +core Python developers have got the message loud and clear by now. At a +bare minimum, a lot of valuable time was lost early in Python 3's lifetime +hanging on to flag-day-type porting strategies that were impractical for +large projects, when it should have been providing for "bilingual" +strategies (code that runs in both Python 2 and 3 for a transitional period) +which is where most libraries and most large migrations ended up in +practice. For instance, the early advice to library maintainers to maintain +two parallel versions or perhaps translate dynamically with `2to3` was +entirely impractical in most non-trivial cases and wasn't what most people +ended up doing, and yet the idea that `2to3` is all you need still floats +around Stack Overflow and the like as a result. (These days, I would +probably point people towards something more like [Eevee's porting +FAQ](https://eev.ee/blog/2016/07/31/python-faq-how-do-i-port-to-python-3/) +as somewhere to start.) + +There are various fairly straightforward things that people often suggest +could have been done to smooth the path, and I largely agree: not removing +the `u''` string prefix only to put it back in 3.3, fewer gratuitous +compatibility breaks in the name of tidiness, and so on. But if I had a +time machine, the number one thing I would ask to have been done differently +would be introducing type annotations in Python 2 before Python 3 branched +off. It's true that it's [technically +possible](https://www.python.org/dev/peps/pep-0484/#suggested-syntax-for-python-2-7-and-straddling-code) +to do type annotations in Python 2, but the fact that it's a different +syntax that would have to be fixed later is offputting, and in practice it +wasn't widely used in Python 2 code. To make a significant difference to +the ease of porting, annotations would need to have been introduced early +enough that lots of Python 2 library code used them so that porting code +didn't have to be quite so much of an exercise of manually figuring out the +exact nature of string types from context. + +Launchpad is a complex piece of software that interacts with multiple +domains: for example, it deals with a database, HTTP, web page rendering, +Debian-format archive publishing, and multiple revision control systems, and +there's often overlap between domains. Each of these tends to imply +different kinds of string handling. Web page rendering is normally done +mainly in Unicode, converting to bytes as late as possible; revision control +systems normally want to spend most of their time working with bytes, +although the exact details vary; HTTP is of course bytes on the wire, but +Python's WSGI interface has some [string type +subtleties](https://www.python.org/dev/peps/pep-3333/#a-note-on-string-types). +In practice I found myself thinking about at least four string-like "types" +(that is, things that in a language with a stricter type system I might well +want to define as distinct types and restrict conversion between them): +bytes, text, "ordinary" native strings (`str` in either language, encoded to +UTF-8 in Python 2), and native strings with WSGI's encoding rules. Some of +these are emergent properties of writing in the intersection of Python 2 and +3, which is effectively a specialized language of its own without coherent +official documentation whose users must intuit its behaviour by comparing +multiple sources of information, or by referring to unofficial porting +guides: not a very satisfactory situation. Fortunately much of the +complexity collapses once it becomes possible to write solely in Python 3. + +Some of the difficulties we ran into are not ones that are typically thought +of as Python 2-to-3 porting issues, because they were changed later in +Python 3's development process. For instance, the `email` module was +substantially improved in around the 3.2/3.3 timeframe to handle Python 3's +bytes/text model more correctly, and since Launchpad sends quite a few +different kinds of email messages and has some quite picky tests for exactly +what it emits, this entailed a lot of work in our email sending code and in +our test suite to account for that. (It took me a while to work out whether +we should be treating raw email messages as bytes or as text; bytes turned +out to work best.) 3.4 made some tweaks to the implementation of +quoted-printable encoding that broke a number of our tests in ways that took +some effort to fix, because the tests needed to work on both 2.7 and 3.5. +The list goes on. I got quite proficient at digging through Python's git +history to figure out when and why some particular bit of behaviour had +changed. + +One of the thorniest problems was parsing HTTP form data. We mainly rely on +[`zope.publisher`](https://pypi.org/project/zope.publisher) for this, which +in turn relied on +[`cgi.FieldStorage`](https://docs.python.org/3/library/cgi.html); but +`cgi.FieldStorage` is [badly broken in some +situations](https://bugs.python.org/issue27777) on Python 3. Even if that +bug were fixed in a more recent version of Python, we can't easily use +anything newer than 3.5 for the first stage of our port due to the version +of the base OS we're currently running, so it wouldn't help much. In the +end I fixed some minor issues in the +[`multipart`](https://pypi.org/project/multipart) module (and was kindly +given co-maintenance of it) and [converted `zope.publisher` to use +it](https://github.com/zopefoundation/zope.publisher/pull/55). Although +this took a while to sort out, it seems to have gone very well. + +A couple of other interesting late-arriving issues were around +[`pickle`](https://docs.python.org/3/library/pickle.html). For most things +we normally prefer safer formats such as JSON, but there are a few cases +where we use pickle, particularly for our session databases. One of my +colleagues pointed out that I needed to remember to tell `pickle` to [stick +to protocol +2](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398534), +so that we'd be able to switch back and forward between Python 2 and 3 for a +while; quite right, and we later ran into a similar problem with +[`marshal`](https://docs.python.org/3/library/marshal.html) too. A more +surprising problem was that `datetime.datetime` objects pickled on Python 2 +[require special care](https://bugs.python.org/issue22005) when unpickling +on Python 3; rather than the approach that ended up being implemented and +[documented](https://docs.python.org/3/library/pickle.html#pickle.Unpickler) +for Python 3.6, though, I preferred a [custom +unpickler](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/399133), +both so that things would work on Python 3.5 and so that I wouldn't have to +risk affecting the decoding of other pickled strings in the session +database. + +## General lessons + +Writing this over a year after Python 2's end-of-life date, and certainly +nowhere near the leading edge of Python 3 porting work, it's perhaps more +useful to look at this in terms of the lessons it has for other large +technical debt projects. + +I mentioned in my [previous article]({filename}/lp-python3-progress.md) that +I used the approach of an enormous and frequently-rebased git branch as a +working area for the port, committing often and sometimes combining and +extracting commits for review once they seemed to be ready. A port of this +scale would have been entirely intractable without a tool of similar power +to `git rebase`, so I'm very glad that we finished migrating to git in 2019. +I relied on this right up to the end of the port, and it also allowed for +quick assessments of how much more there was to land. [git +worktree](https://git-scm.com/docs/git-worktree) was also helpful, in that I +could easily maintain working trees built for each of Python 2 and 3 for +comparison. + +As is usual for most multi-developer projects, all changes to Launchpad need +to go through code review, although we sometimes make exceptions for very +simple and obvious changes that can be self-reviewed. Since I knew from the +outset that this was going to generate a lot of changes for review, I +therefore structured my work from the outset to try to make it as easy as +possible for my colleagues to review it. This generally involved keeping +most changes to a somewhat manageable size of 800 lines or less (although +this wasn't always possible), and arranging commits mainly according to the +kind of change they made rather than their location. For example, when I +needed to fix issues with `/` in Python 3 being true division rather than +floor division, I did so in [one +commit](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/396326) +across the various places where it mattered and took care not to mix it with +other unrelated changes. This is good practice for nearly any kind of +development, but it was especially important here since it allowed reviewers +to consider a clear explanation of what I was doing in the commit message +and then skim-read the rest of it much more quickly. + +It was vital to keep the codebase in a working state at all times, and +deploy to production reasonably often: this way if something went wrong the +amount of code we had to debug to figure out what had happened was always +tractable. (Although I can't seem to find it now to link to it, I saw an +account a while back of a company that had taken a flag-day approach instead +with a large codebase. It seemed to work for them, but I'm certain we +couldn't have made it work for Launchpad.) + +I can't speak too highly of Launchpad's test suite, much of which originated +before my time. Without a great deal of extensive coverage of all sorts of +interesting edge cases at both the unit and functional level, and a +corresponding culture of maintaining that test suite well when making new +changes, it would have been impossible to be anything like as confident of +the port as we were. + +As part of the porting work, we split out a couple of substantial chunks of +the Launchpad codebase that could easily be decoupled from the core: its +[Mailman integration](https://launchpad.net/lp-mailman) and its [code import +worker](https://launchpad.net/lp-codeimport). Both of these had substantial +dependencies with complex requirements for porting to Python 3, and +arranging to be able to do these separately on their own schedule was +absolutely worth it. Like disentangling balls of wool, any opportunity you +can take to make things less tightly-coupled is probably going to make it +easier to disentangle the rest. (I can see a tractable way forward to +porting the code import worker, so we may well get that done soon. Our +Mailman integration will need to be rewritten, though, since it currently +depends on the Python-2-only Mailman 2, and Mailman 3 has a different +architecture.) + +## Python lessons + +Our [database layer]({filename}/storm-py3.md) was already in pretty good +shape for a port, since at least the modern bits of its table modelling +interface were already strict about using Unicode for text columns. If you +have any kind of pervasive low-level framework like this, then making it be +pedantic at you in advance of a Python 3 port will probably incur much less +swearing in the long run, as you won't be trying to deal with quite so many +bytes/text issues at the same time as everything else. + +Early in our port, we established a standard set of +[`__future__`](https://docs.python.org/3/library/__future__.html) imports +and started incrementally converting files over to them, mainly because we +weren't yet sure what else to do and it seemed likely to be helpful. +`absolute_import` was definitely reasonable (and not often a problem in our +code), and `print_function` was annoying but necessary. In hindsight I'm +not sure about `unicode_literals`, though. For files that only deal with +bytes and text it was reasonable enough, but as I mentioned above there were +also a number of cases where we needed literals of the language's native +`str` type, i.e. bytes in Python 2 and text in Python 3: this was +particularly noticeable in WSGI contexts, but also cropped up in [some other +surprising +places](https://github.com/zopefoundation/zope.configuration/pull/19). We +generally either omitted `unicode_literals` or used `six.ensure_str` in such +cases, but it was definitely a bit awkward and maybe I should have listened +more to people telling me it might be a bad idea. + +A lot of Launchpad's early tests used +[doctest](https://docs.python.org/3/library/doctest.html), mainly in the +[style](https://docs.python.org/3/library/doctest.html#simple-usage-checking-examples-in-a-text-file) +where you have text files that interleave narrative commentary with +examples. The development team later reached consensus that this was best +avoided in most cases, but by then there were far too many doctests to +conveniently rewrite in some other form. Porting doctests to Python 3 is +really annoying. You run into all the little changes in how objects are +represented as text (particularly `u'...'` versus `'...'`, but plenty of +other cases as well); you have next to no tools to do anything useful like +skipping individual bits of a doctest that don't apply; using `__future__` +imports requires the rather obscure approach of adding the relevant names to +the doctest's globals in the relevant `DocFileSuite` or `DocTestSuite`; +dealing with many exception tracebacks requires something like +[`zope.testing.renormalizing`](https://github.com/zopefoundation/zope.testing/blob/master/src/zope/testing/renormalizing.py); +and whatever code refactoring tools you're using probably don't work +properly. Basically, don't have done that. It did all turn out to be +tractable for us in the end, and I managed to avoid using much in the way of +fragile doctest extensions aside from the aforementioned +`zope.testing.renormalizing`, but it was not an enjoyable experience. + +## Regressions + +I know of nine regressions that reached Launchpad's production systems as a +result of this porting work; of course there were various other regressions +caught by CI or in manual testing. (Considering the size of this project, I +count it as a resounding success that there were only nine production +issues, and that for the most part we were able to fix them quickly.) + +### Equality testing of removed database objects + +One of the things we had to do while porting to Python 3 was to +[implement](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398087) +the `__eq__`, `__ne__`, and `__hash__` special methods for all our database +objects. This was quite conceptually fiddly, because doing this requires +knowing each object's primary key, and that may not yet be available if +we've created an object in Python but not yet flushed the actual `INSERT` +statement to the database (most of our primary keys are auto-incrementing +sequences). We thus had to take care to flush pending SQL statements in +such cases in order to ensure that we know the primary keys. + +However, it's possible to have a problem at the other end of the object +lifecycle: that is, a Python object might still be reachable in memory even +though the underlying row has been `DELETE`d from the database. In most +cases we don't keep removed objects around for obvious reasons, but it can +happen in caching code, and buildd-manager +[crashed](https://bugs.launchpad.net/launchpad/+bug/1916522) as a result (in +fact while it was still running on Python 2). We had to [take extra +care](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398498) +to avoid this problem. + +### Debian imports crashed on non-UTF-8 filenames + +Python 2 has some [unfortunate +behaviour](https://bugs.launchpad.net/launchpad/+bug/1917449) around passing +bytes or Unicode strings (depending on the platform) to `shutil.rmtree`, and +the combination of some [porting +work](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398367) +and a particular source package in Debian that contained a non-UTF-8 file +name caused us to run into this. The +[fix](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398971) +was to ensure that the argument passed to `shutil.rmtree` is a `str` +regardless of Python version. + +We'd actually run into [something +similar](https://code.launchpad.net/~cjwatson/turnip/+git/turnip/+merge/359051) +before: it's a subtle porting gotcha, since it's quite easy to end up +passing Unicode strings to `shutil.rmtree` if you're in the process of +porting your code to Python 3, and you might easily not notice if the file +names in your tests are all encoded using UTF-8. + +### lazr.restful ETags + +We eventually got far enough along that we could switch one of our four +appserver machines (we have quite a number of other machines too, but the +appservers handle web and API requests) to Python 3 and see what happened. +By this point our extensive test suite had shaken out the vast majority of +the things that could go wrong, but there was always going to be room for +some interesting edge cases. + +One of the Ubuntu kernel team reported that they were seeing an increase in +[412 Precondition +Failed](https://httpstatusdogs.com/412-precondition-failed) errors in some +of their scripts that use our webservice API. These can happen when you're +trying to modify an existing resource: the underlying protocol involves +sending an `If-Match` header with the `ETag` that the client thinks the +resource has, and if this doesn't match the `ETag` that the server calculates +for the resource then the client has to refresh its copy of the resource and +try again. We initially thought that this might be legitimate since it can +happen in normal operation if you collide with another client making changes +to the same resource, but it soon became clear that something stranger was +going on: we were getting inconsistent `ETag`s for the same object even when +it was unchanged. Since we'd recently switched a quarter of our appservers +to Python 3, that was a natural suspect. + +Our `lazr.restful` package provides the framework for our webservice API, +and roughly speaking it generates `ETag`s by serializing objects into some +kind of canonical form and hashing the result. Unfortunately the +serialization was dependent on the Python version in a few ways, and in +particular it serialized lists of strings such as lists of bug tags +differently: Python 2 used `[u'foo', u'bar', u'baz']` where Python 3 used +`['foo', 'bar', 'baz']`. In `lazr.restful` 1.0.3 we [switched to using +JSON](https://code.launchpad.net/~cjwatson/lazr.restful/etag-json/+merge/402920) +for this, removing the Python version dependency and ensuring consistent +behaviour between appservers. + +### Memory leaks + +This problem took the longest to solve. We noticed fairly quickly from our +graphs that the appserver machine we'd switched to Python 3 had a serious +memory leak. Our appservers had always been a bit leaky, but now it wasn't +so much "a small hole that we can bail occasionally" as "the boat is sinking +rapidly": + +![A serious memory leak]({static}/images/chaenomeles-leak.png) + +(Yes, this got in the way of working out what was going on with `ETag`s for +a while.) + +I spent ages messing around with various attempts to fix this. Since only +a quarter of our appservers were affected, and we could get by on 75% +capacity for a while, it wasn't urgent but it was definitely annoying. +After spending some quality time with +[objgraph](https://mg.pov.lt/objgraph/), for +some time I thought [traceback reference +cycles](https://cosmicpercolator.com/2016/01/13/exception-leaks-in-python-2-and-3/) +might be at fault, and I sent a number of fixes to various upstream projects +for those (e.g. +[zope.pagetemplate](https://github.com/zopefoundation/zope.pagetemplate/pull/27)). +Those didn't help the leaks much though, and after a while it became clear +to me that this couldn't be the sole problem: Python has a cyclic garbage +collector that will eventually collect reference cycles as long as there are +no strong references to any objects in them, although it might not happen +very quickly. Something else must be going on. + +Debugging reference leaks in any non-trivial and long-running Python program +is extremely arduous, especially with ORMs that naturally tend to end up +with lots of cycles and caches. After a while I formed a hypothesis that +[zope.server](https://pypi.org/project/zope.server) might be keeping a +strong reference to something, although I never managed to nail it down more +firmly than that. This was an attractive theory as we were already in the +process of migrating to [Gunicorn](https://docs.gunicorn.org/en/stable/) for +other reasons anyway, and Gunicorn also has a convenient +[`max_requests`](https://docs.gunicorn.org/en/stable/settings.html#max-requests) +setting that's good at mitigating memory leaks. Getting this all in place +took some time, but once we did we found that everything was much more +stable: + +![A rather flat memory graph]({static}/images/chaenomeles-stable.png) + +This isn't completely satisfying as we never quite got to the bottom of the +leak itself, and it's entirely possible that we've only papered over it +using `max_requests`: I expect we'll gradually back off on how frequently we +restart workers over time to try to track this down. However, +pragmatically, it's no longer an operational concern. + +### Mirror prober HTTPS proxy handling + +After we switched our script servers to Python 3, we had several reports of +[mirror probing +failures](https://bugs.launchpad.net/launchpad/+bug/1935999). (Launchpad +keeps lists of Ubuntu archive and image mirrors, and probes them every so +often to check that they're reasonably complete and up to date.) This only +affected HTTPS mirrors when probed via a proxy server, support for which is +a relatively recent feature in Launchpad and involved some code that we +never managed to unit-test properly: of course this is exactly the code that +went wrong. Sadly I wasn't able to sort out that gap, but at least the +[fix](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/405688) +was simple. + +### Non-MIME-encoded email headers + +As I mentioned above, there were substantial changes in the `email` package +between Python 2 and 3, and indeed between minor versions of Python 3. Our +test coverage here is pretty good, but it's an area where it's very easy to +have gaps. We noticed that a script that processes incoming email was +crashing on messages with headers that were non-ASCII but not +[MIME-encoded](https://datatracker.ietf.org/doc/html/rfc2047.html) (and +indeed then crashing again when it tried to send a notification of the +crash!). The only examples of these I looked at were spam, but we still +didn't want to crash on them. + +The +[fix](https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/405924) +involved being somewhat more careful about both the handling of headers +returned by Python's email parser and the building of outgoing email +notifications. This seems to be working well so far, although I wouldn't be +surprised to find the odd other incorrect detail in this sort of area. + +### Failure to handle non-ISO-8859-1 URL-encoded form input + +Remember how I said that parsing HTTP form data was thorny? After we +finished upgrading all our appservers to Python 3, people started reporting +that they [couldn't post Unicode comments to +bugs](https://bugs.launchpad.net/launchpad/+bug/1937345), which turned out +to be only if the attempt was made using JavaScript, and was because I +hadn't quite managed to get URL-encoded form data working properly with +`zope.publisher` and `multipart`. The current standard describes the +URL-encoded format for form data as ["in many ways an aberrant +monstrosity"](https://url.spec.whatwg.org/#application/x-www-form-urlencoded), +so this was no great surprise. + +Part of the problem was some [very strange +choices](https://github.com/zopefoundation/zope.publisher/issues/65) in +`zope.publisher` dating back to 2004 or earlier, which I attempted to [clean +up and simplify](https://github.com/zopefoundation/zope.publisher/pull/66). +The rest was that Python 2's `urlparse.parse_qs` unconditionally decodes +percent-encoded sequences as ISO-8859-1 if they're passed in as part of a +Unicode string, so `multipart` needs to [work around +this](https://github.com/defnull/multipart/pull/36) on Python 2. + +I'm still not completely confident that this is correct in all situations, +but at least now that we're on Python 3 everywhere the matrix of cases we +need to care about is smaller. + +### Inconsistent marshalling of Loggerhead's disk cache + +We use [Loggerhead](https://pypi.org/project/loggerhead) for providing web +browsing of Bazaar branches. When we upgraded one of its two servers to +Python 3, we immediately noticed that the one still on Python 2 was failing +to read back its revision information cache, which it stores in a database +on disk. (We noticed this because it caused a deployment to fail: when we +tried to roll out new code to the instance still on Python 2, Nagios checks +had already caused an incompatible cache to be written for one branch from +the Python 3 instance.) + +This turned out to be a similar problem to the `pickle` issue mentioned +above, except this one was with `marshal`, which I didn't think to look for +because it's a relatively obscure module mostly used for internal purposes +by Python itself; I'm not sure that Loggerhead should really be using it in +the first place. The fix was +[relatively](https://code.launchpad.net/~cjwatson/loggerhead/marshal-version/+merge/406291) +[straightforward](https://code.launchpad.net/~cjwatson/loggerhead/fix-marshal-version/+merge/406308), +complicated mainly by now needing to cope with throwing away unreadable +cache data. + +Ironically, if we'd just gone ahead and taken the nominally riskier path of +upgrading both servers at the same time, we might never have had a problem +here. + +### Intermittent bzr failures + +Finally, after we upgraded one of our two Bazaar codehosting servers to +Python 3, we had a +[report](https://bugs.launchpad.net/launchpad/+bug/1938335) of intermittent +`bzr branch` hangs. After some digging I found this in our logs: + + :::pytb + Traceback (most recent call last): + ... + File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/conch/ssh/channel.py", line 136, in addWindowBytes + self.startWriting() + File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/lazr/sshserver/session.py", line 88, in startWriting + resumeProducing() + File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/internet/process.py", line 894, in resumeProducing + for p in self.pipes.itervalues(): + builtins.AttributeError: 'dict' object has no attribute 'itervalues' + +I'd seen this before in our git hosting service: it was a bug in Twisted's +Python 3 port, [fixed](https://github.com/twisted/twisted/pull/1478) after +20.3.0 but unfortunately after the last release that supported Python 2, so +we had to backport that patch. Using the same backport dealt with this. + +## [Onwards!](https://eev.ee/blog/2016/07/31/python-faq-why-should-i-use-python-3/) -- 2.30.2