Porting Storm to Python 3

We released Storm 0.21 on Friday (the release announcement seems to be stuck in moderation, but you can look at the NEWS file directly). For me, the biggest part of this release was adding Python 3 support.

Storm is a really nice and lightweight ORM (object-relational mapper) for Python, developed by Canonical. We use it for some major products (Launchpad and Landscape are the ones I know of), and it’s also free software and used by some other folks as well. Other popular ORMs for Python include SQLObject, SQLAlchemy and the Django ORM; we use those in various places too depending on the context, but personally I’ve always preferred Storm for the readability of code that uses it and for how easy it is to debug and extend it.

It’s been a problem for a while that Storm only worked with Python 2. It’s one of a handful of major blockers to getting Launchpad running on Python 3, which we definitely want to do; stoq ended up with a local fork of Storm to cope with this; and it was recently removed from Debian for this and other reasons. None of that was great. So, with significant assistance from a large patch contributed by Thiago Bellini, and with patient code review from Simon Poirier and some of my other colleagues, we finally managed to get that sorted out in this release.

In many ways, Storm was in fairly good shape already for a project that hadn’t yet been ported to Python 3: while its internal idea of which strings were bytes and which text required quite a bit of untangling in the way that Python 2 code usually does, its normal class used for text database columns was already Unicode which only accepted text input (unicode in Python 2), so it could have been a lot worse; this also means that applications that use Storm tend to get at least this part right even in Python 2. Aside from the bytes/text thing, many of the required changes were just the usual largely-mechanical ones that anyone who’s done 2-to-3 porting will be familiar with. But there were some areas that required non-trivial thought, and I’d like to talk about some of those here.

Exception types

Concrete database implementations such as psycopg2 raise implementation-specific exception types. The inheritance hierarchy for these is defined by the Python Database API (DB-API), but the actual exception classes aren’t in a common place; rather, you might get an instance of psycopg2.errors.IntegrityError when using PostgreSQL but an instance of sqlite3.IntegrityError when using SQLite. To make things easier for applications that don’t have a strict requirement for a particular database backend, Storm arranged to inject its own virtual exception types as additional base classes of these concrete exceptions by patching their __bases__ attribute, so for example, you could import IntegrityError from storm.exceptions and catch that rather than having to catch each backend-specific possibility.

Although this was always a bit of a cheat, it worked well in practice for a while, but the first sign of trouble even before porting to Python 3 was with psycopg2 2.5. This release started implementing its DB-API exception types in a C extension, which meant that it was no longer possible to patch __bases__. To get around that, a few years ago I landed a patch to Storm to use abc.ABCMeta.register instead to register the DB-API exceptions as virtual subclasses of Storm’s exceptions, which solved the problem for Python 2. However, even at the time I landed that, I knew that it would be a porting obstacle due to Python issue 12029; Django ran into that as well.

In the end, I opted to refactor how Storm handles exceptions: it now wraps cursor and connection objects in such a way as to catch DB-API exceptions raised by their methods and properties and re-raise them using wrapper exception types that inherit from both the appropriate subclass of StormError and the original DB-API exception type, and with some care I even managed to avoid this being painfully repetitive. Out-of-tree database backends will need to make some minor adjustments (removing install_exceptions, adding an _exception_module property to their Database subclass, adjusting the raw_connect method of their Database subclass to do exception wrapping, and possibly implementing _make_combined_exception_type and/or _wrap_exception if they need to add extra attributes to the wrapper exceptions). Applications that follow the usual Storm idiom of catching StormError or any of its subclasses should continue to work without needing any changes.

SQLObject compatibility

Storm includes some API compatibility with SQLObject; this was from before my time, but I believe it was mainly because Launchpad and possibly Landscape previously used SQLObject and this made the port to Storm very much easier. It still works fine for the parts of Launchpad that haven’t been ported to Storm, but I wouldn’t be surprised if there were newer features of SQLObject that it doesn’t support.

The main question here was what to do with StringCol and its associated AutoUnicodeVariable. I opted to make these explicitly only accept text on Python 3, since the main reason for them to accept bytes was to allow using them with Python 2 native strings (i.e. str), and on Python 3 str is already text so there’s much less need for the porting affordance in that case.

Since releasing 0.21 I realised that the StringCol implementation in SQLObject itself in fact accepts both bytes and text even on Python 3, so it’s possible that we’ll need to change this in the future, although we haven’t yet found any real code using Storm’s SQLObject compatibility layer that might rely on this. Still, it’s much easier for Storm to start out on the stricter side and perhaps become more lenient than it is to go the other way round.

inspect.getargspec

Storm had some fairly complicated use of inspect.getargspec on Python 2 as part of its test mocking arrangements. This didn’t work in Python 3 due to some subtleties relating to bound methods. I switched to the modern inspect.signature API in Python 3 to fix this, which in any case is rather simpler with the exception of a wrinkle in how method descriptors work.

(It’s possible that these mocking arrangements could be simplified nowadays by using some more off-the-shelf mocking library; I haven’t looked into that in any detail.)

What’s next?

I’m working on getting Storm back into Debian now, which will be with Python 3 support only since Debian is in the process of gradually removing Python 2 module support. Other than that I don’t really have any particular plans for Storm at the moment (although of course I’m not the only person with an interest in it), aside from ideally avoiding leaving six years between releases again. I expect we can go back into bug-fixing mode there for a while.

From the Launchpad side, I’ve recently made progress on one of the other major Python 3 blockers (porting Bazaar code hosting to Breezy, coming soon). There are still some other significant blockers, the largest being migrating to Mailman 3, subvertpy fixes so that we can port code importing to Breezy as well, and porting the lazr.restful stack; but we may soon be able to reach the point where it’s possible to start running interesting subsets of the test suite using Python 3 and categorising the failures, at which point we’ll be able to get a much better idea of how far we still have to go. Porting a project with the best part of a million lines of code and around three hundred dependencies is always going to take a while, but I’m happy to be making progress there, both due to Python 2’s impending end of upstream support and so that eventually we can start using new language facilities.