Porting Storm to Python 3
We released Storm 0.21 on Friday (the release announcement seems to be stuck in moderation, but you can look at the NEWS file directly). For me, the biggest part of this release was adding Python 3 support.
Storm is a really nice and lightweight ORM (object-relational mapper) for Python, developed by Canonical. We use it for some major products (Launchpad and Landscape are the ones I know of), and it’s also free software and used by some other folks as well. Other popular ORMs for Python include SQLObject, SQLAlchemy and the Django ORM; we use those in various places too depending on the context, but personally I’ve always preferred Storm for the readability of code that uses it and for how easy it is to debug and extend it.
It’s been a problem for a while that Storm only worked with Python 2. It’s one of a handful of major blockers to getting Launchpad running on Python 3, which we definitely want to do; stoq ended up with a local fork of Storm to cope with this; and it was recently removed from Debian for this and other reasons. None of that was great. So, with significant assistance from a large patch contributed by Thiago Bellini, and with patient code review from Simon Poirier and some of my other colleagues, we finally managed to get that sorted out in this release.
In many ways, Storm was in fairly good shape already for a project that
hadn’t yet been ported to Python 3: while its internal idea of which strings
were bytes and which text required quite a bit of untangling in the way that
Python 2 code usually does, its normal class used for text database columns
was already Unicode
which only accepted text input (unicode
in Python
2), so it could have been a lot worse; this also means that applications
that use Storm tend to get at least this part right even in Python 2. Aside
from the bytes/text thing, many of the required changes were just the usual
largely-mechanical ones that anyone who’s done 2-to-3 porting will be
familiar with. But there were some areas that required non-trivial thought,
and I’d like to talk about some of those here.
Exception types
Concrete database implementations such as
psycopg2 raise implementation-specific
exception types. The inheritance hierarchy for these is defined by the
Python Database API (DB-API),
but the actual exception classes aren’t in a common place; rather, you might
get an instance of psycopg2.errors.IntegrityError
when using PostgreSQL
but an instance of sqlite3.IntegrityError
when using SQLite. To make
things easier for applications that don’t have a strict requirement for a
particular database backend, Storm arranged to inject its own virtual
exception types as additional base classes of these concrete exceptions by
patching their __bases__
attribute, so for example, you could import
IntegrityError
from storm.exceptions
and catch that rather than having
to catch each backend-specific possibility.
Although this was always a bit of a cheat, it worked well in practice for a
while, but the first sign of trouble even before porting to Python 3 was
with psycopg2 2.5. This release started implementing its DB-API exception
types in a C extension, which meant that it was no longer possible to patch
__bases__
. To get around that, a few years ago I landed a
patch
to Storm to use abc.ABCMeta.register
instead to register the DB-API
exceptions as virtual subclasses of Storm’s exceptions, which solved the
problem for Python 2. However, even at the time I landed that, I knew that
it would be a porting obstacle due to Python issue
12029; Django ran into that as well.
In the end, I opted to
refactor
how Storm handles exceptions: it now wraps cursor and connection objects in
such a way as to catch DB-API exceptions raised by their methods and
properties and re-raise them using wrapper exception types that inherit from
both the appropriate subclass of StormError
and the original DB-API
exception type, and with some care I even managed to avoid this being
painfully repetitive. Out-of-tree database backends will need to make some
minor adjustments (removing install_exceptions
, adding an
_exception_module
property to their Database
subclass, adjusting the
raw_connect
method of their Database
subclass to do exception wrapping,
and possibly implementing _make_combined_exception_type
and/or
_wrap_exception
if they need to add extra attributes to the wrapper
exceptions). Applications that follow the usual Storm idiom of catching
StormError
or any of its subclasses should continue to work without
needing any changes.
SQLObject compatibility
Storm includes some API compatibility with SQLObject; this was from before my time, but I believe it was mainly because Launchpad and possibly Landscape previously used SQLObject and this made the port to Storm very much easier. It still works fine for the parts of Launchpad that haven’t been ported to Storm, but I wouldn’t be surprised if there were newer features of SQLObject that it doesn’t support.
The main question here was what to do with StringCol
and its associated
AutoUnicodeVariable
. I opted to make these explicitly only accept text on
Python 3, since the main reason for them to accept bytes was to allow using
them with Python 2 native strings (i.e. str
), and on Python 3 str
is
already text so there’s much less need for the porting affordance in that case.
Since releasing 0.21 I realised that the StringCol
implementation in
SQLObject itself in fact accepts both bytes and text even on Python 3, so
it’s possible that we’ll need to change this in the future, although we
haven’t yet found any real code using Storm’s SQLObject compatibility layer
that might rely on this. Still, it’s much easier for Storm to start out on
the stricter side and perhaps become more lenient than it is to go the other
way round.
inspect.getargspec
Storm had some fairly complicated use of inspect.getargspec
on Python 2 as
part of its test mocking arrangements. This didn’t work in Python 3 due to
some subtleties relating to bound methods. I
switched
to the modern inspect.signature
API in Python 3 to fix this, which in any
case is rather simpler with the exception of a wrinkle in how method
descriptors work.
(It’s possible that these mocking arrangements could be simplified nowadays by using some more off-the-shelf mocking library; I haven’t looked into that in any detail.)
What’s next?
I’m working on getting Storm back into Debian now, which will be with Python 3 support only since Debian is in the process of gradually removing Python 2 module support. Other than that I don’t really have any particular plans for Storm at the moment (although of course I’m not the only person with an interest in it), aside from ideally avoiding leaving six years between releases again. I expect we can go back into bug-fixing mode there for a while.
From the Launchpad side, I’ve recently made progress on one of the other major Python 3 blockers (porting Bazaar code hosting to Breezy, coming soon). There are still some other significant blockers, the largest being migrating to Mailman 3, subvertpy fixes so that we can port code importing to Breezy as well, and porting the lazr.restful stack; but we may soon be able to reach the point where it’s possible to start running interesting subsets of the test suite using Python 3 and categorising the failures, at which point we’ll be able to get a much better idea of how far we still have to go. Porting a project with the best part of a million lines of code and around three hundred dependencies is always going to take a while, but I’m happy to be making progress there, both due to Python 2’s impending end of upstream support and so that eventually we can start using new language facilities.