An odd test failure

Weird test failures are great at teaching you things that you didn’t realise you might need to know.

As previously mentioned, I’ve been working on converting Launchpad from Buildout to virtualenv and pip, and I finally landed that change on our development branch today. The final landing was mostly quite smooth, except for one test failure on our buildbot that I hadn’t seen before:

ERROR: lp.codehosting.codeimport.tests.test_worker.TestBzrSvnImport.test_stacked
worker ID: unknown worker (bug in our subunit output?)
----------------------------------------------------------------------
Traceback (most recent call last):
_StringException: log: {{{
36.384  creating repository in file:///tmp/testbzr-6CwSLV.tmp/lp.codehosting.codeimport.tests.test_worker.TestBzrSvnImport.test_stacked/work/stacked-on/.bzr/.
36.388  creating branch <bzrlib.branch.BzrBranchFormat7 object at 0xeb85b36c> in file:///tmp/testbzr-6CwSLV.tmp/lp.codehosting.codeimport.tests.test_worker.TestBzrSvnImport.test_stacked/work/stacked-on/
}}}

Traceback (most recent call last):
  File "/srv/buildbot/lpbuildbot/lp-devel-xenial/build/lib/lp/codehosting/codeimport/tests/test_worker.py", line 1108, in test_stacked
    stacked_on.fetch(Branch.open(source_details.url))
  File "/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/branch.py", line 186, in open
    possible_transports=possible_transports, _unsupported=_unsupported)
  File "/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/controldir.py", line 689, in open
    _unsupported=_unsupported)
  File "/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/controldir.py", line 718, in open_from_transport
    find_format, transport, redirected)
  File "/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/transport/__init__.py", line 1719, in do_catching_redirections
    return action(transport)
  File "/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/controldir.py", line 706, in find_format
    probers=probers)
  File "/srv/buildbot/lpbuildbot/lp-devel-xenial/build/env/local/lib/python2.7/site-packages/bzrlib/controldir.py", line 1155, in find_format
    raise errors.NotBranchError(path=transport.base)
NotBranchError: Not a branch: "/tmp/tmpdwqrc6/trunk/".

When I investigated this locally, I found that I could reproduce it if I ran just that test on its own, but not if I ran it together with the other tests in the same class. That’s certainly my favourite way round for test isolation failures to present themselves (it’s more usual to find state from one test leaking out and causing another one to fail, which can make for a very time-consuming exercise of trying to find the critical combination), but it’s still pretty odd.

I stepped through the Branch.open call in each case in the hope of some enlightenment. The interesting difference was that the custom probers installed by the bzr-svn plugin weren’t installed when I ran that one test on its own, so it was trying to open a branch as a Bazaar branch rather than using the foreign-branch logic for Subversion, and this presumably depended on some configuration that only some tests put in place. I was on the verge of just explicitly setting up that plugin in the test suite’s setUp method, but I was still curious about exactly what was breaking this.

Launchpad installs several Bazaar plugins, and lib/lp/codehosting/__init__.py is responsible for putting most of these in place: anything in Launchpad itself that uses Bazaar is generally supposed to do something like import lp.codehosting to set everything up. I therefore put a breakpoint at the top of lp.codehosting and stepped through it to see whether anything was going wrong in the initial setup. Sure enough, I found that bzrlib.plugins.svn was failing to import due to an exception raised by bzrlib.i18n.load_plugin_translations, which was being swallowed silently but meant that its custom probers weren’t being installed. Here’s what that function looks like:

def load_plugin_translations(domain):
    """Load the translations for a specific plugin.

    :param domain: Gettext domain name (usually 'bzr-PLUGINNAME')
    """
    locale_base = os.path.dirname(
        unicode(__file__, sys.getfilesystemencoding()))
    translation = install_translations(domain=domain,
        locale_base=locale_base)
    add_fallback(translation)
    return translation

In this case, sys.getfilesystemencoding was returning None, which isn’t a valid encoding argument to unicode. But why would that be? It gave me a sensible result when I ran it from a Python shell in this environment. A bit of head-scratching later and it occurred to me to look at a backtrace:

(Pdb) bt
  /home/cjwatson/src/canonical/launchpad/lp-branches/testfix/env/lib/python2.7/site.py(703)<module>()
-> main()
  /home/cjwatson/src/canonical/launchpad/lp-branches/testfix/env/lib/python2.7/site.py(694)main()
-> execsitecustomize()
  /home/cjwatson/src/canonical/launchpad/lp-branches/testfix/env/lib/python2.7/site.py(548)execsitecustomize()
-> import sitecustomize
  /home/cjwatson/src/canonical/launchpad/lp-branches/testfix/env/lib/python2.7/sitecustomize.py(7)<module>()
-> lp_sitecustomize.main()
  /home/cjwatson/src/canonical/launchpad/lp-branches/testfix/lib/lp_sitecustomize.py(193)main()
-> dont_wrap_bzr_branch_classes()
  /home/cjwatson/src/canonical/launchpad/lp-branches/testfix/lib/lp_sitecustomize.py(139)dont_wrap_bzr_branch_classes()
-> import lp.codehosting
> /home/cjwatson/src/canonical/launchpad/lp-branches/testfix/lib/lp/codehosting/__init__.py(54)<module>()
-> load_plugins([_get_bzr_plugins_path()])

I wonder if there’s something interesting about being imported from a sitecustomize hook? Sure enough, when I went to look at Python for where sys.getfilesystemencoding is set up, I found this in Py_InitializeEx:

    if (!Py_NoSiteFlag)
        initsite(); /* Module site */
    ...
#if defined(Py_USING_UNICODE) && defined(HAVE_LANGINFO_H) && defined(CODESET)
    /* On Unix, set the file system encoding according to the
       user's preference, if the CODESET names a well-known
       Python codec, and Py_FileSystemDefaultEncoding isn't
       initialized by other means. Also set the encoding of
       stdin and stdout if these are terminals, unless overridden.  */

    if (!overridden || !Py_FileSystemDefaultEncoding) {
        ...
    }

I moved this out of sitecustomize, and it’s working better now. But did you know that a sitecustomize hook can’t safely use anything that depends on sys.getfilesystemencoding? I certainly didn’t, until it bit me.