--- /dev/null
+Encodings in man-db
+
+<p>I've spent some quality upstream time lately with man-db. Specifically,
+I've been upgrading its locale support. I recently published a pre-release,
+<a href="http://people.debian.org/~cjwatson/man-db/man-db-2.5.0-pre2.tar.gz">
+man-db 2.5.0-pre2</a>, mainly for translators, but other people may be
+interested in having a look at it as well. I hope to release 2.5.0 quite
+soon so that all of this can land in Debian.</p>
+
+<p>Firstly, man-db now supports creating and using databases for per-locale
+hierarchies of manual pages, not just English. This means that
+<a href="http://bugs.debian.org/29448">apropos and whatis can now display
+information about localised manual pages</a>.</p>
+
+<p>Secondly, I've been working on the transition to UTF-8 manual pages. Now,
+modulo some hacks, groff can't yet deal with Unicode input; some possible
+input characters are reserved for its internal use which makes full 32-bit
+input difficult to do properly until that's fixed. However, with a few
+exceptions, manual pages generally just need the subset of Unicode that
+corresponds to their language's usual legacy character set, so for now it's
+good enough to just recode on the fly from UTF-8 to some appropriate 8-bit
+character set and use groff's support for that.</p>
+
+<p>man-db has actually supported doing this kind of thing for a while, but
+it's been difficult to use since it only applies to
+<code>/usr/share/man/ll_CC.UTF-8/</code> directories, while manual pages
+usually aren't country-specific. So, man-db 2.5.0 supports using
+<code>/usr/share/man/ll.UTF-8/</code> instead, which is a bit more
+appropriate. Also, following a
+<a href="http://lists.debian.org/debian-mentors/2007/09/msg00245.html">
+discussion with Adam Borowski</a>, man-db can now try decoding manual pages
+as UTF-8 and fall back to 8-bit encodings even in directories without an
+explicit encoding tag; if this fails for some reason, you can put a
+<kbd>'\" -*- coding: UTF-8 -*-</kbd> line at the top of the page.</p>
+
+<p>I'm still debating whether Debian policy should recommend installing
+UTF-8 manual pages in <code>/usr/share/man/ll.UTF-8/</code> or just in
+<code>/usr/share/man/ll/</code>. Initially I was very strongly in favour of
+an encoding declaration, but now that man-db can do a pretty good job of
+guesswork I'm coming round to Adam Borowski's position that people should be
+able to forget about character sets with UTF-8. Opinions here would be
+welcome. One thing I haven't moved on is that any design that assumes that
+the encoding of manual pages on the filesystem has anything to do with the
+user's locale is demonstrably incorrect and broken; I'm not going to use
+<code>LC_CTYPE</code> for anything except output. However, maybe "UTF-8 or
+the usual legacy encoding provided that the latter is not typically confused
+for the former" is a good enough specification. I'll try to come down from
+the fence before unleashing this code on the world.</p>