From d31ce64ae5139a4e36496f75fb93ca9b52517a1a Mon Sep 17 00:00:00 2001 From: cjwatson <> Date: Mon, 17 Sep 2007 07:28:20 +0000 Subject: [PATCH] Encodings in man-db --- 2007-09-17-man-db-encodings.txt | 48 +++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 2007-09-17-man-db-encodings.txt diff --git a/2007-09-17-man-db-encodings.txt b/2007-09-17-man-db-encodings.txt new file mode 100644 index 00000000..5678ee10 --- /dev/null +++ b/2007-09-17-man-db-encodings.txt @@ -0,0 +1,48 @@ +Encodings in man-db + +
I've spent some quality upstream time lately with man-db. Specifically, +I've been upgrading its locale support. I recently published a pre-release, + +man-db 2.5.0-pre2, mainly for translators, but other people may be +interested in having a look at it as well. I hope to release 2.5.0 quite +soon so that all of this can land in Debian.
+ +Firstly, man-db now supports creating and using databases for per-locale +hierarchies of manual pages, not just English. This means that +apropos and whatis can now display +information about localised manual pages.
+ +Secondly, I've been working on the transition to UTF-8 manual pages. Now, +modulo some hacks, groff can't yet deal with Unicode input; some possible +input characters are reserved for its internal use which makes full 32-bit +input difficult to do properly until that's fixed. However, with a few +exceptions, manual pages generally just need the subset of Unicode that +corresponds to their language's usual legacy character set, so for now it's +good enough to just recode on the fly from UTF-8 to some appropriate 8-bit +character set and use groff's support for that.
+ +man-db has actually supported doing this kind of thing for a while, but
+it's been difficult to use since it only applies to
+/usr/share/man/ll_CC.UTF-8/
directories, while manual pages
+usually aren't country-specific. So, man-db 2.5.0 supports using
+/usr/share/man/ll.UTF-8/
instead, which is a bit more
+appropriate. Also, following a
+
+discussion with Adam Borowski, man-db can now try decoding manual pages
+as UTF-8 and fall back to 8-bit encodings even in directories without an
+explicit encoding tag; if this fails for some reason, you can put a
+'\" -*- coding: UTF-8 -*- line at the top of the page.
I'm still debating whether Debian policy should recommend installing
+UTF-8 manual pages in /usr/share/man/ll.UTF-8/
or just in
+/usr/share/man/ll/
. Initially I was very strongly in favour of
+an encoding declaration, but now that man-db can do a pretty good job of
+guesswork I'm coming round to Adam Borowski's position that people should be
+able to forget about character sets with UTF-8. Opinions here would be
+welcome. One thing I haven't moved on is that any design that assumes that
+the encoding of manual pages on the filesystem has anything to do with the
+user's locale is demonstrably incorrect and broken; I'm not going to use
+LC_CTYPE
for anything except output. However, maybe "UTF-8 or
+the usual legacy encoding provided that the latter is not typically confused
+for the former" is a good enough specification. I'll try to come down from
+the fence before unleashing this code on the world.