From d31ce64ae5139a4e36496f75fb93ca9b52517a1a Mon Sep 17 00:00:00 2001 From: cjwatson <> Date: Mon, 17 Sep 2007 07:28:20 +0000 Subject: [PATCH] Encodings in man-db --- 2007-09-17-man-db-encodings.txt | 48 +++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 2007-09-17-man-db-encodings.txt diff --git a/2007-09-17-man-db-encodings.txt b/2007-09-17-man-db-encodings.txt new file mode 100644 index 00000000..5678ee10 --- /dev/null +++ b/2007-09-17-man-db-encodings.txt @@ -0,0 +1,48 @@ +Encodings in man-db + +

I've spent some quality upstream time lately with man-db. Specifically, +I've been upgrading its locale support. I recently published a pre-release, + +man-db 2.5.0-pre2, mainly for translators, but other people may be +interested in having a look at it as well. I hope to release 2.5.0 quite +soon so that all of this can land in Debian.

+ +

Firstly, man-db now supports creating and using databases for per-locale +hierarchies of manual pages, not just English. This means that +apropos and whatis can now display +information about localised manual pages.

+ +

Secondly, I've been working on the transition to UTF-8 manual pages. Now, +modulo some hacks, groff can't yet deal with Unicode input; some possible +input characters are reserved for its internal use which makes full 32-bit +input difficult to do properly until that's fixed. However, with a few +exceptions, manual pages generally just need the subset of Unicode that +corresponds to their language's usual legacy character set, so for now it's +good enough to just recode on the fly from UTF-8 to some appropriate 8-bit +character set and use groff's support for that.

+ +

man-db has actually supported doing this kind of thing for a while, but +it's been difficult to use since it only applies to +/usr/share/man/ll_CC.UTF-8/ directories, while manual pages +usually aren't country-specific. So, man-db 2.5.0 supports using +/usr/share/man/ll.UTF-8/ instead, which is a bit more +appropriate. Also, following a + +discussion with Adam Borowski, man-db can now try decoding manual pages +as UTF-8 and fall back to 8-bit encodings even in directories without an +explicit encoding tag; if this fails for some reason, you can put a +'\" -*- coding: UTF-8 -*- line at the top of the page.

+ +

I'm still debating whether Debian policy should recommend installing +UTF-8 manual pages in /usr/share/man/ll.UTF-8/ or just in +/usr/share/man/ll/. Initially I was very strongly in favour of +an encoding declaration, but now that man-db can do a pretty good job of +guesswork I'm coming round to Adam Borowski's position that people should be +able to forget about character sets with UTF-8. Opinions here would be +welcome. One thing I haven't moved on is that any design that assumes that +the encoding of manual pages on the filesystem has anything to do with the +user's locale is demonstrably incorrect and broken; I'm not going to use +LC_CTYPE for anything except output. However, maybe "UTF-8 or +the usual legacy encoding provided that the latter is not typically confused +for the former" is a good enough specification. I'll try to come down from +the fence before unleashing this code on the world.

-- 2.30.2