Chapter 3: Halibut input format

Chapter 3: Halibut input format

This chapter describes the format in which you should write documents to be processed by Halibut.

3.1 The basics

Halibut's input files mostly look like ordinary ASCII text files; you can edit them with any text editor you like.

Writing paragraphs of ordinary text is very simple: you just write ordinary text in the ordinary way. You can wrap a paragraph across more than one line using line breaks in the text file, and Halibut will ignore this when it rewraps the paragraph for each output format. To separate paragraphs, use a blank line (i.e. two consecutive line breaks). For example, a fragment of Halibut input looking like this:

This is a line of text.
This is another line of text.

This line is separated from the previous one by a blank line.

will produce two paragraphs looking like this:

This is a line of text. This is another line of text.

This line is separated from the previous one by a blank line.

The first two lines of the input have been merged together into a single paragraph, and the line break in the input file was treated identically to the spaces between the individual words.

Halibut is designed to have very few special characters. The only printable characters in Halibut input which will not be treated exactly literally in the output are the backslash (\) and the braces ({ and }). If you do not use these characters, everything else you might type in normal ASCII text is perfectly safe. If you do need to use any of those three characters in your document, you will have to precede each one with a backslash. Hence, for example, you could write

This \\ is a backslash, and these are \{braces\}.

and Halibut would generate the text

This \ is a backslash, and these are {braces}.

If you want to write your input file in a character set other than ASCII, you can do so by using the \cfg{input-charset} command. See section 3.6 for details of this.

3.2 Simple inline formatting commands

Halibut formatting commands all begin with a backslash, followed by a word or character identifying the command. Some of them then use braces to surround one or more pieces of text acted on by the command. (In fact, the \\, \{ and \} sequences you met in section 3.1 are themselves formatting commands.)

This section describes some simple formatting commands you can use in Halibut documents. The commands in this section are inline commands, which means you can use them in the middle of a paragraph. Section 3.3 describes some paragraph commands, which affect a whole paragraph at a time.

Many of these commands are followed by a pair of braces surrounding some text. In all cases, it is perfectly safe to have a line break (in the input file) within those braces; Halibut will treat that exactly the same as a space. For example, these two paragraphs will be treated identically:

Here is some \e{emphasised
text}.

Here is some \e{emphasised text}.

3.2.1 `\e` and `\s`: Emphasising text

Possibly the most obvious piece of formatting you might want to use in a document is emphasis. To emphasise text, you use the \e command, and follow it up with the text to be emphasised in braces. For example, the first sentence in this paragraph was generated using the Halibut input

Possibly the most obvious piece of formatting you might want
to use in a document is \e{emphasis}.

A second form of emphasis is supported, called strong text. You can use the \s command for this type of emphasis. Typically, in output formats, \e will give italics, and \s will give bold.

3.2.2 `\c` and `\cw`: Displaying computer code inline

Halibut was primarily designed to produce software manuals. It can be used for other types of document as well, but software manuals are its speciality.

In software manuals, you often want to format text in a way that indicates that it is something you might see displayed verbatim on a computer screen. In printed manuals, this is typically done by setting that text in a font which is obviously fixed-width. This provides a visual cue that the text being displayed is code, and it also ensures that punctuation marks are clearly separated and shown individually (so that a user can copy the text accurately and conveniently).

Halibut provides two commands for this, which are subtly different. The names of those commands are \c (‘code’) and \cw (‘weak code’). You use them just like \e and \s, by following them with some text in braces. For example, this...

This sentence contains some \c{code} and some \cw{weak code}.

... produces this:

This sentence contains some code and some weak code.

The distinction between code and weak code is mainly important when producing plain text output. Plain text output is typically viewed in a fixed-width font, so there is no need (and no way) to change font in order to make the order of punctuation marks clear. However, marking text as code is also sometimes done to provide a visual distinction between it and the text around it, so that the reader knows where the literal computer text starts and stops; and in plain text, this cannot be done by changing font, so there needs to be an alternative way.

So in the plain text output format, things marked as code (\c) will be surrounded by quote marks, so that it's obvious where they start and finish. Things marked as weak code (\cw) will not look any different from normal text.

I recommend using weak code for any application where it is obvious that the text is literal computer input or output. For example, if the text is capitalised, that's usually good enough. If I talk about the Pentium's EAX and EDX registers, for example, you don't need quotes to notice that those are special; so I would write that in Halibut as ‘the Pentium's \cw{EAX} and \cw{EDX} registers’. But if I'm talking about the Unix command man, which is an ordinary English word in its own right, a reader might be slightly confused if it appeared in the middle of a sentence undecorated; so I would write that as ‘the Unix command \c{man}’.

In summary:

\c means ‘this text must be visually distinct from the text around it’. Halibut's various output formats will do this by changing the font if possible, or by using quotes if not.
\cw means ‘it would be nice to display this text in a fixed-width font if possible, but it's not essential’.

In really extreme cases, you might want Halibut to use quotation marks even in output formats which can change font. In section 3.2.5, for example, I mention the special formatting command ‘\.’. If that appeared at the end of a sentence without the quotes, then the two adjacent full stops would look pretty strange even if they were obviously in different fonts.

For this, Halibut supports the \cq command, which is exactly equivalent to using \q to provide quotes and then using \cw inside the quotes. So in the paragraph above, for example, I wrote

the special formatting command \cq{\\.}.

and I could equivalently have written

the special formatting command \q{\cw{\\.}}.

There is a separate mechanism for displaying computer code in an entire paragraph; see section 3.3.1 for that one.

3.2.3 `\q`: Quotation marks

Halibut's various output formats don't all use the same conventions for displaying text in ordinary quotation marks (‘like these’). Some output formats have access to proper matched quote characters, whereas others are restricted to using plain ASCII. Therefore, it is not ideal to use the ordinary ASCII double quote character in your document (although you can if you like).

Halibut provides the formatting command \q to indicate quoted text. If you write

Here is some \q{text in quotes}.

then Halibut will print

Here is some ‘text in quotes’.

and in every output format Halibut generates, it will choose the best quote characters available to it in that format. (The quote characters to use can be configured with the \cfg command.)

You can still use the ordinary quote characters of your choice if you prefer; or you could even use the \u command (see section 3.2.7) to generate Unicode matched quotes (single or double) in a way which will automatically fall back to the normal ASCII one if they aren't available. But I recommend using the built-in \q command in most cases, because it's simple and does the best it can everywhere.

If you're using the \c or \cw commands to display literal computer code, you will probably want to use literal ASCII quote characters, because it is likely to matter precisely which quote character you use. In fact, Halibut actually disallows the use of \q within either of \c and \cw, since this simplifies some of the output formats.

3.2.4 `\-` and `\_`: Non-breaking hyphens and spaces

If you use an ordinary hyphen in the middle of a word (such as ‘built-in’), Halibut's output formats will feel free to break a line after that hyphen when wrapping paragraphs. This is fine for a word like ‘built-in’, but if you were displaying some literal computer code such as the Emacs command M-x psychoanalyze-pinhead, you might prefer to see the whole hyphenated word treated as an unbreakable block. In some cases, you might even want to prevent the space in that command from becoming a line break.

For these purposes, Halibut provides the commands \- and \_, which generate a non-breaking hyphen and a non-breaking space respectively. So the above Emacs command might be written as

the Emacs command \c{M\-x\_psychoanalyze\-pinhead}

Unfortunately, some of Halibut's output formats do not support non-breaking hyphens, and others don't support breaking hyphens! So Halibut cannot promise to honour these commands in all situations. All it can do is make a best effort.

3.2.5 `\date`: Automatic date generation

Sometimes you might want your document to give an up-to-date indication of the date on which it was run through Halibut.

Halibut supplies the \date command to do this. In its simplest form, you simply say

This document was generated on \date.

and Halibut generates something like

This document was generated on Mon 15 May 2017 08:50:01 BST.

You can follow the \date command directly with punctuation (as in this example, where it is immediately followed by a full stop), but if you try to follow it with an alphabetic or numeric character (such as writing \dateZ) then Halibut will assume you are trying to invoke the name of a macro command you have defined yourself, and will complain if no such command exists. To get round this you can use the special ‘\.’ do-nothing command. See section 3.7 for more about general Halibut command syntax and ‘\.’.

If you would prefer the date to be generated in a specific format, you can follow the \date command with a format specification in braces. The format specification will be run through the standard C function strftime, so any format acceptable to that function is acceptable here as well. I won't document the format here, because the details vary from computer to computer (although there is a standard core which should be supported everywhere). You should look at your local system's manual for strftime for details.

Here's an example which generates the date in the international standard ISO 8601 format:

This document was generated on \date{%Y-%m-%d %H:%M:%S}.

And here's some sample output from that command:

This document was generated on 2017-05-15 08:50:01.

3.2.6 `\W`: WWW hyperlinks

Since one of Halibut's output formats is HTML, it's obviously useful to be able to provide links to arbitrary web sites in a Halibut document.

This is done using the \W command. \W expects to be followed by two sets of braces. In the first set of braces you put a URL; in the second set you put the text which should be a hyperlink. For example, you might write

Try searching on \W{http://www.google.com/}{Google}.

and Halibut would generate

Try searching on Google.

Note that hyperlinks, like the non-breaking commands discussed in section 3.2.4, are discretionary: if an output format does not support them then they will just be left out completely. So unless you're only intending to use the HTML output format, you should avoid storing vital content in the URL part of a \W command. The Google example above is reasonable (because most users are likely to be able to find Google for themselves even without a convenient hyperlink leading straight there), but if you really need to direct users to a specific web site, you will need to give the URL in actual displayed text (probably displayed as code as well). However, there's nothing to stop you making it a hyperlink as well for the convenience of HTML readers.

The \W command supports a piece of extra syntax to make this convenient for you. You can specify \c or \cw between the first and second pairs of braces. For example, you might write

Google is at \W{http://www.google.com/}\cw{www.google.com}.

and Halibut would produce

Google is at www.google.com.

If you want the link text to be an index term as well, you can also specify \i or \ii; this has to come before \c or \cw if both are present. (See section 3.5 for more about indexing.)

3.2.7 `\u`: Specifying arbitrary Unicode characters

Halibut has extensive support for Unicode and character set conversion. You can specify any (reasonably well known) character set for your input document, and Halibut will convert it all to Unicode as it reads it in. See section 3.6 for more details of this.

If you need to specify a Unicode character in your input document which is not supported by the input character set you have chosen, you can use the \u command to do this. \u expects to be followed by a sequence of hex digits; so that \u0041, for example, denotes the Unicode character 0x0041, which is the capital letter A.

If a Unicode character specified in this way is not supported in a particular output format, you probably don't just want it to be omitted. So you can put a pair of braces after the \u command containing fallback text. For example, to specify an amount of money in euros, you might write this:

This is likely to cost \u20AC{EUR\_}2500 at least.

Halibut will render that as a Euro sign if available, and the text ‘EUR ’ if not. In the output format you're currently reading in, the above input generates this:

This is likely to cost €2500 at least.

If you read it in other formats, you may see different results.

3.2.8 `\k` and `\K`: Cross-references to other sections

Section 1.2 mentions that Halibut numbers the sections of your document automatically, and can generate cross-references to them on request. \k and \K are the commands used to generate those cross-references.

To use one of these commands, you simply follow it with a pair of braces containing the keyword for the section in question. For example, you might write something like

\K{input-xref} expands on \k{intro-features}.

and Halibut would generate something like

Section 3.2.8 expands on section 1.2.

The keywords input-xref and intro-features are section keywords used in this manual itself. In your own document, you would have supplied a keyword for each one of your own sections, and you would provide your own keywords for the \k command to work on.

The difference between \k and \K is simply that \K starts the cross-reference text with a capital letter; so you would use \K at the beginning of a sentence, and \k everywhere else.

In output formats which permit it, cross-references act as hyperlinks, so that clicking the mouse on a cross-reference takes you straight to the referenced section.

The \k commands are also used for referring to entries in a bibliography (see section 3.4 for more about bibliographies), and can also be used for referring to an element of a numbered list by its number (see section 3.3.2.2 for more about numbered lists).

See section 3.3.5 for more about chapters and sections.

3.2.9 `\#`: Inline comments

If you want to include comments in your Halibut input, to be seen when reading it directly but not copied into the output text, then you can use \# to do this. If you follow \# with text in braces, that text will be ignored by Halibut.

For example, you might write

The typical behaviour of an antelope \#{do I mean
gazelle?} is...

and Halibut will simply leave out the aside about gazelles, and will generate nothing but

The typical behaviour of an antelope is...

This command will respect nested braces, so you can use it to comment out sections of Halibut markup:

This function is \#{very, \e{very}} important.

In this example, the comment lasts until the final closing brace (so that the whole ‘very, very’ section is commented out).

The \# command can also be used to produce a whole-paragraph comment; see section 3.3.7 for details of that.

3.3 Paragraph-level commands

This section describes Halibut commands which affect an entire paragraph, or sometimes even more than one paragraph, at a time.

3.3.1 `\c`: Displaying whole paragraphs of computer code

Section 3.2.2 describes a mechanism for displaying computer code in the middle of a paragraph, a few words at a time.

However, this is often not enough. Often, in a computer manual, you really want to show several lines of code in a display paragraph.

This is also done using the \c command, in a slightly different way. Instead of using it in the middle of a paragraph followed by braces, you can use it at the start of each line of a paragraph. For example, you could write

\c #include <stdio.h>
\c
\c int main(int argc, char **argv) {
\c     printf("hello, world\n");
\c     return 0;
\c }

and Halibut would generate

#include <stdio.h>

int main(int argc, char **argv) {
    printf("hello, world\n");
    return 0;
}

Note that the above paragraph makes use of a backslash and a pair of braces, and does not need to escape them in the way described in section 3.1. This is because code paragraphs formatted in this way are a special case; the intention is that you can just copy and paste a lump of code out of your program, put ‘\c ’ at the start of every line, and simply not have to worry about the details - you don't have to go through the whole block looking for characters to escape.

Since a backslash inside a code paragraph generates a literal backslash, this means you cannot use any other Halibut formatting commands inside a code paragraph. In particular, if you want to emphasise or strengthen a particular word in the paragraph, you can't do that using \e or \s (section 3.2.1) in the normal way.

Therefore, Halibut provides an alternative means of emphasis in code paragraphs. Each line beginning with \c can optionally be followed by a single line beginning with \e, indicating the emphasis in that line. The emphasis line contains the letters b and i (for ‘bold’ and ‘italic’, although some output formats might render i as underlining instead of italics), positioned to line up under the parts of the text that you want emphasised.

For example, if you wanted to do syntax highlighting on the above C code by highlighting the preprocessor command in italic and the keywords in bold, you might do it like this:

\c #include <stdio.h>
\e iiiiiiiiiiiiiiiiii
\c
\c int main(int argc, char **argv) {
\e bbb      bbb       bbbb
\c     printf("hello, world\n");
\c     return 0;
\e     bbbbbb
\c }

and Halibut would generate:

#include <stdio.h>

int main(int argc, char **argv) {
    printf("hello, world\n");
    return 0;
}

Note that not every \c line has to be followed by a \e line; they're optional.

Also, note that highlighting within a code paragraph is discretionary. Not all of Halibut's output formats can support it (plain text, in particular, has no sensible way to do it). Unless you know you are using a restricted range of output formats, you should use highlighting in code paragraphs only as a visual aid, and not rely on it to convey any vital semantic content.

3.3.2 `\b`, `\n`, `\dt`, `\dd`, `\lcont`: Lists

Halibut supports bulletted lists, numbered lists and description lists.

3.3.2.1 `\b`: Bulletted lists

To create a bulletted list, you simply prefix each paragraph describing a bullet point with the command \b. For example, this Halibut input:

Here's a list:

\b One.

\b Two.

\b Three.

would produce this Halibut output:

Here's a list:

One.

Two.

Three.

3.3.2.2 `\n`: Numbered lists

Numbered lists are just as simple: instead of \b, you use \n, and Halibut takes care of getting the numbering right for you. For example:

Here's a list:

\n One.

\n Two.

\n Three.

This produces the Halibut output:

Here's a list:

One.

Two.

Three.

The disadvantage of having Halibut sort out the list numbering for you is that if you need to refer to a list item by its number, you can't reliably know the number in advance (because if you later add another item at the start of the list, the numbers will all change). To get round this, Halibut allows an optional keyword in braces after the \n command. This keyword can then be referenced using the \k or \K command (see section 3.2.8) to provide the number of the list item. For example:

Here's a list:

\n One.

\n{this-one} Two.

\n Three.

\n Now go back to step \k{this-one}.

This produces the following output:

Here's a list:

One.

Two.

Three.

Now go back to step 2.

The keyword you supply after \n is allowed to contain escaped special characters (\\, \{ and \}), but should not contain any other Halibut markup. It is intended to be a word or two of ordinary text. (This also applies to keywords used in other commands, such as \B and \C).

3.3.2.3 `\dt` and `\dd`: Description lists

To write a description list, you prefix alternate paragraphs with the \dt (‘described thing’) and \dd (description) commands. For example:

\dt Pelican

\dd This is a large bird with a big beak.

\dt Panda

\dd This isn't.

This produces the following output:

Pelican

This is a large bird with a big beak.

Panda

This isn't.

If you really want to, you are allowed to use \dt and \dd without strictly interleaving them (multiple consecutive \dts or consecutive \dds, or a description list starting with \dd or ending with \dt). This is probably most useful if you are listing a sequence of things with \dt, but only some of them actually need \dd descriptions. You should not use multiple consecutive \dds to provide a multi-paragraph definition of something; that's what \lcont is for, as explained in section 3.3.2.4.

3.3.2.4 Continuing list items into further paragraphs

All three of the above list types assume that each list item is a single paragraph. For a short, snappy list in which each item is likely to be only one or two words, this is perfectly sufficient; but occasionally you will find you want to include several paragraphs in a single list item, or even to nest other types of paragraph (such as code paragraphs, or other lists) inside a list item.

To do this, you use the \lcont command. This is a command which can span multiple paragraphs.

After the first paragraph of a list item, include the text \lcont{. This indicates that the subsequent paragraph(s) are a continuation of the list item that has just been seen. So you can include further paragraphs, and eventually include a closing brace } to finish the list continuation. After that, you can either continue adding other items to the original list, or stop immediately and return to writing normal paragraphs of text.

Here's a (long) example.

Here's a list:

\n One. This item is followed by a code paragraph:

\lcont{

\c code
\c paragraph

}

\n Two. Now when I say \q{two}, I mean:

\lcont{

\n Two, part one.

\n Two, part two.

\n Two, part three.

}

\n Three.

The output produced by this fragment is:

Here's a list:
One. This item is followed by a code paragraph:
code
paragraph
Two. Now when I say ‘two’, I mean:
Two, part one.

Two, part two.

Two, part three.

Three.

This syntax might seem a little bit inconvenient, and perhaps counter-intuitive: you might expect the enclosing braces to have to go around the whole list item, rather than everything except the first paragraph.

\lcont is a recent addition to the Halibut input language; previously, all lists were required to use no more than one paragraph per list item. So it's certainly true that this feature looks like an afterthought because it is an afterthought, and it's possible that if I'd been designing the language from scratch with multiple-paragraph list items in mind, I would have made it look different.

However, the advantage of doing it this way is that no enclosing braces are required in the common case: simple lists with only one paragraph per item are really, really easy to write. So I'm not too unhappy with the way it turned out; it obeys the doctrine of making simple things simple, and difficult things possible.

Note that \lcont can only be used on \b, \n and \dd paragraphs; it cannot be used on \dt.

3.3.3 `\rule`: Horizontal rules

The command \rule, appearing on its own as a paragraph, will cause a horizontal rule to be drawn, like this:

Some text.

\rule

Some more text.

This produces the following output:

Some text.

Some more text.

3.3.4 `\quote`: Indenting multiple paragraphs as a long quotation

Quoting verbatim text using a code paragraph (section 3.3.1) is not always sufficient for your quoting needs. Sometimes you need to quote some normally formatted text, possibly in multiple paragraphs. This is similar to HTML's <BLOCKQUOTE> command.

To do this, you can use the \quote command. Like \lcont, this is a command which expects to enclose at least one paragraph and possibly more. Simply write \quote{ at the beginning of your quoted section, and } at the end, and the paragraphs in between will be formatted to indicate that they are a quotation.

(This very manual, in fact, uses this feature a lot: all of the examples of Halibut input followed by Halibut output have the output quoted using \quote.)

Here's some example Halibut input:

In \q{Through the Looking Glass}, Lewis Carroll wrote:

\quote{

\q{The question is,} said Alice, \q{whether you \e{can} make
words mean so many different things.}

\q{The question is,} said Humpty Dumpty, \q{who is to be
master - that's all.}

}

So now you know.

The output generated by this is:

In ‘Through the Looking Glass’, Lewis Carroll wrote:

‘The question is,’ said Alice, ‘whether you can make words mean so many different things.’

‘The question is,’ said Humpty Dumpty, ‘who is to be master - that's all.’

So now you know.

3.3.5 `\C`, `\H`, `\S`, `\A`, `\U`: Chapter and section headings

Section 1.2 mentions that Halibut numbers the sections of your document automatically, and can generate cross-references to them on request; section 3.2.8 describes the \k and \K commands used to generate the cross-references. This section describes the commands used to set up the sections in the first place.

A paragraph beginning with the \C command defines a chapter heading. The \C command expects to be followed by a pair of braces containing a keyword for the chapter; this keyword can then be used with the \k and \K commands to generate cross-references to the chapter. After the closing brace, the rest of the paragraph is used as the displayed chapter title. So the heading for the current chapter of this manual, for example, is written as

\C{input} Halibut input format

and this allows me to use the command \k{input} to generate a cross-reference to that chapter somewhere else.

The keyword you supply after one of these commands is allowed to contain escaped special characters (\\, \{ and \}), but should not contain any other Halibut markup. It is intended to be a word or two of ordinary text. (This also applies to keywords used in other commands, such as \B and \n).

The next level down from \C is \H, for ‘heading’. This is used in exactly the same way as \C, but section headings defined with \H are considered to be part of a containing chapter, and will be numbered with a pair of numbers. After \H comes \S, and if necessary you can then move on to \S2, \S3 and so on.

For example, here's a sequence of heading commands. Normally these commands would be separated at least by blank lines (because each is a separate paragraph), and probably also by body text; but for the sake of brevity, both of those have been left out in this example.

\C{foo} Using Foo
\H{foo-intro} Introduction to Foo
\H{foo-running} Running the Foo program
\S{foo-inter} Running Foo interactively
\S{foo-batch} Running Foo in batch mode
\H{foo-trouble} Troubleshooting Foo
\C{bar} Using Bar instead of Foo

This would define two chapters with keywords foo and bar, which would end up being called Chapter 1 and Chapter 2 (unless there were other chapters before them). The sections foo-intro, foo-running and foo-trouble would be referred to as Section 1.1, Section 1.2 and Section 1.3 respectively; the subsections foo-inter and foo-batch would be Section 1.2.1 and Section 1.2.2. If there had been a \S2 command within one of those, it would have been something like Section 1.2.1.1.

If you don't like the switch from \H to \S, you can use \S1 as a synonym for \S and \S0 as a synonym for \H. Chapters are still designated with \C, because they need to be distinguished from other types of chapter such as appendices. (Personally, I like the \C,\H,\S notation because it encourages me to think of my document as a hard disk :-)

You can define an appendix by using \A in place of \C. This is no different from a chapter except that it's given a letter instead of a number, and cross-references to it will say ‘Appendix A’ instead of ‘Chapter 9’. Subsections of an appendix will be numbered ‘A.1’, ‘A.2’, ‘A.2.1’ and so on.

If you want a particular section to be referred to as something other than a ‘chapter’, ‘section’ or ‘appendix’, you can include a second pair of braces after the keyword. For example, if you're writing a FAQ chapter and you want cross-references between questions to refer to ‘question 1.2.3’ instead of ‘section 1.2.3’, you can write each section heading as

\S{question-about-fish}{Question} What about fish?

(The word ‘Question’ should be given with an initial capital letter. Halibut will lower-case it when you refer to it using \k, and will leave it alone if you use \K.)

This technique allows you to change the designation of particular sections. To make an overall change in what every section is called, see section 3.6.

Finally, the \U command defines an unnumbered chapter. These sometimes occur in books, for specialist purposes such as ‘Bibliography’ or ‘Acknowledgements’. \U does not expect a keyword argument, because there is no sensible way to generate an automatic cross-reference to such a chapter anyway.

3.3.6 `\copyright`, `\title`, `\versionid`: Miscellaneous blurb commands

These three commands define a variety of special paragraph types. They are all used in the same way: you put the command at the start of a paragraph, and then just follow it with normal text, like this:

\title My First Manual

The three special paragraph types are:

\title: This defines the overall title of the entire document. This title is treated specially in some output formats (for example, it's used in a <TITLE> tag in the HTML output), so it needs a special paragraph type to point it out.
\copyright: This command indicates that the paragraph attached to it contains a copyright statement for the document. This text is displayed inline where it appears, exactly like a normal paragraph; but in some output formats it is given additional special treatment. For example, Windows Help files have a standard slot in which to store a copyright notice, so that other software can display it prominently.
\versionid: This command indicates that the paragraph contains a version identifier, such as those produced by CVS (of the form $Id: thingy.but,v 1.6 2004/01/01 16:47:48 simon Exp $ ). This text will be tucked away somewhere unobtrusive, so that anyone wanting to (for example) report errors to the document's author can pick out the version IDs and send them as part of the report, so that the author can tell at a glance which revision of the document is being discussed.

3.3.7 `\#`: Whole-paragraph comments

Section 3.2.9 describes the use of the \# command to put a short comment in the middle of a paragraph.

If you need to use a long comment, Halibut also allows you to use \# without braces, to indicate that an entire paragraph is a comment, like this:

Here's a (fairly short) paragraph which will be displayed.

\# Here's a comment paragraph which will not be displayed, no
matter how long it goes on. All I needed to indicate this was
the single \# at the start of the paragraph; I don't need one
on every line or anything like that.

Here's another displayed paragraph.

When run through Halibut, this produces the following output:

Here's a (fairly short) paragraph which will be displayed.

Here's another displayed paragraph.

3.4 Creating a bibliography

If you need your document to refer to other documents (research papers, books, websites, whatever), you might find a bibliography feature useful.

You can define a bibliography entry using the \B command. This looks very like the \C command and friends: it expects a keyword in braces, followed by some text describing the document being referred to. For example:

\B{freds-book} \q{The Taming Of The Mongoose}, by Fred Bloggs.
Published by Paperjam & Notoner, 1993.

If this bibliography entry appears in the finished document, it will look something like this:

[1] ‘The Taming Of The Mongoose’, by Fred Bloggs. Published by Paperjam & Notoner, 1993.

I say ‘if’ above because not all bibliography entries defined using the \B command will necessarily appear in the finished document. They only appear if they are referred to by a \k command (see section 3.2.8). This allows you to (for example) maintain a single Halibut source file with a centralised database of all the references you have ever needed in any of your writings, include that file in every document you feed to Halibut, and have it only produce the bibliography entries you actually need for each particular document. (In fact, you might even want this centralised source file to be created automatically by, say, a Perl script from BibTeX input, so that you can share the same bibliography with users of other formatting software.)

If you really want a bibliography entry to appear in the document even though no text explicitly refers to it, you can do that using the \nocite command:

\nocite{freds-book}

Normally, each bibliography entry will be referred to (in citations and in the bibliography itself) by a simple reference number, such as [1]. If you would rather use an alternative reference notation, such as [Fred1993], you can use the \BR (‘Bibliography Rewrite’) command to specify your own reference format for a particular book:

\BR{freds-book} [Fred1993]

The keyword you supply after \B is allowed to contain escaped special characters (\\, \{ and \}), but should not contain any other Halibut markup. It is intended to be a word or two of ordinary text. (This also applies to keywords used in other commands, such as \n and \C).

3.5 Creating an index

Halibut contains a comprehensive indexing mechanism, which attempts to be reasonably easy to use in the common case in spite of its power.

3.5.1 Simple indexing

In normal usage, you should be able to add index terms to your document simply by using the \i command to wrap one or two words at a time. For example, if you write

The \i{hippopotamus} is a particularly large animal.

then the index will contain an entry under ‘hippopotamus’, pointing to that sentence (or as close to that sentence as the output format sensibly permits).

You can wrap more than one word in \i as well:

We recommend using a \i{torque wrench} for this job.

3.5.2 Special cases of indexing

If you need to index a computer-related term, you can use the special case \i\c (or \i\cw if you prefer):

The \i\c{grep} command is what you want here.

This will cause the word ‘grep’ to appear in code style, as if the \i were not present and the input just said \c{grep}; the word will also appear in code style in the actual index.

If you want to simultaneously index and emphasise a word, there's another special case \i\e (and similarly \i\s):

This is what we call a \i\e{paper jam}.

This will cause the words ‘paper jam’ to be emphasised in the document, but (unlike the behaviour of \i\c) they will not be emphasised in the index. This different behaviour is based on an expectation that most people indexing a word of computer code will still want it to look like code in the index, whereas most people indexing an emphasised word will not want it emphasised in the index.

(In fact, no emphasis in the text inside \i will be preserved in the index. If you really want a term in the index to appear emphasised, you must say so explicitly using \IM; see section 3.5.3.)

Sometimes you might want to index a term which is not explicitly mentioned, but which is highly relevant to the text and you think that somebody looking up that term in the index might find it useful to be directed here. To do this you can use the \I command, to create an invisible index tag:

If your printer runs out of toner, \I{replacing toner
cartridge}here is what to do:

This input will produce only the output ‘If your printer runs out of toner, here is what to do’; but an index entry will show up under ‘replacing toner cartridge’, so that if a user thinks the obvious place to start in the index is under R for ‘replacing’, they will find their way here with a minimum of fuss.

(It's worth noting that there is no functional difference between \i{foo} and \I{foo}foo. The simple \i case is only a shorthand for the latter.)

Finally, if you want to index a word at the start of a sentence, you might very well not want it to show up with a capital letter in the index. For this, Halibut provides the \ii command, for ‘index (case-)insensitively’. You use it like this:

\ii{Lions} are at the top of the food chain in this area.

This is equivalent to \I{lions}Lions; in other words, the text will say ‘Lions’, but it will show up in the index as ‘lions’. The text inside \ii is converted entirely into lower case before being added to the index data.

3.5.3 Fine-tuning the index

Halibut's index mechanism as described so far still has a few problems left:

In a reasonably large index, it's often difficult to predict which of several words a user will think of first when trying to look something up. For example, if they want to know how to replace a toner cartridge, they might look up ‘replacing’ or they might look up ‘toner cartridge’. You probably don't really want to have to try to figure out which of those is more likely; instead, what you'd like is to be able to effortlessly index the same set of document locations under both terms.
Also, you may find you've indexed the same concept under multiple different index terms; for example, there might be several instances of \i{frog} and several of \i{frogs}, so that you'd end up with two separate index entries for what really ought to be the same concept.
You might well not want the word ‘grep’ to appear in the index without explanation; you might prefer it to say something more verbose such as ‘grep command’, so that a user encountering it in the index has some idea of what it is without having to follow up the reference. However, you certainly don't want to have to write \I{\cw{grep} command}\c{grep} every time you want to add an index term for this! You wanted to write \i\c{grep} as shown in the previous section, and tidy it all up afterwards.

All of these problems can be cleaned up by the \IM (for ‘Index Modification’) command. \IM expects to be followed by one or more pairs of braces containing index terms as seen in the document, and then a piece of text (not in braces) describing how it should be shown in the index.

So to rewrite the grep example above, you might do this:

\IM{grep} \cw{grep} command

This will arrange that the set of places in the document where you asked Halibut to index ‘grep’ will be listed under ‘grep command’ rather than just under ‘grep’.

You can specify more than one index term in a \IM command; so to merge the index terms ‘frog’ and ‘frogs’ into a single term, you might do this:

\IM{frog}{frogs} frog

This will arrange that the single index entry ‘frog’ will list all the places in the document where you asked Halibut to index either ‘frog’ or ‘frogs’.

You can use multiple \IM commands to replicate the same set of document locations in more than one index entry. For example:

\IM{replacing toner cartridge} replacing toner cartridge
\IM{replacing toner cartridge} toner cartridge, replacing

This will arrange that every place in the document where you have indexed ‘replacing toner cartridge’ will be listed both there and under ‘toner cartridge, replacing’, so that no matter whether the user looks under R or under T they will stil find their way to the same parts of the document.

In this example, note that although the first \IM command looks as if it's a tautology, it is still necessary, because otherwise those document locations will only be indexed under ‘toner cartridge, replacing’. If you have no explicit \IM commands for a particular index term, then Halibut will assume a default one (typically \IM{foo} foo, although it might be \IM{foo} \c{foo} if you originally indexed using \i\c); but as soon as you specify an explicit \IM, Halibut discards its default implicit one, and you must then specify that one explicitly as well if you wanted to keep it.

3.5.4 Indexing terms that differ only in case

The tags you use to define an index term (that is, the text in the braces after \i, \I and \IM) are treated case-insensitively by Halibut. So if, as in this manual itself, you need two index terms that differ only in case, doing this will not work:

The \i\c{\\c} command defines computer code.

The \i\c{\\C} command defines a chapter.

Halibut will treat these terms as the same, and will fold the two sets of references into one combined list (although it will warn you that it is doing this). The idea is to ensure that people who forget to use \ii find out about it rather than Halibut silently generating a bad index; checking an index for errors is very hard work, so Halibut tries to avoid errors in the first place as much as it can.

If you do come across this situation, you will need to define two distinguishable index terms. What I did in this manual was something like this:

The \i\c{\\c} command defines computer code.

The \I{\\C-upper}\c{\\C} command defines a chapter.

\IM{\\C-upper} \c{\\C}

The effect of this will be two separate index entries, one reading \c and the other reading \C, pointing to the right places.

3.6 Configuring Halibut

Halibut uses the \cfg command to allow you to configure various aspects of its functionality.

The \cfg command expects to be followed by at least one pair of braces, and usually more after that. The first pair of braces contains a keyword indicating what aspect of Halibut you want to configure, and the meaning of the one(s) after that depends on the first keyword.

Each output format supports a range of configuration options of its own (and some configuration is shared between similar output formats - the PDF and PostScript formats share most of their configuration, as described in section 4.7). The configuration keywords for each output format are listed in the manual section for that format; see chapter 4.

There are also a small number of configuration options which apply across all output formats:

\cfg{chapter}{new chapter name}

This tells Halibut that you don't want to call a chapter a chapter any more. For example, if you give the command \cfg{chapter}{Book}, then any chapter defined with the \C command will be labelled ‘Book’ rather than ‘Chapter’, both in the section headings and in cross-references. This is probably most useful if your document is not written in English.

Your replacement name should be given with a capital letter. Halibut will leave it alone if it appears at the start of a sentence (in a chapter title, or when \K is used), and will lower-case it otherwise (when \k is used).

\cfg{section}{new section name}

Exactly like chapter, but changes the name given to subsections of a chapter.

\cfg{appendix}{new appendix name}

Exactly like chapter, but changes the name given to appendices.

\cfg{contents}{new contents name}

This changes the name given to the contents section (by default ‘Contents’) in back ends which generate one.

\cfg{index}{new index name}

This changes the name given to the index section (by default ‘Index’) in back ends which generate one.

\cfg{input-charset}{character set name}

This tells Halibut what character set you are writing your input file in. By default, it is assumed to be US-ASCII (meaning only plain ASCII, with no accented characters at all).

You can specify any well-known name for any supported character set. For example, iso-8859-1, iso8859-1 and iso_8859-1 are all recognised, GB2312 and EUC-CN both work, and so on. (You can list character sets known to Halibut with by invoking it with the --list-charsets option; see section 2.1.)

This directive takes effect immediately after the \cfg command. All text after that until the end of the input file is expected to be in the new character set. You can even change character set several times within a file if you really want to.

When Halibut reads the input file, everything you type will be converted into Unicode from the character set you specify here, will be processed as Unicode by Halibut internally, and will be written to the various output formats in whatever character sets they deem appropriate.

\cfg{quotes}{open-quote}{close-quote}[{open-quote}{close-quote...}]

This specifies the quote characters which should be used. You should separately specify the open and close quote marks; each quote mark can be one character (\cfg{quotes}{`}{'}), or more than one (\cfg{quotes}{<<}{>>}).

\cfg{quotes} can be overridden by configuration directives for each individual backend (see chapter 4); it is a convenient way of setting quote characters for all backends at once.

All backends use these characters in response to the \q command (see section 3.2.3). Some (such as the text backend) use them for other purposes too.

You can specify multiple fallback options in this command (a pair of open and close quotes, each in their own braces, then another pair, then another if you like), and Halibut will choose the first pair which the output character set supports (Halibut will always use a matching pair). (This is to allow you to configure quote characters once, generate output in several different character sets, and have Halibut constantly adapt to make the best use of the current encoding.) For example, you might write

\cfg{quotes}{\u201c}{\u201d}{"}{"}

and Halibut would use the Unicode matched double quote characters if possible, and fall back to ASCII double quotes otherwise. If the output character set were to contain U+201C but not U+201D, then Halibut would fall back to using the ASCII double quote character as both open and close quotes. (No known character set is that silly; I mention it only as an example.)

\cfg{quotes} (and the backend-specific versions) apply to the entire output; it's not possible to change quote characters partway through the output.

In addition to these configuration commands, there are also configuration commands provided by each individual output format. These configuration commands are discussed along with each output format, in chapter 4.

The default settings for the above options are:

\cfg{chapter}{Chapter}
\cfg{section}{Section}
\cfg{appendix}{Appendix}
\cfg{contents}{Contents}
\cfg{index}{Index}
\cfg{input-charset}{ASCII}

The default for \cfg{input-charset} can be changed with the --input-charset option; see section 2.1. The default settings for \cfg{quotes} are backend-specific; see chapter 4.

3.7 Defining macros

If there's a complicated piece of Halibut source which you think you're going to use a lot, you can define your own Halibut command to produce that piece of source.

In section 3.2.7, there is a sample piece of code which prints a Euro sign, or replaces it with ‘EUR’ if the Euro sign is not available:

This is likely to cost \u20AC{EUR\_}2500 at least.

If your document quotes a lot of prices in Euros, you might not want to spend all your time typing that out. So you could define a macro, using the \define command:

\define{eur} \u20AC{EUR\_}

Your macro names may include Roman alphabetic characters (a-z, A-Z) and ordinary Arabic numerals (0-9), but nothing else. (This is general syntax for all of Halibut's commands, except for a few special ones such as \_ and \- which consist of a single punctuation character only.)

Then you can just write ...

This is likely to cost \eur 2500 at least.

... except that that's not terribly good, because you end up with a space between the Euro sign and the number. (If you had written \eur2500, Halibut would have tried to interpret it as a macro command called eur2500, which you didn't define.) In this case, it's helpful to use the special \. command, which is defined to do nothing at all! But it acts as a separator between your macro and the next character:

This is likely to cost \eur\.2500 at least.

This way, you will see no space between the Euro sign and the number (although, of course, there will be space between ‘EUR’ and the number if the Euro sign is not available, because the macro definition specifically asked for it).

Comments to anakin@pobox.com
[Halibut version 1.2]

Chapter 3: Halibut input format

3.1 The basics

3.2 Simple inline formatting commands

3.2.1 \e and \s: Emphasising text

3.2.2 \c and \cw: Displaying computer code inline

3.2.3 \q: Quotation marks

3.2.4 \- and \_: Non-breaking hyphens and spaces

3.2.5 \date: Automatic date generation

3.2.6 \W: WWW hyperlinks

3.2.7 \u: Specifying arbitrary Unicode characters

3.2.8 \k and \K: Cross-references to other sections

3.2.9 \#: Inline comments

3.3 Paragraph-level commands

3.3.1 \c: Displaying whole paragraphs of computer code

3.3.2 \b, \n, \dt, \dd, \lcont: Lists

3.3.2.1 \b: Bulletted lists

3.3.2.2 \n: Numbered lists

3.3.2.3 \dt and \dd: Description lists

3.3.2.4 Continuing list items into further paragraphs

3.3.3 \rule: Horizontal rules

3.3.4 \quote: Indenting multiple paragraphs as a long quotation

3.3.5 \C, \H, \S, \A, \U: Chapter and section headings

3.3.6 \copyright, \title, \versionid: Miscellaneous blurb commands

3.3.7 \#: Whole-paragraph comments

3.4 Creating a bibliography

3.5 Creating an index

3.5.1 Simple indexing

3.5.2 Special cases of indexing

3.5.3 Fine-tuning the index

3.5.4 Indexing terms that differ only in case

3.6 Configuring Halibut

3.7 Defining macros

3.2.1 `\e` and `\s`: Emphasising text

3.2.2 `\c` and `\cw`: Displaying computer code inline

3.2.3 `\q`: Quotation marks

3.2.4 `\-` and `\_`: Non-breaking hyphens and spaces

3.2.5 `\date`: Automatic date generation

3.2.6 `\W`: WWW hyperlinks

3.2.7 `\u`: Specifying arbitrary Unicode characters

3.2.8 `\k` and `\K`: Cross-references to other sections

3.2.9 `\#`: Inline comments

3.3.1 `\c`: Displaying whole paragraphs of computer code

3.3.2 `\b`, `\n`, `\dt`, `\dd`, `\lcont`: Lists

3.3.2.1 `\b`: Bulletted lists

3.3.2.2 `\n`: Numbered lists

3.3.2.3 `\dt` and `\dd`: Description lists

3.3.3 `\rule`: Horizontal rules

3.3.4 `\quote`: Indenting multiple paragraphs as a long quotation

3.3.5 `\C`, `\H`, `\S`, `\A`, `\U`: Chapter and section headings

3.3.6 `\copyright`, `\title`, `\versionid`: Miscellaneous blurb commands

3.3.7 `\#`: Whole-paragraph comments