RJKDOC

Richard Kettlewell, <rjk@greenend.org.uk>


Contents


1. Introduction

RJKDOC is an SGML DTD for program documentation, plus some tools for generating other formats from it. The tools are written in C++ and use James Clarke's SP toolkit. The software may be copied under the terms of the GNU General Public License.

The motivation for RJKDOC is to produce both full manuals and reference "man pages" from the same source, removing the need to duplicate information between the two. Currently the output formats supported are plain text, HTML and man pages.

If you want something more flexible, detailed or powerful than RJKDOC then DocBook is probably a good place to look. If you just want something cheap and cheerful (or should that be "quick and dirty"?), then RJKDOC might be the right thing.

1.1 SGML

To write or modify RJKDOC documents you need at least some familiarity with SGML. Some introductory material can be found on the web at http://www.oasis-open.org/cover/general.html. Use of an SGML-aware editor is recommended.

1.2 Reporting Bugs

Please send any bug reports or suggestions for improvements to Richard Kettlewell <richard+rjkdoc@sfere.greenend.org.uk>. Please give as much relevant detail as possible. Check http://www.greenend.org.uk/rjk/ for announcements of new versions, etc.


2. Writing RJKDOC Documents

2.1 SGML

RJKDOC documents are SGML documents. If you're familiar with HTML you'll have some of the basics already. A full explanation of SGML is beyond the scope of this manual but we can cover the basics quickly.

As far as we are concerned an SGML document consists of a doctype declaration, saying what type of thing it is (for instance, to distinguish between HTML and RJKDOC documents), and then a top-level element. A doctype declaration looks like this:

<!doctype rjkdoc public "-//greenend.org.uk//DTD RJKDOC 0.1//EN">

An element consists of an open tag, possibly a body, and a close tag. Sometimes the close tag can be left implicit, and the software will work out where it goes automatically. The examples in this manual often rely on this.

The simplest kind of open tag looks like <NAME>. For instance the top level element of an RJKDOC document is started with <rjkdoc>, and finished with </rjkdoc>.

You can also have attributes inside the tag to provide additional information, for instance <NAME ATTR-NAME=ATTR-VALUE>. The attribute values might need quoting: if in doubt, do quote, by putting double quotes around the whole value. Double quotes and ampersands that appear inside the value need escaping; use &dqout; and &amp; respectively. For instance to link to the URL http://www.example.com/?a=b&c=d you might write <url name="http://www.example.com/?a=b&amp;c=d">.

The body consists of plain text and other elements. The meaning of the text depends on the element it appears within (and this is the whole point - you put the tags in to indicate to the software what each bit of the document means, so that it can be processed in the proper way.) If you want to put an angle bracket into the plain text you must use &lt;, and the ampersand must be escaped as above.

2.2 Overview of RJKDOC

An RJKDOC document consists of a title page, which has various bits of administrative information in it, and a body, consisting of chapters and appendices. A contents list is automatically inserted between the title page and the body.

The whole lot is wrapped up in an <rjkdoc> element, and it's also necessary to specify the document type. For example:

<!doctype rjkdoc public "-//greenend.org.uk//DTD RJKDOC 0.1//EN">
<rjkdoc>
  <titlepage>
    ...
  <body>
    ...
</rjkdoc>

The version number in the doctype line will be changed if any new features are added to the DTD.

2.3 The Title Page

A title page consists of a title, several author credits, an abstract and a copyright notice. The abstract and copyright notice are optional.

Example:

<titlepage>
  <title>A sample document

  <author>Fred Bloggs

  <abstract>
    <p>This is an example document.

Note that the element names (<title>, etc) can be in upper case or lower case.

The title and author elements contain inline text (see section 2.6). The copyright and abstract elements contain structured text (see section 2.5).

2.4 Sections

The body consists of a sequence of chapters, following by a sequence of appendices. These are both really types of section; the full list of kinds of section is as follows:

<chapter>
<appendix>

"Top level" sections

<section>

Second level sections (but can appear as a third-level section within a manpage)

<s1>
<s2>
<s3>

Third, fourth and fifth level sections. If your document needs more depth than this then perhaps it is too complicated!

<manpage>

Second level sections (but with special structural properties) - see section 2.7 for more information.

<name>
<synopsis>
<description>

Third level sections within manpages.

Apart from the manpage sections, which are described below, all kinds of section consist of a heading (which is mandatory) followed by (optionally) structured text (see section 2.5) and then (again, optionally) subsections of the appropriate kind.

All of the section types can take an id attribute, which allows you to provide a symbolic name which can be used in a <xref> element. See section 2.6.2.

The <heading> element contains inline text (see section 2.6).

Examples:

<body>
  <chapter>
    <heading>The first chapter

    <p>The chapter text goes here...

  <chapter>
    <heading>The second chapter

2.5 Structured Text

Structured text comes in three kinds: paragraphs, preformatted text and lists. A block of structured text can be made up of any combination of these kinds of element.

Each paragraph is contained within a <p> element, and is reformatted by the processing software to fit the output medium. The contents of a paragraph is called inline text (see section 2.6). Example:

<p>See <file>rjkdoc.sgml</file> for further examples.

...which formats to:

See rjkdoc.sgml for further examples.

Preformatted text is contained within the <pre> element. No reformatting is done, and it is output in a monospaced font (where there is any choice about fonts). Example:

<pre>+--------------------+
|      Boxed in      |
+--------------------+</pre>

...which formats to:

+--------------------+
|      Boxed in      |
+--------------------+

(In fact, pre contains inline text just as p does.)

Note that the effect of using tab characters within <pre> is undefined, and will probably not do what you want.

Two kinds of list are supported at the moment. The first are tagged lists, which are contained in <taglist> elements and consist of a sequence of tags and items. Each item must be preceded by one or more tags, and you can't have tags without a corresponding item.

Plain lists use the <list> and are the same as tagged lists, except that there are no tags. The processor software might add bullet points or similar.

Items contain structured text. Tags, however, can only contain inline text - in effect they act like a single paragraph of their own.

Example of a tagged list:

<taglist>
  <tag><lit>file</lit>
  <item><p>The name of a file

  <tag><lit>prog</lit>
  <item><p>The name of a program

</taglist>

...which formats to:

file

The name of a file

prog

The name of a program

Example of an untagged list:

<list>
  <item><p>The first item

    <p>Second paragraph in the first item

  <item><p>The second item

</list>

...which formats to:

The basic structured text elements can all have <id> attributes. See section 2.6.2.

2.6 Inline Text

2.6.1 Plain Text Markup

The simplest kind of inline text is just plain old text. However various markup elements are available to mark filenames, program names, etc:

<file>

The name of a file. Example: rjkdoc.sgml

<prog>

The name of a program. Example: rjkdoc-html

<proc>

The name of a function or procedure in a programming language. Example: TextDocument::format

<var>

A syntactic variable, i.e. a name which stands for something the user supplies. Example: filename

<lit>

Any piece of text which is to be taken literally, e.g. something to be typed as-is into a computer; effectively the opposite of <var>. Example: <rjkdoc>

<em>

Emphasized text. Example: important

<email>

An email address. Example: <user@example.com>

<option>

An option, e.g. to a program. Example: -debug

<optional>

Optional text, e.g. in the syntax of a program's command line. Example: [-debug]

All of these elements may be nested, though not all combinations are useful.

2.6.2 Links

There are also some special empty elements that you can use:

<url>

This has an attribute of value name which gives the URL referenced. The element is replaced with the value, and (where possible) made into a link.

<xref>

This has an attribute of value ref which identifies another element in the same document. The element identified is that one that has the same value in its id attribute as the value in the ref attribute of this element.

The elements which can have id attributes are all the section types except for <name> and <synopsis>, plus the basic types of structured text.

The <xref> element is replaced with text describe the referenced element (e.g. "section 2.2.3") and made into a link if possible.

<manref>

This has two attributes, name and section. The element is replaced with text describing the referenced man page (usually name(section)) and made into a link where possible.

2.7 Man Pages

Sections introduced with <manpage> have a special structure. First, the element itself has a couple of attributes:

name

The name of the man page

section

The section number of the man page

Secondly, there is no <heading>; it is deduced from these attributes.

Thirdly, the man page doesn't have any initial text but always starts with a <name> element. This contain structured text by convention is always of the format "name - description", where name is the name of the program (or whatever) described and description is a brief summary description.

Following this is the (optional) <synopsis> element. This contains structured text. For programs it is a summary of the command line syntax; the <option> and <optional> elements are intended to be used here. For procedures and functions for a computer language, it lists the headers required and describes the arguments of the procedures and functions document, in whatever is the usual manner for the language. (For example in C, a function declaration is used, with explanatory names for the arguments.)

After this comes the <description> element, which is mandatory. This is the same as <section> except that it has no <heading>. It contains structured text followed, optionally, by <s1> subsections.

Any further sections required are contained in ordinary <section> elements.

Example:

<manpage name=someprog section=1>
  <name>someprog - do something or other

  <synopsis>
    <p><prog>someprog</prog> <var>filename</var>

  <description>
    <p>The description text goes here...

A. Reference

This appendix contains a reference to the DTD itself and to the tools from the RJKDOC package. The latter may be used to translate RJKDOC documents into HTML, plain text or man pages.

A.1 rjkdoc(5)

Name

rjkdoc - the RJKDOC DTD

Description

This man page provides a quick reference to the RJKDOC DTD.

Doctype

The doctype declaration for RJKDOC is as follows:

<!doctype rjkdoc public "-//greenend.org.uk//DTD RJKDOC 0.1//EN">

Elements

The following elements are supported by RJKDOC:

abstract

This element defines a summary of the document. Usually it would be a single paragraph of structured text, but may be more.

appendix

This element contains top level sections, and consists of a heading followed by plain text, followed by any number of sections and manual pages.

author

This element defines an author (either of the document or whatever the document describes).

body

This element encompasses the main body of the document. It consists of zero or more chapters followed by sero or more appendices.

chapter

This element contains top level sections, and consists of a heading followed by plain text, followed by any number of sections and manual pages.

copyright

This element contains the copyright notice for the document, which should be structured text.

description

This element contains the description section of a man page, and consists of structured text and <s1> subsections.

em

This element is used in inline text to delimit emphasized text. Note that there is no way of explicitly specifying bold, italic, etc.

email

This element is used in inline text to delimit email addresses.

file

The file tag is used in inline text to delimit filenames.

heading

This element defines the heading for a section.

item

This element contains an item, as found in a list of some kind. Its contents is structured text.

list

This element contains a list, which is just a sequence of one or more items. Lists are an instance of structured text.

lit

This element is used in inline text to delimit literal text, e.g. a string that would be typed unaltered into a program.

manpage

This defines a section of the document which stands as a man page. It contains a name, an optional synopsis, a description and any number of ordinary sections.

The name of the man page should be given in the name attribute and the section in the section attribute.

manref

This empty element expands to a man page reference. The name attribute should be the name of the target man page and the section attribute should have the section. Normally this would expend to the name followed by the section in parentheses.

name

This defines the name section of a man page. It would normally be a single paragraph of structured text.

option

This element is used in inline text to delimit the name of an option, e.g. to a UNIX command line program. Usually the formatting would be the same as for <lit>.

optional

This element is used in inline text to delimit optional text, e.g. in the description of the syntax of a command. Normally this would expand to square brackets.

p

This element contains a paragraph. It is an instance of structured text, and contains inline text.

pre

This element contains preformatted inline text.

proc

This element is used in inline text to delimit the names of functions and procedures, for example as found within a program's source code.

prog

This element is used in inline text to delimit the names of programs.

rjkdoc

This element surrounds the entire document. It consists of a title page followed by a body.

s1
s2
s3

These elements represent increasingly deeply nested subsections. Each consists of a heading, structured text and (except for s3) subsections of the next level down.

section

This defines a top-level section of a chapter, appendix or man page. It consists of a heading, structured text and <s1> subsections.

synopsis

This defines the synopsis section of a man page. It is constructed of structured text and would normally give a summary of the syntax of the command or function document, or whatever.

tag

This element contains a tag as found in a tagged list. Its contents is inline text.

taglist

This element contains a tagged list, a primitive kind of table. It consists of a sequence of one or more tag-item groups; each such group consists of one or more tags and a single item. Tagged lists are an instance of structured text.

title

This element contains the title of the document.

titlepage

This element defines the title page of a document. Typically this would appear at the start of a formatted version, though it doesn't have to.

A title page consists of a title, one or more authors, an optional abstract and an optional copyright notice.

url

This empty element defines a reference to a URL, which is specified using the name attribute. In HTML for example the URL would be formatted as a link (as well as the literal text).

var

This element is used in inline text to delimit the names of metasyntactic variables, i.e. strings in documentation that would stand for something else in a real example.

xref

This empty element defines a reference to some other point in the document. The ref attribute should have the name of the other point in the document; the element will be replaced in processing by some appropriate text.

The following tags all support id attributes, which contains strings used as the target of ref attributes in the <xref> element: <chapter>, <appendix>, <manpage>, <description>, <section>, <s1>, <s2>, <s3>, <p>, <pre>, <taglist> and <list>.

Special Characters

The following entities are defined.

&AElig;

Æ

&Aacute;

Á

&Acirc;

Â

&Agrave;

À

&Aring;

Å

&Atilde;

Ã

&Ccedil;

Ç

&Delta;

Delta

&Eacute;

É

&Egrave;

È

&Gamma;

Gamma

&Iacute;

Í

&Icirc;

Î

&Igrave;

Ì

&Ntilde;

Ñ

&Oacute;

Ó

&Ocirc;

Ô

&Ograve;

Ò

&Omega;

Omega

&Oslash;

Ø

&Otilde;

Õ

&Phi;

Phi

&Pi;

Pi

&Prime;

''

&Prod;

prod

&Psi;

Psi

&Sigma;

Sigma

&Sum;

sum

&Theta;

Theta

&Uacute;

Ú

&Ugrave;

Ù

&Upsi;

Upsi

&Xi;

Xi

&Yacute;

Ý

&aacute;

á

&acirc;

â

&aelig;

æ

&agrave;

à

&aleph;

aleph

&alpha;

alpha

&amp;

&

&and;

and

&ang;

ang

&ap;

~

&aring;

å

&ast;

*

&atilde;

ã

&beta;

beta

&bottom;

bottom

&bsol;

\\

&bull;

-

&bull;

-

&cap;

cap

&ccedil;

ç

&cent;

¢

&chi;

chi

&cir;

o

&circ;

^

&clubs;

clubs

&colon;

:

&comma;

,

&commat;

@

&congr;

~=

&copy;

©

&cup;

cup

&dArr;

dArr

&darr;

darr

&delta;

delta

&diams;

diams

&divide;

÷

&dollar;

$

&dot;

·

&eacute;

é

&ecirc;

ê

&egrave;

è

&empty;

{}

&emsp;

&ensp;

&epsi;

epsi

&equals;

=

&equiv;

==

&ero;

&

&eta;

eta

&etago;

</

&exist;

exist

&forall;

forall

&gamma;

gamma

&ge;

>=

&grave;

\'

&gt;

>

&hArr;

<=>

&harr;

<->

&hearts;

hearts

&hellip;

...

&hyphen;

-

&iacute;

í

&icirc;

î

&iexcl;

¡

&igrave;

ì

&image;

image

&infin;

infin

&int;

int

&iota;

iota

&iquest;

¿

&isin;

isin

&kappa;

kappa

&lArr;

<=

&lambda;

lambda

&lang;

(

&larr;

<-

&lcub;

{

&le;

<=

&lowbar;

_

&lpar;

(

&lsqb;

[

&lt;

<

&mdash;

---

&mid;

|

&minus;

-

&mu;

µ

&nabla;

nabla

&nbsp;

&ndash;

--

&ne;

=/=

&nequiv;

<>

&not;

¬

&notin;

notin

&nsub;

nsub

&nsube;

nsube

&nsup;

nsup

&nsupe;

nsupe

&ntilde;

ñ

&nu;

nu

&num;

#

&nvDash;

nvDash

&nvdash;

nvdash

&oacute;

ó

&ocirc;

ô

&ograve;

ò

&omega;

omega

&oplus;

oplus

&or;

or

&oslash;

ø

&otilde;

õ

&otimes;

otimes

&para;

&part;

part

&percnt;

%

&phis;

phis

&pi;

pi

&plus;

+

&plusmn;

±

&pound;

£

&prime;

'

&prop;

prop

&psi;

psi

&quot;

'

&rArr;

=>

&rang;

)

&rarr;

->

&rcub;

}

&real;

real

&refnam;

&refname;

&rho;

rho

&rpar;

)

&rsqb;

]

&sect;

§

&semi;

;

&setmn;

\

&sigma;

sigma

&sigmav;

sigmav

&spades;

spades

&square;

square

&sub;

sub

&sube;

sube

&sup;

sup

&supe;

supe

&szlig;

ß

&tau;

tau

&thetas;

theta

&thinsp;

&tilde;

~

&times;

×

&tm;

[TM]

&uArr;

uArr

&uacute;

ú

&uarr;

uarr

&ucirc;

û

&ugrave;

ù

&upsi;

upsi

&urlnam;

&urlname;

&vDash;

|=

&vdash;

|-

&verbar;

|

&xi;

xi

&yacute;

ý

&yuml;

y

&zeta;

zeta

See Also

rjkdoc-text(1), rjkdoc-html(1), rjkdoc-man(1)

A.2 rjkdoc-text(1)

Name

rjkdoc-text - convert rjkdoc sgml to plain text

Synopsis

rjkdoc-text [--debug] [--raw|--overstrike|--no-overstrike] filename

rjkdoc-text --help

rjkdoc-text --version

Description

rjkdoc-text converts the file listed on the command line to a plain text file, and writes it to standard output.

Normally the immediate output of the processor is automatically piped through strike-text(1) to convert it to plain text (possibly with overstrikes). However it is also possible to get the raw output.

If you generate output with overstrikes, then programs such as less(1) can be used to read it. Also your printer may or may not be able to handle it directly.

Options

--debug

Debug mode

--raw

Raw mode; the output will contain escapes intended for processing by strike-text(1).

--overstrike

Overstrike mode; the output will use backspaces and overprinting to get bold and underlined text.

--no-overstrike

Non-overstrike mode (the default).

--help

Display usage summary

--version

Display version number

See also

rjkdoc-man(1), rjkdoc-html(1), strike-text(1), rjkdoc(5)

A.3 rjkdoc-html(1)

Name

rjkdoc-html - convert rjkdoc sgml to HTML

Synopsis

rjkdoc-html [--debug] [--style filename] filename

rjkdoc-html --help

rjkdoc-html --version

Description

rjkdoc-html converts the file listed on the command line to a single HTML file, and writes it to standard output.

Options

--debug

Debug mode

--style

Sets the name of the file to include in the document header instead of the embedded stylesheet. This can be an embedded stylesheet itself, or a collection of links to stylesheets, or something else.

--help

Display usage summary

--version

Display version number

Notes

The HTML file includes an embedded stylesheet, which may be overridden. It defines the following classes:

file
prog
proc
lit
var
option

These are all used for the RJKDOC element of the same name.

title

Used for the <title> element

author

Used for <author> elements

abstract

Used for the abstract

copyright

Used for the copyright notice

Future Directions

Currently the output goes to a single big HTML file. Particularly for larger projects it would be useful to be able to split it up by chapter. This shouldn't be too hard to implement.

See also

rjkdoc-text(1), rjkdoc-man(1), rjkdoc(5)

A.4 rjkdoc-man(1)

Name

rjkdoc-man - extract man pages from rjkdoc sgml

Synopsis

rjkdoc-man [--directory directory] [--debug] filename

rjkdoc-man --help

rjkdoc-man --version

Description

rjkdoc-man extracts the manpage sections from the file listed on the command line and writes them to files named name.section in the specified directory (or the current directory).

The date field in the .TH macro at the top is set from the last update time on the input file, and is formatted as year-month-day. The title field is taken from the title element of the input.

Options

--debug

Debug mode

--directory

Specifies the directory where the output files should be written. The default is the current directory.

--help

Display usage summary

--version

Display version number

See also

rjkdoc-text(1), rjkdoc-html(1), rjkdoc(5)

A.5 strike-text(1)

Name

strike-text - convert marked text to plain or overstrike text

Synopsis

strike-text [--overstrike|--no-overstrike]

strike-text --help

strike-text --version

Description

strike-text is a filter used to convert the intermediate "raw" form generated by rjkdoc-text(1) to either plain text, or plain text using overstrikes to achieve the effects of bold type and underlining. It always reads from standard input and writes to standard output; it is not possible to specify a filename on the command line. It is only really intended to be used from rjkdoc-text(1).

In the input, the following sequences have special meaning:

!B

Start of bold text

!U

Start of underlined text

!R

Return to plain text

!!

A literal !

Options

--overstrike

Generate output using overstrikes (the default)

--no-overstrike

Generate plain text output, without overstrikes

--help

Display usage summary

--version

Display version number

See Also

rjkdoc-text(1)