[sod] / STYLE

Notes on coding style

* General

** Layout

Lines are 77 characters at most, except for strange special effects.
Don't ask.  This is not negotiable, though.  Don't try to tell me that
your monitor is very wide so you can read longer lines.  My monitor is
likely at least as wide.  On the other hand, most lines are easily short
enough to fit in my narrow columns, so the right hand side of a wide
window would be mostly blank.  This seems wasteful to me, when I could
fill that space with more code.

Horizontal whitespace for layout purposes -- i.e., indentation and
alignment, rather than just separating words -- consists of as many tabs
as possible, followed by as many spaces as necessary to reach the target
column.  Tab stops occur at every eight columns.  You can tell this
because when you cat a file to your terminal, that's how the tabs
appear.  Editors which disagree about this are simply wrong.

My indentation quantum is usually two columns.  It seems that some
modern editors are deeply confused, and think that tab width and
indentation quantum are the same thing, but they aren't.  Such broken
editors will make a hopeless mess of my code.  If you have the
misfortune to use such an editor, maybe you could contribute patches to
fix it.


* Lisp style

** Language subset and extensions

None of ANSI Common Lisp is off-limits.

I think my Lisp style is rather more imperative in flavour than most
modern Lisp programmers.  It's probably closer to historical Lisp
practice in that regard, even though I wasn't writing Lisp back then.  A
lot of this is because I don't assume that the Lisp implementation
handles tail calls properly: Common Lisp is not Scheme.

I make extensive use of CLOS, and macros.  On a couple of occasions I've
made macros which use CLOS generic function dispatch to compute their
expansions.  The parser language is probably the best example of this in
the codebase.

I like hairy ~format~ strings.  I've intentionally opted to leave them
as challenges to the reader rather than explain them.

I've avoided hairy ~loop~ for the most part, not because I dislike it
strongly but because others do and I don't find that it wins big enough
for the fight to be worthwhile.

I only use ~&aux~ lambda-list parameters in ~defstruct~ BOA
constructors, for special effects.

I use ~car~, not ~first~, and ~cdr~, not ~rest~.  Similarly, I use
~cadr~, not ~second~, and I'm not afraid to use ~cddr~ or ~cadar~.

Similarly, I've not used ~elt~, preferring to know what kind of sequence
I'm dealing with, or using the built-in sequence functions.

I'm happy to use ~1+~, and I like the brevity of ~1-~ enough to use it
despite its terrible name.

There are no reader syntax extensions in the code.  This is because I
couldn't think of any way they'd be especially helpful, and not because
I'm in any way opposed to them.

The main translator, in the ~SOD~ package, tries to assume very little
beyond ANSI Common Lisp and what's included in just about every serious
implementation: specifically, MOP introspection, and Gray streams.
There's intentionally no MOP intercession.

The frontend additionally makes use of ~cl-launch~, but the dependency
is actually quite weak, and it could be replaced with a different, maybe
implementation-specific, mechanism fairly easily.  I'm keen to take
patches which improve frontend portability.

I'm more tolerant of extensions and external dependencies in the test
suite, which makes additional use of ~xlunit~.  Running the test suite
isn't essential to getting the translator built, so this isn't as much
of a problem.

** Layout

I pretty much let Emacs indent my code for me, based on information
collected by SLIME.  Some exceptions:

  + DSLs (e.g., the parser language) have their own space of macros
    which Emacs doesn't understand and for the most part I haven't
    bothered to teach it.

  + Emacs sometimes does a bad job with hairy ~loop~ and requires manual
    fixing.  Since I don't use hairy ~loop~ much, this isn't a major
    problem.

  + Emacs indents lambda lists really badly.  I often prefer to put the
    entire lambda list on its own line than to split it.  If I have to
    split a simple lambda list, without lambda-list keywords, I just
    align the start of each subsequent line with the start of the first
    argument.  I break hairy lambda lists before lambda-list keywords,
    and the start of a subsequent line aligns with the first argument
    name following the lambda-list keyword which begins the group, so
    that the lambda-list keyword stands out.

    : (defun many-arguments (first second third
    : 			     fourth fifth)
    :   ...)

    : (defun hairy-arguments (first second third
    : 			      &optional fourth fifth
    : 					sixth
    : 			      &rest others)
    :   ...)

    I don't know what I'd do if I had a hairy lambda list with so many
    mandatory positional arguments that I had to split them.  So far,
    this situation hasn't come up.

Lisp code does have a tendency to march across to the right quite
rapidly given a chance.  I have a number of strategies for dealing with
this.

  + Break a long nested calculation into pieces, giving names to the
    intermediate results, in a ~let*~ form.

  + Hoist deeply nested complex computations out into ~flet~ or
    ~labels~, and then invoke them from inside whatever complicated
    conditional mess was needed to decide what to do.

  + Shrug my shoulders and let code dribble down the right hand side for
    a bit.

** Packages and exporting

A package collects symbols which are given meanings in one or more
source files.  If a package's code is all in one file, then the package
definition can be put in that file too; otherwise I put it in its own
file.

I don't put ~:export~ in package definitions.  Instead, I scatter calls
to the ~export~ function throughout the code, right next to where the
relevant symbol is defined.  This has three important advantages.

  + You can tell, when you're reading the code which defines ~foo~,
    whether ~foo~ is exported and therefore a defined part of the
    package interface.

  + When you know that you're writing a thing which will form part of
    the package interface, you don't have to go off and edit some other
    file to export it.

  + A master list of exported symbols becomes a merge hazard: if two
    different branches add symbols to nearby pieces of the master list
    then you get a merge conflict for no especially good reason.

There's an apparent disadvantage: there's no immediately visible master
list of exported symbols.  But that's not a big problem:

: (loop for s being the external-symbols of pkg collect s)

See ~doc/list-symbols.lisp~ for more sophisticated reporting.  (In
particular, this identifies what kind of thing(s) each external symbol
names.)

** Comments and file structuring

A file starts with a big ~;;;~ comment bearing the Emacs ~-*-lisp-*-~
marker, a quick description, and copyright and licensing boilerplate.  I
don't use four-semicolon comments, and I only use ~#|~ ... ~|#~ for
special effects.

Then there's package stuff.  There may be a ~cl:defpackage~ form (with
explicit package qualifier) if the relevant package doesn't have its own
package definition file.  I use gensyms to name packages: strings don't
seem right, and symbols would leak into some unrelated package.

Then there's ~cl:in-package~.  Like ~defpackage~, I use a gensym to name
the package.  I can't think offhand of a good reason to have a file with
sections `in' more than one package.  So, the ~in-package~ form goes at
the top of the file, before the first section header.  If sections are
going to end up in separate packages, I think I'd put a ~cl:in-package~
at the top of each section in case I wanted to reorder them.

The rest of the file consists of Lisp code.  I don't use page boundaries
~^L~ to split files up.  Instead, I use big banner comments for this:

: ;;;--------------------------------------------------------------------------
: ;;; Section title.

Sections don't usually have internal comments, but if they did they'd
also be ~;;;~ comments.

Almost all definitions get documentation strings.  I've tried to be
consistent about formatting.

  + Docstring lines are 77 characters or less.

  + The first line gives a summary of what the thing does.  The summary,
    together with the SLIME-generated synopsis, is likely enough to
    remind you what the thing does.

  + The rest of the lines are indented by three spaces, and explain
    carefully what the thing does and what all the parameters mean.

Smallish functions and macros don't usually need any further
commentary.  Big functions often need to be split into bitesize pieces
with their own internal ~;;~ comments.  The idea is that these comments
should explain the code's overall strategy to the reader, and help them
figure out how a piece fits into that strategy.

Winged, single ~;~ comments are very rare.

Files end, as a result of long tradition, with a comment

: ;;;----- That's all, folks --------------------------------------------------

** Macro style

I don't mind complicated macros if they're doing something worthwhile.
They need to have good documentation strings, though.

That said, where possible I've tried to factor macros into an actual
macro providing the syntactic sugar, and a function which receives the
parameters and $\eta$-expanded forms, and does the actual work.

It's extremely bad taste for a macro to evaluate its evaluable
parameters in any order other than strictly left to right, or to
evaluate them more than once.

** Data structures

I've tended to be happy with plain lists for homogeneous-ish
collections.  Strongly heterogeneous collections (other than input
syntax, destructured using ~defmacro~ or ~destructuring-bind~) I've
tended to make a proper data type for.

My first instinct when defining a new structure is to use ~defclass~.
While it's annoyingly verbose, it has the immense benefit over
~defstruct~ that it's safe to redefine CLOS classes in a running image
without the world breaking, and I usually find it necessary to add or
change slots while I'm working on new code.  Once a piece of code has
settled down and I have a good feel for what my structure is actually
doing, I might switch the ~defclass~ for a ~defstruct~.  Several
questions influence my decision.

  + Do slot accesses need to be really fast?  My usual Lisp
    implementations aggressively optimize ~defstruct~ accessor
    functions.

  + Have I subclassed my class?  While I can move over a
    single-inheritance tree using ~:include~, it seems wrong to do this
    most of the time.  Also, I'd be precluding subclasses from multiple
    inheritance, and I'd either have to prohibit subclassing by
    extensions or have to commit to ~defstruct~ in the documentation.
    In general, I'm much happier committing to ~defclass~.

  + Are there methods specialized on my class?  Again, structure classes
    make fine method specializers, but it doesn't seem right.

Apart from being hard to redefine, ~defstruct~ does a pretty good job of
making a new structure type.  I tend to tidy up a few rough edges.

  + The default predicate always has ~-p~ appended.  If the class name
    is a single word, then I'll explicitly name the predicate with a
    simple ~p~ suffix.  For example, ~ship~ would have the predicate
    ~shipp~, rather than ~ship-p~.

  + If there are slots I can't default then I'll usually provide a BOA
    constructor which sets them from required parameters; other slots
    I'll set from optional or keyword parameters according to my taste
    and judgement.

  + Slots mustn't be given names which are external in any package.
    Unfortunately, slot names are used in constructing accessor names,
    and sometimes the right accessor name involves a prohibited symbol.
    I've mostly addressed this by naming the slot ~%foo~, and then
    providing inline reader and writer functions.  (CLOS class
    definitions don't have this problem because you get to set the
    accessor function names independently of the slot names.)

  + BOA constructors are strange.  You can set the initial slots based
    on an arbitrary computation on the provided parameters, but you have
    to roll up your sleeves and mess with ~&aux~ parameters to pull it
    off.

** Naming

I'm a traditionalist in some ways, and one of the reasons I like Lisp is
the richness of its history and tradition.

In other languages, I tend to use single- or two-letter names for
variables and structure slots; not so much in Lisp.  Other languages
express more using punctuation, so the names stand out easily; I find
that short names can be lost more easily in Lisp.

I've also tended to go for fairly prosaic names, taking my inspiration
from the CLOS MOP.  While I mourn the loss of whimsical names like
~haulong~ and ~haipart~, I've tried to avoid inventing more of them.

There's a convention, which I think comes from ML, of using ~_~ where a
binding occurrence of a variable name is expected, to signify that that
the corresponding value is to be discarded.  Common Lisp, alas, doesn't
have such a convention.  Instead, there's a sequence of silly names used
with the same intention, and the bindings are then explicitly ignored
with a declaration.  The names begin ~hunoz~, ~hukairz~, and (I think)
~huaskt~.

** Declarations

The code is light on declarations, other than ~ignore~ and similar used
to muffle warnings.  The macros try to do sensible things with
declarations, and I think they succeed fairly well, but there might be
bugs and rough edges.  I know that some are just broken because, for
actual correctness, declarations provided by the caller need to be split
up into a number of different parts of the expansion, which in turn
requires figuring out what the declarations mean and which bindings
they're referring to.  That's not completely impossible, assuming that
there aren't implementation-specific declarations with crazy syntax
mixed in there, but it's more work than seems worthwhile.


* C style

** Language subset and extensions

I'm trying to support C89 still.  There are few really worthwhile
features in C99 and later, though there are some.  For now, I want Sod
to continue working if built with a C89 compiler, even if some things --
e.g., most notably the macro sugar for varargs messages -- are
unavailable.

Similarly, I'll use compiler-specific features if they don't adversely
affect portability.  For example, I'll use GCC attributes to improve
compiler diagnostics, but they're wrapped up in preprocessor hacking so
that they won't be noticed by compilers which don't understand them.
I'm generally happy to accept contributions which make similar
improvements for other compilers.

Sod is supposed to have minimal dependencies.  It should be able to work
in what the ISO C standard names a `freestanding environment', without
most of the standard C library.  The keyword-argument library is
carefully split into a piece which is fully portable and a piece which
depends on features which are only available in hosted environments,
like being able to print stuff to ~stderr~, so that users targetting
embedded systems have an easy porting job.

** Naming

I usually give local variables, arguments, and structure members very
short names, just one or two characters long.  I find that longer names
are harder to distinguish, and take up horizontal space.  Besides,
mathematicians have been using single-letter variable names quite
successfully for hundreds of years.

I usually choose variable names to match their types in an informal way.
Loop counters are often called ~i~, ~j~, ~k~; generic pointers, and
pointers to bytes or characters, are usually ~p~ or ~q~; a character is
often ~ch~; a ~FILE~ pointer is ~fp~ following long tradition; sizes of
things, in bytes, are ~sz~, while lengths of vectors, in elements, are
~n~.  I often name values of, or pointers to, structures or custom types
with the first letter of the type.  If I have two things of the same
kind, I'll often double the name of one of them; e.g., if I have two
pointers to ~whatsit~ structures, I might call them ~w~ and ~ww~.

I don't (any more) give ~typedef~ names to structures or unions.  This
makes it possible to have a variable with the same name as the structure
tag without serious trouble.

In variable names, I tend to just squash pieces of words together; in
longer names, sometimes I'll put in underscores to split things up a
bit.  Camel case is bletcherous.

File-scope names with /internal/ linkage -- i.e., things marked ~static~
-- generally deserve somewhat longer names.  I don't give them other
kind of marking; e.g., I'd probably name the pointer to the head of a
list of ~foo~ things something like ~foohead~.

Names with /external/ linkage want more care because they're playing in
a shared global namespace.

** Layout

The C indent quantum is two columns.

Declarations go at the top of functions.  I don't put declarations in
inner blocks, and I certainly don't scatter declarations throughout a
block.  I find that having the declarations all in one place makes it
easier for me to keep track of what things the function is going to be
thinking about.

If I can't set a variable to its proper value immediately, I'll leave it
uninitialized until I can.  That way, the compiler will warn me if I
forget.

Most of my style is an attempt to get as much interesting code on the
screen at a time, and still be able to read it.  The short variable
names keep things distinct while keeping statements short; short
statements don't need to be split across multiple lines.  And keeping
the overall line length limit low means I can fit more /columns/ of code
on my screen.

If there are several related variables with the same declaration
specifiers, I'll usually write a single declaration for all of them --
even if they have different actual types.  For example,

: struct foo f, *fp = &f;

Note that a ~*~ declarator operator has a space to its left, but never
to its right.  (Stroustrup's style horribly misrepresents the underlying
syntax.)

I will often write multiple statements on a single line, usually to
indicate that these things are part of the same thought, and they
shouldn't be separated.  For example, if I'm working through an array of
things, I might have a pointer ~p~ to the element I'm hacking on, and a
count ~n~ of things left to hack, I'll have a loop

: while (n) {
:   /* hack on *p */
:   p++; n--;
: }

so that the two updates don't get separated.

I don't wrap braces around individual statements that fit on a single
line.  For example, I'll write

: while (*p == ' ') p++;

On the other hand, if a single substatement is going to take more than
one line then it gets wrapped in braces.

I don't write blocks which aren't part of larger compound statements,
e.g., ~if~ or ~while~.  I'll write a compound statementon a single line
if I can; but I'll split ~if~ with an ~else~ over two lines.  For
example,

: if (a == 1) x = 0;
: else if (b == 3) { y = 2; z = 1; }
: else w = 15;

On the other hand, if I can't write all of the branches of an
~if~\relax/\relax ~else if~ ladder like this, then /all/ of the
substatements get their own lines.  (I write ~do~\relax/\relax ~while~
loops in the same way, but this comes up much less frequently.)

If I can't write a block on the same line, then the opening brace goes
on the same line as the statement head, and the closing brace gets its
own line.  A trailing ~else~ or ~while~ goes on the same line as the
previous closing brace, if there is one.

I don't write spaces inside parentheses or square brackets, or between
unary operators and their operands.  I always write ~sizeof~ as if it
were a function, even though I know it isn't.  I write a single space
either side of non-multiplicative binary operators -- i.e., other than
~*~, ~/~, ~%~, and ~&~; I don't write spaces around multiplicative
operators any more.  The comma operator is special, and gets a space
after, but not before.

If I'm breaking a long line at a binary operator, the break comes
/after/ the operator, not before.

** Common conventions

A /predicate/ is a function which answers a yes/no question -- and has
no side-effects.  I don't use ~bool~ or similar; predicates return
~int~, such that zero is false and nonzero is true.  Predicates usually
have names ending in ~p~ or ~_p~.  (Note that function names
beginning ~is...~ are reserved for future ~<ctype.h>~ macros.)

On the other hand, an /operation/ is a function whose main purpose is to
have an effect -- maybe create a thing, or update some state.  In the
absence of better ideas, operations also return ~int~, but zero
indicates success, and nonzero -- usually $-1$ -- indicates failure.

** Error handling and resource management

I've tried many techniques.  I think the following is the best approach
so far.

I try to arrange that every type which represents some resource which
might need releasing has an easily recognizable `inert' value which
indicates that the resource has not been acquired.  At the top of a
function, I initialize all of the variables which might hold onto
resources to their inert values.  At the end of the function, I place a
label, ~end~ or ~fail~.  An ~end~ label is for common cleanup; a ~fail~
label is for cleanup that's only needed on unsuccessful completion.


** Miscellaneous style issues

I write ~0~, not ~NULL~.  Doing this prevents a common error in
null-terminated variable-length argument lists, e.g., ~execlp~, where
~NULL~ is actually an integer ~0~ in disguise and ends up being an ~int~
where a pointer was wanted.

I don't usually write redundant comparisons against ~0~, or ~NULL~, or
well-known return codes indicating success.  Again, this helps with
compression.  I'll write

: rc = do_something(foo, bar); if (rc) goto end;

(yes, one line) rather than comparing ~rc~ against some ~STATUS_SUCCESS~
code or similar.  Exception: I still haven't decided whether I prefer
leaving the explicit relational in ~strcmp~ and similar tests.

I always write parentheses around the expression in a ~return~
statement.

In declarations, storage classes come first (e.g., ~static~, ~extern~,
~typedef~), followed by qualifiers (~const~, ~volatile~; I never use
~restrict~), and then the type specifiers, signedness indicators first
where they aren't redundant (so maybe ~signed char~ for special effects,
but never ~signed int~), then length indicators, then the base type.  I
omit ~int~ if there are other type specifiers, so ~unsigned~ or ~long~,
rather than ~unsigned int~ or ~long int~.

The full declarator syntax for function pointer is pretty ugly.  I often
simplify it by defining a ~typedef~ for the /function/ type, not the
function pointer type.  For example

: typedef int callbackfn(struct thing */*t*/, void */*p*/);

I'd then use variables (structure members, arguments, etc.) of type
~callbackfn *~.

In header files, I comment out argument names to prevent problems with
macros defined by client translation units.  Also, I explicitly mark
function declarations as being ~extern~.

** Comments and file structuring

I never use C++-style ~//~ comments except for temporary special
effects.

If a comment fits on one line, then its closing ~*/~ is on the same
line; otherwise, the ending ~*/~ is on a line by itself, and there's a
spine of ~*~ characters in a column on the left.

A file starts with a big comment bearing the Emacs ~-*-c-*-~ marker, a
quick description, and copyright and licensing boilerplate.

Header files are wrapped up with multiple-inclusion and C++ guards, with

: #ifndef HEADER_H
: #define HEADER_H
:
: #ifdef __cplusplus
:   extern "C" {
: #endif

at the top.

The rest of the file consists of C code.  I don't use page boundaries
~^L~ to split files up.  Instead, I use big banner comments for this:

: /*----- Section title -----------------------------------------------------*/

Following long tradition, functions and macros are documented in a
preceding comment which looks like this.

: /* --- @name@ --- *
:  *
:  * Arguments:	@type fmm@ = a five-minute argument
:  *		@type fhh@ = the full half-hour
:  *
:  * Returns:	A return value.
:  *
:  * Use:	It does a thing.  Otherwise I wouldn't have bothered.
:  */

Sometimes (rarely) the description of the return value explains
sufficiently what the thing does.  If so, the `Use' part can be omitted.
Fragments of C code in this comment are surrounded by ~@~ characters.
There can also be \LaTeX\ maths in here, in ~%$~...\relax ~$%~.

Files end, as a result of long tradition, with a comment

: /*----- That's all, folks -------------------------------------------------*/

The closing ~#endif~ of a header file comes after this final comment.


* COMMENT Emacs cruft

#+LATEX_CLASS: strayman

## LocalWords:  CLOS ish destructure destructured accessor specializers
## LocalWords:  accessors DSLs gensym gensyms bletcherous Stroustrup
## LocalWords:  Stroustrup's signedness

## Local variables:
## mode: org
## End: