5 %%% (c) 2015 Straylight/Edgeware
8 %%%----- Licensing notice ---------------------------------------------------
10 %%% This file is part of the Sensible Object Design, an object system for C.
12 %%% SOD is free software; you can redistribute it and/or modify
13 %%% it under the terms of the GNU General Public License as published by
14 %%% the Free Software Foundation; either version 2 of the License, or
15 %%% (at your option) any later version.
17 %%% SOD is distributed in the hope that it will be useful,
18 %%% but WITHOUT ANY WARRANTY; without even the implied warranty of
19 %%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20 %%% GNU General Public License for more details.
22 %%% You should have received a copy of the GNU General Public License
23 %%% along with SOD; if not, write to the Free Software Foundation,
24 %%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
26 \chapter{Module syntax} \label{ch:syntax}
28 %%%--------------------------------------------------------------------------
30 Fortunately, Sod is syntactically quite simple. The notation is slightly
31 unusual in order to make the presentation shorter and easier to read.
33 Anywhere a simple nonterminal name $x$ may appear in the grammar, an
34 \emph{indexed} nonterminal $x[a_1, \ldots, a_n]$ may also appear. On the
35 left-hand side of a production rule, the indices $a_1$, \ldots, $a_n$ are
36 variables which vary over all nonterminal and terminal symbols, and the
37 variables may also appear on the right-hand side in place of a nonterminal.
38 Such a rule stands for a family of rules, in each variable is replaced by
39 each possible simple nonterminal or terminal symbol.
41 The letter $\epsilon$ denotes the empty nonterminal
43 \syntax{$\epsilon$ ::=}
46 The following indexed productions are used throughout the grammar, some often
47 enough that they deserve special notation.
49 \item @[$x$@] abbreviates @<optional>$[x]$, denoting an optional occurrence
52 \syntax{@[$x$@] ::= <optional>$[x]$ ::= $\epsilon$ @! $x$}
54 \item $x^*$ abbreviates @<zero-or-more>$[x]$, denoting a sequence of zero or
55 more occurrences of $x$:
57 \syntax{$x^*$ ::= <zero-or-more>$[x]$ ::=
58 $\epsilon$ @! <zero-or-more>$[x]$ $x$}
60 \item $x^+$ abbreviates @<one-or-more>$[x]$, denoting a sequence of zero or
61 more occurrences of $x$:
63 \syntax{$x^+$ ::= <one-or-more>$[x]$ ::= <zero-or-more>$[x]$ $x$}
65 \item @<list>$[x]$ denotes a sequence of one or more occurrences of $x$
68 \syntax{<list>$[x]$ ::= $x$ @! <list>$[x]$ "," $x$}
72 \subsection{Lexical syntax}
73 \label{sec:syntax.lex}
75 Whitespace and comments are discarded. The remaining characters are
76 collected into tokens according to the following syntax.
79 <token> ::= <identifier>
82 \alt <integer-literal>
86 This syntax is slightly ambiguous, and is disambiguated by the \emph{maximal
87 munch} rule: at each stage we take the longest sequence of characters which
90 \subsubsection{Identifiers} \label{sec:syntax.lex.id}
93 <identifier> ::= <id-start-char> @<id-body-char>^*
95 <id-start-char> ::= <alpha-char> | "_"
97 <id-body-char> ::= <id-start-char> @! <digit-char>
99 <alpha-char> ::= "A" | "B" | \dots\ | "Z"
100 \alt "a" | "b" | \dots\ | "z"
101 \alt <extended-alpha-char>
103 <digit-char> ::= "0" | <nonzero-digit-char>
105 <nonzero-digit-char> ::= "1" | "2" $| \cdots |$ "9"
108 The precise definition of @<alpha-char> is left to the function
109 \textsf{alpha-char-p} in the hosting Lisp system. For portability,
110 programmers are encouraged to limit themselves to the standard ASCII letters.
112 There are no reserved words at the lexical level, but the higher-level syntax
113 recognizes certain identifiers as \emph{keywords} in some contexts. There is
114 also an ambiguity (inherited from C) in the declaration syntax which is
115 settled by distinguishing type names from other identifiers at a lexical
118 \subsubsection{String and character literals} \label{sec:syntax.lex.string}
121 <string-literal> ::= "\"" @<string-literal-char>^* "\""
123 <char-literal> ::= "'" <char-literal-char> "'"
125 <string-literal-char> ::= any character other than "\\" or "\""
128 <char-literal-char> ::= any character other than "\\" or "'"
131 <char> ::= any single character
134 The syntax for string and character literals differs from~C. In particular,
135 escape sequences such as @`\textbackslash n' are not recognized. The use
136 of string and character literals in Sod, outside of C~fragments, is limited,
137 and the simple syntax seems adequate. For the sake of future compatibility,
138 the use of character sequences which resemble C escape sequences is
141 \subsubsection{Integer literals} \label{sec:syntax.lex.int}
144 <integer-literal> ::= <decimal-integer>
145 \alt <binary-integer>
149 <decimal-integer> ::= "0" | <nonzero-digit-char> @<digit-char>^*
151 <binary-integer> ::= "0" @("b"|"B"@) @<binary-digit-char>^+
153 <binary-digit-char> ::= "0" | "1"
155 <octal-integer> ::= "0" @["o"|"O"@] @<octal-digit-char>^+
157 <octal-digit-char> ::= "0" | "1" $| \cdots |$ "7"
159 <hex-integer> ::= "0" @("x"|"X"@) @<hex-digit-char>^+
161 <hex-digit-char> ::= <digit-char>
162 \alt "A" | "B" | "C" | "D" | "E" | "F"
163 \alt "a" | "b" | "c" | "d" | "e" | "f"
166 Sod understands only integers, not floating-point numbers; its integer syntax
167 goes slightly beyond C in allowing a @`0o' prefix for octal and @`0b' for
168 binary. However, length and signedness indicators are not permitted.
170 \subsubsection{Punctuation} \label{sec:syntax.lex.punct}
173 <punctuation> ::= any nonalphanumeric character other than "_", "\"" or "'"
176 \subsubsection{Comments} \label{sec:lex-comment}
179 <comment> ::= <block-comment>
184 @<not-star>^* @(@<star>^+ <not-star-or-slash> @<not-star>^*@)^*
190 <not-star> ::= any character other than "*"
192 <not-star-or-slash> ::= any character other than "*" or "/"
194 <line-comment> ::= "//" @<not-newline>^* <newline>
196 <newline> ::= a newline character
198 <not-newline> ::= any character other than newline
201 Comments are exactly as in C99: both traditional block comments `\texttt{/*}
202 \dots\ \texttt{*/}' and \Cplusplus-style `\texttt{//} \dots' comments are
203 permitted and ignored.
205 \subsection{Special nonterminals}
206 \label{sec:special-nonterminals}
208 Aside from the lexical syntax presented above (\xref{sec:lexical-syntax}),
209 two special nonterminals occur in the module syntax.
211 \subsubsection{S-expressions} \label{sec:syntax-sexp}
214 <s-expression> ::= an S-expression, as parsed by the Lisp reader
217 When an S-expression is expected, the Sod parser simply calls the host Lisp
218 system's \textsf{read} function. Sod modules are permitted to modify the
219 read table to extend the S-expression syntax.
221 S-expressions are self-delimiting, so no end-marker is needed.
223 \subsubsection{C fragments} \label{sec:syntax.lex.cfrag}
226 <c-fragment> ::= a sequence of C tokens, with matching brackets
229 Sequences of C code are simply stored and written to the output unchanged
230 during translation. They are read using a simple scanner which nonetheless
231 understands C comments and string and character literals.
233 A C fragment is terminated by one of a small number of delimiter characters
234 determined by the immediately surrounding context -- usually a closing brace
235 or bracket. The first such delimiter character which is not enclosed in
236 brackets, braces or parenthesis ends the fragment.
238 \subsection{Module syntax} \label{sec:syntax-module}
241 <module> ::= @<definition>^*
243 <definition> ::= <import-definition>
244 \alt <load-definition>
245 \alt <lisp-definition>
246 \alt <code-definition>
247 \alt <typename-definition>
248 \alt <class-definition>
251 A module is the top-level syntactic item. A module consists of a sequence of
254 \subsection{Simple definitions} \label{sec:syntax.defs}
256 \subsubsection{Importing modules} \label{sec:syntax.defs.import}
259 <import-definition> ::= "import" <string> ";"
262 The module named @<string> is processed and its definitions made available.
264 A search is made for a module source file as follows.
266 \item The module name @<string> is converted into a filename by appending
267 @`.sod', if it has no extension already.\footnote{%
268 Technically, what happens is \textsf{(merge-pathnames name (make-pathname
269 :type "SOD" :case :common))}, so exactly what this means varies
270 according to the host system.} %
271 \item The file is looked for relative to the directory containing the
273 \item If that fails, then the file is looked for in each directory on the
274 module search path in turn.
275 \item If the file still isn't found, an error is reported and the import
278 At this point, if the file has previously been imported, nothing further
280 This check is done using \textsf{truename}, so it should see through simple
281 tricks like symbolic links. However, it may be confused by fancy things
282 like bind mounts and so on.} %
284 Recursive imports, either direct or indirect, are an error.
286 \subsubsection{Loading extensions} \label{sec:syntax.defs.load}
289 <load-definition> ::= "load" <string> ";"
292 The Lisp file named @<string> is loaded and evaluated.
294 A search is made for a Lisp source file as follows.
296 \item The name @<string> is converted into a filename by appending @`.lisp',
297 if it has no extension already.\footnote{%
298 Technically, what happens is \textsf{(merge-pathnames name (make-pathname
299 :type "LISP" :case :common))}, so exactly what this means varies
300 according to the host system.} %
301 \item A search is then made in the same manner as for module imports
302 (\xref{sec:syntax-module}).
304 If the file is found, it is loaded using the host Lisp's \textsf{load}
307 Note that Sod doesn't attempt to compile Lisp files, or even to look for
308 existing compiled files. The right way to package a substantial extension to
309 the Sod translator is to provide the extension as a standard ASDF system (or
310 similar) and leave a dropping @"foo-extension.lisp" in the module path saying
313 \textsf{(asdf:load-system :foo-extension)}
315 which will arrange for the extension to be compiled if necessary.
317 (This approach means that the language doesn't need to depend on any
318 particular system definition facility. It's bad enough already that it
319 depends on Common Lisp.)
321 \subsubsection{Lisp escapes} \label{sec:syntax.defs.lisp}
324 <lisp-definition> ::= "lisp" <s-expression> ";"
327 The @<s-expression> is evaluated immediately. It can do anything it likes.
329 \begin{boxy}[Warning!]
330 This means that hostile Sod modules are a security hazard. Lisp code can
331 read and write files, start other programs, and make network connections.
332 Don't install Sod modules from sources that you don't trust.\footnote{%
333 Presumably you were going to run the corresponding code at some point, so
334 this isn't as unusually scary as it sounds. But please be careful.} %
337 \subsubsection{Declaring type names} \label{sec:syntax.defs.typename}
340 <typename-definition> ::=
341 "typename" <list>@[<identifier>@] ";"
344 Each @<identifier> is declared as naming a C type. This is important because
345 the C type syntax -- which Sod uses -- is ambiguous, and disambiguation is
346 done by distinguishing type names from other identifiers.
348 Don't declare class names using @"typename"; use @"class" forward
349 declarations instead.
351 \subsection{Literal code} \label{sec:syntax-code}
354 <code-definition> ::=
355 "code" <identifier> ":" <identifier> @[<constraints>@]
358 <constraints> ::= "[" <list>@[<constraint>@] "]"
360 <constraint> ::= @<identifier>^+
363 The @<c-fragment> will be output unchanged to one of the output files.
365 The first @<identifier> is the symbolic name of an output file. Predefined
366 output file names are @"c" and @"h", which are the implementation code and
367 header file respectively; other output files can be defined by extensions.
369 The second @<identifier> provides a name for the output item. Several C
370 fragments can have the same name: they will be concatenated together in the
371 order in which they were encountered.
373 The @<constraints> provide a means for specifying where in the output file
374 the output item should appear. (Note the two kinds of square brackets shown
375 in the syntax: square brackets must appear around the constraints if they are
376 present, but that they may be omitted.) Each comma-separated @<constraint>
377 is a sequence of identifiers naming output items, and indicates that the
378 output items must appear in the order given -- though the translator is free
379 to insert additional items in between them. (The particular output items
380 needn't be defined already -- indeed, they needn't be defined ever.)
382 There is a predefined output item @"includes" in both the @"c" and @"h"
383 output files which is a suitable place for inserting @"\#include"
384 preprocessor directives in order to declare types and functions for use
385 elsewhere in the generated output files.
387 \subsection{Property sets} \label{sec:syntax.propset}
390 <properties> ::= "[" <list>@[<property>@] "]"
392 <property> ::= <identifier> "=" <expression>
395 Property sets are a means for associating miscellaneous information with
396 classes and related items. By using property sets, additional information
397 can be passed to extensions without the need to introduce idiosyncratic
400 A property has a name, given as an @<identifier>, and a value computed by
401 evaluating an @<expression>. The value can be one of a number of types,
402 though the only operators currently defined act on integer values only.
404 \subsubsection{The expression evaluator} \label{sec:syntax.propset.expr}
407 <expression> ::= <term> | <expression> "+" <term> | <expression> "-" <term>
409 <term> ::= <factor> | <term> "*" <factor> | <term> "/" <factor>
411 <factor> ::= <primary> | "+" <factor> | "-" <factor>
414 <integer-literal> | <string-literal> | <char-literal> | <identifier>
415 \alt "?" <s-expression>
416 \alt "(" <expression> ")"
419 The arithmetic expression syntax is simple and standard; there are currently
420 no bitwise, logical, or comparison operators.
422 A @<primary> expression may be a literal or an identifier. Note that
423 identifiers stand for themselves: they \emph{do not} denote values. For more
424 fancy expressions, the syntax
428 causes the @<s-expression> to be evaluated using the Lisp \textsf{eval}
430 %%% FIXME crossref to extension docs
432 \subsection{C types} \label{sec:syntax.c-types}
434 Sod's syntax for C types closely mirrors the standard C syntax. A C type has
435 two parts: a sequence of @<declaration-specifier>s and a @<declarator>. In
436 Sod, a type must contain at least one @<declaration-specifier> (i.e.,
437 `implicit @"int"' is forbidden), and storage-class specifiers are not
440 \subsubsection{Declaration specifiers} \label{sec:syntax.c-types.declspec}
443 <declaration-specifier> ::= <type-name>
444 \alt "struct" <identifier> | "union" <identifier> | "enum" <identifier>
445 \alt "void" | "char" | "int" | "float" | "double"
446 \alt "short" | "long"
447 \alt "signed" | "unsigned"
450 <qualifier> ::= "const" | "volatile" | "restrict"
452 <type-name> ::= <identifier>
455 A @<type-name> is an identifier which has been declared as being a type name,
456 using the @"typename" or @"class" definitions.
458 Declaration specifiers may appear in any order. However, not all
459 combinations are permitted. A declaration specifier must consist of zero or
460 more @<qualifier>s, and one of the following, up to reordering.
463 \item @"struct" @<identifier>, @"union" @<identifier>, @"enum" @<identifier>
465 \item @"char", @"unsigned char", @"signed char"
466 \item @"short", @"unsigned short", @"signed short"
467 \item @"short int", @"unsigned short int", @"signed short int"
468 \item @"int", @"unsigned int", @"signed int", @"unsigned", @"signed"
469 \item @"long", @"unsigned long", @"signed long"
470 \item @"long int", @"unsigned long int", @"signed long int"
471 \item @"long long", @"unsigned long long", @"signed long long"
472 \item @"long long int", @"unsigned long long int", @"signed long long int"
473 \item @"float", @"double", @"long double"
475 All of these have their usual C meanings.
477 \subsubsection{Declarators} \label{sec:syntax.c-types.declarator}
480 <declarator>$[k]$ ::= @<pointer>^* <primary-declarator>$[k]$
482 <primary-declarator>$[k]$ ::= $k$
483 \alt "(" <primary-declarator>$[k]$ ")"
484 \alt <primary-declarator>$[k]$ @<declarator-suffix>
486 <pointer> ::= "*" @<qualifier>^*
488 <declarator-suffix> ::= "[" <c-fragment> "]"
489 \alt "(" <arguments> ")"
491 <arguments> ::= $\epsilon$ | "..."
492 \alt <list>@[<argument>@] @["," "..."@]
494 <argument> ::= @<declaration-specifier>^+ <argument-declarator>
496 <argument-declarator> ::= <declarator>@[<identifier> @! $\epsilon$@]
498 <simple-declarator> ::= <declarator>@[<identifier>@]
500 <dotted-name> ::= <identifier> "." <identifier>
502 <dotted-declarator> ::= <declarator>@[<dotted-name>@]
505 The declarator syntax is taken from C, but with some differences.
507 \item Array dimensions are uninterpreted @<c-fragments>, terminated by a
508 closing square bracket. This allows array dimensions to contain arbitrary
509 constant expressions.
510 \item A declarator may have either a single @<identifier> at its centre or a
511 pair of @<identifier>s separated by a @`.'; this is used to refer to
512 slots or messages defined in superclasses.
514 The remaining differences are (I hope) a matter of presentation rather than
517 \subsection{Defining classes} \label{sec:syntax.class}
520 <class-definition> ::= <class-forward-declaration>
521 \alt <full-class-definition>
524 \subsubsection{Forward declarations} \label{sec:class.class.forward}
527 <class-forward-declaration> ::= "class" <identifier> ";"
530 A @<class-forward-declaration> informs Sod that an @<identifier> will be used
531 to name a class which is currently undefined. Forward declarations are
532 necessary in order to resolve certain kinds of circularity. For example,
536 class Super : SodObject {
545 \subsubsection{Full class definitions} \label{sec:class.class.full}
548 <full-class-definition> ::=
550 "class" <identifier> ":" <list>@[<identifier>@]
551 "{" @<class-item>^* "}"
553 <class-item> ::= <slot-item> ";"
554 \alt <initializer-item> ";"
559 A full class definition provides a complete description of a class.
561 The first @<identifier> gives the name of the class. It is an error to
562 give the name of an existing class (other than a forward-referenced class),
563 or an existing type name. It is conventional to give classes `MixedCase'
564 names, to distinguish them from other kinds of identifiers.
566 The @<list>@[<identifier>@] names the direct superclasses for the new class. It
567 is an error if any of these @<identifier>s does not name a defined class.
569 The @<properties> provide additional information. The standard class
570 properties are as follows.
572 \item[@"lisp_class"] The name of the Lisp class to use within the translator
573 to represent this class. The property value must be an identifier; the
574 default is @"sod_class". Extensions may define classes with additional
575 behaviour, and may recognize additional class properties.
576 \item[@"metaclass"] The name of the Sod metaclass for this class. In the
577 generated code, a class is itself an instance of another class -- its
578 \emph{metaclass}. The metaclass defines which slots the class will have,
579 which messages it will respond to, and what its behaviour will be when it
580 receives them. The property value must be an identifier naming a defined
581 subclass of @"SodClass". The default metaclass is @"SodClass".
582 %%% FIXME xref to theory
583 \item[@"nick"] A nickname for the class, to be used to distinguish it from
584 other classes in various limited contexts. The property value must be an
585 identifier; the default is constructed by forcing the class name to
589 The class body consists of a sequence of @<class-item>s enclosed in braces.
590 These items are discussed on the following sections.
592 \subsubsection{Slot items} \label{sec:sntax.class.slot}
597 @<declaration-specifier>^+ <list>@[<init-declarator>@]
599 <init-declarator> ::= <simple-declarator> @["=" <initializer>@]
602 A @<slot-item> defines one or more slots. All instances of the class and any
603 subclass will contain these slot, with the names and types given by the
604 @<declaration-specifiers> and the @<declarators>. Slot declarators may not
605 contain dotted names.
607 It is not possible to declare a slot with function type: such an item is
608 interpreted as being a @<message-item> or @<method-item>. Pointers to
611 An @<initializer>, if present, is treated as if a separate
612 @<initializer-item> containing the slot name and initializer were present.
616 class Example : Super {
623 class Example : Super {
629 \subsubsection{Initializer items} \label{sec:syntax.class.init}
632 <initializer-item> ::= @["class"@] <list>@[<slot-initializer>@]
634 <slot-initializer> ::= <dotted-name> "=" <initializer>
636 <initializer> :: "{" <c-fragment> "}" | <c-fragment>
639 An @<initializer-item> provides an initial value for one or more slots. If
640 prefixed by @"class", then the initial values are for class slots (i.e.,
641 slots of the class object itself); otherwise they are for instance slots.
643 The first component of the @<dotted-name> must be the nickname of one of the
644 class's superclasses (including itself); the second must be the name of a
645 slot defined in that superclass.
647 The initializer has one of two forms.
649 \item A @<c-fragment> enclosed in braces denotes an aggregate initializer.
650 This is suitable for initializing structure, union or array slots.
651 \item A @<c-fragment> \emph{not} beginning with an open brace is a `bare'
652 initializer, and continues until the next @`,' or @`;' which is not within
653 nested brackets. Bare initializers are suitable for initializing scalar
654 slots, such as pointers or integers, and strings.
657 \subsubsection{Message items} \label{sec:syntax.class.message}
662 @<declaration-specifier>^+ <declarator> @[<method-body>@]
665 \subsubsection{Method items} \label{sec:syntax.class.method}
670 @<declaration-specifier>^+ <declarator> <method-body>
672 <method-body> ::= "{" <c-fragment> "}" | "extern" ";"
676 %%%----- That's all, folks --------------------------------------------------
680 %%% TeX-master: "sod.tex"