5 %%% (c) 2015 Straylight/Edgeware
8 %%%----- Licensing notice ---------------------------------------------------
10 %%% This file is part of the Sensible Object Design, an object system for C.
12 %%% SOD is free software; you can redistribute it and/or modify
13 %%% it under the terms of the GNU General Public License as published by
14 %%% the Free Software Foundation; either version 2 of the License, or
15 %%% (at your option) any later version.
17 %%% SOD is distributed in the hope that it will be useful,
18 %%% but WITHOUT ANY WARRANTY; without even the implied warranty of
19 %%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20 %%% GNU General Public License for more details.
22 %%% You should have received a copy of the GNU General Public License
23 %%% along with SOD; if not, write to the Free Software Foundation,
24 %%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
26 \chapter{Module syntax} \label{ch:syntax}
28 %%%--------------------------------------------------------------------------
29 \section{Notation} \label{sec:syntax.notation}
31 Fortunately, Sod is syntactically quite simple. The notation is slightly
32 unusual in order to make the presentation shorter and easier to read.
34 Anywhere a simple nonterminal name $x$ may appear in the grammar, an
35 \emph{indexed} nonterminal $x[a_1, \ldots, a_n]$ may also appear. On the
36 left-hand side of a production rule, the indices $a_1$, \ldots, $a_n$ are
37 variables which vary over all nonterminal and terminal symbols, and the
38 variables may also appear on the right-hand side in place of a nonterminal.
39 Such a rule stands for a family of rules, in each variable is replaced by
40 each possible simple nonterminal or terminal symbol.
42 The letter $\epsilon$ denotes the empty nonterminal
44 \syntax{$\epsilon$ ::=}
47 The following indexed productions are used throughout the grammar, some often
48 enough that they deserve special notation.
50 \item @[$x$@] abbreviates @<optional>$[x]$, denoting an optional occurrence
53 \syntax{@[$x$@] ::= <optional>$[x]$ ::= $\epsilon$ @! $x$}
55 \item $x^*$ abbreviates @<zero-or-more>$[x]$, denoting a sequence of zero or
56 more occurrences of $x$:
58 \syntax{$x^*$ ::= <zero-or-more>$[x]$ ::=
59 $\epsilon$ @! <zero-or-more>$[x]$ $x$}
61 \item $x^+$ abbreviates @<one-or-more>$[x]$, denoting a sequence of zero or
62 more occurrences of $x$:
64 \syntax{$x^+$ ::= <one-or-more>$[x]$ ::= <zero-or-more>$[x]$ $x$}
66 \item @<list>$[x]$ denotes a sequence of one or more occurrences of $x$
69 \syntax{<list>$[x]$ ::= $x$ @! <list>$[x]$ "," $x$}
73 %%%--------------------------------------------------------------------------
74 \section{Lexical syntax} \label{sec:syntax.lex}
76 Whitespace and comments are discarded. The remaining characters are
77 collected into tokens according to the following syntax.
80 <token> ::= <identifier>
83 \alt <integer-literal>
87 This syntax is slightly ambiguous, and is disambiguated by the \emph{maximal
88 munch} rule: at each stage we take the longest sequence of characters which
92 \subsection{Identifiers} \label{sec:syntax.lex.id}
95 <identifier> ::= <id-start-char> @<id-body-char>^*
97 <id-start-char> ::= <alpha-char> | "_"
99 <id-body-char> ::= <id-start-char> @! <digit-char>
101 <alpha-char> ::= "A" | "B" | \dots\ | "Z"
102 \alt "a" | "b" | \dots\ | "z"
103 \alt <extended-alpha-char>
105 <digit-char> ::= "0" | <nonzero-digit-char>
107 <nonzero-digit-char> ::= "1" | "2" $| \cdots |$ "9"
110 The precise definition of @<alpha-char> is left to the function
111 \textsf{alpha-char-p} in the hosting Lisp system. For portability,
112 programmers are encouraged to limit themselves to the standard ASCII letters.
114 There are no reserved words at the lexical level, but the higher-level syntax
115 recognizes certain identifiers as \emph{keywords} in some contexts. There is
116 also an ambiguity (inherited from C) in the declaration syntax which is
117 settled by distinguishing type names from other identifiers at a lexical
121 \subsection{String and character literals} \label{sec:syntax.lex.string}
124 <string-literal> ::= "\"" @<string-literal-char>^* "\""
126 <char-literal> ::= "'" <char-literal-char> "'"
128 <string-literal-char> ::= any character other than "\\" or "\""
131 <char-literal-char> ::= any character other than "\\" or "'"
134 <char> ::= any single character
137 The syntax for string and character literals differs from~C. In particular,
138 escape sequences such as @`\textbackslash n' are not recognized. The use
139 of string and character literals in Sod, outside of C~fragments, is limited,
140 and the simple syntax seems adequate. For the sake of future compatibility,
141 the use of character sequences which resemble C escape sequences is
144 \subsubsection{Integer literals} \label{sec:syntax.lex.int}
147 <integer-literal> ::= <decimal-integer>
148 \alt <binary-integer>
152 <decimal-integer> ::= "0" | <nonzero-digit-char> @<digit-char>^*
154 <binary-integer> ::= "0" @("b"|"B"@) @<binary-digit-char>^+
156 <binary-digit-char> ::= "0" | "1"
158 <octal-integer> ::= "0" @["o"|"O"@] @<octal-digit-char>^+
160 <octal-digit-char> ::= "0" | "1" $| \cdots |$ "7"
162 <hex-integer> ::= "0" @("x"|"X"@) @<hex-digit-char>^+
164 <hex-digit-char> ::= <digit-char>
165 \alt "A" | "B" | "C" | "D" | "E" | "F"
166 \alt "a" | "b" | "c" | "d" | "e" | "f"
169 Sod understands only integers, not floating-point numbers; its integer syntax
170 goes slightly beyond C in allowing a @`0o' prefix for octal and @`0b' for
171 binary. However, length and signedness indicators are not permitted.
174 \subsection{Punctuation} \label{sec:syntax.lex.punct}
177 <punctuation> ::= any nonalphanumeric character other than "_", "\"" or "'"
181 \subsection{Comments} \label{sec:syntax.lex.comment}
184 <comment> ::= <block-comment>
189 @<not-star>^* @(@<star>^+ <not-star-or-slash> @<not-star>^*@)^*
195 <not-star> ::= any character other than "*"
197 <not-star-or-slash> ::= any character other than "*" or "/"
199 <line-comment> ::= "//" @<not-newline>^* <newline>
201 <newline> ::= a newline character
203 <not-newline> ::= any character other than newline
206 Comments are exactly as in C99: both traditional block comments `\texttt{/*}
207 \dots\ \texttt{*/}' and \Cplusplus-style `\texttt{//} \dots' comments are
208 permitted and ignored.
211 \subsection{Special nonterminals} \label{sec:syntax.lex.special}
213 Aside from the lexical syntax presented above (\xref{sec:lexical-syntax}),
214 two special nonterminals occur in the module syntax.
216 \subsubsection{S-expressions}
218 <s-expression> ::= an S-expression, as parsed by the Lisp reader
221 When an S-expression is expected, the Sod parser simply calls the host Lisp
222 system's @|read| function. Sod modules are permitted to modify the read
223 table to extend the S-expression syntax.
225 S-expressions are self-delimiting, so no end-marker is needed.
227 \subsubsection{C fragments}
229 <c-fragment> ::= a sequence of C tokens, with matching brackets
232 Sequences of C code are simply stored and written to the output unchanged
233 during translation. They are read using a simple scanner which nonetheless
234 understands C comments and string and character literals.
236 A C fragment is terminated by one of a small number of delimiter characters
237 determined by the immediately surrounding context -- usually a closing brace
238 or bracket. The first such delimiter character which is not enclosed in
239 brackets, braces or parenthesis ends the fragment.
241 %%%--------------------------------------------------------------------------
242 \section{Module syntax} \label{sec:syntax.module}
245 <module> ::= @<definition>^*
247 <definition> ::= <import-definition>
248 \alt <load-definition>
249 \alt <lisp-definition>
250 \alt <code-definition>
251 \alt <typename-definition>
252 \alt <class-definition>
255 A @<module> is the top-level syntactic item. A module consists of a sequence
258 \subsection{Simple definitions} \label{sec:syntax.module.simple}
260 \subsubsection{Importing modules}
262 <import-definition> ::= "import" <string> ";"
265 The module named @<string> is processed and its definitions made available.
267 A search is made for a module source file as follows.
269 \item The module name @<string> is converted into a filename by appending
270 @`.sod', if it has no extension already.\footnote{%
271 Technically, what happens is \textsf{(merge-pathnames name (make-pathname
272 :type "SOD" :case :common))}, so exactly what this means varies
273 according to the host system.} %
274 \item The file is looked for relative to the directory containing the
276 \item If that fails, then the file is looked for in each directory on the
277 module search path in turn.
278 \item If the file still isn't found, an error is reported and the import
281 At this point, if the file has previously been imported, nothing further
283 This check is done using \textsf{truename}, so it should see through simple
284 tricks like symbolic links. However, it may be confused by fancy things
285 like bind mounts and so on.} %
287 Recursive imports, either direct or indirect, are an error.
289 \subsubsection{Loading extensions}
291 <load-definition> ::= "load" <string> ";"
294 The Lisp file named @<string> is loaded and evaluated.
296 A search is made for a Lisp source file as follows.
298 \item The name @<string> is converted into a filename by appending @`.lisp',
299 if it has no extension already.\footnote{%
300 Technically, what happens is \textsf{(merge-pathnames name (make-pathname
301 :type "LISP" :case :common))}, so exactly what this means varies
302 according to the host system.} %
303 \item A search is then made in the same manner as for module imports
304 (\xref{sec:syntax-module}).
306 If the file is found, it is loaded using the host Lisp's \textsf{load}
309 Note that Sod doesn't attempt to compile Lisp files, or even to look for
310 existing compiled files. The right way to package a substantial extension to
311 the Sod translator is to provide the extension as a standard ASDF system (or
312 similar) and leave a dropping @"foo-extension.lisp" in the module path saying
315 \textsf{(asdf:load-system :foo-extension)}
317 which will arrange for the extension to be compiled if necessary.
319 (This approach means that the language doesn't need to depend on any
320 particular system definition facility. It's bad enough already that it
321 depends on Common Lisp.)
323 \subsubsection{Lisp escapes}
325 <lisp-definition> ::= "lisp" <s-expression> ";"
328 The @<s-expression> is evaluated immediately. It can do anything it likes.
330 \begin{boxy}[Warning!]
331 This means that hostile Sod modules are a security hazard. Lisp code can
332 read and write files, start other programs, and make network connections.
333 Don't install Sod modules from sources that you don't trust.\footnote{%
334 Presumably you were going to run the corresponding code at some point, so
335 this isn't as unusually scary as it sounds. But please be careful.} %
338 \subsubsection{Declaring type names}
340 <typename-definition> ::=
341 "typename" <list>$[\mbox{@<identifier>}]$ ";"
344 Each @<identifier> is declared as naming a C type. This is important because
345 the C type syntax -- which Sod uses -- is ambiguous, and disambiguation is
346 done by distinguishing type names from other identifiers.
348 Don't declare class names using @"typename"; use @"class" forward
349 declarations instead.
352 \subsection{Literal code} \label{sec:syntax.module.literal}
355 <code-definition> ::=
356 "code" <identifier> ":" <item-name> @[<constraints>@]
359 <constraints> ::= "[" <list>$[\mbox{@<constraint>}]$ "]"
361 <constraint> ::= @<item-name>^+
363 <item-name> ::= <identifier> @! "(" @<identifier>^+ ")"
366 The @<c-fragment> will be output unchanged to one of the output files.
368 The first @<identifier> is the symbolic name of an output file. Predefined
369 output file names are @"c" and @"h", which are the implementation code and
370 header file respectively; other output files can be defined by extensions.
372 Output items are named with a sequence of identifiers, separated by
373 whitespace, and enclosed in parentheses. As an abbreviation, a name
374 consisting of a single identifier may be written as just that identifier,
375 without the parentheses.
377 The @<constraints> provide a means for specifying where in the output file
378 the output item should appear. (Note the two kinds of square brackets shown
379 in the syntax: square brackets must appear around the constraints if they are
380 present, but that they may be omitted.) Each comma-separated @<constraint>
381 is a sequence of names of output items, and indicates that the output items
382 must appear in the order given -- though the translator is free to insert
383 additional items in between them. (The particular output items needn't be
384 defined already -- indeed, they needn't be defined ever.)
386 There is a predefined output item @"includes" in both the @"c" and @"h"
387 output files which is a suitable place for inserting @"\#include"
388 preprocessor directives in order to declare types and functions for use
389 elsewhere in the generated output files.
392 \subsection{Property sets} \label{sec:syntax.module.properties}
394 <properties> ::= "[" <list>$[\mbox{@<property>}]$ "]"
396 <property> ::= <identifier> "=" <expression>
399 Property sets are a means for associating miscellaneous information with
400 classes and related items. By using property sets, additional information
401 can be passed to extensions without the need to introduce idiosyncratic
404 A property has a name, given as an @<identifier>, and a value computed by
405 evaluating an @<expression>. The value can be one of a number of types,
406 though the only operators currently defined act on integer values only.
408 \subsubsection{The expression evaluator}
410 <expression> ::= <term> | <expression> "+" <term> | <expression> "-" <term>
412 <term> ::= <factor> | <term> "*" <factor> | <term> "/" <factor>
414 <factor> ::= <primary> | "+" <factor> | "-" <factor>
417 <integer-literal> | <string-literal> | <char-literal> | <identifier>
418 \alt "?" <s-expression>
419 \alt "(" <expression> ")"
422 The arithmetic expression syntax is simple and standard; there are currently
423 no bitwise, logical, or comparison operators.
425 A @<primary> expression may be a literal or an identifier. Note that
426 identifiers stand for themselves: they \emph{do not} denote values. For more
427 fancy expressions, the syntax
431 causes the @<s-expression> to be evaluated using the Lisp \textsf{eval}
433 %%% FIXME crossref to extension docs
436 \subsection{C types} \label{sec:syntax.module.types}
438 Sod's syntax for C types closely mirrors the standard C syntax. A C type has
439 two parts: a sequence of @<declaration-specifier>s and a @<declarator>. In
440 Sod, a type must contain at least one @<declaration-specifier> (i.e.,
441 `implicit @"int"' is forbidden), and storage-class specifiers are not
444 \subsubsection{Declaration specifiers}
446 <declaration-specifier> ::= <type-name>
447 \alt "struct" <identifier> | "union" <identifier> | "enum" <identifier>
448 \alt "void" | "char" | "int" | "float" | "double"
449 \alt "short" | "long"
450 \alt "signed" | "unsigned"
451 \alt "bool" | "_Bool"
452 \alt "imaginary" | "_Imaginary" | "complex" | "_Complex"
455 <qualifier> ::= "const" | "volatile" | "restrict"
457 <type-name> ::= <identifier>
460 A @<type-name> is an identifier which has been declared as being a type name,
461 using the @"typename" or @"class" definitions. The following type names are
462 defined in the built-in module.
470 Declaration specifiers may appear in any order. However, not all
471 combinations are permitted. A declaration specifier must consist of zero or
472 more @<qualifiers>, and one of the following, up to reordering.
475 \item @"struct" @<identifier>, @"union" @<identifier>, @"enum" @<identifier>
477 \item @"_Bool", @"bool"
478 \item @"char", @"unsigned char", @"signed char"
479 \item @"short", @"unsigned short", @"signed short"
480 \item @"short int", @"unsigned short int", @"signed short int"
481 \item @"int", @"unsigned int", @"signed int", @"unsigned", @"signed"
482 \item @"long", @"unsigned long", @"signed long"
483 \item @"long int", @"unsigned long int", @"signed long int"
484 \item @"long long", @"unsigned long long", @"signed long long"
485 \item @"long long int", @"unsigned long long int", @"signed long long int"
486 \item @"float", @"double", @"long double"
487 \item @"float _Imaginary", @"double _Imaginary", @"long double _Imaginary"
488 \item @"float imaginary", @"double imaginary", @"long double imaginary"
489 \item @"float _Complex", @"double _Complex", @"long double _Complex"
490 \item @"float complex", @"double complex", @"long double complex"
492 All of these have their usual C meanings.
494 \subsubsection{Declarators}
496 <declarator>$[k]$ ::= @<pointer>^* <primary-declarator>$[k]$
498 <primary-declarator>$[k]$ ::= $k$
499 \alt "(" <primary-declarator>$[k]$ ")"
500 \alt <primary-declarator>$[k]$ @<declarator-suffix>
502 <pointer> ::= "*" @<qualifier>^*
504 <declarator-suffix> ::= "[" <c-fragment> "]"
505 \alt "(" <arguments> ")"
507 <argument-list> ::= $\epsilon$ | "..."
508 \alt <list>$[\mbox{@<argument>}]$ @["," "..."@]
510 <argument> ::= @<declaration-specifier>^+ <argument-declarator>
512 <argument-declarator> ::= <declarator>$[\mbox{@<identifier> @! $\epsilon$}]$
514 <simple-declarator> ::= <declarator>$[\mbox{@<identifier>}]$
516 <dotted-name> ::= <identifier> "." <identifier>
519 The declarator syntax is taken from C, but with some differences.
521 \item Array dimensions are uninterpreted @<c-fragments>, terminated by a
522 closing square bracket. This allows array dimensions to contain arbitrary
523 constant expressions.
524 \item A declarator may have either a single @<identifier> at its centre or a
525 pair of @<identifier>s separated by a @`.'; this is used to refer to
526 slots or messages defined in superclasses.
528 The remaining differences are (I hope) a matter of presentation rather than
532 \subsection{Class definitions} \label{sec:syntax.module.class}
535 <class-definition> ::= <class-forward-declaration>
536 \alt <full-class-definition>
539 \subsubsection{Forward declarations}
541 <class-forward-declaration> ::= "class" <identifier> ";"
544 A @<class-forward-declaration> informs Sod that an @<identifier> will be used
545 to name a class which is currently undefined. Forward declarations are
546 necessary in order to resolve certain kinds of circularity. For example,
550 class Super : SodObject {
559 \subsubsection{Full class definitions}
561 <full-class-definition> ::=
563 "class" <identifier> ":" <list>$[\mbox{@<identifier>}]$
564 "{" @<properties-class-item>^* "}"
566 <properties-class-item> ::= @[<properties>@] <class-item>
568 <class-item> ::= <slot-item>
569 \alt <initializer-item>
574 A full class definition provides a complete description of a class.
576 The first @<identifier> gives the name of the class. It is an error to
577 give the name of an existing class (other than a forward-referenced class),
578 or an existing type name. It is conventional to give classes `MixedCase'
579 names, to distinguish them from other kinds of identifiers.
581 The @<list>$[\mbox{@<identifier>}]$ names the direct superclasses for the new
582 class. It is an error if any of these @<identifier>s does not name a defined
585 The @<properties> provide additional information. The standard class
586 properties are as follows.
588 \item[@"lisp_class"] The name of the Lisp class to use within the translator
589 to represent this class. The property value must be an identifier; the
590 default is @"sod_class". Extensions may define classes with additional
591 behaviour, and may recognize additional class properties.
592 \item[@"metaclass"] The name of the Sod metaclass for this class. In the
593 generated code, a class is itself an instance of another class -- its
594 \emph{metaclass}. The metaclass defines which slots the class will have,
595 which messages it will respond to, and what its behaviour will be when it
596 receives them. The property value must be an identifier naming a defined
597 subclass of @"SodClass". The default metaclass is @"SodClass".
598 %%% FIXME xref to theory
599 \item[@"nick"] A nickname for the class, to be used to distinguish it from
600 other classes in various limited contexts. The property value must be an
601 identifier; the default is constructed by forcing the class name to
605 The class body consists of a sequence of @<class-item>s enclosed in braces.
606 These items are discussed on the following sections.
608 \subsubsection{Slot items}
611 @<declaration-specifier>^+ <list>$[\mbox{@<init-declarator>}]$ ";"
613 <init-declarator> ::= <simple-declarator> @["=" <initializer>@]
616 A @<slot-item> defines one or more slots. All instances of the class and any
617 subclass will contain these slot, with the names and types given by the
618 @<declaration-specifiers> and the @<declarators>. Slot declarators may not
619 contain dotted names.
621 It is not possible to declare a slot with function type: such an item is
622 interpreted as being a @<message-item> or @<method-item>. Pointers to
625 An @<initializer>, if present, is treated as if a separate
626 @<initializer-item> containing the slot name and initializer were present.
630 class Example : Super {
637 class Example : Super {
643 \subsubsection{Initializer items}
645 <initializer-item> ::= @["class"@] <list>$[\mbox{@<slot-initializer>}]$ ";"
647 <slot-initializer> ::= <dotted-name> "=" <initializer>
649 <initializer> :: "{" <c-fragment> "}" | <c-fragment>
652 An @<initializer-item> provides an initial value for one or more slots. If
653 prefixed by @"class", then the initial values are for class slots (i.e.,
654 slots of the class object itself); otherwise they are for instance slots.
656 The first component of the @<dotted-name> must be the nickname of one of the
657 class's superclasses (including itself); the second must be the name of a
658 slot defined in that superclass.
660 The initializer has one of two forms.
662 \item A @<c-fragment> enclosed in braces denotes an aggregate initializer.
663 This is suitable for initializing structure, union or array slots.
664 \item A @<c-fragment> \emph{not} beginning with an open brace is a `bare'
665 initializer, and continues until the next @`,' or @`;' which is not within
666 nested brackets. Bare initializers are suitable for initializing scalar
667 slots, such as pointers or integers, and strings.
670 \subsubsection{Message items}
673 @<declaration-specifier>^+
674 <keyword-declarator>$[\mbox{@<identifier>}]$
678 \subsubsection{Method items}
681 @<declaration-specifier>^+
682 <keyword-declarator>$[\mbox{@<dotted-name>}]$
685 <method-body> ::= "{" <c-fragment> "}" | "extern" ";"
688 %%%----- That's all, folks --------------------------------------------------
692 %%% TeX-master: "sod.tex"