5 %%% (c) 2015 Straylight/Edgeware
8 %%%----- Licensing notice ---------------------------------------------------
10 %%% This file is part of the Sensible Object Design, an object system for C.
12 %%% SOD is free software; you can redistribute it and/or modify
13 %%% it under the terms of the GNU General Public License as published by
14 %%% the Free Software Foundation; either version 2 of the License, or
15 %%% (at your option) any later version.
17 %%% SOD is distributed in the hope that it will be useful,
18 %%% but WITHOUT ANY WARRANTY; without even the implied warranty of
19 %%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20 %%% GNU General Public License for more details.
22 %%% You should have received a copy of the GNU General Public License
23 %%% along with SOD; if not, write to the Free Software Foundation,
24 %%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
26 \chapter{Module syntax} \label{ch:syntax}
28 %%%--------------------------------------------------------------------------
29 \section{Lexical syntax} \label{sec:syntax.lex}
31 Whitespace and comments are discarded. The remaining characters are
32 collected into tokens according to the following syntax.
35 <token> ::= <identifier>
38 \alt <integer-literal>
42 This syntax is slightly ambiguous, and is disambiguated by the \emph{maximal
43 munch} rule: at each stage we take the longest sequence of characters which
47 \subsection{Identifiers} \label{sec:syntax.lex.id}
50 <identifier> ::= <id-start-char> @<id-body-char>^*
52 <id-start-char> ::= <alpha-char> | "_"
54 <id-body-char> ::= <id-start-char> @! <digit-char>
56 <alpha-char> ::= "A" | "B" | \dots\ | "Z"
57 \alt "a" | "b" | \dots\ | "z"
58 \alt <extended-alpha-char>
60 <digit-char> ::= "0" | <nonzero-digit-char>
62 <nonzero-digit-char> ::= "1" | "2" $| \cdots |$ "9"
65 The precise definition of @<alpha-char> is left to the function
66 \textsf{alpha-char-p} in the hosting Lisp system. For portability,
67 programmers are encouraged to limit themselves to the standard ASCII letters.
69 There are no reserved words at the lexical level, but the higher-level syntax
70 recognizes certain identifiers as \emph{keywords} in some contexts. There is
71 also an ambiguity (inherited from C) in the declaration syntax which is
72 settled by distinguishing type names from other identifiers at a lexical
76 \subsection{String and character literals} \label{sec:syntax.lex.string}
79 <string-literal> ::= "\"" @<string-literal-char>^* "\""
81 <char-literal> ::= "'" <char-literal-char> "'"
83 <string-literal-char> ::= any character other than "\\" or "\""
86 <char-literal-char> ::= any character other than "\\" or "'"
89 <char> ::= any single character
92 The syntax for string and character literals differs from~C. In particular,
93 escape sequences such as @`\textbackslash n' are not recognized. The use
94 of string and character literals in Sod, outside of C~fragments, is limited,
95 and the simple syntax seems adequate. For the sake of future compatibility,
96 the use of character sequences which resemble C escape sequences is
99 \subsubsection{Integer literals} \label{sec:syntax.lex.int}
102 <integer-literal> ::= <decimal-integer>
103 \alt <binary-integer>
107 <decimal-integer> ::= "0" | <nonzero-digit-char> @<digit-char>^*
109 <binary-integer> ::= "0" @("b"|"B"@) @<binary-digit-char>^+
111 <binary-digit-char> ::= "0" | "1"
113 <octal-integer> ::= "0" @["o"|"O"@] @<octal-digit-char>^+
115 <octal-digit-char> ::= "0" | "1" $| \cdots |$ "7"
117 <hex-integer> ::= "0" @("x"|"X"@) @<hex-digit-char>^+
119 <hex-digit-char> ::= <digit-char>
120 \alt "A" | "B" | "C" | "D" | "E" | "F"
121 \alt "a" | "b" | "c" | "d" | "e" | "f"
124 Sod understands only integers, not floating-point numbers; its integer syntax
125 goes slightly beyond C in allowing a @`0o' prefix for octal and @`0b' for
126 binary. However, length and signedness indicators are not permitted.
129 \subsection{Punctuation} \label{sec:syntax.lex.punct}
132 <punctuation> ::= any nonalphanumeric character other than "_", "\"" or "'"
136 \subsection{Comments} \label{sec:syntax.lex.comment}
139 <comment> ::= <block-comment>
144 @<not-star>^* @(@<star>^+ <not-star-or-slash> @<not-star>^*@)^*
150 <not-star> ::= any character other than "*"
152 <not-star-or-slash> ::= any character other than "*" or "/"
154 <line-comment> ::= "/\,/" @<not-newline>^* <newline>
156 <newline> ::= a newline character
158 <not-newline> ::= any character other than newline
161 Comments are exactly as in C99: both traditional block comments `@|/*| \dots\
162 @|*/|' and \Cplusplus-style `@|/\,/| \dots' comments are permitted and
166 \subsection{Special nonterminals} \label{sec:syntax.lex.special}
168 Aside from the lexical syntax presented above (\xref{sec:lexical-syntax}),
169 two special nonterminals occur in the module syntax.
171 \subsubsection{S-expressions}
173 <s-expression> ::= an S-expression, as parsed by the Lisp reader
176 When an S-expression is expected, the Sod parser simply calls the host Lisp
177 system's @|read| function. Sod modules are permitted to modify the read
178 table to extend the S-expression syntax.
180 S-expressions are self-delimiting, so no end-marker is needed.
182 \subsubsection{C fragments}
184 <c-fragment> ::= a sequence of C tokens, with matching brackets
187 Sequences of C code are simply stored and written to the output unchanged
188 during translation. They are read using a simple scanner which nonetheless
189 understands C comments and string and character literals.
191 A C fragment is terminated by one of a small number of delimiter characters
192 determined by the immediately surrounding context -- usually a closing brace
193 or bracket. The first such delimiter character which is not enclosed in
194 brackets, braces or parenthesis ends the fragment.
196 %%%--------------------------------------------------------------------------
197 \section{Module syntax} \label{sec:syntax.module}
200 <module> ::= @<definition>^*
202 <definition> ::= <import-definition>
203 \alt <load-definition>
204 \alt <lisp-definition>
205 \alt <code-definition>
206 \alt <typename-definition>
207 \alt <class-definition>
210 A @<module> is the top-level syntactic item. A module consists of a sequence
213 \subsection{Simple definitions} \label{sec:syntax.module.simple}
215 \subsubsection{Importing modules}
217 <import-definition> ::= "import" <string> ";"
220 The module named @<string> is processed and its definitions made available.
222 A search is made for a module source file as follows.
224 \item The module name @<string> is converted into a filename by appending
225 @`.sod', if it has no extension already.\footnote{%
226 Technically, what happens is \textsf{(merge-pathnames name (make-pathname
227 :type "SOD" :case :common))}, so exactly what this means varies
228 according to the host system.} %
229 \item The file is looked for relative to the directory containing the
231 \item If that fails, then the file is looked for in each directory on the
232 module search path in turn.
233 \item If the file still isn't found, an error is reported and the import
236 At this point, if the file has previously been imported, nothing further
238 This check is done using \textsf{truename}, so it should see through simple
239 tricks like symbolic links. However, it may be confused by fancy things
240 like bind mounts and so on.} %
242 Recursive imports, either direct or indirect, are an error.
244 \subsubsection{Loading extensions}
246 <load-definition> ::= "load" <string> ";"
249 The Lisp file named @<string> is loaded and evaluated.
251 A search is made for a Lisp source file as follows.
253 \item The name @<string> is converted into a filename by appending @`.lisp',
254 if it has no extension already.\footnote{%
255 Technically, what happens is \textsf{(merge-pathnames name (make-pathname
256 :type "LISP" :case :common))}, so exactly what this means varies
257 according to the host system.} %
258 \item A search is then made in the same manner as for module imports
259 (\xref{sec:syntax-module}).
261 If the file is found, it is loaded using the host Lisp's \textsf{load}
264 Note that Sod doesn't attempt to compile Lisp files, or even to look for
265 existing compiled files. The right way to package a substantial extension to
266 the Sod translator is to provide the extension as a standard ASDF system (or
267 similar) and leave a dropping @"foo-extension.lisp" in the module path saying
270 \textsf{(asdf:load-system :foo-extension)}
272 which will arrange for the extension to be compiled if necessary.
274 (This approach means that the language doesn't need to depend on any
275 particular system definition facility. It's bad enough already that it
276 depends on Common Lisp.)
278 \subsubsection{Lisp escapes}
280 <lisp-definition> ::= "lisp" <s-expression> ";"
283 The @<s-expression> is evaluated immediately. It can do anything it likes.
285 \begin{boxy}[Warning!]
286 This means that hostile Sod modules are a security hazard. Lisp code can
287 read and write files, start other programs, and make network connections.
288 Don't install Sod modules from sources that you don't trust.\footnote{%
289 Presumably you were going to run the corresponding code at some point, so
290 this isn't as unusually scary as it sounds. But please be careful.} %
293 \subsubsection{Declaring type names}
295 <typename-definition> ::=
296 "typename" <list>$[\mbox{@<identifier>}]$ ";"
299 Each @<identifier> is declared as naming a C type. This is important because
300 the C type syntax -- which Sod uses -- is ambiguous, and disambiguation is
301 done by distinguishing type names from other identifiers.
303 Don't declare class names using @"typename"; use @"class" forward
304 declarations instead.
307 \subsection{Literal code} \label{sec:syntax.module.literal}
310 <code-definition> ::=
311 "code" <identifier> ":" <item-name> @[<constraints>@]
314 <constraints> ::= "[" <list>$[\mbox{@<constraint>}]$ "]"
316 <constraint> ::= @<item-name>^+
318 <item-name> ::= <identifier> @! "(" @<identifier>^+ ")"
321 The @<c-fragment> will be output unchanged to one of the output files.
323 The first @<identifier> is the symbolic name of an output file. Predefined
324 output file names are @"c" and @"h", which are the implementation code and
325 header file respectively; other output files can be defined by extensions.
327 Output items are named with a sequence of identifiers, separated by
328 whitespace, and enclosed in parentheses. As an abbreviation, a name
329 consisting of a single identifier may be written as just that identifier,
330 without the parentheses.
332 The @<constraints> provide a means for specifying where in the output file
333 the output item should appear. (Note the two kinds of square brackets shown
334 in the syntax: square brackets must appear around the constraints if they are
335 present, but that they may be omitted.) Each comma-separated @<constraint>
336 is a sequence of names of output items, and indicates that the output items
337 must appear in the order given -- though the translator is free to insert
338 additional items in between them. (The particular output items needn't be
339 defined already -- indeed, they needn't be defined ever.)
341 There is a predefined output item @"includes" in both the @"c" and @"h"
342 output files which is a suitable place for inserting @"\#include"
343 preprocessor directives in order to declare types and functions for use
344 elsewhere in the generated output files.
347 \subsection{Property sets} \label{sec:syntax.module.properties}
349 <properties> ::= "[" <list>$[\mbox{@<property>}]$ "]"
351 <property> ::= <identifier> "=" <expression>
354 Property sets are a means for associating miscellaneous information with
355 classes and related items. By using property sets, additional information
356 can be passed to extensions without the need to introduce idiosyncratic
359 A property has a name, given as an @<identifier>, and a value computed by
360 evaluating an @<expression>. The value can be one of a number of types,
361 though the only operators currently defined act on integer values only.
363 \subsubsection{The expression evaluator}
365 <expression> ::= <term> | <expression> "+" <term> | <expression> "--" <term>
367 <term> ::= <factor> | <term> "*" <factor> | <term> "/" <factor>
369 <factor> ::= <primary> | "+" <factor> | "--" <factor>
372 <integer-literal> | <string-literal> | <char-literal> | <identifier>
373 \alt "<" <plain-type> ">"
374 \alt "?" <s-expression>
375 \alt "(" <expression> ")"
378 The arithmetic expression syntax is simple and standard; there are currently
379 no bitwise, logical, or comparison operators.
381 A @<primary> expression may be a literal or an identifier. Note that
382 identifiers stand for themselves: they \emph{do not} denote values. For more
383 fancy expressions, the syntax
387 causes the @<s-expression> to be evaluated using the Lisp \textsf{eval}
389 %%% FIXME crossref to extension docs
392 \subsection{C types} \label{sec:syntax.module.types}
394 Sod's syntax for C types closely mirrors the standard C syntax. A C type has
395 two parts: a sequence of @<declaration-specifier>s and a @<declarator>. In
396 Sod, a type must contain at least one @<declaration-specifier> (i.e.,
397 `implicit @"int"' is forbidden), and storage-class specifiers are not
400 \subsubsection{Declaration specifiers}
402 <declaration-specifier> ::= <type-name>
403 \alt "struct" <identifier> | "union" <identifier> | "enum" <identifier>
404 \alt "void" | "char" | "int" | "float" | "double"
405 \alt "short" | "long"
406 \alt "signed" | "unsigned"
407 \alt "bool" | "_Bool"
408 \alt "imaginary" | "_Imaginary" | "complex" | "_Complex"
410 \alt <storage-specifier>
413 <qualifier> ::= <atomic> | "const" | "volatile" | "restrict"
415 <plain-type> ::= @<declaration-specifier>^+ <abstract-declarator>
418 <atomic> "(" <plain-type> ")"
420 <atomic> ::= "atomic" | "_Atomic"
422 <storage-specifier> ::= <alignas> "(" <c-fragment> ")"
424 <alignas> ::= "alignas" "_Alignas"
426 <type-name> ::= <identifier>
429 A @<type-name> is an identifier which has been declared as being a type name,
430 using the @"typename" or @"class" definitions. The following type names are
431 defined in the built-in module.
439 Declaration specifiers may appear in any order. However, not all
440 combinations are permitted. A declaration specifier must consist of zero or
441 more @<qualifier>s, zero or more @<storage-specifier>s, and one of the
442 following, up to reordering.
446 \item @"struct" @<identifier>, @"union" @<identifier>, @"enum" @<identifier>
448 \item @"_Bool", @"bool"
449 \item @"char", @"unsigned char", @"signed char"
450 \item @"short", @"unsigned short", @"signed short"
451 \item @"short int", @"unsigned short int", @"signed short int"
452 \item @"int", @"unsigned int", @"signed int", @"unsigned", @"signed"
453 \item @"long", @"unsigned long", @"signed long"
454 \item @"long int", @"unsigned long int", @"signed long int"
455 \item @"long long", @"unsigned long long", @"signed long long"
456 \item @"long long int", @"unsigned long long int", @"signed long long int"
457 \item @"float", @"double", @"long double"
458 \item @"float _Imaginary", @"double _Imaginary", @"long double _Imaginary"
459 \item @"float imaginary", @"double imaginary", @"long double imaginary"
460 \item @"float _Complex", @"double _Complex", @"long double _Complex"
461 \item @"float complex", @"double complex", @"long double complex"
463 All of these have their usual C meanings.
465 \subsubsection{Declarators}
467 <declarator>$[k, a]$ ::= @<pointer>^* <primary-declarator>$[k, a]$
469 <primary-declarator>$[k, a]$ ::= $k$
470 \alt "(" <primary-declarator>$[k, a]$ ")"
471 \alt <primary-declarator>$[k, a]$ @<declarator-suffix>$[a]$
473 <pointer> ::= "*" @<qualifier>^*
475 <declarator-suffix>$[a]$ ::= "[" <c-fragment> "]"
478 <argument-list> ::= $\epsilon$ | "\dots"
479 \alt <list>$[\mbox{@<argument>}]$ @["," "\dots"@]
481 <argument> ::= @<declaration-specifier>^+ <argument-declarator>
483 <abstract-declarator> ::= <declarator>$[\epsilon, \mbox{@<argument-list>}]$
485 <argument-declarator> ::= <declarator>$[\mbox{@<identifier> @! $\epsilon$}]$
487 <argument-declarator> ::=
488 <declarator>$[\mbox{@<identifier> @! $\epsilon$}, \mbox{@<argument-list>}]$
490 <simple-declarator> ::=
491 <declarator>$[\mbox{@<identifier>}, \mbox{@<argument-list>}]$
494 The declarator syntax is taken from C, but with some differences.
496 \item Array dimensions are uninterpreted @<c-fragments>, terminated by a
497 closing square bracket. This allows array dimensions to contain arbitrary
498 constant expressions.
499 \item A declarator may have either a single @<identifier> at its centre or a
500 pair of @<identifier>s separated by a @`.'; this is used to refer to
501 slots or messages defined in superclasses.
503 The remaining differences are (I hope) a matter of presentation rather than
506 There is additional syntax to support messages and methods which accept
510 <keyword-argument> ::= <argument> @["=" <c-fragment>@]
512 <keyword-argument-list> ::=
513 @[<list>$[\mbox{@<argument>}]$@]
514 "?" @[<list>$[\mbox{@<keyword-argument>}]$@]
516 <method-argument-list> ::= <argument-list> @! <keyword-argument-list>
518 <dotted-name> ::= <identifier> "." <identifier>
520 <keyword-declarator>$[k]$ ::=
521 <declarator>$[k, \mbox{@<method-argument-list>}]$
525 \subsection{Class definitions} \label{sec:syntax.module.class}
528 <class-definition> ::= <class-forward-declaration>
529 \alt <full-class-definition>
532 \subsubsection{Forward declarations}
534 <class-forward-declaration> ::= "class" <identifier> ";"
537 A @<class-forward-declaration> informs Sod that an @<identifier> will be used
538 to name a class which is currently undefined. Forward declarations are
539 necessary in order to resolve certain kinds of circularity. For example,
543 class Super: SodObject \{ \\ \ind
547 class Sub: Super \{ \\ \ind
552 \subsubsection{Full class definitions}
554 <full-class-definition> ::=
556 "class" <identifier> ":" <list>$[\mbox{@<identifier>}]$
557 "{" @<properties-class-item>^* "}"
559 <properties-class-item> ::= @[<properties>@] <class-item>
561 <class-item> ::= <slot-item>
562 \alt <initializer-item>
569 A full class definition provides a complete description of a class.
571 The first @<identifier> gives the name of the class. It is an error to
572 give the name of an existing class (other than a forward-referenced class),
573 or an existing type name. It is conventional to give classes `MixedCase'
574 names, to distinguish them from other kinds of identifiers.
576 The @<list>$[\mbox{@<identifier>}]$ names the direct superclasses for the new
577 class. It is an error if any of these @<identifier>s does not name a defined
578 class. The superclass list is required, and must not be empty; listing
579 @|SodObject| as your class's superclass is a good choice if nothing else
580 seems suitable. It's not possible to define a \emph{root class} in the Sod
581 language: you must use Lisp to do this, and it's quite involved.
583 The @<properties> provide additional information. The standard class
584 properties are as follows.
586 \item[@"lisp_class"] The name of the Lisp class to use within the translator
587 to represent this class. The property value must be an identifier; the
588 default is @"sod_class". Extensions may define classes with additional
589 behaviour, and may recognize additional class properties.
590 \item[@"metaclass"] The name of the Sod metaclass for this class. In the
591 generated code, a class is itself an instance of another class -- its
592 \emph{metaclass}. The metaclass defines which slots the class will have,
593 which messages it will respond to, and what its behaviour will be when it
594 receives them. The property value must be an identifier naming a defined
595 subclass of @"SodClass". The default metaclass is @"SodClass".
596 %%% FIXME xref to theory
597 \item[@"nick"] A nickname for the class, to be used to distinguish it from
598 other classes in various limited contexts. The property value must be an
599 identifier; the default is constructed by forcing the class name to
603 The class body consists of a sequence of @<class-item>s enclosed in braces.
604 These items are discussed on the following sections.
606 \subsubsection{Slot items}
609 @<declaration-specifier>^+ <list>$[\mbox{@<init-declarator>}]$ ";"
611 <init-declarator> ::= <simple-declarator> @["=" <initializer>@]
614 A @<slot-item> defines one or more slots. All instances of the class and any
615 subclass will contain these slot, with the names and types given by the
616 @<declaration-specifiers> and the @<declarators>. Slot declarators may not
617 contain dotted names.
619 It is not possible to declare a slot with function type: such an item is
620 interpreted as being a @<message-item> or @<method-item>. Pointers to
623 An @<initializer>, if present, is treated as if a separate
624 @<initializer-item> containing the slot name and initializer were present.
628 class Example: Super \{ \\ \ind
635 class Example: Super \{ \\ \ind
641 \subsubsection{Initializer items}
643 <initializer-item> ::= @["class"@] <list>$[\mbox{@<slot-initializer>}]$ ";"
645 <slot-initializer> ::= <dotted-name> @["=" <initializer>@]
647 <initializer> :: <c-fragment>
650 An @<initializer-item> provides an initial value for one or more slots. If
651 prefixed by @"class", then the initial values are for class slots (i.e.,
652 slots of the class object itself); otherwise they are for instance slots.
654 The first component of the @<dotted-name> must be the nickname of one of the
655 class's superclasses (including itself); the second must be the name of a
656 slot defined in that superclass.
658 An @|initarg| property may be set on an instance slot initializer (or a
659 direct slot definition). See \xref{sec:concepts.lifecycle.birth} for the
660 details. An initializer item must have either an @|initarg| property, or an
661 initializer expression, or both.
663 Each class may define at most one initializer item with an explicit
664 initializer expression for a given slot.
666 \subsubsection{Initarg items}
670 @<declaration-specifier>^+
671 <list>$[\mbox{@<init-declarator>}]$ ";"
674 \subsubsection{Fragment items}
676 <fragment-item> ::= <fragment-kind> "{" <c-fragment> "}"
678 <fragment-kind> ::= "init" | "teardown"
681 \subsubsection{Message items}
684 @<declaration-specifier>^+
685 <keyword-declarator>$[\mbox{@<identifier>}]$
689 \subsubsection{Method items}
692 @<declaration-specifier>^+
693 <keyword-declarator>$[\mbox{@<dotted-name>}]$
696 <method-body> ::= "{" <c-fragment> "}" | "extern" ";"
699 %%%----- That's all, folks --------------------------------------------------
703 %%% TeX-master: "sod.tex"