src/lexer-proto.lisp: New parser `must'.

[sod] / doc / concepts.tex
diff --git a/doc/concepts.tex b/doc/concepts.tex

index b8cdfe9ee6c6e208e9b23820055ac22ccc78b8eb..b4e80ca908b33ab162478b35d190d08ba724acd7 100644 (file)
--- a/doc/concepts.tex
+++ b/doc/concepts.tex
@@ -195,7 +195,7 @@ It works as follows.
    earliest position in these candidate merges at which they disagree.  The
    \emph{candidate classes} at this position are the classes appearing at this
    position in the candidate merges.  Each candidate class must be a
-  superclass of exactly one of $C$'s direct superclasses, since otherwise the
+  superclass of distinct direct superclasses of $C$, since otherwise the
    candidates would be ordered by their common subclass's class precedence
    list.  The class precedence list contains, at this position, that candidate
    class whose subclass appears earliest in $C$'s local precedence order.
@@ -208,8 +208,8 @@ a link superclass, and the link superclass of a class $C$, if it exists, need
  not be a direct superclass of $C$.
  
  Superclass links must obey the following rule: if $C$ is a class, then there
-must be no three superclasses $X$, $Y$ and~$Z$ of $C$ such that both $Z$ is
-the link superclass of both $X$ and $Y$.  As a consequence of this rule, the
+must be no three superclasses $X$, $Y$ and~$Z$ of $C$ such that $Z$ is the
+link superclass of both $X$ and $Y$.  As a consequence of this rule, the
  superclasses of $C$ can be partitioned into linear \emph{chains}, such that
  superclasses $A$ and $B$ are in the same chain if and only if one can trace a
  path from $A$ to $B$ by following superclass links, or \emph{vice versa}.
@@ -246,12 +246,12 @@ qualified by the defining class's nickname.
  As well as defining slot names and types, a class can also associate an
  \emph{initial value} with each slot defined by itself or one of its
  subclasses.  A class $C$ provides an \emph{initialization function} (see
-\xref{sec:concepts.classes.c}, and \xref{sec:structures.root.sodclass}) which
-sets the slots of a \emph{direct} instance of the class to the correct
+\xref{sec:concepts.lifecycle.birth}, and \xref{sec:structures.root.sodclass})
+which sets the slots of a \emph{direct} instance of the class to the correct
  initial values.  If several of $C$'s superclasses define initializers for the
  same slot then the initializer from the most specific such class is used.  If
  none of $C$'s superclasses define an initializer for some slot then that slot
-will not be initialized.
+will be left uninitialized.
  
  The initializer for a slot with scalar type may be any C expression.  The
  initializer for a slot with aggregate type must contain only constant
@@ -259,6 +259,17 @@ expressions if the generated code is expected to be processed by a
  implementation of C89.  Initializers will be evaluated once each time an
  instance is initialized.
  
+Slots are initialized in reverse-precedence order of their defining classes;
+i.e., slots defined by a less specific superclass are initialized earlier
+than slots defined by a more specific superclass.  Slots defined by the same
+class are initialized in the order in which they appear in the class
+definition.
+
+The initializer for a slot may refer to other slots in the same object, via
+the @|me| pointer: in an initializer for a slot defined by a class $C$, @|me|
+has type `pointer to $C$'.  (Note that the type of @|me| depends only on the
+class which defined the slot, not the class which defined the initializer.)
+
  
  \subsection{C language integration} \label{sec:concepts.classes.c}
  
@@ -307,89 +318,6 @@ functions for working with that class's instances.  (The @|SodClass| class
  doesn't define any messages, so it doesn't have any methods.  In Sod, a class
  slot containing a function pointer is not at all the same thing as a method.)
  
-\subsubsection{Instance allocation, imprinting, and initialization}
-It is in general not sufficient to declare (or @|malloc|) an object of the
-appropriate class type and fill it in, since the class type only describes an
-instance's layout from the point of view of a single superclass chain.  The
-correct type to allocate, to store a direct instance of some class is a
-structure whose tag is the class name suffixed with `@|__ilayout|'; e.g., the
-correct layout structure for a direct instance of @|MyClass| would be
-@|struct MyClass__ilayout|.
-
-Instance layouts may be declared as objects with automatic storage duration
-(colloquially, `allocated on the stack') or allocated dynamically, e.g.,
-using @|malloc|.  Sod's runtime system doesn't retain addresses of instances,
-so, for example, Sod doesn't make using a fancy allocator which sometimes
-moves objects around in memory any more difficult than it needs to be.
-
-Once storage for an instance has been allocated, it must be \emph{imprinted}
-before it can be used.  Imprinting an instance stores some metadata about its
-direct class in the instance structure, so that the rest of the program (and
-Sod's runtime library) can tell what sort of object it is, and how to use
-it.\footnote{%
-  Specifically, imprinting an instance's storage involves storing the
-  appropriate vtable pointers in the right places in it.} %
-A class object's @|imprint| slot points to a function which will correctly
-imprint storage for one of that class's instances.
-
-Once an instance's storage has been imprinted, it is possible to send the
-instance messages; however, the instance's slots are uninitialized at this
-point, so most methods are unlikely to do much of any use.  So, usually, you
-don't just want to imprint instance storage, but to \emph{initialize} an
-instance.  Initialization includes imprinting, but also sets the new
-instance's slots to their initial values, as defined by the class.  If
-neither the class nor any of its superclasses defines an initializer for a
-slot then it will not be initialized.
-
-There is currently no facility for providing parameters to the instance
-initialization process (e.g., for use by slot initializer expressions).
-Instance initialization is a complicated matter and for now I want to
-experiment with various approaches before committing to one.  My current
-interim approach is to specify slot initializers where appropriate and send
-class-specific messages for more complicated parametrized initialization.
-
-Automatic-duration instances can be conveniently constructed and initialized
-using the \descref{SOD_DECL}[macro]{mac}.  No special support is currently
-provided for dynamically allocated instances.  A simple function using
-@|malloc| might work as follows.
-\begin{prog}
-  void *new_instance(const SodClass *c) \\
-  \{ \\ \ind
-    void *p = malloc(c@->cls.initsz); \\
-    if (!p) return (0); \\
-    c@->cls.init(p); \\
-    return (p); \- \\
-  \}
-\end{prog}
-
-\subsubsection{Instance finalization and deallocation}
-There is currently no provided assistance for finalization or deallocation.
-It is the programmer's responsibility to decide and implement an appropriate
-protocol.  Note that to free an instance allocated from the heap, one must
-correctly find its base address: the \descref{SOD_INSTBASE}[macro]{mac} will
-do this for you.
-
-The following simple mixin class is suggested.
-\begin{prog}
-  [nick = disposable] \\
-  class DisposableObject : SodObject \{ \\- \ind
-    void release() \{ ; \} \\
-    \quad /* Release resources held by the receiver. */ \- \\-
-  \}
-  \\+
-  code c : user \{ \\- \ind
-    /\=\+* Free object p's instance storage.  If p is a DisposableObject \\
-       {}* then release its resources beforehand. \\
-       {}*/ \- \\
-    void free_instance(void *p) \\
-    \{ \\ \ind
-      DisposableObject *d = SOD_CONVERT(DisposableObject, p); \\
-      if (d) DisposableObject_release(d); \\
-      free(d); \- \\
-    \} \- \\
-  \}
-\end{prog}
-
  \subsubsection{Conversions}
  Suppose one has a value of type pointer to class type of some class~$C$, and
  wants to convert it to a pointer to class type of some other class~$B$.
@@ -413,7 +341,8 @@ There are three main cases to distinguish.
    conversion can fail: the object in question might not be an instance of~$B$
    at all.  The macro \descref{SOD_CONVERT}{mac} and the function
    \descref{sod_convert}{fun} perform general conversions.  They return a null
-  pointer if the conversion fails.
+  pointer if the conversion fails.  (There are therefore your analogue to the
+  \Cplusplus @|dynamic_cast<>| operator.)
  \end{itemize}
  The Sod translator generates macros for performing both in-chain and
  cross-chain upcasts.  For each class~$C$, and each proper superclass~$B$
@@ -470,7 +399,8 @@ Keyword arguments can be provided in three ways.
  
  Keyword arguments are provided as a general feature for C functions.
  However, Sod has special support for messages which accept keyword arguments
-(\xref{sec:concepts.methods.keywords}).
+(\xref{sec:concepts.methods.keywords}); and they play an essential role in
+the instance construction protocol (\xref{sec:concepts.lifecycle.birth}).
  
  %%%--------------------------------------------------------------------------
  \section{Messages and methods} \label{sec:concepts.methods}
@@ -576,10 +506,13 @@ follows.
    returns; otherwise the behaviour of @|next_method| is to invoke the before
    methods (if any), followed by the most specific primary method, followed by
    the @|around| methods (if any), and to return whichever value was returned
-  by the most specific primary method.  That is, the behaviour of the least
-  specific @|around| method's @|next_method| function is exactly the
-  behaviour that the effective method would have if there were no @|around|
-  methods.
+  by the most specific primary method, as described in the following items.
+  That is, the behaviour of the least specific @|around| method's
+  @|next_method| function is exactly the behaviour that the effective method
+  would have if there were no @|around| methods.  Note that if the
+  least-specific @|around| method calls its @|next_method| more than once
+  then the whole sequence of @|before|, primary, and @|after| methods occurs
+  multiple times.
  
    The value returned by the most specific @|around| method is the value
    returned by the effective method.
@@ -634,6 +567,11 @@ arguments.  If the method body has overwritten its formal arguments, then
  @|CALL_NEXT_METHOD| will pass along the updated values, rather than the
  original ones.
  
+A primary or @|around| method which invokes its @|next_method| function is
+said to \emph{extend} the message behaviour; a method which does not invoke
+its @|next_method| is said to \emph{override} the behaviour.  Note that a
+method may make a decision to override or extend at runtime.
+
  \subsubsection{Aggregating method combinations}
  A number of other method combinations are provided.  They are called
  `aggregating' method combinations because, instead of invoking just the most
@@ -712,6 +650,251 @@ value; otherwise @|suppliedp.$k$| is zero, and $k$ contains the default value
  from the direct method definition if there was one, or an unspecified value
  otherwise.
  
+%%%--------------------------------------------------------------------------
+\section{The object lifecycle} \label{sec:concepts.lifecycle}
+
+\subsection{Creation} \label{sec:concepts.lifecycle.birth}
+
+Construction of a new instance of a class involves three steps.
+\begin{enumerate}
+\item \emph{Allocation} arranges for there to be storage space for the
+  instance's slots and associated metadata.
+\item \emph{Imprinting} fills in the instance's metadata, associating the
+  instance with its class.
+\item \emph{Initialization} stores appropriate initial values in the
+  instance's slots, and maybe links it into any external data structures as
+  necessary.
+\end{enumerate}
+The \descref{SOD_DECL}[macro]{mac} handles constructing instances with
+automatic storage duration (`on the stack').  Similarly, the
+\descref{SOD_MAKE}[macro]{mac} and the \descref{sod_make}{fun} and
+\descref{sod_makev}{fun} functions construct instances allocated from the
+standard @|malloc| heap.  Programmers can add support for other allocation
+strategies by using the \descref{SOD_INIT}[macro]{mac} and the
+\descref{sod_init}{fun} and \descref{sod_initv}{fun} functions, which package
+up imprinting and initialization.
+
+\subsubsection{Allocation}
+Instances of most classes (specifically including those classes defined by
+Sod itself) can be held in any storage of sufficient size.  The in-memory
+layout of an instance of some class~$C$ is described by the type @|struct
+$C$__ilayout|, and if the relevant class is known at compile time then the
+best way to discover the layout size is with the @|sizeof| operator.  Failing
+that, the size required to hold an instance of $C$ is available in a slot in
+$C$'s class object, as @|$C$__class@->cls.initsz|.
+
+It is not in general sufficient to declare, or otherwise allocate, an object
+of the class type $C$.  The class type only describes a single chain of the
+object's layout.  It is nearly always an error to use the class type as if it
+is a \emph{complete type}, e.g., to declare objects or arrays of the class
+type, or to enquire about its size or alignment requirements.
+
+Instance layouts may be declared as objects with automatic storage duration
+(colloquially, `allocated on the stack') or allocated dynamically, e.g.,
+using @|malloc|.  They may be included as members of structures or unions, or
+elements of arrays.  Sod's runtime system doesn't retain addresses of
+instances, so, for example, Sod doesn't make using fancy allocators which
+sometimes move objects around in memory any more difficult than it needs to
+be.
+
+There isn't any way to discover the alignment required for a particular
+class's instances at runtime; it's best to be conservative and assume that
+the platform's strictest alignment requirement applies.
+
+The following simple function correctly allocates and returns space for an
+instance of a class given a pointer to its class object @<cls>.
+\begin{prog}
+  void *allocate_instance(const SodClass *cls)                  \\ \ind
+    \{ return malloc(cls@->cls.initsz); \}
+\end{prog}
+
+\subsubsection{Imprinting}
+Once storage has been allocated, it must be \emph{imprinted} before it can be
+used as an instance of a class, e.g., before any messages can be sent to it.
+
+Imprinting an instance stores some metadata about its direct class in the
+instance structure, so that the rest of the program (and Sod's runtime
+library) can tell what sort of object it is, and how to use it.\footnote{%
+  Specifically, imprinting an instance's storage involves storing the
+  appropriate vtable pointers in the right places in it.} %
+A class object's @|imprint| slot points to a function which will correctly
+imprint storage for one of that class's instances.
+
+Once an instance's storage has been imprinted, it is technically possible to
+send messages to the instance; however the instance's slots are still
+uninitialized at this point, the applicable methods are unlikely to do much
+of any use unless they've been written specifically for the purpose.
+
+The following simple function imprints storage at address @<p> as an instance
+of a class, given a pointer to its class object @<cls>.
+\begin{prog}
+  void imprint_instance(const SodClass *cls, void *p)           \\ \ind
+    \{ cls@->cls.imprint(p); \}
+\end{prog}
+
+\subsubsection{Initialization}
+The final step for constructing a new instance is to \emph{initialize} it, to
+establish the necessary invariants for the instance itself and the
+environment in which it operates.
+
+Details of initialization are necessarily class-specific, but typically it
+involves setting the instance's slots to appropriate values, and possibly
+linking it into some larger data structure to keep track of it.
+
+Initialization is performed by sending the imprinted instance an @|init|
+message, defined by the @|SodObject| class.  This message uses a nonstandard
+method combination which works like the standard combination, except that the
+\emph{default behaviour}, if there is no overriding method, is to initialize
+the instance's slots, as described below, and to invoke each superclass's
+initialization fragments.  This default behaviour may be invoked multiple
+times if some method calls on its @|next_method| more than once, unless some
+other method takes steps to prevent this.
+
+Slots are initialized in a well-defined order.
+\begin{itemize}
+\item Slots defined by a more specific superclasses are initialized after
+  slots defined by a less specific superclass.
+\item Slots defined by the same class are initialized in the order in which
+  their definitions appear.
+\end{itemize}
+
+A class can define \emph{initialization fragments}: pieces of literal code to
+be executed to set up a new instance.  Each superclass's initialization
+fragments are executed with @|me| bound to an instance pointer of the
+appropriate superclass type, immediately after that superclass's slots (if
+any) have been initialized; therefore, fragments defined by a more specific
+superclass are executed after fragments defined by a more specific
+superclass.  A class may define more than one initialization fragment: the
+fragments are executed in the order in which they appear in the class
+definition.  It is possible for an initialization fragment to use @|return|
+or @|goto| for special control-flow effects, but this is not likely to be a
+good idea.
+
+The @|init| message accepts keyword arguments
+(\xref{sec:concepts.methods.keywords}).  The set of acceptable keywords is
+determined by the applicable methods as usual, but also by the
+\emph{initargs} defined by the receiving instance's class and its
+superclasses, which are made available to slot initializers and
+initialization fragments.
+
+There are two kinds of initarg definitions.  \emph{User initargs} are defined
+by an explicit @|initarg| item appearing in a class definition: the item
+defines a name, type, and (optionally) a default value for the initarg.
+\emph{Slot initargs} are defined by attaching an @|initarg| property to a
+slot or slot initializer item: the property's determines the initarg's name,
+while the type is taken from the underlying slot type; slot initargs do not
+have default values.  Both kinds define a \emph{direct initarg} for the
+containing class.
+
+Initargs are inherited.  The \emph{applicable} direct initargs for an @|init|
+effective method are those defined by the receiving object's class, and all
+of its superclasses.  Applicable direct initargs with the same name are
+merged to form \emph{effective initargs}.  An error is reported if two
+applicable direct initargs have the same name but different types.  The
+default value of an effective initarg is taken from the most specific
+applicable direct initarg which specifies a defalt value; if no applicable
+direct initarg specifies a default value then the effective initarg has no
+default.
+
+All initarg values are made available at runtime to user code --
+initialization fragments and slot initializer expressions -- through local
+variables and a @|suppliedp| structure, as in a direct method
+(\xref{sec:concepts.methods.keywords}).  Furthermore, slot initarg
+definitions influence the initialization of slots.
+
+The process for deciding how to initialize a particular slot works as
+follows.
+\begin{enumerate}
+\item If there are any slot initargs defined on the slot, or any of its slot
+  initializers, \emph{and} the sender supplied a value for one or more of the
+  corresponding effective initargs, then the value of the most specific slot
+  initarg is stored in the slot.
+\item Otherwise, if there are any slot initializers defined which include an
+  initializer expression, then the initializer expression from the most
+  specific such slot initializer is evaluated and its value stored in the
+  slot.
+\item Otherwise, the slot is left uninitialized.
+\end{enumerate}
+Note that the default values (if any) of effective initargs do \emph{not}
+affect this procedure.
+
+
+\subsection{Destruction}
+\label{sec:concepts.lifecycle.death}
+
+Destruction of an instance, when it is no longer required, consists of two
+steps.
+\begin{enumerate}
+\item \emph{Teardown} releases any resources held by the instance and
+  disentangles it from any external data structures.
+\item \emph{Deallocation} releases the memory used to store the instance so
+  that it can be reused.
+\end{enumerate}
+Teardown alone, for objects which require special deallocation, or for which
+deallocation occurs automatically (e.g., instances with automatic storage
+duration, or instances whose storage will be garbage-collected), is performed
+using the \descref{sod_teardown}[function]{fun}.  Destruction of instances
+allocated from the standard @|malloc| heap is done using the
+\descref{sod_destroy}[function]{fun}.
+
+\subsubsection{Teardown}
+Details of initialization are necessarily class-specific, but typically it
+involves setting the instance's slots to appropriate values, and possibly
+linking it into some larger data structure to keep track of it.
+
+Teardown is performed by sending the instance the @|teardown| message,
+defined by the @|SodObject| class.  The message returns an integer, used as a
+boolean flag.  If the message returns zero, then the instance's storage
+should be deallocated.  If the message returns nonzero, then it is safe for
+the caller to forget about instance, but should not deallocate its storage.
+This is \emph{not} an error return: if some teardown method fails then the
+program may be in an inconsistent state and should not continue.
+
+This simple protocol can be used, for example, to implement a reference
+counting system, as follows.
+\begin{prog}
+  [nick = ref]                                                  \\
+  class ReferenceCountedObject \{                               \\ \ind
+    unsigned nref = 1;                                          \\-
+    void inc() \{ me@->ref.nref++; \}                           \\-
+    [role = around]                                             \\
+    int obj.teardown()                                          \\
+    \{                                                          \\ \ind
+      if (--\,--me@->ref.nref) return (1);                      \\
+      else return (CALL_NEXT_METHOD);                         \-\\
+    \}                                                        \-\\
+  \}
+\end{prog}
+
+This message uses a nonstandard method combination which works like the
+standard combination, except that the \emph{default behaviour}, if there is
+no overriding method, is to execute the superclass's teardown fragments, and
+to return zero.  This default behaviour may be invoked multiple times if some
+method calls on its @|next_method| more than once, unless some other method
+takes steps to prevent this.
+
+A class can define \emph{teardown fragments}: pieces of literal code to be
+executed to shut down an instance.  Each superclass's teardown fragments are
+executed with @|me| bound to an instance pointer of the appropriate
+superclass type; fragments defined by a more specific superclass are executed
+before fragments defined by a more specific superclass.  A class may define
+more than one teardown fragment: the fragments are executed in the order in
+which they appear in the class definition.  It is possible for an
+initialization fragment to use @|return| or @|goto| for special control-flow
+effects, but this is not likely to be a good idea.  Similarly, it's probably
+a better idea to use an @|around| method to influence the return value than
+to write an explicit @|return| statement in a teardown fragment.
+
+\subsubsection{Deallocation}
+The details of instance deallocation are obviously specific to the allocation
+strategy used by the instance, and this is often orthogonal from the object's
+class.
+
+The code which makes the decision to destroy an object may often not be aware
+of the object's direct class.  Low-level details of deallocation often
+require the proper base address of the instance's storage, which can be
+determined using the \descref{SOD_INSTBASE}[macro]{mac}.
+
  %%%--------------------------------------------------------------------------
  \section{Metaclasses} \label{sec:concepts.metaclasses}