X-Git-Url: http://www.chiark.greenend.org.uk/ucgi/~mdw/git/sod/blobdiff_plain/d24d47f5dbe064439e35033cadd9d6c463ee1c87..98da9322baa008239dd78cd2aff71637017592c7:/doc/concepts.tex

diff --git a/doc/concepts.tex b/doc/concepts.tex
index 74e5902..7702f13 100644
--- a/doc/concepts.tex
+++ b/doc/concepts.tex
@@ -25,37 +25,6 @@
 
 \chapter{Concepts} \label{ch:concepts}
 
-%%%--------------------------------------------------------------------------
-\section{Operational model} \label{sec:concepts.model}
-
-The Sod translator runs as a preprocessor, similar in nature to the
-traditional Unix \man{lex}{1} and \man{yacc}{1} tools.  The translator reads
-a \emph{module} file containing class definitions and other information, and
-writes C~source and header files.  The source files contain function
-definitions and static tables which are fed directly to a C~compiler; the
-header files contain declarations for functions and data structures, and are
-included by source files -- whether hand-written or generated by Sod -- which
-makes use of the classes defined in the module.
-
-Sod is not like \Cplusplus: it makes no attempt to `enhance' the C language
-itself.  Sod module files describe classes, messages, methods, slots, and
-other kinds of object-system things, and some of these descriptions need to
-contain C code fragments, but this code is entirely uninterpreted by the Sod
-translator.\footnote{%
-  As long as a code fragment broadly follows C's lexical rules, and properly
-  matches parentheses, brackets, and braces, the Sod translator will copy it
-  into its output unchanged.  It might, in fact, be some other kind of C-like
-  language, such as Objective~C or \Cplusplus.  Or maybe even
-  Objective~\Cplusplus, because if having an object system is good, then
-  having three must be really awesome.} %
-
-The Sod translator is not a closed system.  It is written in Common Lisp, and
-can load extension modules which add new input syntax, output formats, or
-altered behaviour.  The interface for writing such extensions is described in
-\xref{p:lisp}.  Extensions can change almost all details of the Sod object
-system, so the material in this manual must be read with this in mind: this
-manual describes the base system as provided in the distribution.
-
 %%%--------------------------------------------------------------------------
 \section{Modules} \label{sec:concepts.modules}
 
@@ -245,13 +214,13 @@ qualified by the defining class's nickname.
 \subsubsection{Slot initializers}
 As well as defining slot names and types, a class can also associate an
 \emph{initial value} with each slot defined by itself or one of its
-subclasses.  A class $C$ provides an \emph{initialization function} (see
+subclasses.  A class $C$ provides an \emph{initialization message} (see
 \xref{sec:concepts.lifecycle.birth}, and \xref{sec:structures.root.sodclass})
-which sets the slots of a \emph{direct} instance of the class to the correct
-initial values.  If several of $C$'s superclasses define initializers for the
-same slot then the initializer from the most specific such class is used.  If
-none of $C$'s superclasses define an initializer for some slot then that slot
-will be left uninitialized.
+whose methods set the slots of a \emph{direct} instance of the class to the
+correct initial values.  If several of $C$'s superclasses define initializers
+for the same slot then the initializer from the most specific such class is
+used.  If none of $C$'s superclasses define an initializer for some slot then
+that slot will be left uninitialized.
 
 The initializer for a slot with scalar type may be any C expression.  The
 initializer for a slot with aggregate type must contain only constant
@@ -259,6 +228,17 @@ expressions if the generated code is expected to be processed by a
 implementation of C89.  Initializers will be evaluated once each time an
 instance is initialized.
 
+Slots are initialized in reverse-precedence order of their defining classes;
+i.e., slots defined by a less specific superclass are initialized earlier
+than slots defined by a more specific superclass.  Slots defined by the same
+class are initialized in the order in which they appear in the class
+definition.
+
+The initializer for a slot may refer to other slots in the same object, via
+the @|me| pointer: in an initializer for a slot defined by a class $C$, @|me|
+has type `pointer to $C$'.  (Note that the type of @|me| depends only on the
+class which defined the slot, not the class which defined the initializer.)
+
 
 \subsection{C language integration} \label{sec:concepts.classes.c}
 
@@ -290,7 +270,10 @@ slots.  If you want to hide implementation details, the best approach is to
 stash them in a dynamically allocated private structure, and leave a pointer
 to it in a slot.  (This will also help preserve binary compatibility, because
 the private structure can grow more members as needed.  See
-\xref{sec:fixme.compatibility} for more details.
+\xref{sec:fixme.compatibility} for more details.)
+
+\subsubsection{Vtables}
+
 
 \subsubsection{Class objects}
 In Sod's object system, classes are objects too.  Therefore classes are
@@ -308,8 +291,8 @@ doesn't define any messages, so it doesn't have any methods.  In Sod, a class
 slot containing a function pointer is not at all the same thing as a method.)
 
 \subsubsection{Conversions}
-Suppose one has a value of type pointer to class type of some class~$C$, and
-wants to convert it to a pointer to class type of some other class~$B$.
+Suppose one has a value of type pointer-to-class-type for some class~$C$, and
+wants to convert it to a pointer-to-class-type for some other class~$B$.
 There are three main cases to distinguish.
 \begin{itemize}
 \item If $B$ is a superclass of~$C$, in the same chain, then the conversion
@@ -325,13 +308,13 @@ There are three main cases to distinguish.
   pointer.  The conversion can be performed using the appropriate generated
   upcast macro (see below); the general case is handled by the macro
   \descref{SOD_XCHAIN}{mac}.
-\item If $B$ is a subclass of~$C$ then the conversion is an \emph{upcast};
+\item If $B$ is a subclass of~$C$ then the conversion is a \emph{downcast};
   otherwise the conversion is a~\emph{cross-cast}.  In either case, the
   conversion can fail: the object in question might not be an instance of~$B$
-  at all.  The macro \descref{SOD_CONVERT}{mac} and the function
+  after all.  The macro \descref{SOD_CONVERT}{mac} and the function
   \descref{sod_convert}{fun} perform general conversions.  They return a null
   pointer if the conversion fails.  (There are therefore your analogue to the
-  \Cplusplus @|dynamic_cast<>| operator.)
+  \Cplusplus\ @|dynamic_cast<>| operator.)
 \end{itemize}
 The Sod translator generates macros for performing both in-chain and
 cross-chain upcasts.  For each class~$C$, and each proper superclass~$B$
@@ -388,7 +371,8 @@ Keyword arguments can be provided in three ways.
 
 Keyword arguments are provided as a general feature for C functions.
 However, Sod has special support for messages which accept keyword arguments
-(\xref{sec:concepts.methods.keywords}).
+(\xref{sec:concepts.methods.keywords}); and they play an essential role in
+the instance construction protocol (\xref{sec:concepts.lifecycle.birth}).
 
 %%%--------------------------------------------------------------------------
 \section{Messages and methods} \label{sec:concepts.methods}
@@ -604,6 +588,36 @@ There is also a @|custom| aggregating method combination, which is described
 in \xref{sec:fixme.custom-aggregating-method-combination}.
 
 
+\subsection{Sending messages in C} \label{sec:concepts.methods.c}
+
+Each instance is associated with its direct class [FIXME]
+
+The effective methods for each class are determined at translation time, by
+the Sod translator.  For each effective method, one or more \emph{method
+entry functions} are constructed.  A method entry function has three
+responsibilities.
+\begin{itemize}
+\item It converts the receiver pointer to the correct type.  Method entry
+  functions can perform these conversions extremely efficiently: there are
+  separate method entries for each chain of each class which can receive a
+  message, so method entry functions are in the privileged situation of
+  knowing the \emph{exact} class of the receiving object.
+\item If the message accepts a variable-length argument tail, then two method
+  entry functions are created for each chain of each class: one receives a
+  variable-length argument tail, as intended, and captures it in a @|va_list|
+  object; the other accepts an argument of type @|va_list| in place of the
+  variable-length tail and arranges for it to be passed along to the direct
+  methods.
+\item It invokes the effective method with the appropriate arguments.  There
+  might or might not be an actual function corresponding to the effective
+  method itself: the translator may instead open-code the effective method's
+  behaviour into each method entry function; and the machinery for handling
+  `delegation chains', such as is used for @|around| methods and primary
+  methods in the standard method combination, is necessarily scattered among
+  a number of small functions.
+\end{itemize}
+
+
 \subsection{Messages with keyword arguments}
 \label{sec:concepts.methods.keywords}
 
@@ -654,8 +668,13 @@ Construction of a new instance of a class involves three steps.
   necessary.
 \end{enumerate}
 The \descref{SOD_DECL}[macro]{mac} handles constructing instances with
-automatic storage duration (`on the stack').  Currently, there is no built-in
-support for constructing dynamically-allocated instances.
+automatic storage duration (`on the stack').  Similarly, the
+\descref{SOD_MAKE}[macro]{mac} and the \descref{sod_make}{fun} and
+\descref{sod_makev}{fun} functions construct instances allocated from the
+standard @|malloc| heap.  Programmers can add support for other allocation
+strategies by using the \descref{SOD_INIT}[macro]{mac} and the
+\descref{sod_init}{fun} and \descref{sod_initv}{fun} functions, which package
+up imprinting and initialization.
 
 \subsubsection{Allocation}
 Instances of most classes (specifically including those classes defined by
@@ -687,7 +706,7 @@ the platform's strictest alignment requirement applies.
 The following simple function correctly allocates and returns space for an
 instance of a class given a pointer to its class object @<cls>.
 \begin{prog}
-  void *allocate_instance(const SodClass *cls) \\ \ind
+  void *allocate_instance(const SodClass *cls)                  \\ \ind
     \{ return malloc(cls@->cls.initsz); \}
 \end{prog}
 
@@ -711,7 +730,7 @@ of any use unless they've been written specifically for the purpose.
 The following simple function imprints storage at address @<p> as an instance
 of a class, given a pointer to its class object @<cls>.
 \begin{prog}
-  void imprint_instance(const SodClass *cls, void *p) \\ \ind
+  void imprint_instance(const SodClass *cls, void *p)           \\ \ind
     \{ cls@->cls.imprint(p); \}
 \end{prog}
 
@@ -724,31 +743,82 @@ Details of initialization are necessarily class-specific, but typically it
 involves setting the instance's slots to appropriate values, and possibly
 linking it into some larger data structure to keep track of it.
 
-Classes can declare initial values for their slots.  A class object's @|init|
-slot points to a function which will establish the appropriate initial values
-for a new instance's slots.  Slots are not initialized in any particularly
-useful order.  The @|init| function also imprints the instance storage.
+Initialization is performed by sending the imprinted instance an @|init|
+message, defined by the @|SodObject| class.  This message uses a nonstandard
+method combination which works like the standard combination, except that the
+\emph{default behaviour}, if there is no overriding method, is to initialize
+the instance's slots, as described below, and to invoke each superclass's
+initialization fragments.  This default behaviour may be invoked multiple
+times if some method calls on its @|next_method| more than once, unless some
+other method takes steps to prevent this.
 
-The provided initialization protocol is extremely simplistic; most notably,
-it's not possible to pass parameters into the initialization process.
-Classes which have more complex requirements will need to define and
-implement their own additional (or alternative) protocols.
+Slots are initialized in a well-defined order.
+\begin{itemize}
+\item Slots defined by a more specific superclasses are initialized after
+  slots defined by a less specific superclass.
+\item Slots defined by the same class are initialized in the order in which
+  their definitions appear.
+\end{itemize}
 
-\subsubsection{Example}
-The following is a simple function, with syntactic-sugar macro, which
-allocate storage for an instance of a class, imprints and initializes it, and
-returns a pointer to the new instance.
-\begin{prog}
-  void *make_instance(const SodClass *c) \\
-  \{ \\ \ind
-    void *p = malloc(c@->cls.initsz); \\
-    if (!p) return (0); \\
-    c@->cls.init(p); \\
-    return (p); \- \\
-  \}
-  \\+
-  \#define MAKE(cls) (cls *)make_instance(cls\#\#__class)
-\end{prog}
+A class can define \emph{initialization fragments}: pieces of literal code to
+be executed to set up a new instance.  Each superclass's initialization
+fragments are executed with @|me| bound to an instance pointer of the
+appropriate superclass type, immediately after that superclass's slots (if
+any) have been initialized; therefore, fragments defined by a more specific
+superclass are executed after fragments defined by a more specific
+superclass.  A class may define more than one initialization fragment: the
+fragments are executed in the order in which they appear in the class
+definition.  It is possible for an initialization fragment to use @|return|
+or @|goto| for special control-flow effects, but this is not likely to be a
+good idea.
+
+The @|init| message accepts keyword arguments
+(\xref{sec:concepts.methods.keywords}).  The set of acceptable keywords is
+determined by the applicable methods as usual, but also by the
+\emph{initargs} defined by the receiving instance's class and its
+superclasses, which are made available to slot initializers and
+initialization fragments.
+
+There are two kinds of initarg definitions.  \emph{User initargs} are defined
+by an explicit @|initarg| item appearing in a class definition: the item
+defines a name, type, and (optionally) a default value for the initarg.
+\emph{Slot initargs} are defined by attaching an @|initarg| property to a
+slot or slot initializer item: the property's determines the initarg's name,
+while the type is taken from the underlying slot type; slot initargs do not
+have default values.  Both kinds define a \emph{direct initarg} for the
+containing class.
+
+Initargs are inherited.  The \emph{applicable} direct initargs for an @|init|
+effective method are those defined by the receiving object's class, and all
+of its superclasses.  Applicable direct initargs with the same name are
+merged to form \emph{effective initargs}.  An error is reported if two
+applicable direct initargs have the same name but different types.  The
+default value of an effective initarg is taken from the most specific
+applicable direct initarg which specifies a defalt value; if no applicable
+direct initarg specifies a default value then the effective initarg has no
+default.
+
+All initarg values are made available at runtime to user code --
+initialization fragments and slot initializer expressions -- through local
+variables and a @|suppliedp| structure, as in a direct method
+(\xref{sec:concepts.methods.keywords}).  Furthermore, slot initarg
+definitions influence the initialization of slots.
+
+The process for deciding how to initialize a particular slot works as
+follows.
+\begin{enumerate}
+\item If there are any slot initargs defined on the slot, or any of its slot
+  initializers, \emph{and} the sender supplied a value for one or more of the
+  corresponding effective initargs, then the value of the most specific slot
+  initarg is stored in the slot.
+\item Otherwise, if there are any slot initializers defined which include an
+  initializer expression, then the initializer expression from the most
+  specific such slot initializer is evaluated and its value stored in the
+  slot.
+\item Otherwise, the slot is left uninitialized.
+\end{enumerate}
+Note that the default values (if any) of effective initargs do \emph{not}
+affect this procedure.
 
 
 \subsection{Destruction}
@@ -762,32 +832,61 @@ steps.
 \item \emph{Deallocation} releases the memory used to store the instance so
   that it can be reused.
 \end{enumerate}
+Teardown alone, for objects which require special deallocation, or for which
+deallocation occurs automatically (e.g., instances with automatic storage
+duration, or instances whose storage will be garbage-collected), is performed
+using the \descref{sod_teardown}[function]{fun}.  Destruction of instances
+allocated from the standard @|malloc| heap is done using the
+\descref{sod_destroy}[function]{fun}.
 
 \subsubsection{Teardown}
-Details of teardown are class-specific, but typically it involves releasing
-resources held by the instance, and possibly unlinking it from some larger
-data structure which used to keep track of it.
+Details of initialization are necessarily class-specific, but typically it
+involves setting the instance's slots to appropriate values, and possibly
+linking it into some larger data structure to keep track of it.
 
-There is no provided protocol for teardown: classes whose instances require
-teardown behaviour must define and implement an appropriate protocol of their
-own.  The following class may serve for simple cases.
+Teardown is performed by sending the instance the @|teardown| message,
+defined by the @|SodObject| class.  The message returns an integer, used as a
+boolean flag.  If the message returns zero, then the instance's storage
+should be deallocated.  If the message returns nonzero, then it is safe for
+the caller to forget about instance, but should not deallocate its storage.
+This is \emph{not} an error return: if some teardown method fails then the
+program may be in an inconsistent state and should not continue.
+
+This simple protocol can be used, for example, to implement a reference
+counting system, as follows.
 \begin{prog}
-  [nick = disposable] \\
-  class DisposableObject : SodObject \{ \\- \ind
-    void release() \{ ; \} \\
-    \quad /* Release resources held by the receiver. */ \- \\-
-  \}
-  \\+
-  code c : user \{ \\- \ind
-    /* If p is a a DisposableObject then release its resources. */ \\
-    void maybe_dispose(void *p) \\
-    \{ \\ \ind
-      DisposableObject *d = SOD_CONVERT(DisposableObject, p); \\
-      if (d) DisposableObject_release(d); \- \\
-    \} \- \\
+  [nick = ref]                                                  \\
+  class ReferenceCountedObject \{                               \\ \ind
+    unsigned nref = 1;                                          \\-
+    void inc() \{ me@->ref.nref++; \}                           \\-
+    [role = around]                                             \\
+    int obj.teardown()                                          \\
+    \{                                                          \\ \ind
+      if (--\,--me@->ref.nref) return (1);                      \\
+      else return (CALL_NEXT_METHOD);                         \-\\
+    \}                                                        \-\\
   \}
 \end{prog}
 
+This message uses a nonstandard method combination which works like the
+standard combination, except that the \emph{default behaviour}, if there is
+no overriding method, is to execute the superclass's teardown fragments, and
+to return zero.  This default behaviour may be invoked multiple times if some
+method calls on its @|next_method| more than once, unless some other method
+takes steps to prevent this.
+
+A class can define \emph{teardown fragments}: pieces of literal code to be
+executed to shut down an instance.  Each superclass's teardown fragments are
+executed with @|me| bound to an instance pointer of the appropriate
+superclass type; fragments defined by a more specific superclass are executed
+before fragments defined by a more specific superclass.  A class may define
+more than one teardown fragment: the fragments are executed in the order in
+which they appear in the class definition.  It is possible for an
+initialization fragment to use @|return| or @|goto| for special control-flow
+effects, but this is not likely to be a good idea.  Similarly, it's probably
+a better idea to use an @|around| method to influence the return value than
+to write an explicit @|return| statement in a teardown fragment.
+
 \subsubsection{Deallocation}
 The details of instance deallocation are obviously specific to the allocation
 strategy used by the instance, and this is often orthogonal from the object's
@@ -798,22 +897,25 @@ of the object's direct class.  Low-level details of deallocation often
 require the proper base address of the instance's storage, which can be
 determined using the \descref{SOD_INSTBASE}[macro]{mac}.
 
-\subsubsection{Example}
-The following is a counterpart to the @|new_instance| function
-(\xref{sec:concepts.lifecycle.birth}), which tears down and deallocates an
-instance allocated using @|malloc|.
-\begin{prog}
-  void free_instance(void *p) \\
-  \{ \\ \ind
-    SodObject *obj = p; \\
-    maybe_dispose(p); \\
-    free(SOD_INSTBASE(obj)); \- \\
-  \}
-\end{prog}
-
 %%%--------------------------------------------------------------------------
 \section{Metaclasses} \label{sec:concepts.metaclasses}
 
+%%%--------------------------------------------------------------------------
+\section{Compatibility considerations} \label{sec:concepts.compatibility}
+
+Sod doesn't make source-level compatibility especially difficult.  As long as
+classes, slots, and messages don't change names or dissappear, and slots and
+messages retain their approximate types, everything will be fine.
+
+Binary compatibility is much more difficult.  Unfortunately, Sod classes have
+rather fragile binary interfaces.\footnote{%
+  Research suggestion: investigate alternative instance and vtable layouts
+  which improve binary compatibility, probably at the expense of instance
+  compactness, and efficiency of slot access and message sending.  There may
+  be interesting trade-offs to be made.} %
+
+If instances are allocated [FIXME]
+
 %%%----- That's all, folks --------------------------------------------------
 
 %%% Local variables: