doc/intro.tex: Fix erroneous `\manpage' to correct `\man'.

[sod] / doc / concepts.tex
diff --git a/doc/concepts.tex b/doc/concepts.tex

index 8151014cdd0ed369688f4cf1a664a034bf4bb5c0..14ae4e0170c37dd3a65ff3738e51f33c6434a6b5 100644 (file)
--- a/doc/concepts.tex
+++ b/doc/concepts.tex
@@ -25,37 +25,6 @@
  
  \chapter{Concepts} \label{ch:concepts}
  
-%%%--------------------------------------------------------------------------
-\section{Operational model} \label{sec:concepts.model}
-
-The Sod translator runs as a preprocessor, similar in nature to the
-traditional Unix \man{lex}{1} and \man{yacc}{1} tools.  The translator reads
-a \emph{module} file containing class definitions and other information, and
-writes C~source and header files.  The source files contain function
-definitions and static tables which are fed directly to a C~compiler; the
-header files contain declarations for functions and data structures, and are
-included by source files -- whether hand-written or generated by Sod -- which
-makes use of the classes defined in the module.
-
-Sod is not like \Cplusplus: it makes no attempt to `enhance' the C language
-itself.  Sod module files describe classes, messages, methods, slots, and
-other kinds of object-system things, and some of these descriptions need to
-contain C code fragments, but this code is entirely uninterpreted by the Sod
-translator.\footnote{%
-  As long as a code fragment broadly follows C's lexical rules, and properly
-  matches parentheses, brackets, and braces, the Sod translator will copy it
-  into its output unchanged.  It might, in fact, be some other kind of C-like
-  language, such as Objective~C or \Cplusplus.  Or maybe even
-  Objective~\Cplusplus, because if having an object system is good, then
-  having three must be really awesome.} %
-
-The Sod translator is not a closed system.  It is written in Common Lisp, and
-can load extension modules which add new input syntax, output formats, or
-altered behaviour.  The interface for writing such extensions is described in
-\xref{p:lisp}.  Extensions can change almost all details of the Sod object
-system, so the material in this manual must be read with this in mind: this
-manual describes the base system as provided in the distribution.
-
  %%%--------------------------------------------------------------------------
  \section{Modules} \label{sec:concepts.modules}
  
@@ -245,13 +214,13 @@ qualified by the defining class's nickname.
  \subsubsection{Slot initializers}
  As well as defining slot names and types, a class can also associate an
  \emph{initial value} with each slot defined by itself or one of its
-subclasses.  A class $C$ provides an \emph{initialization function} (see
+subclasses.  A class $C$ provides an \emph{initialization message} (see
  \xref{sec:concepts.lifecycle.birth}, and \xref{sec:structures.root.sodclass})
-which sets the slots of a \emph{direct} instance of the class to the correct
-initial values.  If several of $C$'s superclasses define initializers for the
-same slot then the initializer from the most specific such class is used.  If
-none of $C$'s superclasses define an initializer for some slot then that slot
-will be left uninitialized.
+whose methods set the slots of a \emph{direct} instance of the class to the
+correct initial values.  If several of $C$'s superclasses define initializers
+for the same slot then the initializer from the most specific such class is
+used.  If none of $C$'s superclasses define an initializer for some slot then
+that slot will be left uninitialized.
  
  The initializer for a slot with scalar type may be any C expression.  The
  initializer for a slot with aggregate type must contain only constant
@@ -301,7 +270,10 @@ slots.  If you want to hide implementation details, the best approach is to
  stash them in a dynamically allocated private structure, and leave a pointer
  to it in a slot.  (This will also help preserve binary compatibility, because
  the private structure can grow more members as needed.  See
-\xref{sec:fixme.compatibility} for more details.
+\xref{sec:fixme.compatibility} for more details.)
+
+\subsubsection{Vtables}
+
  
  \subsubsection{Class objects}
  In Sod's object system, classes are objects too.  Therefore classes are
@@ -319,8 +291,8 @@ doesn't define any messages, so it doesn't have any methods.  In Sod, a class
  slot containing a function pointer is not at all the same thing as a method.)
  
  \subsubsection{Conversions}
-Suppose one has a value of type pointer to class type of some class~$C$, and
-wants to convert it to a pointer to class type of some other class~$B$.
+Suppose one has a value of type pointer-to-class-type for some class~$C$, and
+wants to convert it to a pointer-to-class-type for some other class~$B$.
  There are three main cases to distinguish.
  \begin{itemize}
  \item If $B$ is a superclass of~$C$, in the same chain, then the conversion
@@ -336,13 +308,13 @@ There are three main cases to distinguish.
    pointer.  The conversion can be performed using the appropriate generated
    upcast macro (see below); the general case is handled by the macro
    \descref{SOD_XCHAIN}{mac}.
-\item If $B$ is a subclass of~$C$ then the conversion is an \emph{upcast};
+\item If $B$ is a subclass of~$C$ then the conversion is a \emph{downcast};
    otherwise the conversion is a~\emph{cross-cast}.  In either case, the
    conversion can fail: the object in question might not be an instance of~$B$
-  at all.  The macro \descref{SOD_CONVERT}{mac} and the function
+  after all.  The macro \descref{SOD_CONVERT}{mac} and the function
    \descref{sod_convert}{fun} perform general conversions.  They return a null
    pointer if the conversion fails.  (There are therefore your analogue to the
-  \Cplusplus @|dynamic_cast<>| operator.)
+  \Cplusplus\ @|dynamic_cast<>| operator.)
  \end{itemize}
  The Sod translator generates macros for performing both in-chain and
  cross-chain upcasts.  For each class~$C$, and each proper superclass~$B$
@@ -489,8 +461,96 @@ If there are no applicable primary methods then no effective method is
  constructed: the vtables contain null pointers in place of pointers to method
  entry functions.
  
+\begin{figure}
+  \begin{tikzpicture}
+    [>=stealth, thick,
+     order/.append style={color=green!70!black},
+     code/.append style={font=\sffamily},
+     action/.append style={font=\itshape},
+     method/.append style={rectangle, draw=black, thin, fill=blue!30,
+                           text height=\ht\strutbox, text depth=\dp\strutbox,
+                           minimum width=40mm}]
+
+    \def\delgstack#1#2#3{
+      \node (#10) [method, #2] {#3};
+      \node (#11) [method, above=6mm of #10] {#3};
+      \draw [->] ($(#10.north)!.5!(#10.north west) + (0mm, 1mm)$) --
+                 ++(0mm, 4mm)
+        node [code, left=4pt, midway] {next_method};
+      \draw [<-] ($(#10.north)!.5!(#10.north east) + (0mm, 1mm)$) --
+                 ++(0mm, 4mm)
+        node [action, right=4pt, midway] {return};
+      \draw [->] ($(#11.north)!.5!(#11.north west) + (0mm, 1mm)$) --
+                 ++(0mm, 4mm)
+        node [code, left=4pt, midway] {next_method}
+        node (ld) [above] {$\smash\vdots\mathstrut$};
+      \draw [<-] ($(#11.north)!.5!(#11.north east) + (0mm, 1mm)$) --
+                 ++(0mm, 4mm)
+        node [action, right=4pt, midway] {return}
+        node (rd) [above] {$\smash\vdots\mathstrut$};
+      \draw [->] ($(ld.north) + (0mm, 1mm)$) -- ++(0mm, 4mm)
+        node [code, left=4pt, midway] {next_method};
+      \draw [<-] ($(rd.north) + (0mm, 1mm)$) -- ++(0mm, 4mm)
+        node [action, right=4pt, midway] {return};
+      \node (p) at ($(ld.north)!.5!(rd.north)$) {};
+      \node (#1n) [method, above=5mm of p] {#3};
+      \draw [->, order] ($(#10.south east) + (4mm, 1mm)$) --
+                          ($(#1n.north east) + (4mm, -1mm)$)
+        node [midway, right, align=left]
+        {Most to \\ least \\ specific};}
+
+    \delgstack{a}{}{Around method}
+    \draw [<-] ($(a0.south)!.5!(a0.south west) - (0mm, 1mm)$) --
+               ++(0mm, -4mm);
+    \draw [->] ($(a0.south)!.5!(a0.south east) - (0mm, 1mm)$) --
+               ++(0mm, -4mm)
+      node [action, right=4pt, midway] {return};
+
+    \draw [->] ($(an.north)!.6!(an.north west) + (0mm, 1mm)$) --
+               ++(-8mm, 8mm)
+      node [code, midway, left=3mm] {next_method}
+      node (b0) [method, above left = 1mm + 4mm and -6mm - 4mm] {};
+    \node (b1) [method] at ($(b0) - (2mm, 2mm)$) {};
+    \node (bn) [method] at ($(b1) - (2mm, 2mm)$) {Before method};
+    \draw [->, order] ($(bn.west) - (6mm, 0mm)$) -- ++(12mm, 12mm)
+      node [midway, above left, align=center] {Most to \\ least \\ specific};
+    \draw [->] ($(b0.north east) + (-10mm, 1mm)$) -- ++(8mm, 8mm)
+      node (p) {};
+
+    \delgstack{m}{above right=1mm and 0mm of an.west |- p}{Primary method}
+    \draw [->] ($(mn.north)!.5!(mn.north west) + (0mm, 1mm)$) -- ++(0mm, 4mm)
+      node [code, left=4pt, midway] {next_method}
+      node [above right = 0mm and -8mm]
+      {$\vcenter{\hbox{\Huge\textcolor{red}{!}}}
+        \vcenter{\hbox{\begin{tabular}[c]{l}
+                         \textsf{next_method} \\
+                         pointer is null
+                       \end{tabular}}}$};
+
+    \draw [->, color=blue, dotted]
+        ($(m0.south)!.2!(m0.south east) - (0mm, 1mm)$) --
+        ($(an.north)!.2!(an.north east) + (0mm, 1mm)$)
+      node [midway, sloped, below] {Return value};
+
+    \draw [<-] ($(an.north)!.6!(an.north east) + (0mm, 1mm)$) --
+               ++(8mm, 8mm)
+      node [action, midway, right=3mm] {return}
+      node (f0) [method, above right = 1mm and -6mm] {};
+    \node (f1) [method] at ($(f0) + (-2mm, 2mm)$) {};
+    \node (fn) [method] at ($(f1) + (-2mm, 2mm)$) {After method};
+    \draw [<-, order] ($(f0.east) + (6mm, 0mm)$) -- ++(-12mm, 12mm)
+      node [midway, above right, align=center]
+      {Least to \\ most \\ specific};
+    \draw [<-] ($(fn.north west) + (6mm, 1mm)$) -- ++(-8mm, 8mm);
+
+  \end{tikzpicture}
+
+  \caption{The standard method combination}
+  \label{fig:concepts.methods.stdmeth}
+\end{figure}
+
  The effective method for a message with standard method combination works as
-follows.
+follows (see also~\xref{fig:concepts.methods.stdmeth}).
  \begin{enumerate}
  
  \item If any applicable methods have the @|around| role, then the most
@@ -616,6 +676,36 @@ There is also a @|custom| aggregating method combination, which is described
  in \xref{sec:fixme.custom-aggregating-method-combination}.
  
  
+\subsection{Sending messages in C} \label{sec:concepts.methods.c}
+
+Each instance is associated with its direct class [FIXME]
+
+The effective methods for each class are determined at translation time, by
+the Sod translator.  For each effective method, one or more \emph{method
+entry functions} are constructed.  A method entry function has three
+responsibilities.
+\begin{itemize}
+\item It converts the receiver pointer to the correct type.  Method entry
+  functions can perform these conversions extremely efficiently: there are
+  separate method entries for each chain of each class which can receive a
+  message, so method entry functions are in the privileged situation of
+  knowing the \emph{exact} class of the receiving object.
+\item If the message accepts a variable-length argument tail, then two method
+  entry functions are created for each chain of each class: one receives a
+  variable-length argument tail, as intended, and captures it in a @|va_list|
+  object; the other accepts an argument of type @|va_list| in place of the
+  variable-length tail and arranges for it to be passed along to the direct
+  methods.
+\item It invokes the effective method with the appropriate arguments.  There
+  might or might not be an actual function corresponding to the effective
+  method itself: the translator may instead open-code the effective method's
+  behaviour into each method entry function; and the machinery for handling
+  `delegation chains', such as is used for @|around| methods and primary
+  methods in the standard method combination, is necessarily scattered among
+  a number of small functions.
+\end{itemize}
+
+
  \subsection{Messages with keyword arguments}
  \label{sec:concepts.methods.keywords}
  
@@ -704,7 +794,7 @@ the platform's strictest alignment requirement applies.
  The following simple function correctly allocates and returns space for an
  instance of a class given a pointer to its class object @<cls>.
  \begin{prog}
-  void *allocate_instance(const SodClass *cls) \\ \ind
+  void *allocate_instance(const SodClass *cls)                  \\ \ind
      \{ return malloc(cls@->cls.initsz); \}
  \end{prog}
  
@@ -728,7 +818,7 @@ of any use unless they've been written specifically for the purpose.
  The following simple function imprints storage at address @<p> as an instance
  of a class, given a pointer to its class object @<cls>.
  \begin{prog}
-  void imprint_instance(const SodClass *cls, void *p) \\ \ind
+  void imprint_instance(const SodClass *cls, void *p)           \\ \ind
      \{ cls@->cls.imprint(p); \}
  \end{prog}
  
@@ -739,17 +829,20 @@ environment in which it operates.
  
  Details of initialization are necessarily class-specific, but typically it
  involves setting the instance's slots to appropriate values, and possibly
-linking it into some larger data structure to keep track of it.
+linking it into some larger data structure to keep track of it.  It is
+possible for initialization methods to attempt to allocate resources, but
+this must be done carefully: there is currently no way to report an error
+from object initialization, so the object must be marked as incompletely
+initialized, and left in a state where it will be safe to tear down later.
  
  Initialization is performed by sending the imprinted instance an @|init|
  message, defined by the @|SodObject| class.  This message uses a nonstandard
  method combination which works like the standard combination, except that the
  \emph{default behaviour}, if there is no overriding method, is to initialize
-the instance's slots using the initializers defined in the class and its
-superclasses, and to invoke each superclass's initialization fragments.  This
-default behaviour may be invoked multiple times if some method calls on its
-@|next_method| more than once, unless some other method takes steps to
-prevent this.
+the instance's slots, as described below, and to invoke each superclass's
+initialization fragments.  This default behaviour may be invoked multiple
+times if some method calls on its @|next_method| more than once, unless some
+other method takes steps to prevent this.
  
  Slots are initialized in a well-defined order.
  \begin{itemize}
@@ -771,19 +864,53 @@ definition.  It is possible for an initialization fragment to use @|return|
  or @|goto| for special control-flow effects, but this is not likely to be a
  good idea.
  
-Note that an initialization fragment defined in a class is copied literally
-into each subclass's initialization method.  This is fine for simple cases
-but wasteful if the initialization logic is complicated.  More complex
-initialization behaviour should be added either by having an initialization
-fragments call functions (necessarily with external linkage), or by defining
-@|after| methods on the @|init| message.  These will be run after the slot
-initializers have been applied, in reverse precedence order.
-
-Initialization is \emph{parametrized}, so the caller may select from a space
-of possible initial states for the new instance, or to inform the new
-instance about some other objects known to the caller.  Specifically, the
-@|init| message accepts keyword arguments (\xref{sec:concepts.keywords})
-which can be defined and used by methods defined on it.
+The @|init| message accepts keyword arguments
+(\xref{sec:concepts.methods.keywords}).  The set of acceptable keywords is
+determined by the applicable methods as usual, but also by the
+\emph{initargs} defined by the receiving instance's class and its
+superclasses, which are made available to slot initializers and
+initialization fragments.
+
+There are two kinds of initarg definitions.  \emph{User initargs} are defined
+by an explicit @|initarg| item appearing in a class definition: the item
+defines a name, type, and (optionally) a default value for the initarg.
+\emph{Slot initargs} are defined by attaching an @|initarg| property to a
+slot or slot initializer item: the property's determines the initarg's name,
+while the type is taken from the underlying slot type; slot initargs do not
+have default values.  Both kinds define a \emph{direct initarg} for the
+containing class.
+
+Initargs are inherited.  The \emph{applicable} direct initargs for an @|init|
+effective method are those defined by the receiving object's class, and all
+of its superclasses.  Applicable direct initargs with the same name are
+merged to form \emph{effective initargs}.  An error is reported if two
+applicable direct initargs have the same name but different types.  The
+default value of an effective initarg is taken from the most specific
+applicable direct initarg which specifies a defalt value; if no applicable
+direct initarg specifies a default value then the effective initarg has no
+default.
+
+All initarg values are made available at runtime to user code --
+initialization fragments and slot initializer expressions -- through local
+variables and a @|suppliedp| structure, as in a direct method
+(\xref{sec:concepts.methods.keywords}).  Furthermore, slot initarg
+definitions influence the initialization of slots.
+
+The process for deciding how to initialize a particular slot works as
+follows.
+\begin{enumerate}
+\item If there are any slot initargs defined on the slot, or any of its slot
+  initializers, \emph{and} the sender supplied a value for one or more of the
+  corresponding effective initargs, then the value of the most specific slot
+  initarg is stored in the slot.
+\item Otherwise, if there are any slot initializers defined which include an
+  initializer expression, then the initializer expression from the most
+  specific such slot initializer is evaluated and its value stored in the
+  slot.
+\item Otherwise, the slot is left uninitialized.
+\end{enumerate}
+Note that the default values (if any) of effective initargs do \emph{not}
+affect this procedure.
  
  
  \subsection{Destruction}
@@ -805,9 +932,9 @@ allocated from the standard @|malloc| heap is done using the
  \descref{sod_destroy}[function]{fun}.
  
  \subsubsection{Teardown}
-Details of initialization are necessarily class-specific, but typically it
-involves setting the instance's slots to appropriate values, and possibly
-linking it into some larger data structure to keep track of it.
+Details of teardown are necessarily class-specific, but typically it
+involves releasing resources held by the instance, and disentangling it from
+any data structures it might be linked into.
  
  Teardown is performed by sending the instance the @|teardown| message,
  defined by the @|SodObject| class.  The message returns an integer, used as a
@@ -820,16 +947,16 @@ program may be in an inconsistent state and should not continue.
  This simple protocol can be used, for example, to implement a reference
  counting system, as follows.
  \begin{prog}
-  [nick = ref] \\
-  class ReferenceCountedObject \{ \\ \ind
-    unsigned nref = 1; \\-
-    void inc() \{ me@->ref.nref++; \} \\-
-    [role = around] \\
-    int obj.teardown() \\
-    \{ \\ \ind
-      if (--\,--me@->ref.nref) return (1); \\
-      else return (CALL_NEXT_METHOD); \- \\
-    \} \- \\
+  [nick = ref]                                                  \\
+  class ReferenceCountedObject: SodObject \{                    \\ \ind
+    unsigned nref = 1;                                          \\-
+    void inc() \{ me@->ref.nref++; \}                           \\-
+    [role = around]                                             \\
+    int obj.teardown()                                          \\
+    \{                                                          \\ \ind
+      if (--\,--me@->ref.nref) return (1);                      \\
+      else return (CALL_NEXT_METHOD);                         \-\\
+    \}                                                        \-\\
    \}
  \end{prog}
  
@@ -865,6 +992,22 @@ determined using the \descref{SOD_INSTBASE}[macro]{mac}.
  %%%--------------------------------------------------------------------------
  \section{Metaclasses} \label{sec:concepts.metaclasses}
  
+%%%--------------------------------------------------------------------------
+\section{Compatibility considerations} \label{sec:concepts.compatibility}
+
+Sod doesn't make source-level compatibility especially difficult.  As long as
+classes, slots, and messages don't change names or dissappear, and slots and
+messages retain their approximate types, everything will be fine.
+
+Binary compatibility is much more difficult.  Unfortunately, Sod classes have
+rather fragile binary interfaces.\footnote{%
+  Research suggestion: investigate alternative instance and vtable layouts
+  which improve binary compatibility, probably at the expense of instance
+  compactness, and efficiency of slot access and message sending.  There may
+  be interesting trade-offs to be made.} %
+
+If instances are allocated [FIXME]
+
  %%%----- That's all, folks --------------------------------------------------
  
  %%% Local variables: