X-Git-Url: http://www.chiark.greenend.org.uk/ucgi/~mdw/git/sod/blobdiff_plain/1818107e8198734df843841a51bca3713bd37596..7292d6e1bc85cca454f64353722ba78a9263f257:/doc/concepts.tex diff --git a/doc/concepts.tex b/doc/concepts.tex index d554b51..84e4062 100644 --- a/doc/concepts.tex +++ b/doc/concepts.tex @@ -7,7 +7,7 @@ %%%----- Licensing notice --------------------------------------------------- %%% -%%% This file is part of the Sensble Object Design, an object system for C. +%%% This file is part of the Sensible Object Design, an object system for C. %%% %%% SOD is free software; you can redistribute it and/or modify %%% it under the terms of the GNU General Public License as published by @@ -23,15 +23,849 @@ %%% along with SOD; if not, write to the Free Software Foundation, %%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. -\chapter{Concepts} +\chapter{Concepts} \label{ch:concepts} -\section{Classes and slots} +%%%-------------------------------------------------------------------------- +\section{Modules} \label{sec:concepts.modules} -\section{Messages and methods} +A \emph{module} is the top-level syntactic unit of input to the Sod +translator. As described above, given an input module, the translator +generates C source and header files. -\section{Metaclasses} +A module can \emph{import} other modules. This makes the type names and +classes defined in those other modules available to class definitions in the +importing module. Sod's module system is intentionally very simple. There +are no private declarations or attempts to hide things. -\section{Modules} +As well as importing existing modules, a module can include a number of +different kinds of \emph{items}: +\begin{itemize} +\item \emph{class definitions} describe new classes, possibly in terms of + existing classes; +\item \emph{type name declarations} introduce new type names to Sod's + parser;\footnote{% + This is unfortunately necessary because C syntax, upon which Sod's input + language is based for obvious reasons, needs to treat type names + differently from other kinds of identifiers.} % + and +\item \emph{code fragments} contain literal C code to be dropped into an + appropriate place in an output file. +\end{itemize} +Each kind of item, and, indeed, a module as a whole, can have a collection of +\emph{properties} associated with it. A property has a \emph{name} and a +\emph{value}. Properties are an open-ended way of attaching additional +information to module items, so extensions can make use of them without +having to implement additional syntax. + +%%%-------------------------------------------------------------------------- +\section{Classes, instances, and slots} \label{sec:concepts.classes} + +For the most part, Sod takes a fairly traditional view of what it means to be +an object system. + +An \emph{object} maintains \emph{state} and exhibits \emph{behaviour}. An +object's state is maintained in named \emph{slots}, each of which can store a +C value of an appropriate (scalar or aggregate) type. An object's behaviour +is stimulated by sending it \emph{messages}. A message has a name, and may +carry a number of arguments, which are C values; sending a message may result +in the state of receiving object (or other objects) being changed, and a C +value being returned to the sender. + +Every object is a (direct) instance of some \emph{class}. The class +determines which slots its instances have, which messages its instances can +be sent, and which methods are invoked when those messages are received. The +Sod translator's main job is to read class definitions and convert them into +appropriate C declarations, tables, and functions. An object cannot +(usually) change its direct class, and the direct class of an object is not +affected by, for example, the static type of a pointer to it. + + +\subsection{Superclasses and inheritance} +\label{sec:concepts.classes.inherit} + +\subsubsection{Class relationships} +Each class has zero or more \emph{direct superclasses}. + +A class with no direct superclasses is called a \emph{root class}. The Sod +runtime library includes a root class named @|SodObject|; making new root +classes is somewhat tricky, and won't be discussed further here. + +Classes can have more than one direct superclass, i.e., Sod supports +\emph{multiple inheritance}. A Sod class definition for a class~$C$ lists +the direct superclasses of $C$ in a particular order. This order is called +the \emph{local precedence order} of $C$, and the list which consists of $C$ +follows by $C$'s direct superclasses in local precedence order is called the +$C$'s \emph{local precedence list}. + +The multiple inheritance in Sod works similarly to multiple inheritance in +Lisp-like languages, such as Common Lisp, EuLisp, Dylan, and Python, which is +very different from how multiple inheritance works in \Cplusplus.\footnote{% + The latter can be summarized as `badly'. By default in \Cplusplus, an + instance receives an additional copy of superclass's state for each path + through the class graph from the instance's direct class to that + superclass, though this behaviour can be overridden by declaring + superclasses to be @|virtual|. Also, \Cplusplus\ offers only trivial + method combination (\xref{sec:concepts.methods}), leaving programmers to + deal with delegation manually and (usually) statically.} % + +If $C$ is a class, then the \emph{superclasses} of $C$ are +\begin{itemize} +\item $C$ itself, and +\item the superclasses of each of $C$'s direct superclasses. +\end{itemize} +The \emph{proper superclasses} of a class $C$ are the superclasses of $C$ +except for $C$ itself. If a class $B$ is a (direct, proper) superclass of +$C$, then $C$ is a \emph{(direct, proper) subclass} of $B$. If $C$ is a root +class then the only superclass of $C$ is $C$ itself, and $C$ has no proper +superclasses. + +If an object is a direct instance of class~$C$ then the object is also an +(indirect) instance of every superclass of $C$. + +If $C$ has a proper superclass $B$, then $B$ is not allowed to have $C$ has a +direct superclass. In different terms, if we construct a graph, whose +vertices are classes, and draw an edge from each class to each of its direct +superclasses, then this graph must be acyclic. In yet other terms, the `is a +superclass of' relation is a partial order on classes. + +\subsubsection{The class precedence list} +This partial order is not quite sufficient for our purposes. For each class +$C$, we shall need to extend it into a total order on $C$'s superclasses. +This calculation is called \emph{superclass linearization}, and the result is +a \emph{class precedence list}, which lists each of $C$'s superclasses +exactly once. If a superclass $B$ precedes (resp.\ follows) some other +superclass $A$ in $C$'s class precedence list, then we say that $B$ is a more +(resp.\ less) \emph{specific} superclass of $C$ than $A$ is. + +The superclass linearization algorithm isn't fixed, and extensions to the +translator can introduce new linearizations for special effects, but the +following properties are expected to hold. +\begin{itemize} +\item The first class in $C$'s class precedence list is $C$ itself; i.e., + $C$ is always its own most specific superclass. +\item If $A$ and $B$ are both superclasses of $C$, and $A$ is a proper + superclass of $B$ then $A$ appears after $B$ in $C$'s class precedence + list, i.e., $B$ is a more specific superclass of $C$ than $A$ is. +\end{itemize} +The default linearization algorithm used in Sod is the \emph{C3} algorithm, +which has a number of good properties described in~\cite{FIXME:C3}. +It works as follows. +\begin{itemize} +\item A \emph{merge} of some number of input lists is a single list + containing each item that is in any of the input lists exactly once, and no + other items; if an item $x$ appears before an item $y$ in any input list, + then $x$ also appears before $y$ in the merge. If a collection of lists + have no merge then they are said to be \emph{inconsistent}. +\item The class precedence list of a class $C$ is a merge of the local + precedence list of $C$ together with the class precedence lists of each of + $C$'s direct superclasses. +\item If there are no such merges, then the definition of $C$ is invalid. +\item Suppose that there are multiple candidate merges. Consider the + earliest position in these candidate merges at which they disagree. The + \emph{candidate classes} at this position are the classes appearing at this + position in the candidate merges. Each candidate class must be a + superclass of distinct direct superclasses of $C$, since otherwise the + candidates would be ordered by their common subclass's class precedence + list. The class precedence list contains, at this position, that candidate + class whose subclass appears earliest in $C$'s local precedence order. +\end{itemize} + +\subsubsection{Class links and chains} +The definition for a class $C$ may distinguish one of its proper superclasses +as being the \emph{link superclass} for class $C$. Not every class need have +a link superclass, and the link superclass of a class $C$, if it exists, need +not be a direct superclass of $C$. + +Superclass links must obey the following rule: if $C$ is a class, then there +must be no three superclasses $X$, $Y$ and~$Z$ of $C$ such that $Z$ is the +link superclass of both $X$ and $Y$. As a consequence of this rule, the +superclasses of $C$ can be partitioned into linear \emph{chains}, such that +superclasses $A$ and $B$ are in the same chain if and only if one can trace a +path from $A$ to $B$ by following superclass links, or \emph{vice versa}. + +Since a class links only to one of its proper superclasses, the classes in a +chain are naturally ordered from most- to least-specific. The least specific +class in a chain is called the \emph{chain head}; the most specific class is +the \emph{chain tail}. Chains are often named after their chain head +classes. + +\subsection{Names} +\label{sec:concepts.classes.names} + +Classes have a number of other attributes: +\begin{itemize} +\item A \emph{name}, which is a C identifier. Class names must be globally + unique. The class name is used in the names of a number of associated + definitions, to be described later. +\item A \emph{nickname}, which is also a C identifier. Unlike names, + nicknames are not required to be globally unique. If $C$ is any class, + then all the superclasses of $C$ must have distinct nicknames. +\end{itemize} + + +\subsection{Slots} \label{sec:concepts.classes.slots} + +Each class defines a number of \emph{slots}. Much like a structure member, a +slot has a \emph{name}, which is a C identifier, and a \emph{type}. Unlike +many other object systems, different superclasses of a class $C$ can define +slots with the same name without ambiguity, since slot references are always +qualified by the defining class's nickname. + +\subsubsection{Slot initializers} +As well as defining slot names and types, a class can also associate an +\emph{initial value} with each slot defined by itself or one of its +subclasses. A class $C$ provides an \emph{initialization function} (see +\xref{sec:concepts.lifecycle.birth}, and \xref{sec:structures.root.sodclass}) +which sets the slots of a \emph{direct} instance of the class to the correct +initial values. If several of $C$'s superclasses define initializers for the +same slot then the initializer from the most specific such class is used. If +none of $C$'s superclasses define an initializer for some slot then that slot +will be left uninitialized. + +The initializer for a slot with scalar type may be any C expression. The +initializer for a slot with aggregate type must contain only constant +expressions if the generated code is expected to be processed by a +implementation of C89. Initializers will be evaluated once each time an +instance is initialized. + +Slots are initialized in reverse-precedence order of their defining classes; +i.e., slots defined by a less specific superclass are initialized earlier +than slots defined by a more specific superclass. Slots defined by the same +class are initialized in the order in which they appear in the class +definition. + +The initializer for a slot may refer to other slots in the same object, via +the @|me| pointer: in an initializer for a slot defined by a class $C$, @|me| +has type `pointer to $C$'. (Note that the type of @|me| depends only on the +class which defined the slot, not the class which defined the initializer.) + + +\subsection{C language integration} \label{sec:concepts.classes.c} + +For each class~$C$, the Sod translator defines a C type, the \emph{class +type}, with the same name. This is the usual type used when considering an +object as an instance of class~$C$. No entire object will normally have a +class type,\footnote{% + In general, a class type only captures the structure of one of the + superclass chains of an instance. A full instance layout contains multiple + chains. See \xref{sec:structures.layout} for the full details.} % +so access to instances is almost always via pointers. + +\subsubsection{Access to slots} +The class type for a class~$C$ is actually a structure. It contains one +member for each class in $C$'s superclass chain, named with that class's +nickname. Each of these members is also a structure, containing the +corresponding class's slots, one member per slot. There's nothing special +about these slot members: C code can access them in the usual way. + +For example, if @|MyClass| has the nickname @|mine|, and defines a slot @|x| +of type @|int|, then the simple function +\begin{prog} + int get_x(MyClass *m) \{ return (m@->mine.x); \} +\end{prog} +will extract the value of @|x| from an instance of @|MyClass|. + +All of this means that there's no such thing as `private' or `protected' +slots. If you want to hide implementation details, the best approach is to +stash them in a dynamically allocated private structure, and leave a pointer +to it in a slot. (This will also help preserve binary compatibility, because +the private structure can grow more members as needed. See +\xref{sec:fixme.compatibility} for more details. + +\subsubsection{Class objects} +In Sod's object system, classes are objects too. Therefore classes are +themselves instances; the class of a class is called a \emph{metaclass}. The +consequences of this are explored in \xref{sec:concepts.metaclasses}. The +\emph{class object} has the same name as the class, suffixed with +`@|__class|'\footnote{% + This is not quite true. @|$C$__class| is actually a macro. See + \xref{sec:structures.layout.additional} for the gory details.} % +and its type is usually @|SodClass|; @|SodClass|'s nickname is @|cls|. + +A class object's slots contain or point to useful information, tables and +functions for working with that class's instances. (The @|SodClass| class +doesn't define any messages, so it doesn't have any methods. In Sod, a class +slot containing a function pointer is not at all the same thing as a method.) + +\subsubsection{Conversions} +Suppose one has a value of type pointer to class type of some class~$C$, and +wants to convert it to a pointer to class type of some other class~$B$. +There are three main cases to distinguish. +\begin{itemize} +\item If $B$ is a superclass of~$C$, in the same chain, then the conversion + is an \emph{in-chain upcast}. The conversion can be performed using the + appropriate generated upcast macro (see below), or by simply casting the + pointer, using C's usual cast operator (or the \Cplusplus\ @|static_cast<>| + operator). +\item If $B$ is a superclass of~$C$, in a different chain, then the + conversion is a \emph{cross-chain upcast}. The conversion is more than a + simple type change: the pointer value must be adjusted. If the direct + class of the instance in question is not known, the conversion will require + a lookup at runtime to find the appropriate offset by which to adjust the + pointer. The conversion can be performed using the appropriate generated + upcast macro (see below); the general case is handled by the macro + \descref{SOD_XCHAIN}{mac}. +\item If $B$ is a subclass of~$C$ then the conversion is an \emph{upcast}; + otherwise the conversion is a~\emph{cross-cast}. In either case, the + conversion can fail: the object in question might not be an instance of~$B$ + at all. The macro \descref{SOD_CONVERT}{mac} and the function + \descref{sod_convert}{fun} perform general conversions. They return a null + pointer if the conversion fails. (There are therefore your analogue to the + \Cplusplus @|dynamic_cast<>| operator.) +\end{itemize} +The Sod translator generates macros for performing both in-chain and +cross-chain upcasts. For each class~$C$, and each proper superclass~$B$ +of~$C$, a macro is defined: given an argument of type pointer to class type +of~$C$, it returns a pointer to the same instance, only with type pointer to +class type of~$B$, adjusted as necessary in the case of a cross-chain +conversion. The macro is named by concatenating +\begin{itemize} +\item the name of class~$C$, in upper case, +\item the characters `@|__CONV_|', and +\item the nickname of class~$B$, in upper case; +\end{itemize} +e.g., if $C$ is named @|MyClass|, and $B$'s name is @|SuperClass| with +nickname @|super|, then the macro @|MYCLASS__CONV_SUPER| converts a +@|MyClass~*| to a @|SuperClass~*|. See +\xref{sec:structures.layout.additional} for the formal description. + +%%%-------------------------------------------------------------------------- +\section{Keyword arguments} \label{sec:concepts.keywords} + +In standard C, the actual arguments provided to a function are matched up +with the formal arguments given in the function definition according to their +ordering in a list. Unless the (rather cumbersome) machinery for dealing +with variable-length argument tails (@||) is used, exactly the +correct number of arguments must be supplied, and in the correct order. + +A \emph{keyword argument} is matched by its distinctive \emph{name}, rather +than by its position in a list. Keyword arguments may be \emph{omitted}, +causing some default behaviour by the function. A function can detect +whether a particular keyword argument was supplied: so the default behaviour +need not be the same as that caused by any specific value of the argument. + +Keyword arguments can be provided in three ways. +\begin{enumerate} +\item Directly, as a variable-length argument tail, consisting (for the most + part) of alternating keyword names, as pointers to null-terminated strings, + and argument values, and terminated by a null pointer. This is somewhat + error-prone, and the support library defines some macros which help ensure + that keyword argument lists are well formed. +\item Indirectly, through a @|va_list| object capturing a variable-length + argument tail passed to some other function. Such indirect argument tails + have the same structure as the direct argument tails described above. + Because @|va_list| objects are hard to copy, the keyword-argument support + library consistently passes @|va_list| objects \emph{by reference} + throughout its programming interface. +\item Indirectly, through a vector of @|struct kwval| objects, each of which + contains a keyword name, as a pointer to a null-terminated string, and the + \emph{address} of a corresponding argument value. (This indirection is + necessary so that the items in the vector can be of uniform size.) + Argument vectors are rather inconvenient to use, but are the only practical + way in which a caller can decide at runtime which arguments to include in a + call, which is useful when writing wrapper functions. +\end{enumerate} + +Keyword arguments are provided as a general feature for C functions. +However, Sod has special support for messages which accept keyword arguments +(\xref{sec:concepts.methods.keywords}); and they play an essential role in +the instance construction protocol (\xref{sec:concepts.lifecycle.birth}). + +%%%-------------------------------------------------------------------------- +\section{Messages and methods} \label{sec:concepts.methods} + +Objects can be sent \emph{messages}. A message has a \emph{name}, and +carries a number of \emph{arguments}. When an object is sent a message, a +function, determined by the receiving object's class, is invoked, passing it +the receiver and the message arguments. This function is called the +class's \emph{effective method} for the message. The effective method can do +anything a C function can do, including reading or updating program state or +object slots, sending more messages, calling other functions, issuing system +calls, or performing I/O; if it finishes, it may return a value, which is +returned in turn to the message sender. + +The set of messages an object can receive, characterized by their names, +argument types, and return type, is determined by the object's class. Each +class can define new messages, which can be received by any instance of that +class. The messages defined by a single class must have distinct names: +there is no `function overloading'. As with slots +(\xref{sec:concepts.classes.slots}), messages defined by distinct classes are +always distinct, even if they have the same names: references to messages are +always qualified by the defining class's name or nickname. + +Messages may take any number of arguments, of any non-array value type. +Since message sends are effectively function calls, arguments of array type +are implicitly converted to values of the corresponding pointer type. While +message definitions may ascribe an array type to an argument, the formal +argument will have pointer type, as is usual for C functions. A message may +accept a variable-length argument suffix, denoted @|\dots|. + +A class definition may include \emph{direct methods} for messages defined by +it or any of its superclasses. + +Like messages, direct methods define argument lists and return types, but +they may also have a \emph{body}, and a \emph{role}. + +A direct method need not have the same argument list or return type as its +message. The acceptable argument lists and return types for a method depend +on the message, in particular its method combination +(\xref{sec:concepts.methods.combination}), and the method's role. + +A direct method body is a block of C code, and the Sod translator usually +defines, for each direct method, a function with external linkage, whose body +contains a copy of the direct method body. Within the body of a direct +method defined for a class $C$, the variable @|me|, of type pointer to class +type of $C$, refers to the receiving object. + + +\subsection{Effective methods and method combinations} +\label{sec:concepts.methods.combination} + +For each message a direct instance of a class might receive, there is a set +of \emph{applicable methods}, which are exactly the direct methods defined on +the object's class and its superclasses. These direct methods are combined +together to form the \emph{effective method} for that particular class and +message. Direct methods can be combined into an effective method in +different ways, according to the \emph{method combination} specified by the +message. The method combination determines which direct method roles are +acceptable, and, for each role, the appropriate argument lists and return +types. + +One direct method, $M$, is said to be more (resp.\ less) \emph{specific} than +another, $N$, with respect to a receiving class~$C$, if the class defining +$M$ is a more (resp.\ less) specific superclass of~$C$ than the class +defining $N$. + +\subsubsection{The standard method combination} +The default method combination is called the \emph{standard method +combination}; other method combinations are useful occasionally for special +effects. The standard method combination accepts four direct method roles, +called `primary' (the default), @|before|, @|after|, and @|around|. + +All direct methods subject to the standard method combination must have +argument lists which \emph{match} the message's argument list: +\begin{itemize} +\item the method's arguments must have the same types as the message, though + the arguments may have different names; and +\item if the message accepts a variable-length argument suffix then the + direct method must instead have a final argument of type @|va_list|. +\end{itemize} +Primary and @|around| methods must have the same return type as the message; +@|before| and @|after| methods must return @|void| regardless of the +message's return type. + +If there are no applicable primary methods then no effective method is +constructed: the vtables contain null pointers in place of pointers to method +entry functions. + +The effective method for a message with standard method combination works as +follows. +\begin{enumerate} + +\item If any applicable methods have the @|around| role, then the most + specific such method, with respect to the class of the receiving object, is + invoked. + + Within the body of an @|around| method, the variable @|next_method| is + defined, having pointer-to-function type. The method may call this + function, as described below, any number of times. + + If there any remaining @|around| methods, then @|next_method| invokes the + next most specific such method, returning whichever value that method + returns; otherwise the behaviour of @|next_method| is to invoke the before + methods (if any), followed by the most specific primary method, followed by + the @|around| methods (if any), and to return whichever value was returned + by the most specific primary method, as described in the following items. + That is, the behaviour of the least specific @|around| method's + @|next_method| function is exactly the behaviour that the effective method + would have if there were no @|around| methods. Note that if the + least-specific @|around| method calls its @|next_method| more than once + then the whole sequence of @|before|, primary, and @|after| methods occurs + multiple times. + + The value returned by the most specific @|around| method is the value + returned by the effective method. + +\item If any applicable methods have the @|before| role, then they are all + invoked, starting with the most specific. + +\item The most specific applicable primary method is invoked. + + Within the body of a primary method, the variable @|next_method| is + defined, having pointer-to-function type. If there are no remaining less + specific primary methods, then @|next_method| is a null pointer. + Otherwise, the method may call the @|next_method| function any number of + times. + + The behaviour of the @|next_method| function, if it is not null, is to + invoke the next most specific applicable primary method, and to return + whichever value that method returns. + + If there are no applicable @|around| methods, then the value returned by + the most specific primary method is the value returned by the effective + method; otherwise the value returned by the most specific primary method is + returned to the least specific @|around| method, which called it via its + own @|next_method| function. + +\item If any applicable methods have the @|after| role, then they are all + invoked, starting with the \emph{least} specific. (Hence, the most + specific @|after| method is invoked with the most `afterness'.) + +\end{enumerate} + +A typical use for @|around| methods is to allow a base class to set up the +dynamic environment appropriately for the primary methods of its subclasses, +e.g., by claiming a lock, and restore it afterwards. + +The @|next_method| function provided to methods with the primary and +@|around| roles accepts the same arguments, and returns the same type, as the +message, except that one or two additional arguments are inserted at the +front of the argument list. The first additional argument is always the +receiving object, @|me|. If the message accepts a variable argument suffix, +then the second addition argument is a @|va_list|; otherwise there is no +second additional argument; otherwise, In the former case, a variable +@|sod__master_ap| of type @|va_list| is defined, containing a separate copy +of the argument pointer (so the method body can process the variable argument +suffix itself, and still pass a fresh copy on to the next method). + +A method with the primary or @|around| role may use the convenience macro +@|CALL_NEXT_METHOD|, which takes no arguments itself, and simply calls +@|next_method| with appropriate arguments: the receiver @|me| pointer, the +argument pointer @|sod__master_ap| (if applicable), and the method's +arguments. If the method body has overwritten its formal arguments, then +@|CALL_NEXT_METHOD| will pass along the updated values, rather than the +original ones. + +A primary or @|around| method which invokes its @|next_method| function is +said to \emph{extend} the message behaviour; a method which does not invoke +its @|next_method| is said to \emph{override} the behaviour. Note that a +method may make a decision to override or extend at runtime. + +\subsubsection{Aggregating method combinations} +A number of other method combinations are provided. They are called +`aggregating' method combinations because, instead of invoking just the most +specific primary method, as the standard method combination does, they invoke +the applicable primary methods in turn and aggregate the return values from +each. + +The aggregating method combinations accept the same four roles as the +standard method combination, and @|around|, @|before|, and @|after| methods +work in the same way. + +The aggregating method combinations provided are as follows. +\begin{description} \let\makelabel\code +\item[progn] The message must return @|void|. The applicable primary methods + are simply invoked in turn, most specific first. +\item[sum] The message must return a numeric type.\footnote{% + The Sod translator does not check this, since it doesn't have enough + insight into @|typedef| names.} % + The applicable primary methods are invoked in turn, and their return values + added up. The final result is the sum of the individual values. +\item[product] The message must return a numeric type. The applicable + primary methods are invoked in turn, and their return values multiplied + together. The final result is the product of the individual values. +\item[min] The message must return a scalar type. The applicable primary + methods are invoked in turn. The final result is the smallest of the + individual values. +\item[max] The message must return a scalar type. The applicable primary + methods are invoked in turn. The final result is the largest of the + individual values. +\item[and] The message must return a scalar type. The applicable primary + methods are invoked in turn. If any method returns zero then the final + result is zero and no further methods are invoked. If all of the + applicable primary methods return nonzero, then the final result is the + result of the last primary method. +\item[or] The message must return a scalar type. The applicable primary + methods are invoked in turn. If any method returns nonzero then the final + result is that nonzero value and no further methods are invoked. If all of + the applicable primary methods return zero, then the final result is zero. +\end{description} + +There is also a @|custom| aggregating method combination, which is described +in \xref{sec:fixme.custom-aggregating-method-combination}. + + +\subsection{Messages with keyword arguments} +\label{sec:concepts.methods.keywords} + +A message or a direct method may declare that it accepts keyword arguments. +A message which accepts keyword arguments is called a \emph{keyword message}; +a direct method which accepts keyword arguments is called a \emph{keyword +method}. + +While method combinations may set their own rules, usually keyword methods +can only be defined on keyword messages, and all methods defined on a keyword +message must be keyword methods. The direct methods defined on a keyword +message may differ in the keywords they accept, both from each other, and +from the message. If two superclasses of some common class both define +keyword methods on the same message, and the methods both accept a keyword +argument with the same name, then these two keyword arguments must also have +the same type. Different applicable methods may declare keyword arguments +with the same name but different defaults; see below. + +The keyword arguments acceptable in a message sent to an object are the +keywords listed in the message definition, together with all of the keywords +accepted by any applicable method. There is no easy way to determine at +runtime whether a particular keyword is acceptable in a message to a given +instance. + +At runtime, a direct method which accepts one or more keyword arguments +receives an additional argument named @|suppliedp|. This argument is a small +structure. For each keyword argument named $k$ accepted by the direct +method, @|suppliedp| contains a one-bit-wide bitfield member of type +@|unsigned|, also named $k$. If a keyword argument named $k$ was passed in +the message, then @|suppliedp.$k$| is one, and $k$ contains the argument +value; otherwise @|suppliedp.$k$| is zero, and $k$ contains the default value +from the direct method definition if there was one, or an unspecified value +otherwise. + +%%%-------------------------------------------------------------------------- +\section{The object lifecycle} \label{sec:concepts.lifecycle} + +\subsection{Creation} \label{sec:concepts.lifecycle.birth} + +Construction of a new instance of a class involves three steps. +\begin{enumerate} +\item \emph{Allocation} arranges for there to be storage space for the + instance's slots and associated metadata. +\item \emph{Imprinting} fills in the instance's metadata, associating the + instance with its class. +\item \emph{Initialization} stores appropriate initial values in the + instance's slots, and maybe links it into any external data structures as + necessary. +\end{enumerate} +The \descref{SOD_DECL}[macro]{mac} handles constructing instances with +automatic storage duration (`on the stack'). Similarly, the +\descref{SOD_MAKE}[macro]{mac} and the \descref{sod_make}{fun} and +\descref{sod_makev}{fun} functions construct instances allocated from the +standard @|malloc| heap. Programmers can add support for other allocation +strategies by using the \descref{SOD_INIT}[macro]{mac} and the +\descref{sod_init}{fun} and \descref{sod_initv}{fun} functions, which package +up imprinting and initialization. + +\subsubsection{Allocation} +Instances of most classes (specifically including those classes defined by +Sod itself) can be held in any storage of sufficient size. The in-memory +layout of an instance of some class~$C$ is described by the type @|struct +$C$__ilayout|, and if the relevant class is known at compile time then the +best way to discover the layout size is with the @|sizeof| operator. Failing +that, the size required to hold an instance of $C$ is available in a slot in +$C$'s class object, as @|$C$__class@->cls.initsz|. + +It is not in general sufficient to declare, or otherwise allocate, an object +of the class type $C$. The class type only describes a single chain of the +object's layout. It is nearly always an error to use the class type as if it +is a \emph{complete type}, e.g., to declare objects or arrays of the class +type, or to enquire about its size or alignment requirements. + +Instance layouts may be declared as objects with automatic storage duration +(colloquially, `allocated on the stack') or allocated dynamically, e.g., +using @|malloc|. They may be included as members of structures or unions, or +elements of arrays. Sod's runtime system doesn't retain addresses of +instances, so, for example, Sod doesn't make using fancy allocators which +sometimes move objects around in memory any more difficult than it needs to +be. + +There isn't any way to discover the alignment required for a particular +class's instances at runtime; it's best to be conservative and assume that +the platform's strictest alignment requirement applies. + +The following simple function correctly allocates and returns space for an +instance of a class given a pointer to its class object @. +\begin{prog} + void *allocate_instance(const SodClass *cls) \\ \ind + \{ return malloc(cls@->cls.initsz); \} +\end{prog} + +\subsubsection{Imprinting} +Once storage has been allocated, it must be \emph{imprinted} before it can be +used as an instance of a class, e.g., before any messages can be sent to it. + +Imprinting an instance stores some metadata about its direct class in the +instance structure, so that the rest of the program (and Sod's runtime +library) can tell what sort of object it is, and how to use it.\footnote{% + Specifically, imprinting an instance's storage involves storing the + appropriate vtable pointers in the right places in it.} % +A class object's @|imprint| slot points to a function which will correctly +imprint storage for one of that class's instances. + +Once an instance's storage has been imprinted, it is technically possible to +send messages to the instance; however the instance's slots are still +uninitialized at this point, the applicable methods are unlikely to do much +of any use unless they've been written specifically for the purpose. + +The following simple function imprints storage at address @

as an instance +of a class, given a pointer to its class object @. +\begin{prog} + void imprint_instance(const SodClass *cls, void *p) \\ \ind + \{ cls@->cls.imprint(p); \} +\end{prog} + +\subsubsection{Initialization} +The final step for constructing a new instance is to \emph{initialize} it, to +establish the necessary invariants for the instance itself and the +environment in which it operates. + +Details of initialization are necessarily class-specific, but typically it +involves setting the instance's slots to appropriate values, and possibly +linking it into some larger data structure to keep track of it. + +Initialization is performed by sending the imprinted instance an @|init| +message, defined by the @|SodObject| class. This message uses a nonstandard +method combination which works like the standard combination, except that the +\emph{default behaviour}, if there is no overriding method, is to initialize +the instance's slots, as described below, and to invoke each superclass's +initialization fragments. This default behaviour may be invoked multiple +times if some method calls on its @|next_method| more than once, unless some +other method takes steps to prevent this. + +Slots are initialized in a well-defined order. +\begin{itemize} +\item Slots defined by a more specific superclasses are initialized after + slots defined by a less specific superclass. +\item Slots defined by the same class are initialized in the order in which + their definitions appear. +\end{itemize} + +A class can define \emph{initialization fragments}: pieces of literal code to +be executed to set up a new instance. Each superclass's initialization +fragments are executed with @|me| bound to an instance pointer of the +appropriate superclass type, immediately after that superclass's slots (if +any) have been initialized; therefore, fragments defined by a more specific +superclass are executed after fragments defined by a more specific +superclass. A class may define more than one initialization fragment: the +fragments are executed in the order in which they appear in the class +definition. It is possible for an initialization fragment to use @|return| +or @|goto| for special control-flow effects, but this is not likely to be a +good idea. + +The @|init| message accepts keyword arguments +(\xref{sec:concepts.methods.keywords}). The set of acceptable keywords is +determined by the applicable methods as usual, but also by the +\emph{initargs} defined by the receiving instance's class and its +superclasses, which are made available to slot initializers and +initialization fragments. + +There are two kinds of initarg definitions. \emph{User initargs} are defined +by an explicit @|initarg| item appearing in a class definition: the item +defines a name, type, and (optionally) a default value for the initarg. +\emph{Slot initargs} are defined by attaching an @|initarg| property to a +slot or slot initializer item: the property's determines the initarg's name, +while the type is taken from the underlying slot type; slot initargs do not +have default values. Both kinds define a \emph{direct initarg} for the +containing class. + +Initargs are inherited. The \emph{applicable} direct initargs for an @|init| +effective method are those defined by the receiving object's class, and all +of its superclasses. Applicable direct initargs with the same name are +merged to form \emph{effective initargs}. An error is reported if two +applicable direct initargs have the same name but different types. The +default value of an effective initarg is taken from the most specific +applicable direct initarg which specifies a defalt value; if no applicable +direct initarg specifies a default value then the effective initarg has no +default. + +All initarg values are made available at runtime to user code -- +initialization fragments and slot initializer expressions -- through local +variables and a @|suppliedp| structure, as in a direct method +(\xref{sec:concepts.methods.keywords}). Furthermore, slot initarg +definitions influence the initialization of slots. + +The process for deciding how to initialize a particular slot works as +follows. +\begin{enumerate} +\item If there are any slot initargs defined on the slot, or any of its slot + initializers, \emph{and} the sender supplied a value for one or more of the + corresponding effective initargs, then the value of the most specific slot + initarg is stored in the slot. +\item Otherwise, if there are any slot initializers defined which include an + initializer expression, then the initializer expression from the most + specific such slot initializer is evaluated and its value stored in the + slot. +\item Otherwise, the slot is left uninitialized. +\end{enumerate} +Note that the default values (if any) of effective initargs do \emph{not} +affect this procedure. + + +\subsection{Destruction} +\label{sec:concepts.lifecycle.death} + +Destruction of an instance, when it is no longer required, consists of two +steps. +\begin{enumerate} +\item \emph{Teardown} releases any resources held by the instance and + disentangles it from any external data structures. +\item \emph{Deallocation} releases the memory used to store the instance so + that it can be reused. +\end{enumerate} +Teardown alone, for objects which require special deallocation, or for which +deallocation occurs automatically (e.g., instances with automatic storage +duration, or instances whose storage will be garbage-collected), is performed +using the \descref{sod_teardown}[function]{fun}. Destruction of instances +allocated from the standard @|malloc| heap is done using the +\descref{sod_destroy}[function]{fun}. + +\subsubsection{Teardown} +Details of initialization are necessarily class-specific, but typically it +involves setting the instance's slots to appropriate values, and possibly +linking it into some larger data structure to keep track of it. + +Teardown is performed by sending the instance the @|teardown| message, +defined by the @|SodObject| class. The message returns an integer, used as a +boolean flag. If the message returns zero, then the instance's storage +should be deallocated. If the message returns nonzero, then it is safe for +the caller to forget about instance, but should not deallocate its storage. +This is \emph{not} an error return: if some teardown method fails then the +program may be in an inconsistent state and should not continue. + +This simple protocol can be used, for example, to implement a reference +counting system, as follows. +\begin{prog} + [nick = ref] \\ + class ReferenceCountedObject \{ \\ \ind + unsigned nref = 1; \\- + void inc() \{ me@->ref.nref++; \} \\- + [role = around] \\ + int obj.teardown() \\ + \{ \\ \ind + if (--\,--me@->ref.nref) return (1); \\ + else return (CALL_NEXT_METHOD); \-\\ + \} \-\\ + \} +\end{prog} + +This message uses a nonstandard method combination which works like the +standard combination, except that the \emph{default behaviour}, if there is +no overriding method, is to execute the superclass's teardown fragments, and +to return zero. This default behaviour may be invoked multiple times if some +method calls on its @|next_method| more than once, unless some other method +takes steps to prevent this. + +A class can define \emph{teardown fragments}: pieces of literal code to be +executed to shut down an instance. Each superclass's teardown fragments are +executed with @|me| bound to an instance pointer of the appropriate +superclass type; fragments defined by a more specific superclass are executed +before fragments defined by a more specific superclass. A class may define +more than one teardown fragment: the fragments are executed in the order in +which they appear in the class definition. It is possible for an +initialization fragment to use @|return| or @|goto| for special control-flow +effects, but this is not likely to be a good idea. Similarly, it's probably +a better idea to use an @|around| method to influence the return value than +to write an explicit @|return| statement in a teardown fragment. + +\subsubsection{Deallocation} +The details of instance deallocation are obviously specific to the allocation +strategy used by the instance, and this is often orthogonal from the object's +class. + +The code which makes the decision to destroy an object may often not be aware +of the object's direct class. Low-level details of deallocation often +require the proper base address of the instance's storage, which can be +determined using the \descref{SOD_INSTBASE}[macro]{mac}. + +%%%-------------------------------------------------------------------------- +\section{Metaclasses} \label{sec:concepts.metaclasses} %%%----- That's all, folks --------------------------------------------------