X-Git-Url: http://www.chiark.greenend.org.uk/ucgi/~mdw/git/sod/blobdiff_plain/d24d47f5dbe064439e35033cadd9d6c463ee1c87..13cb243aea114b87d4382b8aaf1d357c9495dc0e:/doc/concepts.tex diff --git a/doc/concepts.tex b/doc/concepts.tex index 74e5902..0f4b336 100644 --- a/doc/concepts.tex +++ b/doc/concepts.tex @@ -25,37 +25,6 @@ \chapter{Concepts} \label{ch:concepts} -%%%-------------------------------------------------------------------------- -\section{Operational model} \label{sec:concepts.model} - -The Sod translator runs as a preprocessor, similar in nature to the -traditional Unix \man{lex}{1} and \man{yacc}{1} tools. The translator reads -a \emph{module} file containing class definitions and other information, and -writes C~source and header files. The source files contain function -definitions and static tables which are fed directly to a C~compiler; the -header files contain declarations for functions and data structures, and are -included by source files -- whether hand-written or generated by Sod -- which -makes use of the classes defined in the module. - -Sod is not like \Cplusplus: it makes no attempt to `enhance' the C language -itself. Sod module files describe classes, messages, methods, slots, and -other kinds of object-system things, and some of these descriptions need to -contain C code fragments, but this code is entirely uninterpreted by the Sod -translator.\footnote{% - As long as a code fragment broadly follows C's lexical rules, and properly - matches parentheses, brackets, and braces, the Sod translator will copy it - into its output unchanged. It might, in fact, be some other kind of C-like - language, such as Objective~C or \Cplusplus. Or maybe even - Objective~\Cplusplus, because if having an object system is good, then - having three must be really awesome.} % - -The Sod translator is not a closed system. It is written in Common Lisp, and -can load extension modules which add new input syntax, output formats, or -altered behaviour. The interface for writing such extensions is described in -\xref{p:lisp}. Extensions can change almost all details of the Sod object -system, so the material in this manual must be read with this in mind: this -manual describes the base system as provided in the distribution. - %%%-------------------------------------------------------------------------- \section{Modules} \label{sec:concepts.modules} @@ -245,13 +214,13 @@ qualified by the defining class's nickname. \subsubsection{Slot initializers} As well as defining slot names and types, a class can also associate an \emph{initial value} with each slot defined by itself or one of its -subclasses. A class $C$ provides an \emph{initialization function} (see +subclasses. A class $C$ provides an \emph{initialization message} (see \xref{sec:concepts.lifecycle.birth}, and \xref{sec:structures.root.sodclass}) -which sets the slots of a \emph{direct} instance of the class to the correct -initial values. If several of $C$'s superclasses define initializers for the -same slot then the initializer from the most specific such class is used. If -none of $C$'s superclasses define an initializer for some slot then that slot -will be left uninitialized. +whose methods set the slots of a \emph{direct} instance of the class to the +correct initial values. If several of $C$'s superclasses define initializers +for the same slot then the initializer from the most specific such class is +used. If none of $C$'s superclasses define an initializer for some slot then +that slot will be left uninitialized. The initializer for a slot with scalar type may be any C expression. The initializer for a slot with aggregate type must contain only constant @@ -259,6 +228,17 @@ expressions if the generated code is expected to be processed by a implementation of C89. Initializers will be evaluated once each time an instance is initialized. +Slots are initialized in reverse-precedence order of their defining classes; +i.e., slots defined by a less specific superclass are initialized earlier +than slots defined by a more specific superclass. Slots defined by the same +class are initialized in the order in which they appear in the class +definition. + +The initializer for a slot may refer to other slots in the same object, via +the @|me| pointer: in an initializer for a slot defined by a class $C$, @|me| +has type `pointer to $C$'. (Note that the type of @|me| depends only on the +class which defined the slot, not the class which defined the initializer.) + \subsection{C language integration} \label{sec:concepts.classes.c} @@ -290,7 +270,10 @@ slots. If you want to hide implementation details, the best approach is to stash them in a dynamically allocated private structure, and leave a pointer to it in a slot. (This will also help preserve binary compatibility, because the private structure can grow more members as needed. See -\xref{sec:fixme.compatibility} for more details. +\xref{sec:fixme.compatibility} for more details.) + +\subsubsection{Vtables} + \subsubsection{Class objects} In Sod's object system, classes are objects too. Therefore classes are @@ -308,8 +291,8 @@ doesn't define any messages, so it doesn't have any methods. In Sod, a class slot containing a function pointer is not at all the same thing as a method.) \subsubsection{Conversions} -Suppose one has a value of type pointer to class type of some class~$C$, and -wants to convert it to a pointer to class type of some other class~$B$. +Suppose one has a value of type pointer-to-class-type for some class~$C$, and +wants to convert it to a pointer-to-class-type for some other class~$B$. There are three main cases to distinguish. \begin{itemize} \item If $B$ is a superclass of~$C$, in the same chain, then the conversion @@ -325,13 +308,13 @@ There are three main cases to distinguish. pointer. The conversion can be performed using the appropriate generated upcast macro (see below); the general case is handled by the macro \descref{SOD_XCHAIN}{mac}. -\item If $B$ is a subclass of~$C$ then the conversion is an \emph{upcast}; +\item If $B$ is a subclass of~$C$ then the conversion is a \emph{downcast}; otherwise the conversion is a~\emph{cross-cast}. In either case, the conversion can fail: the object in question might not be an instance of~$B$ - at all. The macro \descref{SOD_CONVERT}{mac} and the function + after all. The macro \descref{SOD_CONVERT}{mac} and the function \descref{sod_convert}{fun} perform general conversions. They return a null pointer if the conversion fails. (There are therefore your analogue to the - \Cplusplus @|dynamic_cast<>| operator.) + \Cplusplus\ @|dynamic_cast<>| operator.) \end{itemize} The Sod translator generates macros for performing both in-chain and cross-chain upcasts. For each class~$C$, and each proper superclass~$B$ @@ -388,7 +371,8 @@ Keyword arguments can be provided in three ways. Keyword arguments are provided as a general feature for C functions. However, Sod has special support for messages which accept keyword arguments -(\xref{sec:concepts.methods.keywords}). +(\xref{sec:concepts.methods.keywords}); and they play an essential role in +the instance construction protocol (\xref{sec:concepts.lifecycle.birth}). %%%-------------------------------------------------------------------------- \section{Messages and methods} \label{sec:concepts.methods} @@ -477,8 +461,96 @@ If there are no applicable primary methods then no effective method is constructed: the vtables contain null pointers in place of pointers to method entry functions. +\begin{figure} + \begin{tikzpicture} + [>=stealth, thick, + order/.append style={color=green!70!black}, + code/.append style={font=\sffamily}, + action/.append style={font=\itshape}, + method/.append style={rectangle, draw=black, thin, fill=blue!30, + text height=\ht\strutbox, text depth=\dp\strutbox, + minimum width=40mm}] + + \def\delgstack#1#2#3{ + \node (#10) [method, #2] {#3}; + \node (#11) [method, above=6mm of #10] {#3}; + \draw [->] ($(#10.north)!.5!(#10.north west) + (0mm, 1mm)$) -- + ++(0mm, 4mm) + node [code, left=4pt, midway] {next_method}; + \draw [<-] ($(#10.north)!.5!(#10.north east) + (0mm, 1mm)$) -- + ++(0mm, 4mm) + node [action, right=4pt, midway] {return}; + \draw [->] ($(#11.north)!.5!(#11.north west) + (0mm, 1mm)$) -- + ++(0mm, 4mm) + node [code, left=4pt, midway] {next_method} + node (ld) [above] {$\smash\vdots\mathstrut$}; + \draw [<-] ($(#11.north)!.5!(#11.north east) + (0mm, 1mm)$) -- + ++(0mm, 4mm) + node [action, right=4pt, midway] {return} + node (rd) [above] {$\smash\vdots\mathstrut$}; + \draw [->] ($(ld.north) + (0mm, 1mm)$) -- ++(0mm, 4mm) + node [code, left=4pt, midway] {next_method}; + \draw [<-] ($(rd.north) + (0mm, 1mm)$) -- ++(0mm, 4mm) + node [action, right=4pt, midway] {return}; + \node (p) at ($(ld.north)!.5!(rd.north)$) {}; + \node (#1n) [method, above=5mm of p] {#3}; + \draw [->, order] ($(#10.south east) + (4mm, 1mm)$) -- + ($(#1n.north east) + (4mm, -1mm)$) + node [midway, right, align=left] + {Most to \\ least \\ specific};} + + \delgstack{a}{}{Around method} + \draw [<-] ($(a0.south)!.5!(a0.south west) - (0mm, 1mm)$) -- + ++(0mm, -4mm); + \draw [->] ($(a0.south)!.5!(a0.south east) - (0mm, 1mm)$) -- + ++(0mm, -4mm) + node [action, right=4pt, midway] {return}; + + \draw [->] ($(an.north)!.6!(an.north west) + (0mm, 1mm)$) -- + ++(-8mm, 8mm) + node [code, midway, left=3mm] {next_method} + node (b0) [method, above left = 1mm + 4mm and -6mm - 4mm] {}; + \node (b1) [method] at ($(b0) - (2mm, 2mm)$) {}; + \node (bn) [method] at ($(b1) - (2mm, 2mm)$) {Before method}; + \draw [->, order] ($(bn.west) - (6mm, 0mm)$) -- ++(12mm, 12mm) + node [midway, above left, align=center] {Most to \\ least \\ specific}; + \draw [->] ($(b0.north east) + (-10mm, 1mm)$) -- ++(8mm, 8mm) + node (p) {}; + + \delgstack{m}{above right=1mm and 0mm of an.west |- p}{Primary method} + \draw [->] ($(mn.north)!.5!(mn.north west) + (0mm, 1mm)$) -- ++(0mm, 4mm) + node [code, left=4pt, midway] {next_method} + node [above right = 0mm and -8mm] + {$\vcenter{\hbox{\Huge\textcolor{red}{!}}} + \vcenter{\hbox{\begin{tabular}[c]{l} + \textsf{next_method} \\ + pointer is null + \end{tabular}}}$}; + + \draw [->, color=blue, dotted] + ($(m0.south)!.2!(m0.south east) - (0mm, 1mm)$) -- + ($(an.north)!.2!(an.north east) + (0mm, 1mm)$) + node [midway, sloped, below] {Return value}; + + \draw [<-] ($(an.north)!.6!(an.north east) + (0mm, 1mm)$) -- + ++(8mm, 8mm) + node [action, midway, right=3mm] {return} + node (f0) [method, above right = 1mm and -6mm] {}; + \node (f1) [method] at ($(f0) + (-2mm, 2mm)$) {}; + \node (fn) [method] at ($(f1) + (-2mm, 2mm)$) {After method}; + \draw [<-, order] ($(f0.east) + (6mm, 0mm)$) -- ++(-12mm, 12mm) + node [midway, above right, align=center] + {Least to \\ most \\ specific}; + \draw [<-] ($(fn.north west) + (6mm, 1mm)$) -- ++(-8mm, 8mm); + + \end{tikzpicture} + + \caption{The standard method combination} + \label{fig:concepts.methods.stdmeth} +\end{figure} + The effective method for a message with standard method combination works as -follows. +follows (see also~\xref{fig:concepts.methods.stdmeth}). \begin{enumerate} \item If any applicable methods have the @|around| role, then the most @@ -604,6 +676,36 @@ There is also a @|custom| aggregating method combination, which is described in \xref{sec:fixme.custom-aggregating-method-combination}. +\subsection{Sending messages in C} \label{sec:concepts.methods.c} + +Each instance is associated with its direct class [FIXME] + +The effective methods for each class are determined at translation time, by +the Sod translator. For each effective method, one or more \emph{method +entry functions} are constructed. A method entry function has three +responsibilities. +\begin{itemize} +\item It converts the receiver pointer to the correct type. Method entry + functions can perform these conversions extremely efficiently: there are + separate method entries for each chain of each class which can receive a + message, so method entry functions are in the privileged situation of + knowing the \emph{exact} class of the receiving object. +\item If the message accepts a variable-length argument tail, then two method + entry functions are created for each chain of each class: one receives a + variable-length argument tail, as intended, and captures it in a @|va_list| + object; the other accepts an argument of type @|va_list| in place of the + variable-length tail and arranges for it to be passed along to the direct + methods. +\item It invokes the effective method with the appropriate arguments. There + might or might not be an actual function corresponding to the effective + method itself: the translator may instead open-code the effective method's + behaviour into each method entry function; and the machinery for handling + `delegation chains', such as is used for @|around| methods and primary + methods in the standard method combination, is necessarily scattered among + a number of small functions. +\end{itemize} + + \subsection{Messages with keyword arguments} \label{sec:concepts.methods.keywords} @@ -654,8 +756,13 @@ Construction of a new instance of a class involves three steps. necessary. \end{enumerate} The \descref{SOD_DECL}[macro]{mac} handles constructing instances with -automatic storage duration (`on the stack'). Currently, there is no built-in -support for constructing dynamically-allocated instances. +automatic storage duration (`on the stack'). Similarly, the +\descref{SOD_MAKE}[macro]{mac} and the \descref{sod_make}{fun} and +\descref{sod_makev}{fun} functions construct instances allocated from the +standard @|malloc| heap. Programmers can add support for other allocation +strategies by using the \descref{SOD_INIT}[macro]{mac} and the +\descref{sod_init}{fun} and \descref{sod_initv}{fun} functions, which package +up imprinting and initialization. \subsubsection{Allocation} Instances of most classes (specifically including those classes defined by @@ -687,7 +794,7 @@ the platform's strictest alignment requirement applies. The following simple function correctly allocates and returns space for an instance of a class given a pointer to its class object @. \begin{prog} - void *allocate_instance(const SodClass *cls) \\ \ind + void *allocate_instance(const SodClass *cls) \\ \ind \{ return malloc(cls@->cls.initsz); \} \end{prog} @@ -711,7 +818,7 @@ of any use unless they've been written specifically for the purpose. The following simple function imprints storage at address @

as an instance of a class, given a pointer to its class object @. \begin{prog} - void imprint_instance(const SodClass *cls, void *p) \\ \ind + void imprint_instance(const SodClass *cls, void *p) \\ \ind \{ cls@->cls.imprint(p); \} \end{prog} @@ -722,33 +829,88 @@ environment in which it operates. Details of initialization are necessarily class-specific, but typically it involves setting the instance's slots to appropriate values, and possibly -linking it into some larger data structure to keep track of it. - -Classes can declare initial values for their slots. A class object's @|init| -slot points to a function which will establish the appropriate initial values -for a new instance's slots. Slots are not initialized in any particularly -useful order. The @|init| function also imprints the instance storage. - -The provided initialization protocol is extremely simplistic; most notably, -it's not possible to pass parameters into the initialization process. -Classes which have more complex requirements will need to define and -implement their own additional (or alternative) protocols. - -\subsubsection{Example} -The following is a simple function, with syntactic-sugar macro, which -allocate storage for an instance of a class, imprints and initializes it, and -returns a pointer to the new instance. -\begin{prog} - void *make_instance(const SodClass *c) \\ - \{ \\ \ind - void *p = malloc(c@->cls.initsz); \\ - if (!p) return (0); \\ - c@->cls.init(p); \\ - return (p); \- \\ - \} - \\+ - \#define MAKE(cls) (cls *)make_instance(cls\#\#__class) -\end{prog} +linking it into some larger data structure to keep track of it. It is +possible for initialization methods to attempt to allocate resources, but +this must be done carefully: there is currently no way to report an error +from object initialization, so the object must be marked as incompletely +initialized, and left in a state where it will be safe to tear down later. + +Initialization is performed by sending the imprinted instance an @|init| +message, defined by the @|SodObject| class. This message uses a nonstandard +method combination which works like the standard combination, except that the +\emph{default behaviour}, if there is no overriding method, is to initialize +the instance's slots, as described below, and to invoke each superclass's +initialization fragments. This default behaviour may be invoked multiple +times if some method calls on its @|next_method| more than once, unless some +other method takes steps to prevent this. + +Slots are initialized in a well-defined order. +\begin{itemize} +\item Slots defined by a more specific superclasses are initialized after + slots defined by a less specific superclass. +\item Slots defined by the same class are initialized in the order in which + their definitions appear. +\end{itemize} + +A class can define \emph{initialization fragments}: pieces of literal code to +be executed to set up a new instance. Each superclass's initialization +fragments are executed with @|me| bound to an instance pointer of the +appropriate superclass type, immediately after that superclass's slots (if +any) have been initialized; therefore, fragments defined by a more specific +superclass are executed after fragments defined by a less specific +superclass. A class may define more than one initialization fragment: the +fragments are executed in the order in which they appear in the class +definition. It is possible for an initialization fragment to use @|return| +or @|goto| for special control-flow effects, but this is not likely to be a +good idea. + +The @|init| message accepts keyword arguments +(\xref{sec:concepts.methods.keywords}). The set of acceptable keywords is +determined by the applicable methods as usual, but also by the +\emph{initargs} defined by the receiving instance's class and its +superclasses, which are made available to slot initializers and +initialization fragments. + +There are two kinds of initarg definitions. \emph{User initargs} are defined +by an explicit @|initarg| item appearing in a class definition: the item +defines a name, type, and (optionally) a default value for the initarg. +\emph{Slot initargs} are defined by attaching an @|initarg| property to a +slot or slot initializer item: the property's determines the initarg's name, +while the type is taken from the underlying slot type; slot initargs do not +have default values. Both kinds define a \emph{direct initarg} for the +containing class. + +Initargs are inherited. The \emph{applicable} direct initargs for an @|init| +effective method are those defined by the receiving object's class, and all +of its superclasses. Applicable direct initargs with the same name are +merged to form \emph{effective initargs}. An error is reported if two +applicable direct initargs have the same name but different types. The +default value of an effective initarg is taken from the most specific +applicable direct initarg which specifies a defalt value; if no applicable +direct initarg specifies a default value then the effective initarg has no +default. + +All initarg values are made available at runtime to user code -- +initialization fragments and slot initializer expressions -- through local +variables and a @|suppliedp| structure, as in a direct method +(\xref{sec:concepts.methods.keywords}). Furthermore, slot initarg +definitions influence the initialization of slots. + +The process for deciding how to initialize a particular slot works as +follows. +\begin{enumerate} +\item If there are any slot initargs defined on the slot, or any of its slot + initializers, \emph{and} the sender supplied a value for one or more of the + corresponding effective initargs, then the value of the most specific slot + initarg is stored in the slot. +\item Otherwise, if there are any slot initializers defined which include an + initializer expression, then the initializer expression from the most + specific such slot initializer is evaluated and its value stored in the + slot. +\item Otherwise, the slot is left uninitialized. +\end{enumerate} +Note that the default values (if any) of effective initargs do \emph{not} +affect this procedure. \subsection{Destruction} @@ -762,32 +924,61 @@ steps. \item \emph{Deallocation} releases the memory used to store the instance so that it can be reused. \end{enumerate} +Teardown alone, for objects which require special deallocation, or for which +deallocation occurs automatically (e.g., instances with automatic storage +duration, or instances whose storage will be garbage-collected), is performed +using the \descref{sod_teardown}[function]{fun}. Destruction of instances +allocated from the standard @|malloc| heap is done using the +\descref{sod_destroy}[function]{fun}. \subsubsection{Teardown} -Details of teardown are class-specific, but typically it involves releasing -resources held by the instance, and possibly unlinking it from some larger -data structure which used to keep track of it. - -There is no provided protocol for teardown: classes whose instances require -teardown behaviour must define and implement an appropriate protocol of their -own. The following class may serve for simple cases. +Details of teardown are necessarily class-specific, but typically it +involves releasing resources held by the instance, and disentangling it from +any data structures it might be linked into. + +Teardown is performed by sending the instance the @|teardown| message, +defined by the @|SodObject| class. The message returns an integer, used as a +boolean flag. If the message returns zero, then the instance's storage +should be deallocated. If the message returns nonzero, then it is safe for +the caller to forget about instance, but should not deallocate its storage. +This is \emph{not} an error return: if some teardown method fails then the +program may be in an inconsistent state and should not continue. + +This simple protocol can be used, for example, to implement a reference +counting system, as follows. \begin{prog} - [nick = disposable] \\ - class DisposableObject : SodObject \{ \\- \ind - void release() \{ ; \} \\ - \quad /* Release resources held by the receiver. */ \- \\- - \} - \\+ - code c : user \{ \\- \ind - /* If p is a a DisposableObject then release its resources. */ \\ - void maybe_dispose(void *p) \\ - \{ \\ \ind - DisposableObject *d = SOD_CONVERT(DisposableObject, p); \\ - if (d) DisposableObject_release(d); \- \\ - \} \- \\ + [nick = ref] \\ + class ReferenceCountedObject: SodObject \{ \\ \ind + unsigned nref = 1; \\- + void inc() \{ me@->ref.nref++; \} \\- + [role = around] \\ + int obj.teardown() \\ + \{ \\ \ind + if (--\,--me@->ref.nref) return (1); \\ + else return (CALL_NEXT_METHOD); \-\\ + \} \-\\ \} \end{prog} +This message uses a nonstandard method combination which works like the +standard combination, except that the \emph{default behaviour}, if there is +no overriding method, is to execute the superclass's teardown fragments, and +to return zero. This default behaviour may be invoked multiple times if some +method calls on its @|next_method| more than once, unless some other method +takes steps to prevent this. + +A class can define \emph{teardown fragments}: pieces of literal code to be +executed to shut down an instance. Each superclass's teardown fragments are +executed with @|me| bound to an instance pointer of the appropriate +superclass type; fragments defined by a more specific superclass are executed +before fragments defined by a less specific superclass. A class may define +more than one teardown fragment: the fragments are executed in the order in +which they appear in the class definition. It is possible for an +initialization fragment to use @|return| or @|goto| for special control-flow +effects, but this is not likely to be a good idea. Similarly, it's probably +a better idea to use an @|around| method to influence the return value than +to write an explicit @|return| statement in a teardown fragment. + \subsubsection{Deallocation} The details of instance deallocation are obviously specific to the allocation strategy used by the instance, and this is often orthogonal from the object's @@ -798,22 +989,25 @@ of the object's direct class. Low-level details of deallocation often require the proper base address of the instance's storage, which can be determined using the \descref{SOD_INSTBASE}[macro]{mac}. -\subsubsection{Example} -The following is a counterpart to the @|new_instance| function -(\xref{sec:concepts.lifecycle.birth}), which tears down and deallocates an -instance allocated using @|malloc|. -\begin{prog} - void free_instance(void *p) \\ - \{ \\ \ind - SodObject *obj = p; \\ - maybe_dispose(p); \\ - free(SOD_INSTBASE(obj)); \- \\ - \} -\end{prog} - %%%-------------------------------------------------------------------------- \section{Metaclasses} \label{sec:concepts.metaclasses} +%%%-------------------------------------------------------------------------- +\section{Compatibility considerations} \label{sec:concepts.compatibility} + +Sod doesn't make source-level compatibility especially difficult. As long as +classes, slots, and messages don't change names or dissappear, and slots and +messages retain their approximate types, everything will be fine. + +Binary compatibility is much more difficult. Unfortunately, Sod classes have +rather fragile binary interfaces.\footnote{% + Research suggestion: investigate alternative instance and vtable layouts + which improve binary compatibility, probably at the expense of instance + compactness, and efficiency of slot access and message sending. There may + be interesting trade-offs to be made.} % + +If instances are allocated [FIXME] + %%%----- That's all, folks -------------------------------------------------- %%% Local variables: