Heap Object Layout (SBCL Internals)

Objects stored in the heap are of two kinds: those with headers, and cons cells. If the first word of an object has a header widetag then the object has the type and layout associated with that widetag. Otherwise, the object is assumed to be a cons cell.

Some objects have “unboxed” words without any associated type information as well as the more usual “boxed” words with lowtags. Obvious cases include the specialized array types, some of the numeric types, system-area-pointers, and so on.

The primitive object layouts are specified in src/compiler/generic/objdef.lisp.

6.2.1 Header Values

As a widetag is only eight bits wide but a heap object header takes a full machine word, there is an extra 24 or 56 bits of space available for unboxed data storage in each heap object. This space is called the “header value”, and is used for various purposes depending on the type of heap object in question.

6.2.2 Symbols

In contrast to the simple model of symbols provided in the Common Lisp standard, symbol objects in SBCL do not have a function cell. Instead, the mapping from symbols to functions is done via the compiler globaldb.

There are two additional slots associated with symbols. One is a hash value for the symbol (based on the symbol name), which avoids having to recompute the hash from the name every time it is required.

The other additional slot, on threaded systems only, is the TLS index, which is either no-tls-value-marker-widetag or an unboxed byte offset within the TLS area to the TLS slot associated with the symbol. Because the unboxed offset is aligned to a word boundary it appears as a fixnum when viewed as boxed data. It is not, in general, safe to increment this value as a fixnum, however, in case n-fixnum-tag-bits changes⁸.

6.2.3 The NIL-cons Hack

As an “optimization”, the symbol nil has list-pointer-lowtag rather than other-pointer-lowtag, and is aligned in memory so that the value and hash slots are the car and cdr of the cons, with both slots containing nil. This allows for car and cdr to simply do a lowtag test and slot access instead of having to explicitly test for nil, at the cost of requiring all symbol type tests and slot accesses to test for nil.

6.2.4 Functions and Code Components

All compiled code resides in code-component objects. These objects consist of a header, some number of boxed literal values, a “data block” containing machine code and simple-fun headers, and a “trace table” which is currently unused⁹.

The simple-fun headers represent simple function objects (not funcallable-instances or closures), and each code-component will typically have one for the main entry point and one per closure entry point (as the function underlying the closure, not the closure object proper). In a compiler trace-file, the simple-fun headers are all listed as entries in the IR2 component.

The simple-fun headers are held in a linked list per code-component in order to allow the garbage collector to find them during relocation. In order to be able to find the start of a code-component from a simple-fun, the header value is the offset in words from the start of the code-component to the start of the simple-fun.

Footnotes

(8)

This is not as unlikely as it might seem at first; while historically n-fixnum-tag-bits has always been the same as word-shift there is a branch where it is permitted to vary at build time from word-shift to as low as 1 on 64-bit ports, and a proposed scheme to allow the same on 32-bit ports

(9)

Trace tables were originally used to support garbage collection using gengc in CMUCL. As there is still vestigial support for carrying them around at the end of code-components, they may end up being used for something else in the future.

6.2 Heap Object Layout

6.2.1 Header Values

6.2.2 Symbols

6.2.3 The NIL-cons Hack

6.2.4 Functions and Code Components

Footnotes

(8)

(9)