.\" -*-nroff-*- .de VS .sp 1 .RS .nf .ft B .. .de VE .ft R .fi .RE .sp 1 .. .TH sym 3 "8 May 1999" "Straylight/Edgeware" "mLib utilities library" .SH NAME sym \- symbol table manager .\" @sym_create .\" @sym_destroy .\" @sym_find .\" @sym_remove .\" @sym_mkiter .\" @sym_next .\" .\" @SYM_NAME .\" @SYM_LEN .\" @SYM_HASH .\" .SH SYNOPSIS .nf .B "#include " .BI "void sym_create(sym_table *" t ); .BI "void sym_destroy(sym_table *" t ); .BI "void *sym_find(sym_table *" t , .BI " const char *" n ", long " l , .BI " size_t " sz ", unsigned *" f ); .BI "void sym_remove(sym_table *" t ", void *" b ); .BI "const char *SYM_NAME(const void *" p ); .BI "size_t SYM_LEN(const void *" p ); .BI "uint32 SYM_HASH(const void *" p ); .BI "void sym_mkiter(sym_iter *" i ", sym_table *" t ); .BI "void *sym_next(sym_iter *" i ); .fi .SH "DESCRIPTION" The .B sym functions implement a data structure often described as a dictionary, a finite map, an associative array, or a symbol table. It associates .I values with .I keys such that the value corresponding to a given key can be found quickly. Additionally, all stored associations can be enumerated. .PP The interface provides an .I intrusive symbol table. The data objects stored in the table must include a small header used by the symbol table manager. This reduces the amount of pointer fiddling that needs to be done, and in practice doesn't seem to be much of a problem. It's also fairly easy to construct a non-intrusive interface if you really want one. .PP There are three main data structures involved in the interface: .TP .B sym_table Keeps track of the information associated with a particular table. .TP .B sym_base The header which must be attached to the front of all the value objects. .TP .B sym_iter An iterator object, used for enumerating all of the associations stored in a symbol table. .PP All of the above data structures should be considered .IR opaque : don't try looking inside. Representations have changed in the past, and they may change again in the future. .SS "Creation and destruction" The .B sym_table object itself needs to be allocated by the caller. It is initialized by passing it to the function .BR sym_create . After initialization, the table contains no entries. .PP Initializing a symbol table involves allocating some memory. If this allocation fails, an .B EXC_NOMEM exception is raised. .PP When a symbol table is no longer needed, the memory occupied by the values and other maintenance structures can be reclaimed by calling .BR sym_destroy . Any bits of user data attached to values should previously have been destroyed. .SS "Adding, searching and removing" Most of the actual work is done by the function .BR sym_find . It does both lookup and creation, depending on its arguments. To do its job, it needs to know the following bits of information: .TP .BI "sym_table *" t A pointer to a symbol table to manipulate. .TP .BI "const char *" n The address of the .I key to look up or create. Usually this will be a simple text string, although it can actually be any arbitrary binary data. .TP .BI "long " l The length of the key. If this is \-1, .B sym_find assumes that the key is a null-terminated string, and calculates its length itself. This is entirely equivalent to passing .BI strlen( n )\fR. .TP .BI "size_t " sz The size of the value block to allocate if the key could not be found. If this is zero, no value is allocated, and a null pointer is returned to indicate an unsuccessful lookup. .TP .BI "unsigned *" f The address of a `found' flag to set. This is an output parameter. On exit, .B sym_find will set the value of .BI * f to zero if the key could not be found, or nonzero if it was found. This can be used to tell whether the value returned has been newly allocated, or whether it was already in the table. .PP A terminating null byte is appended to the copy of the symbol's name in memory. This is not considered to be a part of the symbol's name, and does not contribute to the name's length as reported by the .B SYM_LEN macro. .PP A symbol can be removed from the table by calling .BR sym_remove , passing the symbol table itself, and the value block that needs removing. .SS "Enquiries about symbols" Three macros are provided to enable simple enquiries about a symbol. Given a pointer .I s to a symbol table entry, .BI SYM_LEN( s ) returns the length of the symbol's name (excluding any terminating null byte); .BI SYM_NAME( s ) returns a pointer to the symbol's name; and .BI SYM_HASH( s ) returns the symbol's hash value. .SS "Enumerating symbols" Enumerating the values in a symbol table is fairly simple. Allocate a .B sym_iter object from somewhere. Attach it to a symbol table by calling .BR sym_mkiter , and passing in the addresses of the iterator and the symbol table. Then, each call to .B sym_next will return a different value from the symbol table, until all of them have been enumerated, at which point, .B sym_next returns a null pointer. .PP It's safe to remove the symbol you've just been returned by .BR sym_next . However, it's not safe to remove any other symbol. So don't do that. .PP When you've finished with an iterator, it's safe to just throw it away. You don't need to call any functions beforehand. .SS "Use in practice" In normal use, the keys are simple strings (usually identifiers from some language), and the values are nontrivial structures providing information about types and values. .PP In this case, you'd define something like the following structure for your values: .VS typedef struct val { sym_base _base; /* Symbol header */ unsigned type; /* Type of this symbol */ int dispoff; /* Which display variable is in */ size_t frameoff; /* Offset of variable in frame */ } val; .VE Given a pointer .I v to a .BR val , you can find the variable's name by calling .BI SYM_NAME( v )\fR. .PP You can look up a name in the table by saying something like: .VS val *v = sym_find(t, name, -1, 0, 0); if (!v) error("unknown variable `%s'", name); .VE You can add in a new variable by saying something like .VS unsigned f; val *v = sym_find(t, name, -1, sizeof(val), &f); if (f) error("variable `%s' already exists", name); /* fill in v */ .VE You can examine all the variables in your symbol table by saying something like: .VS sym_iter i; val *v; for (sym_mkiter(&i, t); (v = sym_next(&i)) != 0; ) { /* ... */ } .VE That ought to be enough examples to be getting on with. .SS Implementation The symbol table is an extensible hashtable, using the universal hash function described in .BR unihash (3) and the global hashing key. The hash chains are kept very short (probably too short, actually). Every time a symbol is found, its block is promoted to the front of its bin chain so it gets found faster next time. .SH SEE ALSO .BR hash (3), .BR mLib (3). .SH AUTHOR Mark Wooding,