chiark - git - mdw - mLib/blob - man/sym.3

   1 .\" -*-nroff-*-
   2 .de VS
   3 .sp 1
   4 .RS
   5 .nf
   6 .ft B
   7 ..
   8 .de VE
   9 .ft R
  10 .fi
  11 .RE
  12 .sp 1
  13 ..
  14 .TH sym 3 "8 May 1999" "Straylight/Edgeware" "mLib utilities library"
  15 .SH NAME
  16 sym \- symbol table manager
  17 .\" @sym_create
  18 .\" @sym_destroy
  19 .\" @sym_find
  20 .\" @sym_remove
  21 .\" @sym_mkiter
  22 .\" @sym_next
  23 .\"
  24 .\" @SYM_NAME
  25 .\" @SYM_LEN
  26 .\" @SYM_HASH
  27 .\"
  28 .SH SYNOPSIS
  29 .nf
  30 .B "#include <mLib/sym.h>"
  31
  32 .BI "void sym_create(sym_table *" t );
  33 .BI "void sym_destroy(sym_table *" t );
  34
  35 .BI "void *sym_find(sym_table *" t ,
  36 .BI "               const char *" n ", long " l ,
  37 .BI "               size_t " sz ", unsigned *" f );
  38 .BI "void sym_remove(sym_table *" t ", void *" b );
  39
  40 .BI "const char *SYM_NAME(const void *" p );
  41 .BI "size_t SYM_LEN(const void *" p );
  42 .BI "uint32 SYM_HASH(const void *" p );
  43
  44 .BI "void sym_mkiter(sym_iter *" i ", sym_table *" t );
  45 .BI "void *sym_next(sym_iter *" i );
  46 .fi
  47 .SH "DESCRIPTION"
  48 The
  49 .B sym
  50 functions implement a data structure often described as a dictionary, a
  51 finite map, an associative array, or a symbol table.  It associates
  52 .I values
  53 with
  54 .I keys
  55 such that the value corresponding to a given key can be found quickly.
  56 Additionally, all stored associations can be enumerated.
  57 .PP
  58 The interface provides an
  59 .I intrusive
  60 symbol table.  The data objects stored in the table must include a small
  61 header used by the symbol table manager.  This reduces the amount of
  62 pointer fiddling that needs to be done, and in practice doesn't seem to
  63 be much of a problem.  It's also fairly easy to construct a
  64 non-intrusive interface if you really want one.
  65 .PP
  66 There are three main data structures involved in the interface:
  67 .TP
  68 .B sym_table
  69 Keeps track of the information associated with a particular table.
  70 .TP
  71 .B sym_base
  72 The header which must be attached to the front of all the value
  73 objects.
  74 .TP
  75 .B sym_iter
  76 An iterator object, used for enumerating all of the associations stored
  77 in a symbol table.
  78 .PP
  79 All of the above data structures should be considered
  80 .IR opaque :
  81 don't try looking inside.  Representations have changed in the past, and
  82 they may change again in the future.
  83 .SS "Creation and destruction"
  84 The
  85 .B sym_table
  86 object itself needs to be allocated by the caller.  It is initialized by
  87 passing it to the function
  88 .BR sym_create .
  89 After initialization, the table contains no entries.
  90 .PP
  91 Initializing a symbol table involves allocating some memory.  If this
  92 allocation fails, an
  93 .B EXC_NOMEM
  94 exception is raised.
  95 .PP
  96 When a symbol table is no longer needed, the memory occupied by the
  97 values and other maintenance structures can be reclaimed by calling
  98 .BR sym_destroy .
  99 Any bits of user data attached to values should previously have been
 100 destroyed.
 101 .SS "Adding, searching and removing"
 102 Most of the actual work is done by the function
 103 .BR sym_find .
 104 It does both lookup and creation, depending on its arguments.  To do its
 105 job, it needs to know the following bits of information:
 106 .TP
 107 .BI "sym_table *" t
 108 A pointer to a symbol table to manipulate.
 109 .TP
 110 .BI "const char *" n
 111 The address of the
 112 .I key
 113 to look up or create.  Usually this will be a simple text string,
 114 although it can actually be any arbitrary binary data.
 115 .TP
 116 .BI "long " l
 117 The length of the key.  If this is \-1,
 118 .B sym_find
 119 assumes that the key is a null-terminated string, and calculates its
 120 length itself.  This is entirely equivalent to passing
 121 .BI strlen( n )\fR.
 122 .TP
 123 .BI "size_t " sz
 124 The size of the value block to allocate if the key could not be found.
 125 If this is zero, no value is allocated, and a null pointer is returned
 126 to indicate an unsuccessful lookup.
 127 .TP
 128 .BI "unsigned *" f
 129 The address of a `found' flag to set.  This is an output parameter.  On
 130 exit,
 131 .B sym_find
 132 will set the value of
 133 .BI * f
 134 to zero if the key could not be found, or nonzero if it was found.  This
 135 can be used to tell whether the value returned has been newly allocated,
 136 or whether it was already in the table.
 137 .PP
 138 A terminating null byte is appended to the copy of the symbol's name in
 139 memory.  This is not considered to be a part of the symbol's name, and
 140 does not contribute to the name's length as reported by the
 141 .B SYM_LEN
 142 macro.
 143 .PP
 144 A symbol can be removed from the table by calling
 145 .BR sym_remove ,
 146 passing the symbol table itself, and the value block that needs
 147 removing.
 148 .SS "Enquiries about symbols"
 149 Three macros are provided to enable simple enquiries about a symbol.
 150 Given a pointer
 151 .I s
 152 to a symbol table entry,
 153 .BI SYM_LEN( s )
 154 returns the length of the symbol's name (excluding any terminating null
 155 byte);
 156 .BI SYM_NAME( s )
 157 returns a pointer to the symbol's name; and
 158 .BI SYM_HASH( s )
 159 returns the symbol's hash value.
 160 .SS "Enumerating symbols"
 161 Enumerating the values in a symbol table is fairly simple.  Allocate a
 162 .B sym_iter
 163 object from somewhere.  Attach it to a symbol table by calling
 164 .BR sym_mkiter ,
 165 and passing in the addresses of the iterator and the symbol table.
 166 Then, each call to
 167 .B sym_next
 168 will return a different value from the symbol table, until all of them
 169 have been enumerated, at which point,
 170 .B sym_next
 171 returns a null pointer.
 172 .PP
 173 It's safe to remove the symbol you've just been returned by
 174 .BR sym_next .
 175 However, it's not safe to remove any other symbol.  So don't do that.
 176 .PP
 177 When you've finished with an iterator, it's safe to just throw it away.
 178 You don't need to call any functions beforehand.
 179 .SS "Use in practice"
 180 In normal use, the keys are simple strings (usually identifiers from
 181 some language), and the values are nontrivial structures providing
 182 information about types and values.
 183 .PP
 184 In this case, you'd define something like the following structure for
 185 your values:
 186 .VS
 187 typedef struct val {
 188   sym_base _base;       /* Symbol header */
 189   unsigned type;        /* Type of this symbol */
 190   int dispoff;          /* Which display variable is in */
 191   size_t frameoff;      /* Offset of variable in frame */
 192 } val;
 193 .VE
 194 Given a pointer
 195 .I v
 196 to a
 197 .BR val ,
 198 you can find the variable's name by calling
 199 .BI SYM_NAME( v )\fR.
 200 .PP
 201 You can look up a name in the table by saying something like:
 202 .VS
 203 val *v = sym_find(t, name, -1, 0, 0);
 204 if (!v)
 205   error("unknown variable `%s'", name);
 206 .VE
 207 You can add in a new variable by saying something like
 208 .VS
 209 unsigned f;
 210 val *v = sym_find(t, name, -1, sizeof(val), &f);
 211 if (f)
 212   error("variable `%s' already exists", name);
 213 /* fill in v */
 214 .VE
 215 You can examine all the variables in your symbol table by saying
 216 something like:
 217 .VS
 218 sym_iter i;
 219 val *v;
 220
 221 for (sym_mkiter(&i, t); (v = sym_next(&i)) != 0; ) {
 222   /* ... */
 223 }
 224 .VE
 225 That ought to be enough examples to be getting on with.
 226 .SS Implementation
 227 The symbol table is an extensible hashtable, using the universal hash
 228 function described in
 229 .BR unihash (3)
 230 and the global hashing key.  The hash chains are kept very short
 231 (probably too short, actually).  Every time a symbol is found, its block
 232 is promoted to the front of its bin chain so it gets found faster next
 233 time.
 234 .SH SEE ALSO
 235 .BR hash (3),
 236 .BR mLib (3).
 237 .SH AUTHOR
 238 Mark Wooding, <mdw@distorted.org.uk>