chiark - git - mdw - mLib/blob - man/sym.3

   1 .\" -*-nroff-*-
   2 .de VS
   3 .sp 1
   4 .RS 5
   5 .nf
   6 .ft B
   7 ..
   8 .de VE
   9 .ft R
  10 .fi
  11 .RE
  12 .sp 1
  13 ..
  14 .TH sym 3mLib "8 May 1999" mLib
  15 .SH NAME
  16 sym \- symbol table manager
  17 .SH SYNOPSIS
  18 .nf
  19 .B "#include <mLib/sym.h>"
  20
  21 .BI "void sym_create(sym_table *" t );
  22 .BI "void sym_destroy(sym_table *" t );
  23
  24 .BI "char *SYM_NAME(void *" p );
  25 .BI "void *sym_find(sym_table *" t ,
  26 .BI "               const char *" n ", long " l ,
  27 .BI "               size_t " sz ", unsigned *" f );
  28 .BI "void sym_remove(sym_table *" t ", void *" b );
  29
  30 .BI "void sym_mkiter(sym_iter *" i ", sym_table *" t );
  31 .BI "void *sym_next(sym_iter *" i );
  32 .fi
  33 .SH "HOW IT WORKS"
  34 The
  35 .B sym
  36 functions implement a data structure often described as a dictionary, a
  37 finite map, an associative array, or a symbol table.  It associates
  38 .I values
  39 with
  40 .I keys
  41 such that the value corresponding to a given key can be found quickly.
  42 Additionally, all stored associations can be enumerated.
  43 .PP
  44 The interface provides an
  45 .I intrusive
  46 symbol table.  The data objects stored in the table must include a small
  47 header used by the symbol table manager.  This reduces the amount of
  48 pointer fiddling that needs to be done, and in practice doesn't seem to
  49 be much of a problem.  It's also fairly easy to construct a
  50 non-intrusive interface if you really want one.
  51 .PP
  52 There are three main data structures involved in the interface:
  53 .TP
  54 .B sym_table
  55 Keeps track of the information associated with a particular table.
  56 .TP
  57 .B sym_base
  58 The header which must be attached to the front of all the value
  59 objects.
  60 .TP
  61 .B sym_iter
  62 An iterator object, used for enumerating all of the associations stored
  63 in a symbol table.
  64 .PP
  65 All of the above data structures should be considered
  66 .IR opaque :
  67 don't try looking inside.  Representations have changed in the past, and
  68 they may change again in the future.
  69 .SH "CREATION AND DESTRUCTION"
  70 The
  71 .B sym_table
  72 object itself needs to be allocated by the caller.  It is initialized by
  73 passing it to the function
  74 .BR sym_create .
  75 After initialization, the table contains no entries.
  76 .PP
  77 Initializing a symbol table involves allocating some memory.  If this
  78 allocation failes, an
  79 .B EXC_NOMEM
  80 exception is raised.
  81 .PP
  82 When a symbol table is no longer needed, the memory occupied by the
  83 values and other maintenance structures can be reclaimed by calling
  84 .BR sym_destroy .
  85 Any bits of user data attached to the symbol table values should have
  86 previously been destroyed.
  87 .SH "ADDING, SEARCHING AND REMOVING"
  88 Most of the actual work is done by the function
  89 .BR sym_find .
  90 It does both lookup and creation, depending on its arguments.  To do its
  91 job, it needs to know the following bits of information:
  92 .TP
  93 .I t
  94 A pointer to a symbol table to manipulate.
  95 .TP
  96 .I n
  97 The address of the
  98 .I key
  99 to look up or create.  Usually this will be a simple text string,
 100 although it can actually be any arbitrary binary data.
 101 .TP
 102 .I l
 103 The length of the key.  If this is \-1,
 104 .B sym_find
 105 assumes that the key is a null-terminated string, and calculates its
 106 length itself.
 107 .TP
 108 .I sz
 109 The size of the value block to allocate if the key could not be found.
 110 If this is zero, no value is allocated, and a null pointer is returned
 111 to indicate an unsuccessful lookup.
 112 .TP
 113 .I f
 114 The address of a `found' flag to set.  This is an output parameter.  On
 115 exit,
 116 .B sym_find
 117 will set the value of
 118 .BI * f
 119 to zero if the key could not be found, or nonzero if it was found.  This
 120 can be used to tell whether the value returned has been newly allocated,
 121 or whether it was already in the table.
 122 .PP
 123 The macro
 124 .B SYM_NAME
 125 will return the key associated with a given value block.  You must use
 126 this macro rather than fiddling inside the
 127 .B sym_base
 128 structure.
 129 .PP
 130 A symbol can be removed from the table by calling
 131 .BR sym_remove ,
 132 passing the symbol table itself, and the value block that needs
 133 removing.
 134 .SH ENUMERATION
 135 Enumerating the values in a symbol table is fairly simple.  Allocate a
 136 .B sym_iter
 137 object from somewhere.  Attach it to a symbol table by calling
 138 .BR sym_mkiter ,
 139 and passing in the addresses of the iterator and the symbol table.
 140 Then, each call to
 141 .B sym_next
 142 will return a different value from the symbol table, until all of them
 143 have been enumerated, at which point,
 144 .B sym_next
 145 returns a null pointer.
 146 .PP
 147 It's safe to remove the symbol you've just been returned by
 148 .BR sym_next .
 149 However, it's not safe to remove any other symbol.  So don't do that.
 150 .PP
 151 When you've finished with an iterator, it's safe to just throw it away.
 152 You don't need to call any functions beforehand.
 153 .SH "USE IN PRACTICE"
 154 In normal use, the keys are simple strings (usually identifiers from
 155 some language), and the values are nontrivial structures providing
 156 information about types and values.
 157 .PP
 158 In this case, you'd define something like the following structure for
 159 your values:
 160 .VS
 161 typedef struct val {
 162   sym_base _base;       /* Symbol header */
 163   unsigned type;        /* Type of this symbol */
 164   int dispoff;          /* Which display variable is in */
 165   size_t frameoff;      /* Offset of variable in frame */
 166 } val;
 167 .VE
 168 Given a pointer
 169 .I v
 170 to a
 171 .BR val ,
 172 you can find the variable's name by calling
 173 .BI SYM_NAME( v )\fR.
 174 .PP
 175 You can look up a name in the table by saying something like:
 176 .VS
 177 val *v = sym_find(t, name, -1, 0, 0);
 178 if (!v)
 179   error("unknown variable `%s'", name);
 180 .VE
 181 You can add in a new variable by saying something like
 182 .VS
 183 unsigned f;
 184 val *v = sym_find(t, name, -1, sizeof(val), &f);
 185 if (f)
 186   error("variable `%s' already exists", name);
 187 /* fill in v */
 188 .VE
 189 You can examine all the variables in your symbol table by saying
 190 something like:
 191 .VS
 192 sym_iter i;
 193 val *v;
 194
 195 for (sym_mkiter(&i, t); (v = sym_next(&i)) != 0; ) {
 196   /* ... */
 197 }
 198 .VE
 199 That ought to be enough examples to be getting on with.
 200 .SH CAVEATS
 201 The symbol table manager requires the suballocator (see
 202 .BR sub (3mLib)
 203 for details).  You must ensure that
 204 .B sub_init
 205 has been called before using any symbol tables in your program.
 206 .SH IMPLEMENTATION
 207 The symbol table is an extensible hashtable, using a 32-bit CRC as the
 208 hash function.  The hash chains are kept very short (probably too short,
 209 actually).  Every time a symbol is found, its block is promoted to the
 210 front of its bin chain so it gets found faster next time.
 211 .SH SEE ALSO
 212 .BR sub (3mLib).
 213 .SH AUTHOR
 214 Mark Wooding, <mdw@nsict.org>