chiark - git - mdw - mLib/blob - man/sym.3

   1 .\" -*-nroff-*-
   2 .de VS
   3 .sp 1
   4 .RS
   5 .nf
   6 .ft B
   7 ..
   8 .de VE
   9 .ft R
  10 .fi
  11 .RE
  12 .sp 1
  13 ..
  14 .TH sym 3 "8 May 1999" mLib
  15 .SH NAME
  16 sym \- symbol table manager
  17 .\" @sym_create
  18 .\" @sym_destroy
  19 .\" @sym_find
  20 .\" @sym_remove
  21 .\" @sym_mkiter
  22 .\" @sym_next
  23 .\"
  24 .\" @SYM_NAME
  25 .\"
  26 .SH SYNOPSIS
  27 .nf
  28 .B "#include <mLib/sym.h>"
  29
  30 .BI "void sym_create(sym_table *" t );
  31 .BI "void sym_destroy(sym_table *" t );
  32
  33 .BI "char *SYM_NAME(void *" p );
  34 .BI "void *sym_find(sym_table *" t ,
  35 .BI "               const char *" n ", long " l ,
  36 .BI "               size_t " sz ", unsigned *" f );
  37 .BI "void sym_remove(sym_table *" t ", void *" b );
  38
  39 .BI "void sym_mkiter(sym_iter *" i ", sym_table *" t );
  40 .BI "void *sym_next(sym_iter *" i );
  41 .fi
  42 .SH "HOW IT WORKS"
  43 The
  44 .B sym
  45 functions implement a data structure often described as a dictionary, a
  46 finite map, an associative array, or a symbol table.  It associates
  47 .I values
  48 with
  49 .I keys
  50 such that the value corresponding to a given key can be found quickly.
  51 Additionally, all stored associations can be enumerated.
  52 .PP
  53 The interface provides an
  54 .I intrusive
  55 symbol table.  The data objects stored in the table must include a small
  56 header used by the symbol table manager.  This reduces the amount of
  57 pointer fiddling that needs to be done, and in practice doesn't seem to
  58 be much of a problem.  It's also fairly easy to construct a
  59 non-intrusive interface if you really want one.
  60 .PP
  61 There are three main data structures involved in the interface:
  62 .TP
  63 .B sym_table
  64 Keeps track of the information associated with a particular table.
  65 .TP
  66 .B sym_base
  67 The header which must be attached to the front of all the value
  68 objects.
  69 .TP
  70 .B sym_iter
  71 An iterator object, used for enumerating all of the associations stored
  72 in a symbol table.
  73 .PP
  74 All of the above data structures should be considered
  75 .IR opaque :
  76 don't try looking inside.  Representations have changed in the past, and
  77 they may change again in the future.
  78 .SH "CREATION AND DESTRUCTION"
  79 The
  80 .B sym_table
  81 object itself needs to be allocated by the caller.  It is initialized by
  82 passing it to the function
  83 .BR sym_create .
  84 After initialization, the table contains no entries.
  85 .PP
  86 Initializing a symbol table involves allocating some memory.  If this
  87 allocation failes, an
  88 .B EXC_NOMEM
  89 exception is raised.
  90 .PP
  91 When a symbol table is no longer needed, the memory occupied by the
  92 values and other maintenance structures can be reclaimed by calling
  93 .BR sym_destroy .
  94 Any bits of user data attached to the symbol table values should have
  95 previously been destroyed.
  96 .SH "ADDING, SEARCHING AND REMOVING"
  97 Most of the actual work is done by the function
  98 .BR sym_find .
  99 It does both lookup and creation, depending on its arguments.  To do its
 100 job, it needs to know the following bits of information:
 101 .TP
 102 .I t
 103 A pointer to a symbol table to manipulate.
 104 .TP
 105 .I n
 106 The address of the
 107 .I key
 108 to look up or create.  Usually this will be a simple text string,
 109 although it can actually be any arbitrary binary data.
 110 .TP
 111 .I l
 112 The length of the key.  If this is \-1,
 113 .B sym_find
 114 assumes that the key is a null-terminated string, and calculates its
 115 length itself.
 116 .TP
 117 .I sz
 118 The size of the value block to allocate if the key could not be found.
 119 If this is zero, no value is allocated, and a null pointer is returned
 120 to indicate an unsuccessful lookup.
 121 .TP
 122 .I f
 123 The address of a `found' flag to set.  This is an output parameter.  On
 124 exit,
 125 .B sym_find
 126 will set the value of
 127 .BI * f
 128 to zero if the key could not be found, or nonzero if it was found.  This
 129 can be used to tell whether the value returned has been newly allocated,
 130 or whether it was already in the table.
 131 .PP
 132 The macro
 133 .B SYM_NAME
 134 will return the key associated with a given value block.  You must use
 135 this macro rather than fiddling inside the
 136 .B sym_base
 137 structure.
 138 .PP
 139 A symbol can be removed from the table by calling
 140 .BR sym_remove ,
 141 passing the symbol table itself, and the value block that needs
 142 removing.
 143 .SH ENUMERATION
 144 Enumerating the values in a symbol table is fairly simple.  Allocate a
 145 .B sym_iter
 146 object from somewhere.  Attach it to a symbol table by calling
 147 .BR sym_mkiter ,
 148 and passing in the addresses of the iterator and the symbol table.
 149 Then, each call to
 150 .B sym_next
 151 will return a different value from the symbol table, until all of them
 152 have been enumerated, at which point,
 153 .B sym_next
 154 returns a null pointer.
 155 .PP
 156 It's safe to remove the symbol you've just been returned by
 157 .BR sym_next .
 158 However, it's not safe to remove any other symbol.  So don't do that.
 159 .PP
 160 When you've finished with an iterator, it's safe to just throw it away.
 161 You don't need to call any functions beforehand.
 162 .SH "USE IN PRACTICE"
 163 In normal use, the keys are simple strings (usually identifiers from
 164 some language), and the values are nontrivial structures providing
 165 information about types and values.
 166 .PP
 167 In this case, you'd define something like the following structure for
 168 your values:
 169 .VS
 170 typedef struct val {
 171   sym_base _base;       /* Symbol header */
 172   unsigned type;        /* Type of this symbol */
 173   int dispoff;          /* Which display variable is in */
 174   size_t frameoff;      /* Offset of variable in frame */
 175 } val;
 176 .VE
 177 Given a pointer
 178 .I v
 179 to a
 180 .BR val ,
 181 you can find the variable's name by calling
 182 .BI SYM_NAME( v )\fR.
 183 .PP
 184 You can look up a name in the table by saying something like:
 185 .VS
 186 val *v = sym_find(t, name, -1, 0, 0);
 187 if (!v)
 188   error("unknown variable `%s'", name);
 189 .VE
 190 You can add in a new variable by saying something like
 191 .VS
 192 unsigned f;
 193 val *v = sym_find(t, name, -1, sizeof(val), &f);
 194 if (f)
 195   error("variable `%s' already exists", name);
 196 /* fill in v */
 197 .VE
 198 You can examine all the variables in your symbol table by saying
 199 something like:
 200 .VS
 201 sym_iter i;
 202 val *v;
 203
 204 for (sym_mkiter(&i, t); (v = sym_next(&i)) != 0; ) {
 205   /* ... */
 206 }
 207 .VE
 208 That ought to be enough examples to be getting on with.
 209 .SH CAVEATS
 210 The symbol table manager requires the suballocator (see
 211 .BR sub (3)
 212 for details).  You must ensure that
 213 .B sub_init
 214 has been called before using any symbol tables in your program.
 215 .SH IMPLEMENTATION
 216 The symbol table is an extensible hashtable, using a 32-bit CRC as the
 217 hash function.  The hash chains are kept very short (probably too short,
 218 actually).  Every time a symbol is found, its block is promoted to the
 219 front of its bin chain so it gets found faster next time.
 220 .SH SEE ALSO
 221 .BR sub (3),
 222 .BR mLib (3).
 223 .SH AUTHOR
 224 Mark Wooding, <mdw@nsict.org>