chiark / gitweb /
@@@ wip type definitions in manpage synopses
[mLib] / struct / hash.3
CommitLineData
03d53b73 1.\" -*-nroff-*-
2.de VS
3.sp 1
4.RS
5.nf
6.ft B
7..
8.de VE
9.ft R
10.fi
11.RE
12.sp 1
13..
14.de hP
15.IP
16.ft B
17\h'-\w'\\$1\ 'u'\\$1\ \c
18.ft P
19..
20.ie t .ds o \(bu
21.el .ds o o
fbf20b5b 22.TH hash 3 "2 August 1999" "Straylight/Edgeware" "mLib utilities library"
03d53b73 23.SH "NAME"
24hash \- low-level hashtable implementation
25.\" @hash_create
26.\" @hash_destroy
27.\" @hash_bin
28.\" @hash_extend
29.\" @hash_remove
30.\" @hash_mkiter
31.\" @hash_next
32.\"
33.\" @HASH_BIN
34.\" @HASH_MKITER
35.\" @HASH_NEXT
36.\"
37.SH "SYNOPSIS"
38.nf
39.B "#include <mLib/hash.h>"
40
4729aa69
MW
41.B "typedef struct {"
42.B "\h'4n'uint32 mask;"
43.B "\h'4n'hash_base **v;"
44.B "\h'4n'arena *a;"
45.B "} hash_table;"
46
47.B "typedef struct {"
48.B "\h'4n'hash_base *next;"
49.B "\h'4n'uint32 hash;"
50.B "} hash_base;"
51
52.B "typedef struct { ...\& } hash_iter;"
53
03d53b73 54.BI "void hash_create(hash_table *" t ", size_t " n );
55.BI "void hash_destroy(hash_table *" t );
56.BI "hash_base **hash_bin(hash_table *" t ", uint32 " hash );
57.BI "int hash_extend(hash_table *" t );
db0e70a1 58.BI "void hash_remove(hash_table *" t ", hash_base *" b );
03d53b73 59.BI "void hash_mkiter(hash_iter *" i ", hash_table *" t );
60.BI "hash_base *hash_next(hash_iter *" i );
61
62.BI "hash_base **HASH_BIN(hash_table *" t ", uint32 " hash );
63.BI "void HASH_MKITER(hash_iter *" i ", hash_table *" t );
64.BI "void HASH_NEXT(hash_iter *" i ", " b );
65.fi
66.SH "OVERVIEW"
67The
68.B hash
69functions provide the basis for an extensible hashtable implementation.
70The implementation is not complete. Many decisions have been left to
71the user, including:
72.hP \*o
528c8b4d 73how keys should be represented, hashed and compared;
03d53b73 74.hP \*o
528c8b4d 75how objects contained within the table should be allocated; and
03d53b73 76.hP \*o
528c8b4d 77when the hashtable should be extended.
03d53b73 78.PP
79A complete hashtable implementation will need to take the above
80decisions. If you just want a prepackaged solution, see
81.BR sym (3)
82which provides one.
83.SH "IMPLEMENTATION DETAILS"
84Each item in the hashtable is assigned a 32-bit integer
85.IR hash :
86a number computed somehow from the item's data such that two items which
87are considered equal will yield the same hash. Ideally, different items
88will yield different hashes. It is important for this implementation
89that all bits of the hash are similarly good.
90.PP
91In order to look up an item, the high bits of the hash are masked off
92and the remainder used as an index into a vector of
93.IR "bin lists" .
94Each bin contains a list of items with identical low-order bits of their
95hashes.
96.PP
97A table expansion involves doubling the size of the bin vector. Each
98bin list is then split into two, items being placed into a new bin
99depending on the setting of the next lowest hash bit. By expanding the
100hashtable as needed, lookups remain constant-time.
101.PP
102A low-level hashtable is represented by a
103.B hash_table
104structure. It contains two members:
105.TP
ff76c38f 106.B "uint32 mask"
03d53b73 107The current bitmask to be applied to hashes. This is one less than the
108current number of bins in the hashtable, and is applied to hash values
109in order to decide which bin an item should be in.
110.TP
ff76c38f 111.B "hash_base **v"
03d53b73 112The bin vector. It is an array of pointers to hashtable items.
113.PP
114A hashtable item consists of a
115.B hash_base
116structure. A full hashtable implementation will need to extend this
117structure by adding keying information and other data; the
118.B hash_base
119only contains the bare minimum of information needed to maintain the
120hashtable at a low level. It contains the following members:
121.TP
ff76c38f 122.B "hash_base *next"
03d53b73 123Pointer to the next item in the bin list. The final item has a null
124.B next
125pointer. The entry in the bin vector is null if the bin list is empty.
126It is up to the high-level implementation to insert items into the list.
127.TP
ff76c38f 128.B "uint32 hash"
03d53b73 129The hash for this item. This must be the full 32-bit hash for the
130current item. It is used during hashtable expansion to determine which
131bin an item should be moved to.
132.SH "FUNCTIONALITY PROVIDED"
133This section describes the functions and macros provided for building
134hashtables. Code examples are given throughout. They assume the
135following definitions:
136.VS
137/* --- A table of items --- */
138
139typedef struct item_table {
140 hash_table t;
141 size_t load;
142};
143
144/* --- An item --- */
145
146typedef struct item {
147 hash_base b;
148 const char *k;
149 /* ... */
150} item;
151.VE
152The implementation presented here is simple but relatively bad. The
153source file
154.B sym.c
155presents a more realistic example, but is rather more complex.
156.SS "Initialization and finalization"
157An empty hashtable is initialized by calling
158.B hash_create
159with the address of a
160.B hash_table
161structure to be filled in and the initial number of hash bins to create.
162.PP
163For example, an item table might be initialized like this:
164.VS
165void item_createtab(item_table *t)
166{
167 hash_create(&t->t, ITEM_INITSZ);
168 t->load = ITEM_INITLOAD;
169}
170.VE
171A hashtable can be destroyed by calling
172.B hash_destroy
173with the address of the
174.B hash_table
175structure. This does not attempt to deallocate the individual items;
176that must be done beforehand.
177.PP
178The usual way to deallocate the individual hashtable items is using the
179iteration constructs described below.
180.VS
181void item_destroytab(item_table *t)
182{
183 hash_iter i;
184 hash_base *b;
185 for (hash_mkiter(&i, &t->t); (b = hash_next(&i)) != 0; ) {
186 item *ii = (item *)b;
187 free(ii->k);
188 /* ... */
189 DESTROY(ii);
190 }
191 hash_destroy(&t->t);
192}
193.VE
194.sp -1
195.SS "Searching, adding and removing"
196Items must be searched for and added by hand.
197.PP
198The macro
199.B HASH_BIN
200returns the address of the bin list haed for a particular hashtable and
201hash value. The function
202.B hash_bin
203works the same way and provides the same result, but since the macro is
204very simple its use is preferred. However, it will evaluate its
205arguments multiple times.
206.PP
207Once the bin list has been found, it's fairly easy to search for an
208exact match. A simple search might look something like this:
209.VS
210item *lookup(item_table *t, const char *k)
211{
212 uint32 h = hash(k); /* Hash @k@ somehow */
213 hash_base **bin = HASH_BIN(&t->t, h);
214 hash_base *b;
215 for (b = *bin; b; b = b->next) {
216 item *i = (item *)b;
217 if (h == i->b.hash && strcmp(k, i->k) == 0)
218 return (i);
219 }
220 return (0);
221}
222.VE
223Insertion is also relatively trivial given the bin list head. Insertion
224may make the hashtable too large, so it might be necessary to extend
225it. Extension is performed by
226.BR hash_extend ,
227which is passed only the address of the hashtable. It returns nonzero
228if extension was successful.
229.VS
230item *add(item_table *t, const char *k, /* ... */)
231{
232 item *i;
233 uint32 h;
234 hash_base **bin;
235
236 /* --- See if the item is already there --- */
237
238 if ((i = = lookup(t, k)) != 0)
239 return (i);
240
241 /* --- Make a new hashtable item --- */
242
243 i = CREATE(item);
244 i->k = xstrdup(k);
245 /* ... */
246
247 /* --- Link it into the bin list --- */
248
249 h = i->b.hash = hash(k);
250 bin = HASH_BIN(&t->t, h);
251 i->b.next = *bin;
252 *bin = &i->b.next;
253
254 /* --- Maybe extend the hashtable --- */
255
256 if (t->load)
257 t->load--;
258 else if (hash_extend(&t->t))
259 t->load = recalc_load(t);
260
261 /* --- Done --- */
262
263 return (i);
264}
265.VE
266The
267.B sym
268implementation is rather more sophisticated in its approach but the idea
269is the same. In particular,
270.B sym
271provides a single interface for lookup and insertion which prevents the
272unnecessary rehashing performed by the above code.
273.PP
274Removal of items is more straightforward. The function
275.B hash_remove
276will unlink a given item from its bin list, after which point it is safe
277to remove.
278.SS "Iteration"
279Iteration allows code to be performed on all the items in a hashtable.
280This is done using an
281.I iterator
282object, represented by a
283.B hash_iter
284structure. An iterator is initialized by calling
285.BR hash_mkiter .
286Each call to
287.B hash_next
288yields a different item from the hashtable until there are none left, a
289condition signified by a null return value.
290.PP
291The macros
292.B HASH_MKITER
293and
294.B HASH_NEXT
295do the same jobs as the above functions. However,
296.B HASH_NEXT
297has a slightly bizarre argument passing convention: its second argument
298is an
299.I lvalue
300which is updated to contain the address of the next item.
301.PP
302The finalization code above contained an example of iteration.
303.SH "SEE ALSO"
304.BR sym (3),
305.BR mLib (3).
306.SH "AUTHOR"
9b5ac6ff 307Mark Wooding, <mdw@distorted.org.uk>