.\" -*-nroff-*- .de VS .sp 1 .RS .nf .ft B .. .de VE .ft R .fi .RE .sp 1 .. .de hP .IP .ft B \h'-\w'\\$1\ 'u'\\$1\ \c .ft P .. .ie t .ds o \(bu .el .ds o o .TH hash 3 "2 August 1999" "Straylight/Edgeware" "mLib utilities library" .SH "NAME" hash \- low-level hashtable implementation .\" @hash_create .\" @hash_destroy .\" @hash_bin .\" @hash_extend .\" @hash_remove .\" @hash_mkiter .\" @hash_next .\" .\" @HASH_BIN .\" @HASH_MKITER .\" @HASH_NEXT .\" .SH "SYNOPSIS" .nf .B "#include " .BI "void hash_create(hash_table *" t ", size_t " n ); .BI "void hash_destroy(hash_table *" t ); .BI "hash_base **hash_bin(hash_table *" t ", uint32 " hash ); .BI "int hash_extend(hash_table *" t ); .BI "void hash_remove(hash_table *" t ", hash_base * " b ); .BI "void hash_mkiter(hash_iter *" i ", hash_table *" t ); .BI "hash_base *hash_next(hash_iter *" i ); .BI "hash_base **HASH_BIN(hash_table *" t ", uint32 " hash ); .BI "void HASH_MKITER(hash_iter *" i ", hash_table *" t ); .BI "void HASH_NEXT(hash_iter *" i ", " b ); .fi .SH "OVERVIEW" The .B hash functions provide the basis for an extensible hashtable implementation. The implementation is not complete. Many decisions have been left to the user, including: .hP \*o how keys should be represented, hashed and compared; .hP \*o how objects contained within the table should be allocated; and .hP \*o when the hashtable should be extended. .PP A complete hashtable implementation will need to take the above decisions. If you just want a prepackaged solution, see .BR sym (3) which provides one. .SH "IMPLEMENTATION DETAILS" Each item in the hashtable is assigned a 32-bit integer .IR hash : a number computed somehow from the item's data such that two items which are considered equal will yield the same hash. Ideally, different items will yield different hashes. It is important for this implementation that all bits of the hash are similarly good. .PP In order to look up an item, the high bits of the hash are masked off and the remainder used as an index into a vector of .IR "bin lists" . Each bin contains a list of items with identical low-order bits of their hashes. .PP A table expansion involves doubling the size of the bin vector. Each bin list is then split into two, items being placed into a new bin depending on the setting of the next lowest hash bit. By expanding the hashtable as needed, lookups remain constant-time. .PP A low-level hashtable is represented by a .B hash_table structure. It contains two members: .TP .B "uint32 mask" The current bitmask to be applied to hashes. This is one less than the current number of bins in the hashtable, and is applied to hash values in order to decide which bin an item should be in. .TP .B "hash_base **v" The bin vector. It is an array of pointers to hashtable items. .PP A hashtable item consists of a .B hash_base structure. A full hashtable implementation will need to extend this structure by adding keying information and other data; the .B hash_base only contains the bare minimum of information needed to maintain the hashtable at a low level. It contains the following members: .TP .B "hash_base *next" Pointer to the next item in the bin list. The final item has a null .B next pointer. The entry in the bin vector is null if the bin list is empty. It is up to the high-level implementation to insert items into the list. .TP .B "uint32 hash" The hash for this item. This must be the full 32-bit hash for the current item. It is used during hashtable expansion to determine which bin an item should be moved to. .SH "FUNCTIONALITY PROVIDED" This section describes the functions and macros provided for building hashtables. Code examples are given throughout. They assume the following definitions: .VS /* --- A table of items --- */ typedef struct item_table { hash_table t; size_t load; }; /* --- An item --- */ typedef struct item { hash_base b; const char *k; /* ... */ } item; .VE The implementation presented here is simple but relatively bad. The source file .B sym.c presents a more realistic example, but is rather more complex. .SS "Initialization and finalization" An empty hashtable is initialized by calling .B hash_create with the address of a .B hash_table structure to be filled in and the initial number of hash bins to create. .PP For example, an item table might be initialized like this: .VS void item_createtab(item_table *t) { hash_create(&t->t, ITEM_INITSZ); t->load = ITEM_INITLOAD; } .VE A hashtable can be destroyed by calling .B hash_destroy with the address of the .B hash_table structure. This does not attempt to deallocate the individual items; that must be done beforehand. .PP The usual way to deallocate the individual hashtable items is using the iteration constructs described below. .VS void item_destroytab(item_table *t) { hash_iter i; hash_base *b; for (hash_mkiter(&i, &t->t); (b = hash_next(&i)) != 0; ) { item *ii = (item *)b; free(ii->k); /* ... */ DESTROY(ii); } hash_destroy(&t->t); } .VE .sp -1 .SS "Searching, adding and removing" Items must be searched for and added by hand. .PP The macro .B HASH_BIN returns the address of the bin list haed for a particular hashtable and hash value. The function .B hash_bin works the same way and provides the same result, but since the macro is very simple its use is preferred. However, it will evaluate its arguments multiple times. .PP Once the bin list has been found, it's fairly easy to search for an exact match. A simple search might look something like this: .VS item *lookup(item_table *t, const char *k) { uint32 h = hash(k); /* Hash @k@ somehow */ hash_base **bin = HASH_BIN(&t->t, h); hash_base *b; for (b = *bin; b; b = b->next) { item *i = (item *)b; if (h == i->b.hash && strcmp(k, i->k) == 0) return (i); } return (0); } .VE Insertion is also relatively trivial given the bin list head. Insertion may make the hashtable too large, so it might be necessary to extend it. Extension is performed by .BR hash_extend , which is passed only the address of the hashtable. It returns nonzero if extension was successful. .VS item *add(item_table *t, const char *k, /* ... */) { item *i; uint32 h; hash_base **bin; /* --- See if the item is already there --- */ if ((i = = lookup(t, k)) != 0) return (i); /* --- Make a new hashtable item --- */ i = CREATE(item); i->k = xstrdup(k); /* ... */ /* --- Link it into the bin list --- */ h = i->b.hash = hash(k); bin = HASH_BIN(&t->t, h); i->b.next = *bin; *bin = &i->b.next; /* --- Maybe extend the hashtable --- */ if (t->load) t->load--; else if (hash_extend(&t->t)) t->load = recalc_load(t); /* --- Done --- */ return (i); } .VE The .B sym implementation is rather more sophisticated in its approach but the idea is the same. In particular, .B sym provides a single interface for lookup and insertion which prevents the unnecessary rehashing performed by the above code. .PP Removal of items is more straightforward. The function .B hash_remove will unlink a given item from its bin list, after which point it is safe to remove. .SS "Iteration" Iteration allows code to be performed on all the items in a hashtable. This is done using an .I iterator object, represented by a .B hash_iter structure. An iterator is initialized by calling .BR hash_mkiter . Each call to .B hash_next yields a different item from the hashtable until there are none left, a condition signified by a null return value. .PP The macros .B HASH_MKITER and .B HASH_NEXT do the same jobs as the above functions. However, .B HASH_NEXT has a slightly bizarre argument passing convention: its second argument is an .I lvalue which is updated to contain the address of the next item. .PP The finalization code above contained an example of iteration. .SH "SEE ALSO" .BR sym (3), .BR mLib (3). .SH "AUTHOR" Mark Wooding,