.\" -*-nroff-*-
.de VS
.sp 1
.RS
.nf
.ft B
..
.de VE
.ft R
.fi
.RE
.sp 1
..
.de hP
.IP
.ft B
\h'-\w'\\$1\ 'u'\\$1\ \c
.ft P
..
.ie t .ds o \(bu
.el .ds o o
.TH hash 3 "2 August 1999" "Straylight/Edgeware" "mLib utilities library"
.SH "NAME"
hash \- low-level hashtable implementation
.\" @hash_create
.\" @hash_destroy
.\" @hash_bin
.\" @hash_extend
.\" @hash_remove
.\" @hash_mkiter
.\" @hash_next
.\"
.\" @HASH_BIN
.\" @HASH_MKITER
.\" @HASH_NEXT
.\"
.SH "SYNOPSIS"
.nf
.B "#include <mLib/hash.h>"

.BI "void hash_create(hash_table *" t ", size_t " n );
.BI "void hash_destroy(hash_table *" t );
.BI "hash_base **hash_bin(hash_table *" t ", uint32 " hash );
.BI "int hash_extend(hash_table *" t );
.BI "void hash_remove(hash_table *" t ", hash_base * " b );
.BI "void hash_mkiter(hash_iter *" i ", hash_table *" t );
.BI "hash_base *hash_next(hash_iter *" i );

.BI "hash_base **HASH_BIN(hash_table *" t ", uint32 " hash );
.BI "void HASH_MKITER(hash_iter *" i ", hash_table *" t );
.BI "void HASH_NEXT(hash_iter *" i ", " b );
.fi
.SH "OVERVIEW"
The
.B hash
functions provide the basis for an extensible hashtable implementation.
The implementation is not complete.  Many decisions have been left to
the user, including:
.hP \*o
how keys should be represented, hashed and compared;
.hP \*o
how objects contained within the table should be allocated; and
.hP \*o
when the hashtable should be extended.
.PP
A complete hashtable implementation will need to take the above
decisions.  If you just want a prepackaged solution, see
.BR sym (3)
which provides one.
.SH "IMPLEMENTATION DETAILS"
Each item in the hashtable is assigned a 32-bit integer
.IR hash :
a number computed somehow from the item's data such that two items which
are considered equal will yield the same hash.  Ideally, different items
will yield different hashes.  It is important for this implementation
that all bits of the hash are similarly good.
.PP
In order to look up an item, the high bits of the hash are masked off
and the remainder used as an index into a vector of
.IR "bin lists" .
Each bin contains a list of items with identical low-order bits of their
hashes.
.PP
A table expansion involves doubling the size of the bin vector.  Each
bin list is then split into two, items being placed into a new bin
depending on the setting of the next lowest hash bit.  By expanding the
hashtable as needed, lookups remain constant-time.
.PP
A low-level hashtable is represented by a
.B hash_table
structure.  It contains two members:
.TP
.B "uint32 mask"
The current bitmask to be applied to hashes.  This is one less than the
current number of bins in the hashtable, and is applied to hash values
in order to decide which bin an item should be in.
.TP
.B "hash_base **v"
The bin vector.  It is an array of pointers to hashtable items.
.PP
A hashtable item consists of a
.B hash_base
structure.  A full hashtable implementation will need to extend this
structure by adding keying information and other data; the
.B hash_base
only contains the bare minimum of information needed to maintain the
hashtable at a low level.  It contains the following members:
.TP
.B "hash_base *next"
Pointer to the next item in the bin list.  The final item has a null
.B next
pointer.  The entry in the bin vector is null if the bin list is empty.
It is up to the high-level implementation to insert items into the list.
.TP
.B "uint32 hash"
The hash for this item.  This must be the full 32-bit hash for the
current item.  It is used during hashtable expansion to determine which
bin an item should be moved to.
.SH "FUNCTIONALITY PROVIDED"
This section describes the functions and macros provided for building
hashtables.  Code examples are given throughout.  They assume the
following definitions:
.VS
/* --- A table of items --- */

typedef struct item_table {
  hash_table t;
  size_t load;
};

/* --- An item --- */

typedef struct item {
  hash_base b;
  const char *k;
  /* ... */
} item;
.VE
The implementation presented here is simple but relatively bad.  The
source file
.B sym.c
presents a more realistic example, but is rather more complex.
.SS "Initialization and finalization"
An empty hashtable is initialized by calling
.B hash_create
with the address of a
.B hash_table
structure to be filled in and the initial number of hash bins to create.
.PP
For example, an item table might be initialized like this:
.VS
void item_createtab(item_table *t)
{
  hash_create(&t->t, ITEM_INITSZ);
  t->load = ITEM_INITLOAD;
}
.VE
A hashtable can be destroyed by calling
.B hash_destroy
with the address of the
.B hash_table
structure.  This does not attempt to deallocate the individual items;
that must be done beforehand.
.PP
The usual way to deallocate the individual hashtable items is using the
iteration constructs described below.
.VS
void item_destroytab(item_table *t)
{
  hash_iter i;
  hash_base *b;
  for (hash_mkiter(&i, &t->t); (b = hash_next(&i)) != 0; ) {
    item *ii = (item *)b;
    free(ii->k);
    /* ... */
    DESTROY(ii);
  }
  hash_destroy(&t->t);
}
.VE
.sp -1
.SS "Searching, adding and removing"
Items must be searched for and added by hand.
.PP
The macro
.B HASH_BIN
returns the address of the bin list haed for a particular hashtable and
hash value.  The function
.B hash_bin
works the same way and provides the same result, but since the macro is
very simple its use is preferred.  However, it will evaluate its
arguments multiple times.
.PP
Once the bin list has been found, it's fairly easy to search for an
exact match.  A simple search might look something like this:
.VS
item *lookup(item_table *t, const char *k)
{
  uint32 h = hash(k);		/* Hash @k@ somehow */
  hash_base **bin = HASH_BIN(&t->t, h);
  hash_base *b;
  for (b = *bin; b; b = b->next) {
    item *i = (item *)b;
    if (h == i->b.hash && strcmp(k, i->k) == 0)
      return (i);
  }
  return (0);
}
.VE
Insertion is also relatively trivial given the bin list head.  Insertion
may make the hashtable too large, so it might be necessary to extend
it.  Extension is performed by
.BR hash_extend ,
which is passed only the address of the hashtable.  It returns nonzero
if extension was successful.
.VS
item *add(item_table *t, const char *k, /* ... */)
{
  item *i;
  uint32 h;
  hash_base **bin;

  /* --- See if the item is already there --- */

  if ((i =  = lookup(t, k)) != 0)
    return (i);

  /* --- Make a new hashtable item --- */

  i = CREATE(item);
  i->k = xstrdup(k);
  /* ... */

  /* --- Link it into the bin list --- */

  h = i->b.hash = hash(k);
  bin = HASH_BIN(&t->t, h);
  i->b.next = *bin;
  *bin = &i->b.next;

  /* --- Maybe extend the hashtable --- */

  if (t->load)
    t->load--;
  else if (hash_extend(&t->t))
    t->load = recalc_load(t);

  /* --- Done --- */

  return (i);
}
.VE
The
.B sym
implementation is rather more sophisticated in its approach but the idea
is the same.  In particular,
.B sym
provides a single interface for lookup and insertion which prevents the
unnecessary rehashing performed by the above code.
.PP
Removal of items is more straightforward.  The function
.B hash_remove
will unlink a given item from its bin list, after which point it is safe
to remove.
.SS "Iteration"
Iteration allows code to be performed on all the items in a hashtable.
This is done using an
.I iterator
object, represented by a
.B hash_iter
structure.  An iterator is initialized by calling
.BR hash_mkiter .
Each call to
.B hash_next
yields a different item from the hashtable until there are none left, a
condition signified by a null return value.
.PP
The macros
.B HASH_MKITER
and
.B HASH_NEXT
do the same jobs as the above functions.  However,
.B HASH_NEXT
has a slightly bizarre argument passing convention: its second argument
is an
.I lvalue
which is updated to contain the address of the next item.
.PP
The finalization code above contained an example of iteration.
.SH "SEE ALSO"
.BR sym (3),
.BR mLib (3).
.SH "AUTHOR"
Mark Wooding, <mdw@distorted.org.uk>