chiark / gitweb /
@@@ man wip
[mLib] / hash / unihash.3
CommitLineData
8fe3c82b 1.\" -*-nroff-*-
2.de VS
3.sp 1
4.RS
5.nf
6.ft B
7..
8.de VE
9.ft R
10.fi
11.RE
12.sp 1
13..
14.de hP
15.IP
16.ft B
17\h'-\w'\\$1\ 'u'\\$1\ \c
18.ft P
19..
20.ie t \{\
21. ds ss \s8\u
22. ds se \d\s0
23. ds us \s8\d
24. ds ue \u\s0
25. ds *d \(*d
26. ds >= \(>=
27.\}
28.el \{\
29. ds ss ^
30. ds se
31. ds us _
32. ds ue
33. ds *d \fIdelta\fP
34. ds >= >=
35.\}
36.TH unihash 3 "5 July 2003" "Straylight/Edgeware" "mLib utilities library"
37.SH NAME
38unihash \- simple and efficient universal hashing for hashtables
39.\" @unihash_setkey
40.\" @UNIHASH_INIT
41.\" @unihash_hash
42.\" @UNIHASH
43.\" @unihash
44.SH SYNOPSIS
45.nf
46.B "#include <mLib/unihash.h>"
d056fbdf 47.PP
4729aa69 48.B "typedef struct { ...\& } unihash_info;"
d056fbdf 49.PP
6f444bda 50.B "unihash_info unihash_global;"
d056fbdf 51.PP
8fe3c82b 52.BI "void unihash_setkey(unihash_info *" i ", uint32 " k );
53.BI "uint32 UNIHASH_INIT(const unihash_info *" i );
adec5584 54.ta \w'\fBuint32 unihash_hash('u
3bc42912 55.BI "uint32 unihash_hash(const unihash_info *" i ", uint32 " a ,
adec5584 56.BI " const void *" p ", size_t " sz );
8fe3c82b 57.BI "uint32 unihash(const unihash_info *" i ", const void *" p ", size_t " sz );
58.BI "uint32 UNIHASH(const unihash_info *" i ", const void *" p ", size_t " sz );
59.fi
60.SH DESCRIPTION
61The
62.B unihash
63system implements a simple and relatively efficient
64.IR "universal hashing family" .
65Using a such a universal hashing family means that it's provably
66difficult for an adversary to choose input data whose hashes collide,
67thus guaranteeing good average performance even on maliciously chosen
68data.
69.PP
70Unlike, say,
71.BR crc32 (3),
72the
73.B unihash
74function is
75.I keyed
76\- in addition to the data to be hashed, the function takes as input a
7732-bit key. This key should be chosen at random each time the program
78runs.
79.SS "Preprocessing a key"
80Before use, a key must be
81.I preprocessed
82into a large (16K) table which is used by the main hashing functions.
83The preprocessing is done by
84.BR unihash_setkey :
85pass it a pointer to a
86.B unihash_info
87structure and the 32-bit key you've chosen, and it stores the table in
88the structure.
89.PP
90Objects of type
91.B unihash_info
92don't contain any pointers to other data and are safe to free when
93you've finished with them; or you can just allocate them statically or
94on the stack if that's more convenient.
95.SS "Hashing data"
96The function
97.B unihash_hash
98takes as input:
99.TP
100.BI "const unihash_info *" i
101A pointer to the precomputed tables for a key.
102.TP
d4efbcd9 103.BI "uint32 " a
8fe3c82b 104An accumulator value. This should be
105.BI UNIHASH_INIT( i )
106for the first chunk of a multi-chunk input, or the result of the
107previous
108.B unihash_hash
109call for subsequent chunks.
110.TP
111.BI "const void *" p
112A pointer to the start of a buffer containing this chunk of data.
113.TP
114.BI "size_t " sz
115The length of the chunk.
116.PP
117The function returns a new accumulator value, which is also the hash of
118the data so far. So, to hash multiple chunks of data, do something like
119.VS
120uint32 a = UNIHASH_INIT(i);
121a = unihash_hash(i, a, p_0, sz_0);
122a = unihash_hash(i, a, p_1, sz_1);
adec5584 123/* ...\& */
8fe3c82b 124a = unihash_hash(i, a, p_n, sz_n);
125.VE
126The macro
127.B UNIHASH
128and function
129.B unihash
130are convenient interfaces to
131.B unihash_hash
132if you only wanted to hash one chunk.
6f444bda 133.SS "Global hash info table"
134There's no problem with using the same key for several purposes, as long
135as it's secret from all of your adversaries. Therefore, there is a
136global
137.B unihash_info
138table define, called
139.BR unihash_global .
140This initially contains information for a fixed key which the author
141chose at random, but if you need to you can set a different key into it
142.I before
143it gets used to hash any data (otherwise your hash tables will become
144messed up).
8fe3c82b 145.SS "Theoretical issues"
146The hash function implemented by
147.B unihash
148is
149.RI ( l \ +\ 1)/2\*(ss32\*(se-almost
150XOR-universal, where
151.I l
152is the length (in bytes) of the longest string you hash. That means
153that, for any pair of strings
154.I x
155and
156.I y
157and any 32-bit value \*(*d, the probability taken over all choices of the
158key
159.I k
160that
161.IR H\*(usk\*(ue ( x )\ \c
162.BR xor \c
163.RI \ H\*(usk\*(ue ( y )\ =\ \*(*d
164is no greater than
165.RI ( l \ +\ 1)/2\*(ss32\*(se.
166.PP
167This fact is proven in the header file, but it requires more
168sophisticated typesetting than is available here.
169.PP
170The function evaluates a polynomial over GF(2\*(ss32\*(se) whose
171coefficients are the bytes of the message and whose variable is the key.
172Details are given in the header file.
173.PP
174For best results, you should choose the key as a random 32-bit number
175each time your program starts. Choosing a different key for different
176hashtables isn't necessary. It's probably a good idea to avoid the keys
1770 and 1. This raises the collision bound to
178.RI ( l \ +\ 1)/(2\*(ss32\*(se\ \-\ 2)
179(which isn't a significant increase) but eliminates keys for which the
180hash's behaviour is particularly poor.
181.PP
182In tests,
183.B unihash
184actually performed better than
185.BR crc32 ,
186so if you want to just use it as a fast-ish hash with good statistical
187properties, choose some fixed key
188.IR k \ \*(>=\ 2.
189.PP
190We emphasize that the proof of this function's collision behaviour is
191.I not
192dependent on any unproven assumptions (unlike many `proofs' of
193cryptographic security, which actually reduce the security of some
194construction to the security of its components). It's just a fact.
195.SH SEE ALSO
6f444bda 196.BR unihash-mkstatic (3),
8fe3c82b 197.BR crc32 (3),
198.BR mLib (3).
199.SH AUTHOR
9b5ac6ff 200Mark Wooding (mdw@distorted.org.uk).