chiark / gitweb /
Various manual fixes.
[mLib] / man / unihash.3
CommitLineData
8fe3c82b 1.\" -*-nroff-*-
2.de VS
3.sp 1
4.RS
5.nf
6.ft B
7..
8.de VE
9.ft R
10.fi
11.RE
12.sp 1
13..
14.de hP
15.IP
16.ft B
17\h'-\w'\\$1\ 'u'\\$1\ \c
18.ft P
19..
20.ie t \{\
21. ds ss \s8\u
22. ds se \d\s0
23. ds us \s8\d
24. ds ue \u\s0
25. ds *d \(*d
26. ds >= \(>=
27.\}
28.el \{\
29. ds ss ^
30. ds se
31. ds us _
32. ds ue
33. ds *d \fIdelta\fP
34. ds >= >=
35.\}
36.TH unihash 3 "5 July 2003" "Straylight/Edgeware" "mLib utilities library"
37.SH NAME
38unihash \- simple and efficient universal hashing for hashtables
39.\" @unihash_setkey
40.\" @UNIHASH_INIT
41.\" @unihash_hash
42.\" @UNIHASH
43.\" @unihash
44.SH SYNOPSIS
45.nf
46.B "#include <mLib/unihash.h>"
47
6f444bda 48.B "unihash_info unihash_global;"
49
8fe3c82b 50.BI "void unihash_setkey(unihash_info *" i ", uint32 " k );
51.BI "uint32 UNIHASH_INIT(const unihash_info *" i );
3bc42912 52.BI "uint32 unihash_hash(const unihash_info *" i ", uint32 " a ,
53.BI " const void *" p ", size_t " sz );
8fe3c82b 54.BI "uint32 unihash(const unihash_info *" i ", const void *" p ", size_t " sz );
55.BI "uint32 UNIHASH(const unihash_info *" i ", const void *" p ", size_t " sz );
56.fi
57.SH DESCRIPTION
58The
59.B unihash
60system implements a simple and relatively efficient
61.IR "universal hashing family" .
62Using a such a universal hashing family means that it's provably
63difficult for an adversary to choose input data whose hashes collide,
64thus guaranteeing good average performance even on maliciously chosen
65data.
66.PP
67Unlike, say,
68.BR crc32 (3),
69the
70.B unihash
71function is
72.I keyed
73\- in addition to the data to be hashed, the function takes as input a
7432-bit key. This key should be chosen at random each time the program
75runs.
76.SS "Preprocessing a key"
77Before use, a key must be
78.I preprocessed
79into a large (16K) table which is used by the main hashing functions.
80The preprocessing is done by
81.BR unihash_setkey :
82pass it a pointer to a
83.B unihash_info
84structure and the 32-bit key you've chosen, and it stores the table in
85the structure.
86.PP
87Objects of type
88.B unihash_info
89don't contain any pointers to other data and are safe to free when
90you've finished with them; or you can just allocate them statically or
91on the stack if that's more convenient.
92.SS "Hashing data"
93The function
94.B unihash_hash
95takes as input:
96.TP
97.BI "const unihash_info *" i
98A pointer to the precomputed tables for a key.
99.TP
100.BI "uint32 " a
101An accumulator value. This should be
102.BI UNIHASH_INIT( i )
103for the first chunk of a multi-chunk input, or the result of the
104previous
105.B unihash_hash
106call for subsequent chunks.
107.TP
108.BI "const void *" p
109A pointer to the start of a buffer containing this chunk of data.
110.TP
111.BI "size_t " sz
112The length of the chunk.
113.PP
114The function returns a new accumulator value, which is also the hash of
115the data so far. So, to hash multiple chunks of data, do something like
116.VS
117uint32 a = UNIHASH_INIT(i);
118a = unihash_hash(i, a, p_0, sz_0);
119a = unihash_hash(i, a, p_1, sz_1);
120/* ... */
121a = unihash_hash(i, a, p_n, sz_n);
122.VE
123The macro
124.B UNIHASH
125and function
126.B unihash
127are convenient interfaces to
128.B unihash_hash
129if you only wanted to hash one chunk.
6f444bda 130.SS "Global hash info table"
131There's no problem with using the same key for several purposes, as long
132as it's secret from all of your adversaries. Therefore, there is a
133global
134.B unihash_info
135table define, called
136.BR unihash_global .
137This initially contains information for a fixed key which the author
138chose at random, but if you need to you can set a different key into it
139.I before
140it gets used to hash any data (otherwise your hash tables will become
141messed up).
8fe3c82b 142.SS "Theoretical issues"
143The hash function implemented by
144.B unihash
145is
146.RI ( l \ +\ 1)/2\*(ss32\*(se-almost
147XOR-universal, where
148.I l
149is the length (in bytes) of the longest string you hash. That means
150that, for any pair of strings
151.I x
152and
153.I y
154and any 32-bit value \*(*d, the probability taken over all choices of the
155key
156.I k
157that
158.IR H\*(usk\*(ue ( x )\ \c
159.BR xor \c
160.RI \ H\*(usk\*(ue ( y )\ =\ \*(*d
161is no greater than
162.RI ( l \ +\ 1)/2\*(ss32\*(se.
163.PP
164This fact is proven in the header file, but it requires more
165sophisticated typesetting than is available here.
166.PP
167The function evaluates a polynomial over GF(2\*(ss32\*(se) whose
168coefficients are the bytes of the message and whose variable is the key.
169Details are given in the header file.
170.PP
171For best results, you should choose the key as a random 32-bit number
172each time your program starts. Choosing a different key for different
173hashtables isn't necessary. It's probably a good idea to avoid the keys
1740 and 1. This raises the collision bound to
175.RI ( l \ +\ 1)/(2\*(ss32\*(se\ \-\ 2)
176(which isn't a significant increase) but eliminates keys for which the
177hash's behaviour is particularly poor.
178.PP
179In tests,
180.B unihash
181actually performed better than
182.BR crc32 ,
183so if you want to just use it as a fast-ish hash with good statistical
184properties, choose some fixed key
185.IR k \ \*(>=\ 2.
186.PP
187We emphasize that the proof of this function's collision behaviour is
188.I not
189dependent on any unproven assumptions (unlike many `proofs' of
190cryptographic security, which actually reduce the security of some
191construction to the security of its components). It's just a fact.
192.SH SEE ALSO
6f444bda 193.BR unihash-mkstatic (3),
8fe3c82b 194.BR crc32 (3),
195.BR mLib (3).
196.SH AUTHOR
9b5ac6ff 197Mark Wooding (mdw@distorted.org.uk).