8fe3c82b |
1 | .\" -*-nroff-*- |
2 | .de VS |
3 | .sp 1 |
4 | .RS |
5 | .nf |
6 | .ft B |
7 | .. |
8 | .de VE |
9 | .ft R |
10 | .fi |
11 | .RE |
12 | .sp 1 |
13 | .. |
14 | .de hP |
15 | .IP |
16 | .ft B |
17 | \h'-\w'\\$1\ 'u'\\$1\ \c |
18 | .ft P |
19 | .. |
20 | .ie t \{\ |
21 | . ds ss \s8\u |
22 | . ds se \d\s0 |
23 | . ds us \s8\d |
24 | . ds ue \u\s0 |
25 | . ds *d \(*d |
26 | . ds >= \(>= |
27 | .\} |
28 | .el \{\ |
29 | . ds ss ^ |
30 | . ds se |
31 | . ds us _ |
32 | . ds ue |
33 | . ds *d \fIdelta\fP |
34 | . ds >= >= |
35 | .\} |
36 | .TH unihash 3 "5 July 2003" "Straylight/Edgeware" "mLib utilities library" |
37 | .SH NAME |
38 | unihash \- simple and efficient universal hashing for hashtables |
39 | .\" @unihash_setkey |
40 | .\" @UNIHASH_INIT |
41 | .\" @unihash_hash |
42 | .\" @UNIHASH |
43 | .\" @unihash |
44 | .SH SYNOPSIS |
45 | .nf |
46 | .B "#include <mLib/unihash.h>" |
47 | |
48 | .BI "void unihash_setkey(unihash_info *" i ", uint32 " k ); |
49 | .BI "uint32 UNIHASH_INIT(const unihash_info *" i ); |
50 | .BI "void unihash_hash(const unihash_info *" st ", uint32 " a , |
51 | .BI " const void *" p ", size_t " sz ); |
52 | .BI "uint32 unihash(const unihash_info *" i ", const void *" p ", size_t " sz ); |
53 | .BI "uint32 UNIHASH(const unihash_info *" i ", const void *" p ", size_t " sz ); |
54 | .fi |
55 | .SH DESCRIPTION |
56 | The |
57 | .B unihash |
58 | system implements a simple and relatively efficient |
59 | .IR "universal hashing family" . |
60 | Using a such a universal hashing family means that it's provably |
61 | difficult for an adversary to choose input data whose hashes collide, |
62 | thus guaranteeing good average performance even on maliciously chosen |
63 | data. |
64 | .PP |
65 | Unlike, say, |
66 | .BR crc32 (3), |
67 | the |
68 | .B unihash |
69 | function is |
70 | .I keyed |
71 | \- in addition to the data to be hashed, the function takes as input a |
72 | 32-bit key. This key should be chosen at random each time the program |
73 | runs. |
74 | .SS "Preprocessing a key" |
75 | Before use, a key must be |
76 | .I preprocessed |
77 | into a large (16K) table which is used by the main hashing functions. |
78 | The preprocessing is done by |
79 | .BR unihash_setkey : |
80 | pass it a pointer to a |
81 | .B unihash_info |
82 | structure and the 32-bit key you've chosen, and it stores the table in |
83 | the structure. |
84 | .PP |
85 | Objects of type |
86 | .B unihash_info |
87 | don't contain any pointers to other data and are safe to free when |
88 | you've finished with them; or you can just allocate them statically or |
89 | on the stack if that's more convenient. |
90 | .SS "Hashing data" |
91 | The function |
92 | .B unihash_hash |
93 | takes as input: |
94 | .TP |
95 | .BI "const unihash_info *" i |
96 | A pointer to the precomputed tables for a key. |
97 | .TP |
98 | .BI "uint32 " a |
99 | An accumulator value. This should be |
100 | .BI UNIHASH_INIT( i ) |
101 | for the first chunk of a multi-chunk input, or the result of the |
102 | previous |
103 | .B unihash_hash |
104 | call for subsequent chunks. |
105 | .TP |
106 | .BI "const void *" p |
107 | A pointer to the start of a buffer containing this chunk of data. |
108 | .TP |
109 | .BI "size_t " sz |
110 | The length of the chunk. |
111 | .PP |
112 | The function returns a new accumulator value, which is also the hash of |
113 | the data so far. So, to hash multiple chunks of data, do something like |
114 | .VS |
115 | uint32 a = UNIHASH_INIT(i); |
116 | a = unihash_hash(i, a, p_0, sz_0); |
117 | a = unihash_hash(i, a, p_1, sz_1); |
118 | /* ... */ |
119 | a = unihash_hash(i, a, p_n, sz_n); |
120 | .VE |
121 | The macro |
122 | .B UNIHASH |
123 | and function |
124 | .B unihash |
125 | are convenient interfaces to |
126 | .B unihash_hash |
127 | if you only wanted to hash one chunk. |
128 | .SS "Theoretical issues" |
129 | The hash function implemented by |
130 | .B unihash |
131 | is |
132 | .RI ( l \ +\ 1)/2\*(ss32\*(se-almost |
133 | XOR-universal, where |
134 | .I l |
135 | is the length (in bytes) of the longest string you hash. That means |
136 | that, for any pair of strings |
137 | .I x |
138 | and |
139 | .I y |
140 | and any 32-bit value \*(*d, the probability taken over all choices of the |
141 | key |
142 | .I k |
143 | that |
144 | .IR H\*(usk\*(ue ( x )\ \c |
145 | .BR xor \c |
146 | .RI \ H\*(usk\*(ue ( y )\ =\ \*(*d |
147 | is no greater than |
148 | .RI ( l \ +\ 1)/2\*(ss32\*(se. |
149 | .PP |
150 | This fact is proven in the header file, but it requires more |
151 | sophisticated typesetting than is available here. |
152 | .PP |
153 | The function evaluates a polynomial over GF(2\*(ss32\*(se) whose |
154 | coefficients are the bytes of the message and whose variable is the key. |
155 | Details are given in the header file. |
156 | .PP |
157 | For best results, you should choose the key as a random 32-bit number |
158 | each time your program starts. Choosing a different key for different |
159 | hashtables isn't necessary. It's probably a good idea to avoid the keys |
160 | 0 and 1. This raises the collision bound to |
161 | .RI ( l \ +\ 1)/(2\*(ss32\*(se\ \-\ 2) |
162 | (which isn't a significant increase) but eliminates keys for which the |
163 | hash's behaviour is particularly poor. |
164 | .PP |
165 | In tests, |
166 | .B unihash |
167 | actually performed better than |
168 | .BR crc32 , |
169 | so if you want to just use it as a fast-ish hash with good statistical |
170 | properties, choose some fixed key |
171 | .IR k \ \*(>=\ 2. |
172 | .PP |
173 | We emphasize that the proof of this function's collision behaviour is |
174 | .I not |
175 | dependent on any unproven assumptions (unlike many `proofs' of |
176 | cryptographic security, which actually reduce the security of some |
177 | construction to the security of its components). It's just a fact. |
178 | .SH SEE ALSO |
179 | .BR crc32 (3), |
180 | .BR mLib (3). |
181 | .SH AUTHOR |
182 | Mark Wooding (mdw@nsict.org). |