Commit | Line | Data |
---|---|---|
8fe3c82b | 1 | .\" -*-nroff-*- |
c4ccbbf9 MW |
2 | .\" |
3 | .\" Manual for universal hashing | |
4 | .\" | |
5 | .\" (c) 2003, 2005, 2007, 2009, 2023, 2024 Straylight/Edgeware | |
6 | .\" | |
7 | . | |
8 | .\"----- Licensing notice --------------------------------------------------- | |
9 | .\" | |
10 | .\" This file is part of the mLib utilities library. | |
11 | .\" | |
12 | .\" mLib is free software: you can redistribute it and/or modify it under | |
13 | .\" the terms of the GNU Library General Public License as published by | |
14 | .\" the Free Software Foundation; either version 2 of the License, or (at | |
15 | .\" your option) any later version. | |
16 | .\" | |
17 | .\" mLib is distributed in the hope that it will be useful, but WITHOUT | |
18 | .\" ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or | |
19 | .\" FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public | |
20 | .\" License for more details. | |
21 | .\" | |
22 | .\" You should have received a copy of the GNU Library General Public | |
23 | .\" License along with mLib. If not, write to the Free Software | |
24 | .\" Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, | |
25 | .\" USA. | |
26 | . | |
27 | .\"-------------------------------------------------------------------------- | |
28 | .so ../defs.man \" @@@PRE@@@ | |
29 | . | |
30 | .\"-------------------------------------------------------------------------- | |
31 | .TH unihash 3mLib "5 July 2003" "Straylight/Edgeware" "mLib utilities library" | |
8fe3c82b | 32 | .\" @unihash_setkey |
33 | .\" @UNIHASH_INIT | |
34 | .\" @unihash_hash | |
35 | .\" @UNIHASH | |
36 | .\" @unihash | |
c4ccbbf9 MW |
37 | . |
38 | .\"-------------------------------------------------------------------------- | |
39 | .SH NAME | |
40 | unihash \- simple and efficient universal hashing for hashtables | |
41 | . | |
42 | .\"-------------------------------------------------------------------------- | |
8fe3c82b | 43 | .SH SYNOPSIS |
c4ccbbf9 | 44 | . |
8fe3c82b | 45 | .nf |
46 | .B "#include <mLib/unihash.h>" | |
d056fbdf | 47 | .PP |
4729aa69 | 48 | .B "typedef struct { ...\& } unihash_info;" |
d056fbdf | 49 | .PP |
6f444bda | 50 | .B "unihash_info unihash_global;" |
d056fbdf | 51 | .PP |
8fe3c82b | 52 | .BI "void unihash_setkey(unihash_info *" i ", uint32 " k ); |
53 | .BI "uint32 UNIHASH_INIT(const unihash_info *" i ); | |
adec5584 | 54 | .ta \w'\fBuint32 unihash_hash('u |
3bc42912 | 55 | .BI "uint32 unihash_hash(const unihash_info *" i ", uint32 " a , |
adec5584 | 56 | .BI " const void *" p ", size_t " sz ); |
8fe3c82b | 57 | .BI "uint32 unihash(const unihash_info *" i ", const void *" p ", size_t " sz ); |
58 | .BI "uint32 UNIHASH(const unihash_info *" i ", const void *" p ", size_t " sz ); | |
59 | .fi | |
c4ccbbf9 MW |
60 | . |
61 | .\"-------------------------------------------------------------------------- | |
8fe3c82b | 62 | .SH DESCRIPTION |
63 | The | |
64 | .B unihash | |
65 | system implements a simple and relatively efficient | |
66 | .IR "universal hashing family" . | |
67 | Using a such a universal hashing family means that it's provably | |
68 | difficult for an adversary to choose input data whose hashes collide, | |
69 | thus guaranteeing good average performance even on maliciously chosen | |
70 | data. | |
71 | .PP | |
72 | Unlike, say, | |
73 | .BR crc32 (3), | |
74 | the | |
75 | .B unihash | |
76 | function is | |
77 | .I keyed | |
78 | \- in addition to the data to be hashed, the function takes as input a | |
79 | 32-bit key. This key should be chosen at random each time the program | |
80 | runs. | |
c4ccbbf9 | 81 | . |
8fe3c82b | 82 | .SS "Preprocessing a key" |
83 | Before use, a key must be | |
84 | .I preprocessed | |
85 | into a large (16K) table which is used by the main hashing functions. | |
86 | The preprocessing is done by | |
87 | .BR unihash_setkey : | |
88 | pass it a pointer to a | |
89 | .B unihash_info | |
90 | structure and the 32-bit key you've chosen, and it stores the table in | |
91 | the structure. | |
92 | .PP | |
93 | Objects of type | |
94 | .B unihash_info | |
95 | don't contain any pointers to other data and are safe to free when | |
96 | you've finished with them; or you can just allocate them statically or | |
97 | on the stack if that's more convenient. | |
c4ccbbf9 | 98 | . |
8fe3c82b | 99 | .SS "Hashing data" |
100 | The function | |
101 | .B unihash_hash | |
102 | takes as input: | |
103 | .TP | |
104 | .BI "const unihash_info *" i | |
105 | A pointer to the precomputed tables for a key. | |
106 | .TP | |
d4efbcd9 | 107 | .BI "uint32 " a |
8fe3c82b | 108 | An accumulator value. This should be |
109 | .BI UNIHASH_INIT( i ) | |
110 | for the first chunk of a multi-chunk input, or the result of the | |
111 | previous | |
112 | .B unihash_hash | |
113 | call for subsequent chunks. | |
114 | .TP | |
115 | .BI "const void *" p | |
116 | A pointer to the start of a buffer containing this chunk of data. | |
117 | .TP | |
118 | .BI "size_t " sz | |
119 | The length of the chunk. | |
120 | .PP | |
121 | The function returns a new accumulator value, which is also the hash of | |
122 | the data so far. So, to hash multiple chunks of data, do something like | |
123 | .VS | |
124 | uint32 a = UNIHASH_INIT(i); | |
125 | a = unihash_hash(i, a, p_0, sz_0); | |
126 | a = unihash_hash(i, a, p_1, sz_1); | |
adec5584 | 127 | /* ...\& */ |
8fe3c82b | 128 | a = unihash_hash(i, a, p_n, sz_n); |
129 | .VE | |
130 | The macro | |
131 | .B UNIHASH | |
132 | and function | |
133 | .B unihash | |
134 | are convenient interfaces to | |
135 | .B unihash_hash | |
136 | if you only wanted to hash one chunk. | |
c4ccbbf9 | 137 | . |
6f444bda | 138 | .SS "Global hash info table" |
139 | There's no problem with using the same key for several purposes, as long | |
140 | as it's secret from all of your adversaries. Therefore, there is a | |
141 | global | |
142 | .B unihash_info | |
143 | table define, called | |
144 | .BR unihash_global . | |
145 | This initially contains information for a fixed key which the author | |
146 | chose at random, but if you need to you can set a different key into it | |
147 | .I before | |
148 | it gets used to hash any data (otherwise your hash tables will become | |
149 | messed up). | |
c4ccbbf9 | 150 | . |
8fe3c82b | 151 | .SS "Theoretical issues" |
152 | The hash function implemented by | |
153 | .B unihash | |
154 | is | |
155 | .RI ( l \ +\ 1)/2\*(ss32\*(se-almost | |
156 | XOR-universal, where | |
157 | .I l | |
158 | is the length (in bytes) of the longest string you hash. That means | |
159 | that, for any pair of strings | |
160 | .I x | |
161 | and | |
162 | .I y | |
163 | and any 32-bit value \*(*d, the probability taken over all choices of the | |
164 | key | |
165 | .I k | |
166 | that | |
167 | .IR H\*(usk\*(ue ( x )\ \c | |
168 | .BR xor \c | |
169 | .RI \ H\*(usk\*(ue ( y )\ =\ \*(*d | |
170 | is no greater than | |
171 | .RI ( l \ +\ 1)/2\*(ss32\*(se. | |
172 | .PP | |
173 | This fact is proven in the header file, but it requires more | |
174 | sophisticated typesetting than is available here. | |
175 | .PP | |
176 | The function evaluates a polynomial over GF(2\*(ss32\*(se) whose | |
177 | coefficients are the bytes of the message and whose variable is the key. | |
178 | Details are given in the header file. | |
179 | .PP | |
180 | For best results, you should choose the key as a random 32-bit number | |
181 | each time your program starts. Choosing a different key for different | |
182 | hashtables isn't necessary. It's probably a good idea to avoid the keys | |
183 | 0 and 1. This raises the collision bound to | |
184 | .RI ( l \ +\ 1)/(2\*(ss32\*(se\ \-\ 2) | |
185 | (which isn't a significant increase) but eliminates keys for which the | |
186 | hash's behaviour is particularly poor. | |
187 | .PP | |
188 | In tests, | |
189 | .B unihash | |
190 | actually performed better than | |
191 | .BR crc32 , | |
192 | so if you want to just use it as a fast-ish hash with good statistical | |
193 | properties, choose some fixed key | |
194 | .IR k \ \*(>=\ 2. | |
195 | .PP | |
196 | We emphasize that the proof of this function's collision behaviour is | |
197 | .I not | |
198 | dependent on any unproven assumptions (unlike many `proofs' of | |
199 | cryptographic security, which actually reduce the security of some | |
200 | construction to the security of its components). It's just a fact. | |
c4ccbbf9 | 201 | . |
b1a20bee MW |
202 | .SS "Practical issues" |
203 | The implementation of | |
204 | .B unihash | |
205 | uses a (fairly large) table precomputed from the key. | |
206 | When a message is hashed, | |
207 | some of the message data | |
208 | and internal state of the hashing operation | |
209 | are leaked to other processes on the same hardware | |
210 | through the processor cache | |
211 | and other stateful microarchitectural features. | |
212 | It's possible for an adversary to determine the hashing key | |
213 | by observing this leakage. | |
214 | This is unlikely to be a major concern | |
215 | since local processes have other, more effective ways to deny service; | |
216 | but if is then | |
217 | .BR siphash (3) | |
218 | may be more appropriate. | |
219 | See that manual page for a comparison of the two. | |
220 | . | |
c4ccbbf9 | 221 | .\"-------------------------------------------------------------------------- |
8fe3c82b | 222 | .SH SEE ALSO |
c4ccbbf9 | 223 | . |
b1a20bee | 224 | .BR unihash-mkstatic (1), |
8fe3c82b | 225 | .BR crc32 (3), |
b1a20bee | 226 | .BR siphash (3), |
8fe3c82b | 227 | .BR mLib (3). |
c4ccbbf9 MW |
228 | . |
229 | .\"-------------------------------------------------------------------------- | |
8fe3c82b | 230 | .SH AUTHOR |
c4ccbbf9 | 231 | . |
9b5ac6ff | 232 | Mark Wooding (mdw@distorted.org.uk). |
c4ccbbf9 MW |
233 | . |
234 | .\"----- That's all, folks -------------------------------------------------- |