chiark / gitweb /
@@@ fltfmt mess
[mLib] / hash / unihash.3.in
CommitLineData
8fe3c82b 1.\" -*-nroff-*-
c4ccbbf9
MW
2.\"
3.\" Manual for universal hashing
4.\"
5.\" (c) 2003, 2005, 2007, 2009, 2023, 2024 Straylight/Edgeware
6.\"
7.
8.\"----- Licensing notice ---------------------------------------------------
9.\"
10.\" This file is part of the mLib utilities library.
11.\"
12.\" mLib is free software: you can redistribute it and/or modify it under
13.\" the terms of the GNU Library General Public License as published by
14.\" the Free Software Foundation; either version 2 of the License, or (at
15.\" your option) any later version.
16.\"
17.\" mLib is distributed in the hope that it will be useful, but WITHOUT
18.\" ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
19.\" FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public
20.\" License for more details.
21.\"
22.\" You should have received a copy of the GNU Library General Public
23.\" License along with mLib. If not, write to the Free Software
24.\" Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307,
25.\" USA.
26.
27.\"--------------------------------------------------------------------------
28.so ../defs.man \" @@@PRE@@@
29.
30.\"--------------------------------------------------------------------------
31.TH unihash 3mLib "5 July 2003" "Straylight/Edgeware" "mLib utilities library"
8fe3c82b 32.\" @unihash_setkey
33.\" @UNIHASH_INIT
34.\" @unihash_hash
35.\" @UNIHASH
36.\" @unihash
c4ccbbf9
MW
37.
38.\"--------------------------------------------------------------------------
39.SH NAME
40unihash \- simple and efficient universal hashing for hashtables
41.
42.\"--------------------------------------------------------------------------
8fe3c82b 43.SH SYNOPSIS
c4ccbbf9 44.
8fe3c82b 45.nf
46.B "#include <mLib/unihash.h>"
d056fbdf 47.PP
4729aa69 48.B "typedef struct { ...\& } unihash_info;"
d056fbdf 49.PP
6f444bda 50.B "unihash_info unihash_global;"
d056fbdf 51.PP
8fe3c82b 52.BI "void unihash_setkey(unihash_info *" i ", uint32 " k );
53.BI "uint32 UNIHASH_INIT(const unihash_info *" i );
adec5584 54.ta \w'\fBuint32 unihash_hash('u
3bc42912 55.BI "uint32 unihash_hash(const unihash_info *" i ", uint32 " a ,
adec5584 56.BI " const void *" p ", size_t " sz );
8fe3c82b 57.BI "uint32 unihash(const unihash_info *" i ", const void *" p ", size_t " sz );
58.BI "uint32 UNIHASH(const unihash_info *" i ", const void *" p ", size_t " sz );
59.fi
c4ccbbf9
MW
60.
61.\"--------------------------------------------------------------------------
8fe3c82b 62.SH DESCRIPTION
63The
64.B unihash
65system implements a simple and relatively efficient
66.IR "universal hashing family" .
67Using a such a universal hashing family means that it's provably
68difficult for an adversary to choose input data whose hashes collide,
69thus guaranteeing good average performance even on maliciously chosen
70data.
71.PP
72Unlike, say,
73.BR crc32 (3),
74the
75.B unihash
76function is
77.I keyed
78\- in addition to the data to be hashed, the function takes as input a
7932-bit key. This key should be chosen at random each time the program
80runs.
c4ccbbf9 81.
8fe3c82b 82.SS "Preprocessing a key"
83Before use, a key must be
84.I preprocessed
85into a large (16K) table which is used by the main hashing functions.
86The preprocessing is done by
87.BR unihash_setkey :
88pass it a pointer to a
89.B unihash_info
90structure and the 32-bit key you've chosen, and it stores the table in
91the structure.
92.PP
93Objects of type
94.B unihash_info
95don't contain any pointers to other data and are safe to free when
96you've finished with them; or you can just allocate them statically or
97on the stack if that's more convenient.
c4ccbbf9 98.
8fe3c82b 99.SS "Hashing data"
100The function
101.B unihash_hash
102takes as input:
103.TP
104.BI "const unihash_info *" i
105A pointer to the precomputed tables for a key.
106.TP
d4efbcd9 107.BI "uint32 " a
8fe3c82b 108An accumulator value. This should be
109.BI UNIHASH_INIT( i )
110for the first chunk of a multi-chunk input, or the result of the
111previous
112.B unihash_hash
113call for subsequent chunks.
114.TP
115.BI "const void *" p
116A pointer to the start of a buffer containing this chunk of data.
117.TP
118.BI "size_t " sz
119The length of the chunk.
120.PP
121The function returns a new accumulator value, which is also the hash of
122the data so far. So, to hash multiple chunks of data, do something like
123.VS
124uint32 a = UNIHASH_INIT(i);
125a = unihash_hash(i, a, p_0, sz_0);
126a = unihash_hash(i, a, p_1, sz_1);
adec5584 127/* ...\& */
8fe3c82b 128a = unihash_hash(i, a, p_n, sz_n);
129.VE
130The macro
131.B UNIHASH
132and function
133.B unihash
134are convenient interfaces to
135.B unihash_hash
136if you only wanted to hash one chunk.
c4ccbbf9 137.
6f444bda 138.SS "Global hash info table"
139There's no problem with using the same key for several purposes, as long
140as it's secret from all of your adversaries. Therefore, there is a
141global
142.B unihash_info
143table define, called
144.BR unihash_global .
145This initially contains information for a fixed key which the author
146chose at random, but if you need to you can set a different key into it
147.I before
148it gets used to hash any data (otherwise your hash tables will become
149messed up).
c4ccbbf9 150.
8fe3c82b 151.SS "Theoretical issues"
152The hash function implemented by
153.B unihash
154is
155.RI ( l \ +\ 1)/2\*(ss32\*(se-almost
156XOR-universal, where
157.I l
158is the length (in bytes) of the longest string you hash. That means
159that, for any pair of strings
160.I x
161and
162.I y
163and any 32-bit value \*(*d, the probability taken over all choices of the
164key
165.I k
166that
167.IR H\*(usk\*(ue ( x )\ \c
168.BR xor \c
169.RI \ H\*(usk\*(ue ( y )\ =\ \*(*d
170is no greater than
171.RI ( l \ +\ 1)/2\*(ss32\*(se.
172.PP
173This fact is proven in the header file, but it requires more
174sophisticated typesetting than is available here.
175.PP
176The function evaluates a polynomial over GF(2\*(ss32\*(se) whose
177coefficients are the bytes of the message and whose variable is the key.
178Details are given in the header file.
179.PP
180For best results, you should choose the key as a random 32-bit number
181each time your program starts. Choosing a different key for different
182hashtables isn't necessary. It's probably a good idea to avoid the keys
1830 and 1. This raises the collision bound to
184.RI ( l \ +\ 1)/(2\*(ss32\*(se\ \-\ 2)
185(which isn't a significant increase) but eliminates keys for which the
186hash's behaviour is particularly poor.
187.PP
188In tests,
189.B unihash
190actually performed better than
191.BR crc32 ,
192so if you want to just use it as a fast-ish hash with good statistical
193properties, choose some fixed key
194.IR k \ \*(>=\ 2.
195.PP
196We emphasize that the proof of this function's collision behaviour is
197.I not
198dependent on any unproven assumptions (unlike many `proofs' of
199cryptographic security, which actually reduce the security of some
200construction to the security of its components). It's just a fact.
c4ccbbf9 201.
b1a20bee
MW
202.SS "Practical issues"
203The implementation of
204.B unihash
205uses a (fairly large) table precomputed from the key.
206When a message is hashed,
207some of the message data
208and internal state of the hashing operation
209are leaked to other processes on the same hardware
210through the processor cache
211and other stateful microarchitectural features.
212It's possible for an adversary to determine the hashing key
213by observing this leakage.
214This is unlikely to be a major concern
215since local processes have other, more effective ways to deny service;
216but if is then
217.BR siphash (3)
218may be more appropriate.
219See that manual page for a comparison of the two.
220.
c4ccbbf9 221.\"--------------------------------------------------------------------------
8fe3c82b 222.SH SEE ALSO
c4ccbbf9 223.
b1a20bee 224.BR unihash-mkstatic (1),
8fe3c82b 225.BR crc32 (3),
b1a20bee 226.BR siphash (3),
8fe3c82b 227.BR mLib (3).
c4ccbbf9
MW
228.
229.\"--------------------------------------------------------------------------
8fe3c82b 230.SH AUTHOR
c4ccbbf9 231.
9b5ac6ff 232Mark Wooding (mdw@distorted.org.uk).
c4ccbbf9
MW
233.
234.\"----- That's all, folks --------------------------------------------------