.\" -*-nroff-*- .\" .\" Manual for universal hashing .\" .\" (c) 2003, 2005, 2007, 2009, 2023, 2024 Straylight/Edgeware .\" . .\"----- Licensing notice --------------------------------------------------- .\" .\" This file is part of the mLib utilities library. .\" .\" mLib is free software: you can redistribute it and/or modify it under .\" the terms of the GNU Library General Public License as published by .\" the Free Software Foundation; either version 2 of the License, or (at .\" your option) any later version. .\" .\" mLib is distributed in the hope that it will be useful, but WITHOUT .\" ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or .\" FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public .\" License for more details. .\" .\" You should have received a copy of the GNU Library General Public .\" License along with mLib. If not, write to the Free Software .\" Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, .\" USA. . .\"-------------------------------------------------------------------------- .so ../defs.man \" @@@PRE@@@ . .\"-------------------------------------------------------------------------- .TH unihash 3mLib "5 July 2003" "Straylight/Edgeware" "mLib utilities library" .\" @unihash_setkey .\" @UNIHASH_INIT .\" @unihash_hash .\" @UNIHASH .\" @unihash . .\"-------------------------------------------------------------------------- .SH NAME unihash \- simple and efficient universal hashing for hashtables . .\"-------------------------------------------------------------------------- .SH SYNOPSIS . .nf .B "#include " .PP .B "typedef struct { ...\& } unihash_info;" .PP .B "unihash_info unihash_global;" .PP .BI "void unihash_setkey(unihash_info *" i ", uint32 " k ); .BI "uint32 UNIHASH_INIT(const unihash_info *" i ); .ta \w'\fBuint32 unihash_hash('u .BI "uint32 unihash_hash(const unihash_info *" i ", uint32 " a , .BI " const void *" p ", size_t " sz ); .BI "uint32 unihash(const unihash_info *" i ", const void *" p ", size_t " sz ); .BI "uint32 UNIHASH(const unihash_info *" i ", const void *" p ", size_t " sz ); .fi . .\"-------------------------------------------------------------------------- .SH DESCRIPTION The .B unihash system implements a simple and relatively efficient .IR "universal hashing family" . Using a such a universal hashing family means that it's provably difficult for an adversary to choose input data whose hashes collide, thus guaranteeing good average performance even on maliciously chosen data. .PP Unlike, say, .BR crc32 (3), the .B unihash function is .I keyed \- in addition to the data to be hashed, the function takes as input a 32-bit key. This key should be chosen at random each time the program runs. . .SS "Preprocessing a key" Before use, a key must be .I preprocessed into a large (16K) table which is used by the main hashing functions. The preprocessing is done by .BR unihash_setkey : pass it a pointer to a .B unihash_info structure and the 32-bit key you've chosen, and it stores the table in the structure. .PP Objects of type .B unihash_info don't contain any pointers to other data and are safe to free when you've finished with them; or you can just allocate them statically or on the stack if that's more convenient. . .SS "Hashing data" The function .B unihash_hash takes as input: .TP .BI "const unihash_info *" i A pointer to the precomputed tables for a key. .TP .BI "uint32 " a An accumulator value. This should be .BI UNIHASH_INIT( i ) for the first chunk of a multi-chunk input, or the result of the previous .B unihash_hash call for subsequent chunks. .TP .BI "const void *" p A pointer to the start of a buffer containing this chunk of data. .TP .BI "size_t " sz The length of the chunk. .PP The function returns a new accumulator value, which is also the hash of the data so far. So, to hash multiple chunks of data, do something like .VS uint32 a = UNIHASH_INIT(i); a = unihash_hash(i, a, p_0, sz_0); a = unihash_hash(i, a, p_1, sz_1); /* ...\& */ a = unihash_hash(i, a, p_n, sz_n); .VE The macro .B UNIHASH and function .B unihash are convenient interfaces to .B unihash_hash if you only wanted to hash one chunk. . .SS "Global hash info table" There's no problem with using the same key for several purposes, as long as it's secret from all of your adversaries. Therefore, there is a global .B unihash_info table define, called .BR unihash_global . This initially contains information for a fixed key which the author chose at random, but if you need to you can set a different key into it .I before it gets used to hash any data (otherwise your hash tables will become messed up). . .SS "Theoretical issues" The hash function implemented by .B unihash is .RI ( l \ +\ 1)/2\*(ss32\*(se-almost XOR-universal, where .I l is the length (in bytes) of the longest string you hash. That means that, for any pair of strings .I x and .I y and any 32-bit value \*(*d, the probability taken over all choices of the key .I k that .IR H\*(usk\*(ue ( x )\ \c .BR xor \c .RI \ H\*(usk\*(ue ( y )\ =\ \*(*d is no greater than .RI ( l \ +\ 1)/2\*(ss32\*(se. .PP This fact is proven in the header file, but it requires more sophisticated typesetting than is available here. .PP The function evaluates a polynomial over GF(2\*(ss32\*(se) whose coefficients are the bytes of the message and whose variable is the key. Details are given in the header file. .PP For best results, you should choose the key as a random 32-bit number each time your program starts. Choosing a different key for different hashtables isn't necessary. It's probably a good idea to avoid the keys 0 and 1. This raises the collision bound to .RI ( l \ +\ 1)/(2\*(ss32\*(se\ \-\ 2) (which isn't a significant increase) but eliminates keys for which the hash's behaviour is particularly poor. .PP In tests, .B unihash actually performed better than .BR crc32 , so if you want to just use it as a fast-ish hash with good statistical properties, choose some fixed key .IR k \ \*(>=\ 2. .PP We emphasize that the proof of this function's collision behaviour is .I not dependent on any unproven assumptions (unlike many `proofs' of cryptographic security, which actually reduce the security of some construction to the security of its components). It's just a fact. . .SS "Practical issues" The implementation of .B unihash uses a (fairly large) table precomputed from the key. When a message is hashed, some of the message data and internal state of the hashing operation are leaked to other processes on the same hardware through the processor cache and other stateful microarchitectural features. It's possible for an adversary to determine the hashing key by observing this leakage. This is unlikely to be a major concern since local processes have other, more effective ways to deny service; but if is then .BR siphash (3) may be more appropriate. See that manual page for a comparison of the two. . .\"-------------------------------------------------------------------------- .SH SEE ALSO . .BR unihash-mkstatic (1), .BR crc32 (3), .BR siphash (3), .BR mLib (3). . .\"-------------------------------------------------------------------------- .SH AUTHOR . Mark Wooding (mdw@distorted.org.uk). . .\"----- That's all, folks --------------------------------------------------