Universal hash function

In computer science, a universal hash function is a theoretical construct primarily used to show that an algorithm based on a hash function cannot be forced to have bad performance by an adversary. Bad performance in hashing comes from collisions, and a universal hash function guarantees that these cannot be forced to occur too often.

The formal definition involves a set of keys, K, a set of values, V, and a family of hash functions, H, mapping keys to values. Let |V| denote size of V, the number of possible values. Then for all pairs of distinct keys x and y in K, H is a 2-universal family of hash functions if

\Pr _{h\in H}[h(x)=h(y)]\leq {\frac {1}{|V|}}.

More stringently, H is a strongly 2-universal family if for all pairs of distinct keys x and y in K, and for all pairs x′ and y′ in V,

\Pr _{h\in H}[h(x)=x'{\mbox{ and }}h(y)=y']={\frac {1}{|V|^{2}}}.

Without qualification, the latter definition is probably intended.

Example

Let the keys, K, be {0,1,…,p−1} for some prime, p, and let the values, V, be {0,1,…n−1}. Define a 2-parameter family of functions, h_a,b, as

h_{a,b}(x)=((ax+b)\mod p)\mod n,\,\!

H=\{h_{a,b}:a\in \{1,\ldots ,p-1\},b\in \{0,\ldots ,p-1\}\}.\,\!

References

Knuth, Donald Ervin (1973–1998). The Art of Computer Programming, Vol. III: Sorting and Searching (2e ed.). Reading, Mass ; London: Addison-Wesley. ISBN 0-201-89685-0. {{cite book}}: Unknown parameter |notes= ignored (help)CS1 maint: date format (link)