Jump to content

Jenkins hash function

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Jorge Stolfi (talk | contribs) at 05:45, 3 April 2009 (Created article from a chunk of hash table.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

The Jenkins hash functions are a collection of hash functions for multy-byte keys designed by Bob Jenkins. They can be used also as checksums to detect accidental data corruption or detect identical records in a database.


A mathematical byte-by-byte implementation that performs particularly well is the Jenkins One-at-a-time hash, adapted here from an article by Bob Jenkins, its creator.

uint32_t jenkins_one_at_a_time_hash(unsigned char *key, size_t key_len)
{
    uint32_t hash = 0;
    size_t i;

    for (i = 0; i < key_len; i++) {
        hash += key[i];
        hash += (hash << 10);
        hash ^= (hash >> 6);
    }
    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);
    return hash;
}
Avalanche behavior of Jenkins One-at-a-time hash over 3-byte keys

The avalanche behavior of this hash is shown on the right. The image was made using Bret Mulvey's AvalancheTest in his Hash.cs toolset.

Each of the 24 rows corresponds to a single bit in the 3-byte input key, and each of the 32 columns corresponds to a bit in the output hash. Colors are chosen by how well the input key bit affects the given output hash bit: a green square indicates good mixing behavior, a yellow square weak mixing behavior, and red would indicate no mixing. Only a few bits in the last byte of the input key are weakly mixed to a minority of bits in the output hash, a performance vastly better than a number of widely used hash functions.

Many commonly used hash functions perform poorly when subjected to such rigorous avalanche testing. The widely favored FNV hash, for example, shows many bits with no mixing at all, especially for short keys. See the evaluation of FNV by Bret Mulvey for a more thorough analysis.

If speed is more important than simplicity, then the class of hash functions which consume multibyte chunks per iteration may be of interest. One of the most sophisticated is "lookup3" by Bob Jenkins, which consumes input in 12 byte (96 bit) chunks. Note, though, that any speed improvement from the use of this hash is only likely to be useful for large keys, and that the increased complexity may also have speed consequences such as preventing an optimizing compiler from inlining the hash function. Bret Mulvey analyzed an earlier version lookup2, and found it to have excellent avalanche behavior.

References

See also