Benutzer:Infinity ive/linear probing

Linear probing is a scheme in computer programming for resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location.^[1] This is accomplished using two values - one as a starting value and one as an interval between successive values in modular arithmetic. The second value, which is the same for all keys and known as the stepsize, is repeatedly added to the starting value until a free space is found, or the entire table is traversed.

   newLocation = (startingValue + stepSize) % arraySize

This algorithm, which is used in open-addressed hash tables, provides good memory caching (if stepsize is equal to one), through good locality of reference, but also results in clustering, an unfortunately high probability that where there has been one collision there will be more. The performance of linear probing is also more sensitive to input distribution when compared to double hashing.

Given an ordinary hash function H(x), a linear probing function would be:

H(x,i)=(H(x)+i){\pmod {n}}.\,

Here H(x) is the starting value, n the size of the hash table, and the stepsize is i in this case.

Dictionary operation in constant time

Using linear probing, dictionary operation can be implemented in constant time. In other words, insert, remove and find operations can be implemented in O(1). If the size of the hashtable fullfils certain conditions: Let n be the number of elements stored in hash table T with size m and m > 2en. Let k be the amount of consecutive a consecutive sequence of filled cells (one cell of the table stores exactly one element). k elements can be chosen from n in $n \choose k$ ways. Let's consider the following inequality: ${n \choose k}\leq ({\frac {en}{k}})^{k}$ .

Proof: ${n \choose k}={\frac {n^{\underline {k}}}{k!}}\leq {\frac {n^{\underline {k}}k^{k}}{k^{k}k!}}\leq {\frac {n^{k}}{k^{k}}}\sum _{l=0}^{\infty }{\frac {k^{l}}{l!}}=({\frac {en}{k}})^{k}$

The probability that k hash values are mapped to consecutive positions with length k is ${\frac {k^{\underline {k}}}{m^{\underline {k}}}}\leq ({\frac {k}{m}})^{k}$ . There are k different starting positions for a consecutive "run" of length k, so T[i] is in this range. $\Rightarrow p_{k}=P[T[i]]IsInRunOfLengthK]<k({\frac {k}{m}})^{k}<k({\frac {1}{2}})^{k}$

References

↑ Nell Dale: C++ Plus Data Structures. Jones and Bartlett Computer Science, Sudbury, MA 2003, ISBN 0-7637-0481-4.

External links

How Caching Affects Hashing by Gregory L. Heileman and Wenbin Luo 2005.

[1] Nell Dale: C++ Plus Data Structures. Jones and Bartlett Computer Science, Sudbury, MA 2003, ISBN 0-7637-0481-4.

[1]

Dictionary operation in constant time

See also

References

External links