Talk:Universal hashing
![]() | Mathematics C‑class Mid‑priority | |||||||||
|
![]() | Computer science C‑class Mid‑importance | ||||||||||||||||
|
The two articles on universal hashing should clearly be merged. Any volunteer? Pagh 11:23, 24 April 2007 (UTC)
I don't see two articles on universal hashing, just this one. One might merge this universal_hashing article into the main article hash_function, but since the present article is reasonably coherent and self-contained, I don't think that's urgent. I have removed the {{Unreferenced}} tag at the beginning of this universal_hashing article since, in its present form, the article does indeed cite the main basic references on universal hashing.CharlesHBennett (talk) 11:17, 23 September 2008 (UTC)
In the Examples section, the symbols H and h are used without having been defined or introduced. I think h is supposed to refer to the function called f in the definitions (a clue: f is defined but never used), and H would be the family of all such h. But I'm not editing the page because I don't know enough about the subject to be confident that my interpretation is correct. 217.109.185.32 (talk) 16:25, 7 December 2009 (UTC)
The section on avoiding modular arithmetic was heavily edited. The previous discussion suggested that the function h_a(x) had a collision probability of 1/m; it doesn't. It has a collision probability as high as 2/m. A reference was added to the multiply-add-shift scheme, which does have collision probability 1/m. Finally, the discussion of hashing with the uniform difference property was removed, since the same issue with the analysis of h_a(x) applies to this scheme. At best, it has some almost-uniform difference property. (Patmorin) Tue May 17 10:53:11 EDT 2011 —Preceding undated comment added 14:53, 17 May 2011 (UTC).
multiply-shift
Something doesn't seem right with one equation for "the multiply-shift scheme described by Dietzfelbinger et al. in 1997.", which currently states:
-
(unsigned) (a*x+b) >> (w-M)
- ... where ... is a random non-negative integer with .
The first step is to multiply , right? Let's define T(ax) as the top (w-M) bits of the bottom w bits of a*x -- i.e.,
-
(unsigned) (a*x) >> (w-M)
For example, I'm using a machine where the word size is w=32 bits, and I'm trying to hash to a table of 1024 entries so M=10. When I think I'm getting too many collisions, I pick some random unsigned integer b < 2^(32-10), or in other words, b < 2^(22), and rehash -- right?
Each time I add the 22-bit integer b to that intermediate value , it directly affects the bottom 22 bits of the result and may or may not cause a carry into the higher bits. But then the hash function immediately discards those bottom 22 bits, and so I am left with either or .
So with this equation, the value of b has very little effect on the final hash value.
I expected the hash value to depend more strongly on the value of b, such that given any particular value of a and x, which leads to some particular value of T(ax), the output hash value would ideally be any possible value 0 <= h < m.
Should that equation be something maybe more like the following?:
-
(unsigned) (a*(x+b)) >> (w-M)