Jump to content

Talk:Universal hashing

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Cebus (talk | contribs) at 08:42, 31 March 2016 (uniform distance property: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
WikiProject iconMathematics C‑class Mid‑priority
WikiProject iconThis article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
CThis article has been rated as C-class on Wikipedia's content assessment scale.
MidThis article has been rated as Mid-priority on the project's priority scale.
WikiProject iconComputer science C‑class Mid‑importance
WikiProject iconThis article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
CThis article has been rated as C-class on Wikipedia's content assessment scale.
MidThis article has been rated as Mid-importance on the project's importance scale.
Things you can help WikiProject Computer science with:

The two articles on universal hashing should clearly be merged. Any volunteer? Pagh 11:23, 24 April 2007 (UTC)[reply]

I don't see two articles on universal hashing, just this one. One might merge this universal_hashing article into the main article hash_function, but since the present article is reasonably coherent and self-contained, I don't think that's urgent. I have removed the {{Unreferenced}} tag at the beginning of this universal_hashing article since, in its present form, the article does indeed cite the main basic references on universal hashing.CharlesHBennett (talk) 11:17, 23 September 2008 (UTC)[reply]

In the Examples section, the symbols H and h are used without having been defined or introduced. I think h is supposed to refer to the function called f in the definitions (a clue: f is defined but never used), and H would be the family of all such h. But I'm not editing the page because I don't know enough about the subject to be confident that my interpretation is correct. 217.109.185.32 (talk) 16:25, 7 December 2009 (UTC)[reply]

The section on avoiding modular arithmetic was heavily edited. The previous discussion suggested that the function h_a(x) had a collision probability of 1/m; it doesn't. It has a collision probability as high as 2/m. A reference was added to the multiply-add-shift scheme, which does have collision probability 1/m. Finally, the discussion of hashing with the uniform difference property was removed, since the same issue with the analysis of h_a(x) applies to this scheme. At best, it has some almost-uniform difference property. (Patmorin) Tue May 17 10:53:11 EDT 2011 —Preceding undated comment added 14:53, 17 May 2011 (UTC).[reply]

multiply-shift

Something doesn't seem right with one equation for "the multiply-shift scheme described by Dietzfelbinger et al. in 1997.", which currently states:

(unsigned) (a*x+b) >> (w-M)
... where ... is a random non-negative integer with .

The first step is to multiply , right? Let's define T(ax) as the top (w-M) bits of the bottom w bits of a*x -- i.e.,

(unsigned) (a*x) >> (w-M)

For example, I'm using a machine where the word size is w=32 bits, and I'm trying to hash to a table of 1024 entries so M=10. When I think I'm getting too many collisions, I pick some random unsigned integer b < 2^(32-10), or in other words, b < 2^(22), and rehash -- right?

Each time I add the 22-bit integer b to that intermediate value , it directly affects the bottom 22 bits of the result and may or may not cause a carry into the higher bits. But then the hash function immediately discards those bottom 22 bits, and so I am left with either or .

So with this equation, the value of b has very little effect on the final hash value.

I expected the hash value to depend more strongly on the value of b, such that given any particular value of a and x, which leads to some particular value of T(ax), the output hash value would ideally be any possible value 0 <= h < m.

Should that equation be something maybe more like the following?:

(unsigned) (a*(x+b)) >> (w-M)

--DavidCary (talk) 03:28, 26 March 2015 (UTC)[reply]

uniform distance property

The term uniform distance property is used, but not explained. If it is the same as uniform difference property, then it should be changed to match. Cebus (talk) 08:42, 31 March 2016 (UTC)[reply]