Jump to content

Dynamic perfect hashing

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 99.246.173.77 (talk) at 06:36, 23 April 2011 (Added pseudo-code and additional details with expected values). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure.[1][2][3] This technique is useful for situations where fast queries, insertions, and deletions must be made on a large set, S, of elements.

Details

In this method, the entries that hash to the same slot of the table are organized as separate second-level hash table. If there are k entries in this set S, the second-level table is allocated with k2 slots, and its hash function is selected at random from a universal hash function set so that it is collision-free (i.e. a perfect hash function). Therefore, the look-up cost is guaranteed to be O(1) in the worst-case.[2]

function Locate(x) is
       j = h(x);
       if (position h(x) of subtable Tj contains x (not deleted))
          return (x is in S);
       end if
       else 
          return (x is not in S);
       end else
end

Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are uniformly distributed, the structure as a whole occupies expected O(n) space, since bucket sizes are small with high probability.[1]

If a collision occurs during the insertion of a new entry x at j, the bucket's second-level table Tj is rebuilt with a different randomly-selected hash function. Because the load factor of the second-level table is kept low (1/k), rebuilding is infrequent, and the amortized cost of insertions is O(1).[2] During an insertion operation the global operations counter, count, is incremented.

function Insert(x) is
       count = count + 1;
       if (count > M) 
          FullRehash(x);
       end if
       else
          j = h(x);
          if (Position hj(x) of subtable Tj contains x)
             if (x is marked deleted) 
                remove the delete marker;
             end if
          end if
          else
             bj = bj + 1;
             if (bj <= mj) 
                if position hj(x) of Tj is empty 
                   store x in position hj(x) of Tj;
                end if
                else
                   Put all unmarked elements of Tj in list Lj;
                   Append x to list Lj;
                   bj = length of Lj;
                   repeat 
                      hj = randomly chosen function in Hsj;
                   until hj is injective on the elements of Lj;
                   for all y on list Lj
                      store y in position hj(y) of Tj;
                   end for
                end else
             end if
             else
                mj = 2 * max{1, mj};
                sj = 2 * mj * (mj - 1);
                if (condition (**) is still satisfied) 
                   Allocate sj cells for Tj;
                   Put all unmarked elements of Tj in list Lj;
                   Append x to list Lj;
                   bj = length of Lj;
                   repeat 
                      hj = randomly chosen function in Hsj;
                   until hj is injective on the elements of Lj;
                   for all y on list Lj
                      store y in position hj(y) of Tj;
                   end for
                end if
                else
                   FullRehash(x);
                end else
             end else
          end else
       end else
end

The expected time for a full rebuild of the table of S with size n is O(n).[2]

function FullRehash(x) is
       Put all unmarked elements of T in list L;
       if (x is in U) 
          append x to L;
       end if
       count = length of list L;
       M = (1 + c) * max{count, 4};
       repeat 
          h = randomly chosen function in Hs(M);
          for all j < s(M) 
             form a list Lj for h(x) = j;
             bj = length of Lj; 
             mj = 2 * bj; 
             sj = 2 * mj * (mj - 1);
       until condition (**) is satisfied;
       for all j < s(M) 
          Allocate space sj for subtable Tj;
          repeat 
             hj = randomly chosen function in Hsj;
             until hj is injective on the elements of list Lj;
       end for
       for all x on list Lj 
          store x in position hj(x) of Tj;
       end for
end

Deletion of x simply flags x as deleted without removal and increments count. In the case of both insertions and deletions, if count reaches a threshold M the entire table is rebuilt, where M is some constant multiple of the size of S at the start of a new phase. Here phase refers to the time between full rebuilds. The amortized cost of delete is O(1).[2]

function Delete(x) is
       count = count + 1
       j = h(x);
       if position h(x) of subtable Tj contains x
          mark x as deleted
       end if
       else 
          return (x is not a member of S);
       end else
       if (count >= M)
          FullRehash(-1)
       end if
end

Note that here the -1 in "Delete(x)" refers to an element which is not in U.

References

  1. ^ a b Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#
  2. ^ a b c d e Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. Dynamic Perfect Hashing: Upper and Lower Bounds. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370#
  3. ^ Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003. http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L2/lecture2.pdf