Dynamic perfect hashing

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure.^[1]^[2]^[3] This technique is useful for situations where fast queries, insertions, and deletions must be made on a large set, S, of elements.

Details

In this method, the entries that hash to the same slot of the table are organized as separate second-level hash table. If there are k entries in this set S, the second-level table is allocated with k² slots, and its hash function is selected at random from a universal hash function set so that it is collision-free (i.e. a perfect hash function). Therefore, the look-up cost is guaranteed to be O(1) in the worst-case.^[2]

function Locate(x) is
       j = h(x);
       if (position h(x) of subtable T_j contains x (not deleted))
          return (x is in S);
       end if
       else 
          return (x is not in S);
       end else
end

Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are uniformly distributed, the structure as a whole occupies expected O(n) space, since bucket sizes are small with high probability.^[1]

If a collision occurs during the insertion of a new entry x at j, the bucket's second-level table T_j is rebuilt with a different randomly-selected hash function. Because the load factor of the second-level table is kept low (1/k), rebuilding is infrequent, and the amortized cost of insertions is O(1).^[2] During an insertion operation the global operations counter, count, is incremented.

function Insert(x) is
       count = count + 1;
       if (count > M) 
          FullRehash(x);
       end if
       else
          j = h(x);
          if (Position h_j(x) of subtable T_j contains x)
             if (x is marked deleted) 
                remove the delete marker;
             end if
          end if
          else
             b_j = b_j + 1;
             if (b_j <= m_j) 
                if position h_j(x) of T_j is empty 
                   store x in position h_j(x) of T_j;
                end if
                else
                   Put all unmarked elements of T_j in list L_j;
                   Append x to list L_j;
                   b_j = length of L_j;
                   repeat 
                      h_j = randomly chosen function in H_sj;
                   until h_j is injective on the elements of L_j;
                   for all y on list L_j
                      store y in position h_j(y) of T_j;
                   end for
                end else
             end if
             else
                m_j = 2 * max{1, m_j};
                s_j = 2 * m_j * (m_j - 1);
                if (condition (**) is still satisfied) 
                   Allocate s_j cells for T_j;
                   Put all unmarked elements of T_j in list L_j;
                   Append x to list L_j;
                   b_j = length of L_j;
                   repeat 
                      h_j = randomly chosen function in H_sj;
                   until h_j is injective on the elements of L_j;
                   for all y on list L_j
                      store y in position h_j(y) of T_j;
                   end for
                end if
                else
                   FullRehash(x);
                end else
             end else
          end else
       end else
end

The expected time for a full rebuild of the table of S with size n is O(n).^[2]

function FullRehash(x) is
       Put all unmarked elements of T in list L;
       if (x is in U) 
          append x to L;
       end if
       count = length of list L;
       M = (1 + c) * max{count, 4};
       repeat 
          h = randomly chosen function in H_s(M);
          for all j < s(M) 
             form a list L_j for h(x) = j;
             b_j = length of L_j; 
             m_j = 2 * b_j; 
             s_j = 2 * m_j * (m_j - 1);
       until condition (**) is satisfied;
       for all j < s(M) 
          Allocate space s_j for subtable T_j;
          repeat 
             h_j = randomly chosen function in H_sj;
             until h_j is injective on the elements of list L_j;
       end for
       for all x on list L_j 
          store x in position h_j(x) of T_j;
       end for
end

Deletion of x simply flags x as deleted without removal and increments count. In the case of both insertions and deletions, if count reaches a threshold M the entire table is rebuilt, where M is some constant multiple of the size of S at the start of a new phase. Here phase refers to the time between full rebuilds. The amortized cost of delete is O(1).^[2]

function Delete(x) is
       count = count + 1
       j = h(x);
       if position h(x) of subtable Tj contains x
          mark x as deleted
       end if
       else 
          return (x is not a member of S);
       end else
       if (count >= M)
          FullRehash(-1)
       end if
end

Note that here the -1 in "Delete(x)" refers to an element which is not in U.

References

^ ^a ^b Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#
^ ^a ^b ^c ^d ^e Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. Dynamic Perfect Hashing: Upper and Lower Bounds. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370#
^ Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003. http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L2/lecture2.pdf

This computer-programming-related article is a stub. You can help Wikipedia by expanding it.

[inventor-1] Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (Jun. 1984), 538-544 http://portal.acm.org/citation.cfm?id=1884#

[dietzfelbinger-2] Dietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994. Dynamic Perfect Hashing: Upper and Lower Bounds. SIAM J. Comput. 23, 4 (Aug. 1994), 738-761. http://portal.acm.org/citation.cfm?id=182370#

[3] Erik Demaine, Jeff Lind. 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Spring 2003. http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L2/lecture2.pdf

[1]

[2]

[3]