Jump to content

Partition refinement

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Matt Cook (talk | contribs) at 21:09, 15 August 2012 (Data structure: fixed section to consistently use "element" rather than the unexplained term "state"). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In the design of algorithms, partition refinement is a technique for representing a partition of a set as a data structure that allows the partition to be refined by splitting its sets into a larger number of smaller sets. In that sense it is dual to the union-find data structure, which also maintains a partition into disjoint sets but in which the operations merge pairs of sets together. More specifically, a partition refinement algorithm maintains a family of disjoint sets Si; at the start of the algorithm, this is just a single set containing all the elements in the data structure. At each step of the algorithm, a set X is presented to the algorithm, and each set Si that contains members of X is replaced by two sets, the intersection SiX and the difference Si \ X. Partition refinement forms a key component of several efficient algorithms on graphs and finite automata.[1][2][3]

Data structure

A partition refinement algorithm may be implemented by maintaining an object for each set that stores a collection of its elements, in a form such as a doubly linked list that allows for rapid deletion, and an object for each element that points to the set containing it. As an alternative, an efficient data structure to represent the (still to be refined) partition was presented by Valmari et al.[4], based on a suggestion of Knuutila[5]: Element identifiers are stored in an array with a specific order: first come all identifiers of elements in the first set, then all identifiers in the second set etc. Each set object contains the array indices where its elements start and end.

In addition, each set object should have an instance variable that may point to a second set into which it is being split. To perform a refinement operation, loop through the elements of X. For each element x, find the set Si containing x, and check whether a second set for Si has already been formed. If not, create the second set and add Si to a list L of the sets that are split by the operation. Then, regardless of whether a new second set was formed, remove x from Si and add it to the second set.

Finally, after all elements of X have been processed in this way, loop through L, separating each current set Si from the second set that has been split from it, and report both of these sets as newly formed sets from the refinement operation. In Knuutila/Valmari’s data structure, this would be implemented by reordering the element identifiers of the elements in Si: move the elements in the second set towards the end of the array slice reserved originally for Si (and move the remaining elements towards the beginning). Then correct the start and end indices.

The time to perform the refinement operations in this way is O(|X|), independent of the number of elements or the total number of sets in the data structure.

Applications

Possibly the first application of partition refinement was in an algorithm by Hopcroft (1971) for DFA minimization. In this problem, one is given as input a deterministic finite automaton, and must find an equivalent automaton with as few states as possible. The algorithm maintains a partition of the states of the input automaton into subsets, with the property that any two states in different subsets must be mapped to different states of the output automaton; initially, there are two subsets, one containing all the accepting states and one containing the remaining states. At each step one of the subsets Si and one of the input symbols x of the automaton are chosen, and the subsets of states are refined into states for which a transition labeled x would lead to Si, and states for which an x-transition would lead somewhere else. When a set Si that has already been chosen is split by a refinement, only one of the two resulting sets (the smaller of the two) needs to be chosen again; in this way, each state participates in the sets X for O(s log n) refinement steps and the overall algorithm takes time O(ns log n), where n is the number of initial states and s is the size of the alphabet.[6]

Partition refinement was applied by Sethi (1976) in an efficient implementation of the Coffman–Graham algorithm for parallel scheduling. Sethi showed that it could be used to construct a lexicographically ordered topological sort of a given directed acyclic graph in linear time; this lexicographic topological ordering is one of the key steps of the Coffman–Graham algorithm. In this application, the elements of the disjoint sets are vertices of the input graph and the sets X used to refine the partition are sets of neighbors of vertices. Since the total number of neighbors of all vertices is just the number of edges in the graph, the algorithm takes time linear in the number of edges, its input size.[7]

Partition refinement also forms a key step in lexicographic breadth-first search, a graph search algorithm with applications in the recognition of chordal graphs and several other important classes of graphs. Again, the disjoint set elements are vertices and the set X represent sets of neighbors, so the algorithm takes linear time.[8][9]

See also

References

  1. ^ Paige, Robert; Tarjan, Robert E. (1987), "Three partition refinement algorithms", SIAM Journal on Computing, 16 (6): 973–989, doi:10.1137/0216062, MR 0917035.
  2. ^ Habib, Michel; Paul, Christophe; Viennot, Laurent (1999), "Partition refinement techniques: an interesting algorithmic tool kit", International Journal of Foundations of Computer Science, 10 (2): 147–170, doi:10.1142/S0129054199000125, MR 1759929.
  3. ^ Habib, Michel; Paul, Christophe; Viennot, Laurent (1998), "A synthesis on partition refinement: a useful routine for strings, graphs, Boolean matrices and automata", STACS 98 (Paris, 1998), Lecture Notes in Computer Science, vol. 1373, Springer-Verlag, pp. 25–38, doi:10.1007/BFb0028546, MR 1650757.
  4. ^ Valmari, Antti; Lehtinen, Petri (2008). "Efficient minimization of DFAs with partial transition functions". In Albers, Susanne; Weil, Pascal (eds.). 25th International Symposium on Theoretical Aspects of Computer Science (STACS 2008). Leibniz International Proceedings in Informatics (LIPIcs). Vol. 1. Schloss Dagstuhl: Leibniz-Zentrum fuer Informatik. pp. 645–656. doi:10.4230/LIPIcs.STACS.2008.1328. ISBN 978-3-939897-06-4. ISSN 1868-8969. {{cite conference}}: Unknown parameter |address= ignored (|location= suggested) (help); Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: unflagged free DOI (link).
  5. ^ Knuutila, Timo (2001). "Re-describing an algorithm by Hopcroft". Theoretical Computer Science. 250 (1–2): 333–363. doi:10.1016/S0304-3975(99)00150-4. ISSN 0304-3975.
  6. ^ Hopcroft, John (1971), "An n log n algorithm for minimizing states in a finite automaton", Theory of machines and computations (Proc. Internat. Sympos., Technion, Haifa, 1971), New York: Academic Press, pp. 189–196, MR 0403320.
  7. ^ Sethi, Ravi (1976), "Scheduling graphs on two processors", SIAM Journal on Computing, 5 (1): 73–82, doi:10.1137/0205005, MR 0398156.
  8. ^ Rose, D. J.; Tarjan, R. E.; Lueker, G. S. (1976), "Algorithmic aspects of vertex elimination on graphs", SIAM Journal on Computing, 5 (2): 266–283, doi:10.1137/0205021.
  9. ^ Corneil, Derek G. (2004), "Lexicographic breadth first search – a survey", Graph-Theoretic Methods in Computer Science, Lecture Notes in Computer Science, vol. 3353, Springer-Verlag, pp. 1–19.