Graph canonization

In graph theory, a branch of mathematics, graph canonization is finding a canonical form of a graph G, which is a graph Canon(G) isomorphic to G such that Canon(H) = Canon(G) if and only if H is isomorphic to G. The canonical form of a graph is an example of a complete graph invariant.^[1]^[2] Since the vertex sets of (finite) graphs are commonly identified with the intervals of integers 1,..., n, where n is the number of the vertices of a graph, a canonical form of a graph is commonly called canonical labeling of a graph.^[3] Graph canonization is also sometimes known as graph canonicalization.

A commonly known canonical form is the lexicographically smallest graph within the isomorphism class, which is the graph of the class with lexicographically smallest adjacency matrix considered as a linear string.

Computational complexity

Clearly, the graph canonization problem is at least as computationally hard as the graph isomorphism problem. In fact, Graph Isomorphism is even AC⁰-reducible to Graph Canonization. However it is still an open question whether the two problems are polynomial time equivalent.^[2]

While existence of (deterministic) polynomial algorithms for Graph Isomorphism is still an open problem in the computational complexity theory, in 1977 Laszlo Babai reported that a simple vertex classification algorithm after only two refinement steps produces a canonical labeling of an n-vertex random graph with probability 1 − exp(−O(n)). Small modifications and added depth-first search step produce canonical labeling of all graphs in linear average time. This result sheds some light on the fact why many reported graph isomorphism algorithms behave well in practice.^[4] This was an important breakthrough in probabilistic complexity theory which became widely known in its manuscript form and which was still cited as an "unpublished manuscript" long after it was reported at a symposium.

The computation of the lexicographically smallest graph is NP-Hard.^[1]

For trees, a concise polynomial canonization algorithm requiring O(n) space is presented in.^[5] Begin by labeling each vertex with the string 01. Iteratively for each non-leaf x remove the leading 0 and trailing 1 from x's label; then sort x's label along with the labels of all adjacent leaves in lexicographic order. Concatenate these sorted labels, add back a leading 0 and trailing 1, make this the new label of x, and delete the adjacent leaves. If there are two vertices remaining, concatenate their labels in lexicographic order.

Applications

Graph canonization is the essence of many graph isomorphism algorithms.

A common application of graph canonization is in graphical data mining, in particular in chemical database applications.^[6]

A number of identifiers for chemical substances, such as SMILES and InChI, designed to provide a standard and human-readable way to encode molecular information and to facilitate the search for such information in databases and on the web, use canonization step in their computation, which is essentially the canonization of the graph which represents the molecule.

Example

An example of a canonical form associated with planar-graphs that is used in the study of graph-coloring is one where vertices associated with a planar graph is deformed into blocks and certain edges are represented as lines between such blocks..^[7] This is done in such a way that the one-to-one correspondence between the planar graph being canonized and the canonical-form associated is preserved.

References

^ ^a ^b "A Logspace Algorithm for Partial 2-Tree Canonization"
^ ^a ^b "the Space Complexity of k-Tree Isomorphism"
^ Laszlo Babai, Eugene Luks, "Canonical labeling of graphs", Proc. 15th ACM Symposium on Theory of Computing, 1983, pp. 171–183.
^
L. Babai, "On the Isomorphism Problem", unpublished manuscript, 1977
- L. Babai and L. Kucera. Canonical labeling of graphs in average linear time. Proc. 20th Annual IEEE Symposium on Foundations of Computer Science (1979), 39–46.
^ Read, Ronald C. (1972). "The Coding of Various Kinds of Unlabeled Trees". Graph Theory and Computing: 153–182.
^ "Mining Graph Data", by Diane J. Cook, Lawrence B. Holder (2007) ISBN 0-470-07303-9, pp. 120–122, section 6.2.1. "Canonical Labeling"
^ "Planar Graph Homologies and the 4-Color Theorem.", Pillay.J (2008) HAL, pp. 3–4, Definition 0.0

Yuri Gurevich, From Invariants to Canonization, The Bull. of Euro. Assoc. for Theor. Computer Sci., no. 63, 1997.[1]

[p2tree-1] "A Logspace Algorithm for Partial 2-Tree Canonization"

[ktree-2] "the Space Complexity of k-Tree Isomorphism"

[3] Laszlo Babai, Eugene Luks, "Canonical labeling of graphs", Proc. 15th ACM Symposium on Theory of Computing, 1983, pp. 171–183.

[4] L. Babai, "On the Isomorphism Problem", unpublished manuscript, 1977
L. Babai and L. Kucera. Canonical labeling of graphs in average linear time. Proc. 20th Annual IEEE Symposium on Foundations of Computer Science (1979), 39–46.

[5] L. Babai and L. Kucera. Canonical labeling of graphs in average linear time. Proc. 20th Annual IEEE Symposium on Foundations of Computer Science (1979), 39–46.

[5] Read, Ronald C. (1972). "The Coding of Various Kinds of Unlabeled Trees". Graph Theory and Computing: 153–182.

[6] "Mining Graph Data", by Diane J. Cook, Lawrence B. Holder (2007) ISBN 0-470-07303-9, pp. 120–122, section 6.2.1. "Canonical Labeling"

[7] "Planar Graph Homologies and the 4-Color Theorem.", Pillay.J (2008) HAL, pp. 3–4, Definition 0.0

[1]

[2]

[3]

[4]

[5]

[6]

[7]