Minimum spanning tree

Given a connected, undirected graph, a spanning tree of that graph is a subgraph which is a tree and connects all the vertices together. A single graph can have many different spanning trees. We can also assign a weight to each edge, which is a number representing how unfavorable it is, and use this to assign a weight to a spanning tree by computing the sum of the weights of the edges in that spanning tree. A minimum spanning tree (MST) or minimum weight spanning tree is then a spanning tree with weight less than or equal to the weight of every other spanning tree. More generally, any undirected graph (not necessarily connected) has a minimum spanning forest, which is a union of minimum spanning trees for its connected components.
One example would be a cable TV company laying cable to a new neighborhood. If it is constrained to bury the cable only along certain paths, then there would be a graph representing which points are connected by those paths. Some of those paths might be more expensive, because they are longer, or require the cable to be buried deeper; these paths would be represented by edges with larger weights. A spanning tree for that graph would be a subset of those paths that has no cycles but still connects to every house. There might be several spanning trees possible. A minimum spanning tree would be one with the lowest total cost.
Properties
Possible multiplicity
There may be several minimum spanning trees of the same weight having a minimum number of edges; in particular, if all the edge weights of a given graph are the same, then every spanning tree of that graph is minimum. If there are n vertices in the graph, then each tree has n-1 edges.
Uniqueness
If each edge has a distinct weight then there will only be one, unique minimum spanning tree. This can be proved by induction or contradiction. This is true in many realistic situations, such as the cable TV company example above, where it's unlikely any two paths have exactly the same cost. This generalizes to spanning forests as well.
A proof of uniqueness by contradiction is as follows.[1] Say we have an algorithm that finds an MST (which we will call A) based on the structure of the graph and the order of the edges when ordered by weight. (Such algorithms do exist, see below.) Assume for the moment that this MST is not unique and that there is another spanning tree, B, with equal weight. Let e be an edge that is in A but not in B. Then B should include at least one edge e' that is not in A. Assume for the moment that the weight of e is less than that of e'. Recalling that B is a spanning tree, it is clear that the edge set {e} B must contain a cycle. Substituting e' with e in B yields the spanning tree {e} B - {e'} which has a smaller weight compared to B, thus leading to a contradiction since the latter was assumed to be a MST. If the weight of e is larger than that of e', a similar argument involving tree {e'} A - {e} also leads to a contradiction. Thus, we conclude that the assumption that there can be a second MST was false.
Minimum-cost subgraph
If the weights are non-negative, then a minimum spanning tree is in fact the minimum-cost subgraph connecting all vertices, since subgraphs containing cycles necessarily have more total weight.
Cycle property
For any cycle C in the graph, if the weight of an edge e of C is larger than the weights of other edges of C, then this edge cannot belong to an MST. Assuming the contrary, i.e. that e belongs to an MST T1, then deleting e will break T1 into two subtrees with the two ends of e in different subtrees. The remainder of C reconnects the subtrees, hence there is an edge f of C with ends in different subtrees, i.e., it reconnects the subtrees into a tree T2 with weight less than that of T1, because the weight of f is less than the weight of e.
Cut property
For any cut C in the graph, if the weight of an edge e of C is smaller than the weights of other edges of C, then this edge belongs to all MSTs of the graph. Indeed, assume the contrary, i.e., e does not belong to an MST T1. Then adding e to T1 will produce a cycle, which must have another edge e2 from T1 in the cut C. Replacing e2 with e would produce a tree T1 of smaller weight.
Minimum-cost edge
If the edge of a graph with the minimum cost e is unique, then this edge is included in any MST. Indeed, if e was not included in the MST, removing any of the (larger cost) edges in the cycle formed after adding e to the MST, would yield a spanning tree of smaller weight.
Algorithms
The first algorithm for finding a minimum spanning tree was developed by Czech scientist Otakar Borůvka in 1926 (see Borůvka's algorithm). Its purpose was an efficient electrical coverage of Moravia. There are now two algorithms commonly used, Prim's algorithm and Kruskal's algorithm. All three are greedy algorithms that run in polynomial time, so the problem of finding such trees is in FP, and related decision problems such as determining whether a particular edge is in the MST or determining if the minimum total weight exceeds a certain value are in P. Another greedy algorithm not as commonly used is the reverse-delete algorithm, which is the reverse of Kruskal's algorithm.
The fastest minimum spanning tree algorithm to date was developed by Bernard Chazelle, which is based on the soft heap, an approximate priority queue.[2][3] Its running time is O(m α(m,n)), where m is the number of edges, n is the number of vertices and α is the classical functional inverse of the Ackermann function. The function α grows extremely slowly, so that for all practical purposes it may be considered a constant no greater than 4; thus Chazelle's algorithm takes very close to linear time.
What is the fastest possible algorithm for this problem? That is one of the oldest open questions in computer science. There is clearly a linear lower bound, since we must at least examine all the weights. If the edge weights are integers with a bounded bit length, then deterministic algorithms are known with linear running time.[4] For general weights, there are randomized algorithms whose expected running time is linear.[5][6]
Whether there exists a deterministic algorithm with linear running time for general weights is still an open question. However, Seth Pettie and Vijaya Ramachandran have found a provably optimal deterministic minimum spanning tree algorithm, the computational complexity of which is unknown.[7]
More recently, research has focused on solving the minimum spanning tree problem in a highly parallelized manner. With a linear number of processors it is possible to solve the problem in time.[8][9] A 2003 paper "Fast Shared-Memory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs" by David A. Bader and Guojing Cong demonstrates a pragmatic algorithm that can compute MSTs 5 times faster on 8 processors than an optimized sequential algorithm.[10] Typically, parallel algorithms are based on Boruvka's algorithm—Prim's and especially Kruskal's algorithm do not scale as well to additional processors.
Other specialized algorithms have been designed for computing minimum spanning trees of a graph so large that most of it must be stored on disk at all times. These external storage algorithms, for example as described in "Engineering an External Memory Minimum Spanning Tree Algorithm" by Roman Dementiev et al.,[11] can operate as little as 2 to 5 times slower than a traditional in-memory algorithm; they claim that "massive minimum spanning tree problems filling several hard disks can be solved overnight on a PC." They rely on efficient external storage sorting algorithms and on graph contraction techniques for reducing the graph's size efficiently.
The problem can also be approached in a distributed manner. If each node is considered a computer and no node knows anything except its own connected links, one can still calculate the distributed minimum spanning tree.
MST on complete graphs
Alan M. Frieze showed that given a complete graph on n vertices, with edge weights that are independent identically distributed random variables with distribution function satisfying , then as n approaches +∞ the expected weight of the MST approaches , where is the Riemann zeta function. Under the additional assumption of finite variance, Alan M. Frieze also proved convergence in probability. Subsequently, J. Michael Steele showed that the variance assumption could be dropped.
In later work, Svante Janson proved a central limit theorem for weight of the MST.
For uniform random weights in , the exact expected size of the minimum spanning tree has been computed for small complete graphs.
Vertices | Expected size | Approximative expected size |
---|---|---|
2 | 1 / 2 | 0.5 |
3 | 3 / 4 | 0.75 |
4 | 31 / 35 | 0.8857143 |
5 | 893 / 924 | 0.9664502 |
6 | 278 / 273 | 1.0183151 |
7 | 30739 / 29172 | 1.053716 |
8 | 199462271 / 184848378 | 1.0790588 |
9 | 126510063932 / 115228853025 | 1.0979027 |
Related problems
A related problem is the k-minimum spanning tree (k-MST) which is the tree that spans some subset of k vertices in the graph with minimum weight.
A set of k-smallest spanning trees is a subset of k spanning trees (out of all possible spanning trees) such that no spanning tree outside the subset has smaller weight.[12] (Note that this problem is unrelated to the k-minimum spanning tree.)
The Euclidean minimum spanning tree is a spanning tree of a graph with edge weights corresponding to the Euclidean distance between vertices which are points in the plane (or space).
The rectilinear minimum spanning tree is a spanning tree of a graph with edge weights corresponding to the rectilinear distance between vertices which are points in the plane (or space).
In the distributed model, where each node is considered a computer and no node knows anything except its own connected links, one can consider distributed minimum spanning tree. Mathematical definition of the problem is the same but has different approaches for solution.
Capacitated minimum spanning tree (CMST) is a tree that has a marked node (origin, or root) and each of the subtrees attached to the node contains no more than a c nodes. c is called a tree capacity. Solving CMST optimally requires exponential time, but good heuristics such as Esau-Williams and Sharma produce solutions close to optimal in polynomial time.
For directed graphs, the minimum spanning tree problem is called the Arborescence problem and can be solved in quadratic time using the Chu–Liu/Edmonds algorithm.
A maximum spanning tree is a spanning tree with weight greater than or equal to the weight of every other spanning tree. Such a tree can be found with algorithms such as Prim's or Kruskal's after multiplying the edge weights by -1 and solving the MST problem on the new graph.
The dynamic MST problem concerns the update of a previously computed MST after an edge weight change in the original graph or the insertion/deletion of a vertex.[13]
Minimum bottleneck spanning tree
A bottleneck edge is the highest weighted edge in a spanning tree.
A spanning tree is a minimum bottleneck spanning tree (or MBST) if the graph does not contain a spanning tree with a smaller bottleneck edge weight.
A MST is necessarily a MBST (provable by the cut property), but a MBST is not necessarily a MST. If the bottleneck edge in a MBST is a bridge in the graph, then all spanning trees are MBSTs.
See also
- Reverse-Delete algorithm
- Dijkstra's algorithm
- Spanning tree protocol, used in switched networks
- Minimum spanning tree-based image segmentation
- Edmonds's algorithm
- Distributed minimum spanning tree
- Prim's algorithm
- Kruskal's algorithm
- Steiner tree
References
- ^ Gallager, R.G., Humblet, P.A., and Spira, P.M. 1983. A Distributed Algorithm for Minimum-Weight Spanning Trees. ACM Trans. Program. Lang. Syst. 5, 1 (Jan. 1983), 66-77.
- ^ Chazelle, B. 2000. A minimum spanning tree algorithm with inverse-Ackermann type complexity. J. ACM 47, 6 (Nov. 2000), 1028–1047.
- ^ Chazelle, B. 2000. The soft heap: an approximate priority queue with optimal error rate. J. ACM 47, 6 (Nov. 2000), 1012–1027.
- ^ Fredman, M. L. and Willard, D. E. 1994. Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. Syst. Sci. 48, 3 (Jun. 1994), 533–551.
- ^ Karger, D. R., Klein, P. N., and Tarjan, R. E. 1995. A randomized linear-time algorithm to find minimum spanning trees. J. ACM 42, 2 (Mar. 1995), 321–328.
- ^ Pettie, S. and Ramachandran, V. 2002. Minimizing randomness in minimum spanning tree, parallel connectivity, and set maxima algorithms. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (San Francisco, California, January 06 - 08, 2002). Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, 713–722.
- ^ Pettie, S. and Ramachandran, V. 2002. An optimal minimum spanning tree algorithm. J. ACM 49, 1 (Jan. 2002), 16–34.
- ^ Chong, K. W., Han, Y., and Lam, T. W. 2001. Concurrent threads and optimal parallel minimum spanning trees algorithm. J. ACM 48, 2 (Mar. 2001), 297–323.
- ^ Pettie, S. and Ramachandran, V. 2002. A Randomized Time-Work Optimal Parallel Algorithm for Finding a Minimum Spanning Forest. SIAM J. Comput. 31, 6 (Jun. 2002), 1879–1895.
- ^ Bader, D. A. and Cong, G. 2006. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. J. Parallel Distrib. Comput. 66, 11 (Nov. 2006), 1366–1378.
- ^ R. Dementiev, P. Sanders, D. Schultes, and J. Sibeyn. Engineering an external memory minimum spanning tree algorithm. In Proc. 3rd IFIP Intl. Conf. on Theoretical Computer Science, pages 195--208, 2004.
- ^ Gabow, H.N., Two algorithms for generating weighted spanning trees in order. SIAM Journal on Computing, Vol. 6, No. 1. (1977), pp. 139–150.
- ^ Spira, P.M. and Pan, A. 1975. On Finding and Updating Spanning Trees and Shortest Paths. SIAM Journal on Computing, vol. 4(3), pp. 375–380.
- Graham, R. L. and Hell, P. 1985. On the History of the Minimum Spanning Tree Problem. IEEE Ann. Hist. Comput. 7, 1 (Jan. 1985), 43–57.
- Otakar Boruvka on Minimum Spanning Tree Problem (translation of the both 1926 papers, comments, history) (2000) Jaroslav Nesetril, Eva Milková, Helena Nesetrilová. (Section 7 gives his algorithm, which looks like a cross between Prim's and Kruskal's.)
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Chapter 23: Minimum Spanning Trees, pp. 561–579.
- Eisner, Jason (1997). State-of-the-art algorithms for minimum spanning trees: A tutorial discussion. Manuscript, University of Pennsylvania, April. 78 pp.