Distributed minimum spanning tree

The distributed minimum spanning tree (MST) problem involves the construction of a minimum spanning tree by a distributed algorithm, in a network where nodes communicate by message passing. It is radically different from the classical sequential problem, although the most basic approach resembles Borůvka's algorithm. Distributed minimum spanning tree is mainly used for broadcast In particular, if the cost for a message to pass through an edge in a graph is significant, a MST can minimize the total cost for source process to communicate with all the other processes in the network.

The problem was first suggested and solved in $O(V\log V)$ time in 1983 by Gallager et. al.,^[1] where $V$ is the number of vertices in the graph. Later the solution was improved to $O(V)$ ^[2] and finally^[3] ^[4] $O({\sqrt {V}}\log ^{*}V+D)$ where D is the network, or graph diameter. Lower bound on the time complexity of the solution has been eventually shown to be^[5] $\Omega \left({\frac {\sqrt {V}}{\log V}}\right).$

Overview

The input graph $G(V,E)$ is considered to be a network, where vertices $V$ are independent computing nodes and edges $E$ are communication links. Links are weighted as in the classical problem.

At the beginning of the algorithm, nodes know only the weights of the links which are connected to them. (It is possible to consider models in which they know more, for example their neighbors' links.)

As the output of the algorithm, every node knows which of its links belong to the Minimum Spanning Tree and which do not.

MST in message-passing model

Message-passing model is one of the common used models in distributed computing. In this model, each process is modeled as a node of a graph. The communication channel between two processes is an edge of the graph.

Two common used minimum spanning tree algorithms are Prim’s algorithm and Kruskal’s algorithm. However, it is difficult to apply these two algorithms in the message-passing model. The mainly challenges are:

Both Prim’s algorithm and Kruskal’s algorithm executes sequentially. Therefore, when a process A is executing, other processes have to wait, which obviously degrades the performance.
Both Prim’s algorithm and Kruskal’s algorithm require processes to know the state of the whole graph, which is very difficult to discover in message-passing model.

Due to those difficulties, traditional MST algorithms are not suitable in message-passing model. Therefore, new category of MST algorithms was introduced for constructing a MST in message-passing model.

GHS algorithm

GHS algorithm is one of the best-known algorithms in distributed computing theory. This algorithm can construct the MST in Message-passing model.

Precondition

The algorithm should run on connected undirected graph.^[1]
The graph should have distinct finite weight assigned to each edge.^[1]
Each node initially knows the weight for each edge incident to that node.^[1]
Initially, each node is in a quiescent state and it either spontaneously awakens or awakened by receipt of any message from another node.^[1]
Messages can be transmitted independently in both directions on an edge and arrive after an unpredictable but finite delay, without error.^[1]
Each edge satisfies FIFO property.^[1]

Properties of MST

Let's define fragment of a MST G as a sub-tree of G, that is, a connected set of nodes and edges of G. There are two properties of MST:

Given a fragment of a minimum spanning tree G, let e be a minimum-weight outgoing edge of the fragment. Then joining e and its adjacent non-fragment node to the fragment yields another fragment of G.^[1]
If all the edges of a connected graph have different weights, then the MST is unique.^[1]

These two properties form the basic correctness of GHS algorithm. In general, GHS algorithm is a button-up algorithm in the sense that it starts by letting each individual node be a fragment and joining fragments in a certain way to form new fragments. This fragments-joining process repeats until there is only one fragment left and property 1 and 2 imply the result fragment is an MST.

Description of the algorithm

GHS algorithm assigns a level of each fragment, which is a non-decreasing integer with initial value 0. Each non-zero level fragment has an ID, which is the ID of the core edge in the fragment. The core edge of each fragment is selected when the fragment is constructed. During the execution of the algorithm, each node can classify each of its incident edge into three categories:^[1]^[6]

Branch: Edges are those that have already been determined to be part of the minimum spanning tree.
Rejected: Edges are those that have already been determined not to be part of the minimum spanning tree.
Basic: Edges are neither of branch edge and rejected edge.

For level 0 fragments, each awaken node will do the followings:

Chooses its minimum-weight incident edge and marks that edge as branch
Sends a message via the branch edge to notify the node on the other side.
Waiting for message from the other end of the fragment.

The edge chosen by both nodes it connects becomes the core with level 1.

For non-zero level fragment, an execution of the algorithm can be separated into three stages in each level:

Broadcast

The two nodes adjacent to the core broadcast messages to the rest of the nodes. The messages are sent via the branch edge but not via the core. Each broadcast message contains the ID and level of the fragment. At the end of this stage, each node received the new fragment ID and level.

Convertcast

In this stage, each node tries to cooperate to find the minimum weight outgoing edge of the fragment. Outgoing edges are edges connecting to other fragments. The messages sent in this stage are in the opposite direction of the broadcast stage. Initialized by all the leaves (the nodes have only one branch edge), a message is sent through the branch edge. The message contains the minimum weight (infinity if not exist) of its outgoing edge it found (the way to find the minimum outgoing edge will be discussed later). For each non-leaf node, (let the number of its branch edges is n) after receiving n-1 convertcast messages, it will pick up the minimum weight from the messages and compare it to the weights of its incident outgoing edges. The smallest weight will be sent toward the branch it received broadcast from.

Change core

After the complete of the previous stage, the two nodes connected by the core can acknowledge each other the best edges they received. Then they can identify the minimum outgoing edge. A message will be sent from the core to the minimum outgoing edge via a set of branch edge(s). Finally, a message will be sent out via the chosen outgoing edge to request to combine those two fragments. Depending on the levels of those two fragments, one of 2 combined operations could perform to form a new fragment (details discussed below).

How to find minimum weight outgoing edge?

As discussed above, every node needs to find its minimum weight outgoing edge after the receipt of a broadcast message from the core. If node n receives a broadcast, it will pick its minimum weight basic edge and send a message to the node on the other side with its fragment’s ID and level. Let node n’ be the node on the other side, then node n’ will decide whether the edge is an outgoing edge and send back a message to notify node n of the result. The decision is according to followings:
Case 1: Fragment_ID(n) = Fragment_ID(n’)
Then node n and n’ belongs to same fragment.
Case 2: Fragment_ID(n) != Fragment_ID(n’) and Level(n) <= Level(n’)
Then node n and n’ belongs to the different fragments.
Case 3: Fragment_ID(n) != Fragment_ID(n’) and Level(n) > Level(n’)
We can’t make any conclusion. The reason is those two nodes may belongs to the same fragment already but node n’ haven’t discovered this fact yet due to the delay of the broadcast message. In this case, the algorithm let node n’ postpone the response until its level becomes higher or equal to the level it received from node n.

How to combine two fragments?

Let F and F’ be the two fragments that need to be combined. There is two ways to do this:^[1]^[6]

Merge: This operation occurs if both F and F’ share the common minimum weight outgoing edge, and Level(F) = Level(F’). The combined fragment will have level as Level(F) + 1.
Absorb: This operation occurs if Level(F) < Level(F’). The combined fragment will have the same level as F’.

Furthermore, when this operation occurs, F must be in the stage of changing the core while F’ can be in broadcast or convertcast stage. Therefore, some work is needed to make sure absorb operation won’t interfere with operations in F’. Let e be the edge that F and F’ want to combine with and node n and n’ be the two nodes connected by e in F and F’ respectively. There are two cases to consider:
Case 1: Node n’ hasn’t sent convertcast message back to the core.
In this case, node n’ can simply initial a broadcast to F to update the fragment ID and collect minimum weight outgoing edge in F.
Case 2: Node n’ has already finished the convertcast.
That means n’ has packed a minimum weight outgoing edge e’, which is different from e(otherwise, node n’ will postpone the response according to the algorithm), into the convertcast message that sends to the core and weight(e’) < weight(e). Therefore, e’ is still the minimum outgoing edge to choose after F is absorbed.

Progress property

This algorithm has a nice property that the lowest level fragments won’t be blocked, although some operations in non-lowest level fragments may be blocked. This property implies the algorithm will eventually terminate with a minimum spanning tree.

Approximation algorithms

An $O(\log n)$ -approximation algorithm was developed by Maleq Khan and Gopal Pandurangan.^[7] This algorithm runs in $O(D+L\log n)$ time, where $L$ is the local shortest path diameter^[7] of the graph.

References

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Robert G. Gallager, Pierre A. Humblet, and P. M. Spira, “A distributed algorithm for minimum-weight spanning trees,” ACM Transactions on Programming Languages and Systems, vol. 5, no. 1, pp. 66–77, January 1983.
^ Baruch Awerbuch. “Optimal Distributed Algorithms for Minimum Weight Spanning Tree, Counting, Leader Election, and Related Problems,” Proceedings of the 19th ACM Symposium on Theory of Computing (STOC), New York City, New York, May 1987.
^ Juan Garay, Shay Kutten and David Peleg, “A Sub-Linear Time Distributed Algorithm for Minimum-Weight Spanning Trees (Extended Abstract),” Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 1993.
^ Shay Kutten and David Peleg, “Fast Distributed Construction of Smallk-Dominating Sets and Applications,” Journal of Algorithms, Volume 28, Issue 1, July 1998, Pages 40-66.
^ David Peleg and Vitaly Rubinovich “A near tight lower bound on the time complexity of Distributed Minimum Spanning Tree Construction,“ SIAM Journal on Computing, 2000, and IEEE Symposium on Foundations of Computer Science (FOCS), 1999.
^ ^a ^b Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.
^ ^a ^b Maleq Khan and Gopal Pandurangan. “A Fast Distributed Approximation Algorithm for Minimum Spanning Trees,” Distributed Computing, vol. 20, no. 6, pp. 391–402, Apr. 2008.

[GHS-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Robert G. Gallager, Pierre A. Humblet, and P. M. Spira, “A distributed algorithm for minimum-weight spanning trees,” ACM Transactions on Programming Languages and Systems, vol. 5, no. 1, pp. 66–77, January 1983.

[2] Baruch Awerbuch. “Optimal Distributed Algorithms for Minimum Weight Spanning Tree, Counting, Leader Election, and Related Problems,” Proceedings of the 19th ACM Symposium on Theory of Computing (STOC), New York City, New York, May 1987.

[3] Juan Garay, Shay Kutten and David Peleg, “A Sub-Linear Time Distributed Algorithm for Minimum-Weight Spanning Trees (Extended Abstract),” Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 1993.

[4] Shay Kutten and David Peleg, “Fast Distributed Construction of Smallk-Dominating Sets and Applications,” Journal of Algorithms, Volume 28, Issue 1, July 1998, Pages 40-66.

[5] David Peleg and Vitaly Rubinovich “A near tight lower bound on the time complexity of Distributed Minimum Spanning Tree Construction,“ SIAM Journal on Computing, 2000, and IEEE Symposium on Foundations of Computer Science (FOCS), 1999.

[Lynch-6] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.

[khan-7] Maleq Khan and Gopal Pandurangan. “A Fast Distributed Approximation Algorithm for Minimum Spanning Trees,” Distributed Computing, vol. 20, no. 6, pp. 391–402, Apr. 2008.

[1]

[2]

[3]

[4]

[5]

[6]

[7]