Expected linear time MST algorithm
Overview
The key insight to the algorithm is a random sampling step which partitions a graph into two subgraphs by randomly selecting edges to include in each subgraph. The algorithm recursively finds the minimum spanning forest of the first subproblem and uses the solution in conjunction with the linear time verification algorithm to discard edges in the graph that cannot be in the minimum spanning tree. A procedure taken from Borůvka's algorithm is also used to reduce the size of the graph at each recursion.
Borůvka Step
Each iteration of the algorithm relies on an adaptation of Borůvka's Algorithm referred to as a Borůvka Step
Input: A graph G with no isolated vertices 1 For each vertex v, select the lightest edge incident on v 2 Create a contracted graph G' by replacing each component of G connected by the edges selected in step 1 with a single vertex 3 Remove all isolated vertices, self-loops, and non-minimal repetitive edges from G' Output: The edges selected in step 1 and the contracted graph G'
A Borůvka Step is equivalent to the inner loop of Borůvka's algorithm which runs in O(m) time where m is the number of edges in G. Furthermore, since each edge can be selected at most twice (once by each incident vertex) the maximum number of disconnected components after step 1 is equal to half the number of vertices. Thus, a Borůvka step reduces the number of vertices in the graph by at least a factor of two and deletes at least n/2 edges where n is the number of vertices in G.
Example Execution of a Borůvka Step
F-heavy and F-light Edges
In each iteration the algorithm removes edges with particular properties that exclude them from the minimum spanning tree. These are called F-heavy edges and are defined as follows. Let F be a forest on the graph H. An F-heavy edge is an edge e connecting vertices u,v whose weight is strictly greater than the weight of the heaviest edge on the path from u to v in F. (If a path does not exist in F it is considered to have infinite weight). Any edge that is not F-heavy is F-light. If H is a subgraph of G then any F-heavy edge in G cannot be in the minimum spanning tree of G by the cycle property. Given a forest, F-heavy edges can be computed in linear time using a minimum spanning tree verification algorithm[1][2].