Matroid

In combinatorial mathematics, a matroid is a structure that captures the essence of a notion of "independence" that generalizes linear independence in vector spaces. Formally, a matroid M on a finite set E is a pair (E, I), where I is a collection of subsets of E (called the independent sets) with the following properties:

the empty set is independent
every subset of an independent set is independent
if A and B are two independent sets and A has more elements than B, then there exists an element in A which is not in B and when added to B still gives an independent set

Examples

Let E be a finite set and k a natural number. The subsets of E with at most k elements are the independent sets of a matroid on E.
If E is any finite subset of a vector space V, then we can define a matroid M on E by taking the linearly independent elements in E to be the independent sets of M.
Every finite simple graph G gives rise to a matroid as follows: take as E the set of all edges in G and consider a set of edges independent iff it does not contain a simple cycle. This is called the forest matroid or graphic matroid.

Further definitions and properties

A subset of E is called dependent if it is not independent. A dependent set all of whose proper subsets are independent is called a circuit (with the terminology coming from the graph example above). An independent set all that is not properly contained in another independent set is called a basis (with the terminology coming from the vector space example above). Hence bases are maximal independent sets, and circuits are minimal dependent sets. An important fact is that all bases of a matroid have the same number of elements, called the rank of M. In general, the circuits of M have different sizes.

In the first example matroid above, a basis is a basis in the sense of linear algebra of the subspace spanned by E, and a circuit is a minimal set of dependent vectors of E. In the second example, a basis is the same as a spanning forest of the graph G, and circuits are cycles in the graph. In the third example, a basis is any subset of E with k elements, and a circuit is any subset of k + 1 elements.

If A is a subset of E, then a matroid on A can be defined by considering a subset of A independent if and only if it is independent in M. This allows to talk about the rank of any subset of E.

The rank function r assigns a natural number to every subset of E and has the following properties:

r(A) ≤ |A| for all subsets A of E
if A and B are subsets of E with A ⊆ B, then r(A) ≤ r(B)
for any two subsets A and B of E, we have r(A ∪ B) + r(A ∩ B) ≤ r(A) + r(B)

In fact, these three properties can be used as one of many alternative definitions of matroids: the independent sets are then defined as those subsets A of E with r(A) = |A|.

If M is a matroid, we can define the dual matroid M* by taking the same underlying set and calling a set independent in M* if and only if it is contained in the complement of some basis of M. One checks easily that M* is indeed a matroid.

Weighted matroids and greedy algorithms

A weight function w:E→R⁺ for a matroid M=(E, I) assigns a strictly positive weight to each element of E. We extend the function to subsets of E by summation; w(A) is the sum of w(x) over x in A. A matroid with an associated weight function is called a weighted matroid.

As a simple example, say we wish to find the maximum spanning forest of a graph. That is, given a graph and a weight for each edge, find a forest containing every vertex and maximizing the total weight of the edges in the tree. This problem arises in some clustering applications. If we look at the definition of the forest matroid above, we see that the maximum spanning forest is simply the independent set with largest total weight — such a set must span the graph, for otherwise we can add edges without creating cycles. But how do we find it?

An independent set of largest total weight is called an optimal set. Optimal sets are always bases, because if an edge can be added, it should be; this only increases the total weight. As it turns out, there is a trivial greedy algorithm for computing an optimal set of a matroid. It works as follows:

Let A be the empty set.
For each x in E, taken in (monotonically) decreasing order by weight
- if A ∪ {x} is independent, then set A to A ∪ {x}.

The easiest way to traverse the members of E in the desired order is to sort them. This requires O(|E|log|E|) time using a comparison sorting algorithm. We also need to test for each x whether A ∪ {x} is independent; assuming independence tests require O(f(|E|)) time, the total time for the algorithm is O(|E|log|E| + |E|f(|E|)).

If we want to find a minimum spanning tree instead, we simply "invert" the weight function by subtracting it from a large constant exceeding the total graph weight. In other words, let w_min(x) = W - w(x), where W exceeds the total weight over all graph edges. Many more optimization problems about all sorts of matroids and weight functions can be solved in this trivial way, although in many cases more efficient algorithms can be found that exploit more specialized properties.

History

The matroid concept was introduced by Hassler Whitney in 1935 in his paper "On the abstract properties of linear dependence".

Links and references

Steven R. Pagano: Matroids and Signed Graphs
Oxley, James G.: "Matroid Theory", Oxford University Press, New York, 1992
PlanetMath article on matroids. Contains several other equivalent definitions of matroids.
Sandra Kingan: Matroid theory. Lots of links.
T. Cormen, C. Leiserson, R. Rivest. Introduction to Algorithms. MIT Press, 1990. Section 16.4, Theoretical foundations for greedy methods. ISBN 0262032937.