The chain rule for Kolmogorov complexity is an analogue of the chain rule for Information entropy, which states:

That is, the combined randomness of two sequences X and Y is the sum of the randomness of X plus whatever randomness is left in Y once we know X.
This follows immediately from the definitions of conditional and joint entropy fact from probability theory that the joint probability is the product of the marginal and conditional probability:

The equivalent statement for Kolmogorov complexity does not hold exactly; it is only true up to a logarithmic factor:

It states that the shortest program to reproduce X and Y is using a program to reproduce X and a program to reproduce Y given X, plus at most a logarithmic factor. Using this statement one can define an analogue of mutual information for Kolmogorov complexity.
Proof
The
direction is obvious: we can write a program to produce
and
by concatenating a program to produce
, a program to produce
given
access to
, and (whence the log term) the length of one of the programs, so
that we know where to separate the two programs for
and
(
upper bounds this length).
The
direction is rather more difficult. The key to the proof is the
construction of the set
; that is, the
construction of the set of all pairs
such that the shortest input (for
a universal Turing machine) that produces
(and some way to distinguish
from
) is shorter than the shortest producing
. We will also
need
, the subset of
where
.
Enumerating
is not hard (although not fast!). In parallel, simulate all
programs with length
. Output each
as the
simulated program terminates; eventually, any particular such
will be
produced. We can restrict to
simply by ignoring any output where
. We will assume that the inequality does not hold and derive a contradiction
by finding a short description for
in terms of its index in the
enumeration of a subset of
.
In particular, we can describe
by giving the index in which
is output
when enumerating
(which is less than or equal to
) along with
and the value of
. Thus,

where the
term is from the overhead of the enumerator, which does not
depend on
or
.
Assume by way of contradiction

for some
.
Combining this with the above, we have

So for any constant
and some
,

Let
;
.
Now, we can enumerate a set of possible values for
by finding all
such
that
-- list only the
such that
is output after more than
other values when enumerating
.
is
one such
. Let
be the set of all such
. We can enumerate
by
enumerating each
(in parallel, as usual) and outputting a
only when
would be output for
and more than
other values for the
would be output. Note that given
,
and the index of
in
we
can enumerate
to recover
.
Note that
is a subset of
, and
has at
most
elements, since there are only that many programs of length
. So we have:

where the first inequality follows since each element of
corresponds to an
with strictly more than
elements.
But this leads to a contradiction:

which when we substitute in
gives

which for large enough
gives
.
References