Uniform convergence in probability is a concept in probability theory with applications to statistical learning theory. It means that, under certain conditions, the empirical frequencies of all events in a certain event-family converge to their theoretical probabilities.
For a class of predicates defined on a set and a set of samples , where , the empirical frequency of on is .
The Uniform Convergence Theorem states, roughly, that if is "simple" and we draw samples independently (with replacement) from according to a distribution , then with high probability, the empirical frequency will be close to its expected value, which is the theoretical probability .
Here "simple" means that the Vapnik-Chervonenkis dimension of the class is small relative to the size of the sample.
In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole.
The Uniform Convergence Theorem was first proved by Vapnik and Chervonenkis[1] using the concept of growth function.
From the point of Learning Theory one can consider to be the Concept/Hypothesis class defined over the instance set . Before getting into the details of the proof of the theorem we will state Sauer's Lemma which we will need in our proof.
Lemma:, where is the VC Dimension of the concept class .
Corollary:.
Proof of uniform convergence theorem
[1] and [2] are the sources of the proof below. Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof.
Symmetrization: We transform the problem of analyzing into the problem of analyzing , where and are i.i.d samples of size drawn according to the distribution . One can view as the original randomly drawn sample of length , while may be thought as the testing sample which is used to estimate .
Permutation: Since and are picked identically and independently, so swapping elements between them will not change the probability distribution on and . So, we will try to bound the probability of for some by considering the effect of a specific collection of permutations of the joint sample . Specifically, we consider permutations which swap and in some subset of . The symbol means the concatenation of and .
Reduction to a finite class: We can now restrict the function class to a fixed joint sample and hence, if has finite VC Dimension, it reduces to the problem to one involving a finite function class.
We present the technical details of the proof.
Symmetrization
Lemma: Let for some and
for some .
Then for , .
Proof:
By the triangle inequality,
if and then .
Therefore,
and
and [since and are independent].
Now for fix an such that . For this , we shall show that
.
Thus for any , and hence . And hence we perform the first step of our high level idea.
Notice, is a binomial random variable with expectation and variance . By Chebyshev's inequality we get,
for the mentioned bound on . Here we use the fact that for .
Permutations
Let be the set of all permutations of that swaps and in some subset of .
Lemma: Let be any subset of and any probability distribution on . Then,
where the expectation is over chosen according to , and the probability is over chosen uniformly from .
Proof:
For any
[since coordinate permutations preserve the product distribution ].
[Because is finite]
[The expectation]
.
The maximum is guaranteed to exist since there is only a finite set of values that probability under a random permutation can take.
Reduction to a finite class
Lemma: Basing on the previous lemma,
.
Proof:
Let us define and which is atmost . This means there are functions such that for any between and with for .
We see that iff for some in satisfies,
.
Hence if we define if and otherwise.
For and , we have that iff for some in satisfies . By union bound we get,
.
Since, the distribution over the permutations is uniform for each , so equals , with equal probability.
Thus,
,
where the probability on the right is over and both the possibilities are equally likely. By Hoeffding's inequality, this is at most .
Finally, combining all the three parts of the proof we get the Uniform Convergence Theorem.
References
^ abVapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025.
This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968.
The translation was reproduced as:
Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN978-3-319-21851-9.