Growth function

The growth function (also called: shatter coefficient) measures the richness of a set family. It is especially used in the context of statistical learning theory, where it measures the complexity of a hypothesis class. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties.^[1]

Definitions

Set-family definition

Let $H$ be a set family (a set of sets) and $C$ a set. Their intersection is defined as the following set-family:

H\cap C:=\{h\cap C\mid h\in H\}

The index of $H$ with respect to $C$ is the size of this intersection:

\operatorname {Index} (H,C):=|H\cap C|

Obviously, if a set $C_{m}$ has $m$ elements then the index is at most $2^{m}$ .

The growth function measures the size of $H\cap C$ as a function of $|C|$ . Formally:

\operatorname {Growth} (H,m):=\max _{C:|C|=m}|H\cap C|

Hypothesis-class definition

Equivalently, let $H$ be a hypothesis-class (a set of binary functions) and $C$ a set with $m$ elements. The restriction of $H$ to $C$ is the set of binary functions on $C$ that can be derived from $H$ :^[2]^: 45

H_{C}:=\{(h(x_{1}),\ldots ,h(x_{m}))\mid h\in H,x_{i}\in C\}

The growth function measures the size of $H_{C}$ as a function of $|C|$ :^[2]^: 49

\operatorname {Growth} (H,m):=\max _{C:|C|=m}|H\cap C|(H,m):=\max _{C:|C|=m}|H_{C}|

Examples

1. The domain is the real line $\mathbb {R}$ . The set-family $H$ contains all the half-lines (rays) from a given number to positive infinity, i.e., all sets of the form $\{x>x_{0}\mid x\in \mathbb {R} \}$ for some $x_{0}\in \mathbb {R}$ . For any set $C$ of $m$ real numbers, the intersection $H\cap C$ contains $m+1$ sets: the empty set, the set containing the largest element of $C$ , the set containing the two largest elements of $C$ , and so on. Therefore: $\operatorname {Growth} (H,m)=m+1$ .^[1]^: Ex.1 The same is true whether $H$ contains open half-lines, closed half-lines, or both.

2. The domain is the segment $[0,1]$ . The set-family $H$ contains all the open sets. For any set $C$ of $m$ real numbers, the intersection $H\cap C$ contains all possible subsets of $C$ . There are $2^{m}$ such subsets, so $\operatorname {Growth} (H,m)=2^{m}$ . ^[1]^: Ex.2

3. The domain is the Euclidean space $\mathbb {R} ^{n}$ . The set-family $H$ contains all the half-spaces of the form: $x\cdot \phi \geq 1$ , where $\phi$ is a fixed vector. Then $\operatorname {Growth} (H,m)=\operatorname {Comp} (n,m)$ , where Comp is the number of number of components in a partitioning of an n-dimensional space by m hyperplanes.^[1]^: Ex.3

4. The domain is the real line $\mathbb {R}$ . The set-family $H$ contains all the real intervals, i.e., all sets of the form $\{x\in [x_{0},x_{1}]|x\in \mathbb {R} \}$ for some $x_{0},x_{1}\in \mathbb {R}$ . For any set $C$ of $m$ real numbers, the intersection $H\cap C$ contains all runs of between 0 and $m$ consecutive elements of $C$ . The number of such runs is ${m+1 \choose 2}+1$ , so $\operatorname {Growth} (H,m)={m+1 \choose 2}+1$ .

Properties

The growth function has two trivial bounds.

1. For any finite $H$ :

\operatorname {Growth} (H,m)\leq |H|

since for every $C$ , the number of elements in $H\cap C$ is at most $|H|$ . Therefore, the growth function is mainly interesting when $H$ is infinite.

2. For any nonempty $H$ :

\operatorname {Growth} (H,m)\leq 2^{m}

I.e, the growth function has an exponential upper-bound.

We say that a set-family $H$ shatters a set $C$ if their intersection contains all possible subsets of $C$ , i.e. $H\cap C=2^{C}$ . If $H$ shatters $C$ of size $m$ , then $\operatorname {Growth} (H,C)=2^{m}$ , which is the upper bound.

3. The following is a property of the Index function:

If, for some set $C_{m}$ of size $m$ , and for some number $n\leq m$ , $|H\cap C_{m}|\geq \operatorname {Comp} (n,m)$ -
then, there exists a subset $C_{n}\subseteq C_{m}$ of size $n$ such that $|C_{n}\cap H|$ = $2^{n}$ .

This implies the following property of the Growth function.^[1]^: Th.1 For every family $H$ there are two cases:

The exponential case: $\operatorname {Growth} (H,m)=2^{m}$ identically.
The polynomial case: $\operatorname {Growth} (H,m)$ is majorized by $\operatorname {Comp} (n,m)\leq m^{n}+1$ , where $n$ is the smallest integer for which $\operatorname {Growth} (H,n)<2^{n}$ .

The VC dimension of $H$ is defined according to these two cases:

In the polynomial case, $\operatorname {VCDim} (H)=n-1$ = the largest integer $d$ for which $\operatorname {Growth} (H,d)=2^{d}$ .
In the exponential case $\operatorname {VCDim} (H)=\infty$ .

So $VCDim(H)\geq d$ if-and-only-if $Growth(H,d)=2^{d}$ .

T‎he growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether $\operatorname {Growth} (H,d)$ is equal to or smaller than $2^{d}$ , while the growth function tells us exactly how $Growth(H,m)$ changes as a function of $m$ .

4. Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:^[2]^: 49

If

VCDim(H)=d

, then:

for all

m

:

\operatorname {Growth} (H,m)\leq \sum _{i=0}^{d}{m \choose i}

In particular,

for all

m>d+1

:

\operatorname {Growth} (H,m)\leq (em/d)^{d}=O(m^{d})

so the growth function grows polynomially, rather then exponentially, with

m

.

Applications in probability theory

Let $H$ be family of subsets of some universal set $X$ . Suppose we choose a set $C_{m}$ that contains $m$ elements of $X$ . For each element $h\in H$ we calculate the relative frequency $|h\cap C_{m}|/m$ and compare it to the probability of $h$ . Then, the difference satisfies the following upper bound:^[1]^: Th.2

Pr[\sup _{h\in H}({\big |}|h\cap C_{m}|/m-Pr[h]{\big |})>\epsilon ]\leq 4\cdot \operatorname {Growth} (H,2m)\cdot \exp(-\epsilon ^{2}\cdot m/8)

References

^ ^a ^b ^c ^d ^e ^f Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. The paper was first published in 1968 in Russian. The first English translation, by B. Seckler, appeared in 1971. The translation was reproduced in 2015: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9.
^ ^a ^b ^c Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135.

[vc-1] ^ ^a ^b ^c ^d ^e ^f Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. The paper was first published in 1968 in Russian. The first English translation, by B. Seckler, appeared in 1971. The translation was reproduced in 2015: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9.

[book14-2] Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135.

[1]

[2]