Occam learning

In Occam Learning, named after Occam's Razor, a PAC learning algorithm is evaluated based on its succinctness and performance on the training set, rather than directly on its predictive power on a test set. Occam learnability is equivalent to PAC learnability.

Definitions of Occam Learning and Succinctness

The succinctness of a concept $c$ in concept class $C$ can be expressed by the length $size(c)$ of the shortest bit string that can represent $c$ in $C$ . Occam learning connects the succinctness of a learning algorithm's output to its predictive power on unseen data.

Let $C$ and $H$ be concept classes containing target concepts and hypotheses respectively and let sample set $S$ contain $m$ samples each containing $n$ bits. Then, for constants $\alpha \geq 0$ and $0\leq \beta <1$ , a learning algorithm $L$ is an (α,β)-Occam algorithm for $C$ using $H$ if, given $S$ labeled according to $c\in C$ , $L$ outputs a hypothesis $h\in H$ such that $h$ is consistent with $c$ on $S$ (that is, $h(x)=c(x),\forall x\in S$ ) and $size(h)\leq (n\cdot size(c))^{\alpha }m^{\beta }$ .^[1]^[2]

Such an algorithm $L$ is called an efficient (α,β)-Occam algorithm if it runs in time polynomial in $n$ , $m$ , and $size(c)$ .

Equivalence of Occam and PAC learning

Any efficient Occam algorithm is also an efficient PAC learning algorithm. Specifically, an efficient (α,β)-Occam algorithm $L$ for $C$ using $H$ , given $m$ samples of $n$ bits each, such that $m\geq \alpha \left({\frac {1}{\epsilon }}\log {\frac {1}{\delta }}+\left({\frac {(n\cdot size(c))^{\alpha })}{\epsilon }}\right)^{\frac {1}{1-\beta }}\right)$ will return $h\in H$ such that $error(h)\leq \epsilon$ with probability at least $1-\delta$ . More generally, there exists a constant $b>0$ such that given $m\geq {\frac {1}{b\epsilon }}\left(\log |H|+\log {\frac {1}{\delta }}\right)$ examples $L$ will output $h\in H$ such that $error(h)\leq \epsilon$ with probability at least $1-\delta$ .^[1]

Any PAC learning algorithm is also an Occam algorithm. ^[3]

Improving Sample Complexity for Common Problems

Though Occam and PAC learnability are equivalent, the Occam framework can be used to produce tighter bounds on the sample complexity of classical problems including conjunctions^[1], conjunctions with few relevant variables^[4], and decision lists^[5].

Extensions

Occam algorithms have also been shown to be successful for PAC learning in the presence of errors^[6]^[7], probabilistic concepts ^[8], function learning ^[9], and Markovian non-independent examples^[10].

References

Category:Computational learning theory Category:Theoretical computer science Category:Machine learning

^ ^a ^b ^c Kearns, M. J., & Vazirani, U. V. (1994). An introduction to computational learning theory, chapter 2. MIT press.
^ Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1987). Occam's razor. Information processing letters, 24(6), 377-380.
^ Board, R., & Pitt, L. (1990, April). On the necessity of Occam algorithms. In Proceedings of the twenty-second annual ACM symposium on Theory of computing (pp. 54-63). ACM.
^ Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artificial intelligence, 36(2), 177-221.
^ Rivest, R. L. (1987). Learning decision lists. Machine learning, 2(3), 229-246.
^ Angluin, D., & Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4), 343-370.
^ Kearns, M., & Li, M. (1993). Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4), 807-837.
^ Kearns, M. J., & Schapire, R. E. (1990, October). Efficient distribution-free learning of probabilistic concepts. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 382-391). IEEE.
^ Natarajan, B. K. (1993, August). Occam's razor for functions. In Proceedings of the sixth annual conference on Computational learning theory (pp. 370-376). ACM.
^ Aldous, D., & Vazirani, U. (1990, October). A Markovian extension of Valiant's learning model. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 392-396). IEEE.

[kv-1] Kearns, M. J., & Vazirani, U. V. (1994). An introduction to computational learning theory, chapter 2. MIT press.

[def-2] Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1987). Occam's razor. Information processing letters, 24(6), 377-380.

[3] Board, R., & Pitt, L. (1990, April). On the necessity of Occam algorithms. In Proceedings of the twenty-second annual ACM symposium on Theory of computing (pp. 54-63). ACM.

[4] Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artificial intelligence, 36(2), 177-221.

[5] Rivest, R. L. (1987). Learning decision lists. Machine learning, 2(3), 229-246.

[6] Angluin, D., & Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4), 343-370.

[7] Kearns, M., & Li, M. (1993). Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4), 807-837.

[8] Kearns, M. J., & Schapire, R. E. (1990, October). Efficient distribution-free learning of probabilistic concepts. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 382-391). IEEE.

[9] Natarajan, B. K. (1993, August). Occam's razor for functions. In Proceedings of the sixth annual conference on Computational learning theory (pp. 370-376). ACM.

[10] Aldous, D., & Vazirani, U. (1990, October). A Markovian extension of Valiant's learning model. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 392-396). IEEE.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Definitions of Occam Learning and Succinctness

Equivalence of Occam and PAC learning

Improving Sample Complexity for Common Problems

Extensions

See Also

References