Occam learning
This article, Occam learning, has recently been created via the Articles for creation process. Please check to see if the reviewer has accidentally left this template after accepting the draft and take appropriate action as necessary.
Reviewer tools: Inform author |
In Occam Learning, named after Occam's Razor, a PAC learning algorithm is evaluated based on its succinctness and performance on the training set, rather than directly on its predictive power on a test set. Occam learnability is equivalent to PAC learnability.
Definitions of Occam Learning and Succinctness
The succinctness of a concept in concept class can be expressed by the length of the shortest bit string that can represent in . Occam learning connects the succinctness of a learning algorithm's output to its predictive power on unseen data.
Let and be concept classes containing target concepts and hypotheses respectively and let sample set contain samples each containing bits. Then, for constants and , a learning algorithm is an (α,β)-Occam algorithm for using if, given labeled according to , outputs a hypothesis such that is consistent with on (that is, ) and .[1][2]
Such an algorithm is called an efficient (α,β)-Occam algorithm if it runs in time polynomial in , , and .
Equivalence of Occam and PAC learning
Any efficient Occam algorithm is also an efficient PAC learning algorithm. Specifically, an efficient (α,β)-Occam algorithm for using , given samples of bits each, such that will return such that with probability at least . More generally, there exists a constant such that given examples will output such that with probability at least .[1]
Any PAC learning algorithm is also an Occam algorithm. [3]
Improving Sample Complexity for Common Problems
Though Occam and PAC learnability are equivalent, the Occam framework can be used to produce tighter bounds on the sample complexity of classical problems including conjunctions[1], conjunctions with few relevant variables[4], and decision lists[5].
Extensions
Occam algorithms have also been shown to be successful for PAC learning in the presence of errors[6][7], probabilistic concepts [8], function learning [9], and Markovian non-independent examples[10].
See Also
References
Category:Computational learning theory Category:Theoretical computer science Category:Machine learning
- ^ a b c Kearns, M. J., & Vazirani, U. V. (1994). An introduction to computational learning theory, chapter 2. MIT press.
- ^ Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1987). Occam's razor. Information processing letters, 24(6), 377-380.
- ^ Board, R., & Pitt, L. (1990, April). On the necessity of Occam algorithms. In Proceedings of the twenty-second annual ACM symposium on Theory of computing (pp. 54-63). ACM.
- ^ Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artificial intelligence, 36(2), 177-221.
- ^ Rivest, R. L. (1987). Learning decision lists. Machine learning, 2(3), 229-246.
- ^ Angluin, D., & Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4), 343-370.
- ^ Kearns, M., & Li, M. (1993). Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4), 807-837.
- ^ Kearns, M. J., & Schapire, R. E. (1990, October). Efficient distribution-free learning of probabilistic concepts. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 382-391). IEEE.
- ^ Natarajan, B. K. (1993, August). Occam's razor for functions. In Proceedings of the sixth annual conference on Computational learning theory (pp. 370-376). ACM.
- ^ Aldous, D., & Vazirani, U. (1990, October). A Markovian extension of Valiant's learning model. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 392-396). IEEE.