Learning curve (machine learning)

In machine learning (ML), a learning curve (or training curve) is a graphical representation that shows how a model's performance on a training set (and usually a validation set) changes with the number of training iterations (epochs) or the amount of training data.^[1] Typically, the number of training epochs or training set size is plotted on the x-axis, and the value of the loss function (and possibly some other metric such as the cross-validation score) on the y-axis.

Synonyms include error curve, experience curve, improvement curve and generalization curve.^[2]

More abstractly, learning curves plot the difference between learning effort and predictive performance, where learning effort usually means the number of training samples, and predictive performance means accuracy on testing samples.^[3]

Learning curves have many useful purposes in ML, including:^[4]^[5]^[6]

choosing model parameters during design,
adjusting optimization to improve convergence,
diagnosing problems such as overfitting (or underfitting),
and determining the amount of data used for training.

Formal definition

One model of a machine learning is producing a function, $f(x)$ , which given some information, $x$ , predicts some variable, $y$ , from training data $X_{\text{train}}$ and $Y_{\text{train}}$ . It is distinct from mathematical optimization because $f$ should predict well for $x$ outside of $X_{\text{train}}$ .

We often constrain the possible functions to a parameterized family of functions, $\{f_{\theta }(x):\theta \in \Theta \}$ , so that our function is more generalizable^[7] or so that the function has certain properties such as those that make finding a good $f$ easier, or because we have some a priori reason to think that these properties are true.^[7]^: 172

Given that it is not possible to produce a function that perfectly fits our data, it is then necessary to produce a loss function $L(f_{\theta }(X),Y')$ to measure how good our prediction is. We then define an optimization process which finds a $\theta$ which minimizes $L(f_{\theta }(X_{,}Y))$ referred to as $\theta ^{*}(X,Y)$ .

Training curve for amount of data

Then if our training data is $\{x_{1},x_{2},\dots ,x_{n}\},\{y_{1},y_{2},\dots y_{n}\}$ and our validation data is $\{x_{1}',x_{2}',\dots x_{m}'\},\{y_{1}',y_{2}',\dots y_{m}'\}$ a learning curve is the plot of the two curves

$i\mapsto L(f_{\theta ^{*}(X_{i},Y_{i})}(X_{i}),Y_{i})$
$i\mapsto L(f_{\theta ^{*}(X_{i},Y_{i})}(X_{i}'),Y_{i}')$

where $X_{i}=\{x_{1},x_{2},\dots x_{i}\}$

Training curve for number of iterations

Many optimization processes are iterative, repeating the same step until the process converges to an optimal value. Gradient descent is one such algorithm. If you define $\theta _{i}^{*}$ as the approximation of the optimal $\theta$ after $i$ steps, a learning curve is the plot of

$i\mapsto L(f_{\theta _{i}^{*}(X,Y)}(X),Y)$
$i\mapsto L(f_{\theta _{i}^{*}(X,Y)}(X'),Y')$

Choosing the size of the training dataset

It is a tool to find out how much a machine model benefits from adding more training data and whether the estimator suffers more from a variance error or a bias error. If both the validation score and the training score converge to a value that is too low with increasing size of the training set, it will not benefit much from more training data.^[8]

In the machine learning domain, there are two implications of learning curves differing in the x-axis of the curves, with experience of the model graphed either as the number of training examples used for learning or the number of iterations used in training the model.^[9]

References

^ "Mohr, Felix and van Rijn, Jan N. "Learning Curves for Decision Making in Supervised Machine Learning - A Survey." arXiv preprint arXiv:2201.12150 (2022)". arXiv:2201.12150.
^ Viering, Tom; Loog, Marco (2023-06-01). "The Shape of Learning Curves: A Review". IEEE Transactions on Pattern Analysis and Machine Intelligence. 45 (6): 7799–7819. arXiv:2103.10948. doi:10.1109/TPAMI.2022.3220744. ISSN 0162-8828. PMID 36350870.
^ Perlich, Claudia (2010), "Learning Curves in Machine Learning", in Sammut, Claude; Webb, Geoffrey I. (eds.), Encyclopedia of Machine Learning, Boston, MA: Springer US, pp. 577–580, doi:10.1007/978-0-387-30164-8_452, ISBN 978-0-387-30164-8, retrieved 2023-07-06
^ Madhavan, P.G. (1997). "A New Recurrent Neural Network Learning Algorithm for Time Series Prediction" (PDF). Journal of Intelligent Systems. p. 113 Fig. 3.
^ "Machine Learning 102: Practical Advice". Tutorial: Machine Learning for Astronomy with Scikit-learn.
^ Meek, Christopher; Thiesson, Bo; Heckerman, David (Summer 2002). "The Learning-Curve Sampling Method Applied to Model-Based Clustering". Journal of Machine Learning Research. 2 (3): 397. Archived from the original on 2013-07-15.
^ ^a ^b Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016-11-18). Deep Learning. MIT Press. p. 108. ISBN 978-0-262-03561-3.
^ scikit-learn developers. "Validation curves: plotting scores to evaluate models — scikit-learn 0.20.2 documentation". Retrieved February 15, 2019.
^ Sammut, Claude; Webb, Geoffrey I. (Eds.) (28 March 2011). Encyclopedia of Machine Learning (1st ed.). Springer. p. 578. ISBN 978-0-387-30768-8.

[abs2201.12150-1] "Mohr, Felix and van Rijn, Jan N. "Learning Curves for Decision Making in Supervised Machine Learning - A Survey." arXiv preprint arXiv:2201.12150 (2022)". arXiv:2201.12150.

[2] Viering, Tom; Loog, Marco (2023-06-01). "The Shape of Learning Curves: A Review". IEEE Transactions on Pattern Analysis and Machine Intelligence. 45 (6): 7799–7819. arXiv:2103.10948. doi:10.1109/TPAMI.2022.3220744. ISSN 0162-8828. PMID 36350870.

[3] Perlich, Claudia (2010), "Learning Curves in Machine Learning", in Sammut, Claude; Webb, Geoffrey I. (eds.), Encyclopedia of Machine Learning, Boston, MA: Springer US, pp. 577–580, doi:10.1007/978-0-387-30164-8_452, ISBN 978-0-387-30164-8, retrieved 2023-07-06

[4] Madhavan, P.G. (1997). "A New Recurrent Neural Network Learning Algorithm for Time Series Prediction" (PDF). Journal of Intelligent Systems. p. 113 Fig. 3.

[5] "Machine Learning 102: Practical Advice". Tutorial: Machine Learning for Astronomy with Scikit-learn.

[6] Meek, Christopher; Thiesson, Bo; Heckerman, David (Summer 2002). "The Learning-Curve Sampling Method Applied to Model-Based Clustering". Journal of Machine Learning Research. 2 (3): 397. Archived from the original on 2013-07-15.

[:0-7] Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016-11-18). Deep Learning. MIT Press. p. 108. ISBN 978-0-262-03561-3.

[scikit-learn_learning-curve-8] scikit-learn developers. "Validation curves: plotting scores to evaluate models — scikit-learn 0.20.2 documentation". Retrieved February 15, 2019.

[9] Sammut, Claude; Webb, Geoffrey I. (Eds.) (28 March 2011). Encyclopedia of Machine Learning (1st ed.). Springer. p. 578. ISBN 978-0-387-30768-8.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Formal definition

Training curve for amount of data

Training curve for number of iterations

Choosing the size of the training dataset

See also

References