Platt scaling
Part of a series on |
Machine learning and data mining |
---|
In machine learning, Platt scaling or Platt calibration is a method of transforming the outputs of a classification model into a probability distribution over classes. The method was invented by John Platt in the context of support vector machines,[1] replacing an earlier method by Vapnik, but can be applied to other classification models.[2] Platt scaling works by fitting a logistic regression model to a classifier's scores.
Description
Let f be a real-valued function that is used as a binary classifier to predict, for examples x, a label y from the set {+1, -1}, as y = sign(f(x)) (disregarding the possibility of a zero output for now). When what is required is instead a probability P(y=1|x), but the model does not provide this (or gives bad probability estimates), Platt scaling can be used. This method produces probabilities
- ,
i.e., a logistic transformation of the classifier scores f(x). Note that predictions can now be made according to y = 1 iff P(y=1|x) > ½; if B ≠ 0, the probability estimates contain a correction compared to the old decision function y = sign(f(x)).[3]
The (scalar) parameters A and B are estimated using a maximum likelihood method. The training set for parameter optimization is typically the same as that for the original classifier f. To avoid overfitting to this set, a held-out calibration set or cross-validation can be used, but Platt additionally suggests transforming the labels y to target probabilities
- for positive samples (y = 1), and
- for negative samples, y = -1.
Here, N₊ and N₋ are the number of positive and negative samples, resp. This transformation follows by applying Bayes' rule to a model of out-of-sample data that has a uniform prior over the labels.[1]
Platt himself suggested using the Levenberg–Marquardt algorithm to optimize the parameters, but a Newton algorithm was later proposed that should be more numerically stable.[4]
See also
- Relevance vector machine: probabilistic alternative to the support vector machine
References
- ^ a b Platt, John (1999). "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods" (PDF). Advances in large margin classifiers. 10 (3): 61–74.
- ^ Niculescu-Mizil, Alexandru; Caruana, Rich (2005). Predicting good probabilities with supervised learning (PDF). ICML.
- ^ Olivier Chapelle; Vladimir Vapnik; Olivier Bousquet; Sayan Mukherjee (2002). "Choosing multiple parameters for support vector machines" (PDF). Machine Learning. 46: 131–159.
- ^ Lin, Hsuan-Tien; Lin, Chih-Jen; Weng, Ruby C. (2007). "A note on Platt's probabilistic outputs for support vector machines" (PDF). Machine Learning. 68 (3): 267–276.