Jump to content

Platt scaling

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Qwertyus (talk | contribs) at 21:36, 10 March 2014 (Description: derivation of the transformed probabilities). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In machine learning, Platt scaling or Platt calibration is a method of transforming the outputs of a classification model into a probability distribution over classes. The method was invented by John Platt in the context of support vector machines,[1] replacing an earlier method by Vapnik, but can be applied to other classification models.[2] Platt scaling works by fitting a logistic regression model to a classifier's scores.

Description

Let f be a real-valued function that is used as a binary classifier to predict, for examples x, a label y from the set {+1, -1}, as y = sign(f(x)) (disregarding the possibility of a zero output for now). When what is required is instead a probability P(y=1|x), but the model does not provide this (or gives bad probability estimates), Platt scaling can be used. This method produces probabilities

,

i.e., a logistic transformation of the classifier scores f(x).

The (scalar) parameters A and B are estimated using a maximum likelihood method. The training set for parameter optimization is typically the same as that for the original classifier f. To avoid overfitting to this set, a held-out calibration set or cross-validation can be used, but Platt additionally suggests transforming the labels y to target probabilities

for positive samples (y = 1), and
for negative samples, y = -1.

Here, N and N are the number of positive and negative samples, resp. This transformation follows by applying Bayes' rule to a model of out-of-sample data that has a uniform prior over the labels.[1]

Platt himself suggested using the Levenberg–Marquardt algorithm to optimize the parameters, but a Newton algorithm was later proposed that should be more numerically stable.[3]

See also

References

  1. ^ a b Platt, John (1999). "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods" (PDF). Advances in large margin classifiers. 10 (3): 61–74.
  2. ^ Niculescu-Mizil, Alexandru; Caruana, Rich (2005). Predicting good probabilities with supervised learning (PDF). ICML.
  3. ^ Lin, Hsuan-Tien; Lin, Chih-Jen; Weng, Ruby C. (2007). "A note on Platt's probabilistic outputs for support vector machines" (PDF). Machine Learning. 68 (3): 267–276.