Jump to content

Polynomial kernel

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Qwertyus (talk | contribs) at 18:10, 15 November 2012 (Practical use: link to overfitting). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In machine learning, the polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models, that represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables.

Definition

For degree-d polynomials, the polynomial kernel is defined as[1]

where and are vectors in the input space, i.e. vectors of features computed from training or test samples, is a constant trading off the influence of higher-order versus lower-order terms in the polynomial. For the case of the quadratic kernel, we have:

From this we see that is the inner product in a feature space induced by the mapping

When the input features are binary-valued (booleans) and , the mapped features correspond to conjunctions of input features.[2] (The terms in the expansion are immaterial.[3])

Practical use

Although the RBF kernel is more popular in SVM classification than the polynomial kernel, the latter is quite popular in natural language processing (NLP).[2][3] The most common degree is d=2, since larger degrees tend to overfit on NLP problems.

Various ways of computing the polynomial kernel (both exact and approximate) have been devised as alternatives to the usual non-linear SVM training algorithms, including:

  • full expansion of the kernel prior to training/testing with a linear SVM,[3] i.e. full computation of the mapping
  • basket mining (using a variant of the apriori algorithm) for the most commonly occurring feature conjunctions in a training set to produce an approximate expansion[4]
  • inverted indexing of support vectors[4][2]

References

  1. ^ http://www.cs.tufts.edu/~roni/Teaching/CLT/LN/lecture18.pdf
  2. ^ a b c Yoav Goldberg and Michael Elhadad (2008). splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications. Proc. ACL-08: HLT.
  3. ^ a b c Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard and Chih-Jen Lin (2010). Training and testing low-degree polynomial data mappings via linear SVM. J. Machine Learning Research 11:1471–1490.
  4. ^ a b T. Kudo and Y. Matsumoto (2003). Fast methods for kernel-based text analysis. Proc. ACL 2003.