Polynomial kernel
In machine learning, the polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models, that represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables.
Definition
For degree-d polynomials, the polynomial kernel is defined as[1]
where and are vectors in the input space, i.e. vectors of features computed from training or test samples, is a constant trading off the influence of higher-order versus lower-order terms in the polynomial. (A further generalized polykernel divides by a user-specified scalar parameter .[2]) For the case of the quadratic kernel, we have:
From this we see that is the inner product in a feature space induced by the mapping
When the input features are binary-valued (booleans) and , the mapped features correspond to conjunctions of input features.[3] (The terms in the expansion are immaterial.[4])
Practical use
Although the RBF kernel is more popular in SVM classification than the polynomial kernel, the latter is quite popular in natural language processing (NLP).[3][4] The most common degree is d=2, since larger degrees tend to overfit on NLP problems.
Various ways of computing the polynomial kernel (both exact and approximate) have been devised as alternatives to the usual non-linear SVM training algorithms, including:
- full expansion of the kernel prior to training/testing with a linear SVM,[4] i.e. full computation of the mapping
- basket mining (using a variant of the apriori algorithm) for the most commonly occurring feature conjunctions in a training set to produce an approximate expansion[5]
- inverted indexing of support vectors[5][3]
One problem with the polynomial kernel is that may it suffer from numerical instability: when , tends to zero as is increased, whereas when , tends to infinity.[2]
References
- ^ http://www.cs.tufts.edu/~roni/Teaching/CLT/LN/lecture18.pdf
- ^ a b Chih-Jen Lin (2012). Machine learning software: design and practical use. Talk at Machine Learning Summer School, Kyoto.
- ^ a b c Yoav Goldberg and Michael Elhadad (2008). splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications. Proc. ACL-08: HLT.
- ^ a b c Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard and Chih-Jen Lin (2010). Training and testing low-degree polynomial data mappings via linear SVM. J. Machine Learning Research 11:1471–1490.
- ^ a b T. Kudo and Y. Matsumoto (2003). Fast methods for kernel-based text analysis. Proc. ACL 2003.