Polynomial kernel
In machine learning, the polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models, that represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables.
Definition
For degree-d polynomials, the polynomial kernel is defined as[1]
where and are vectors in the input space, i.e. vectors of features computed from training or test samples, is a constant trading off the influence of higher-order versus lower-order terms in the polynomial. For the case of the quadratic kernel, we have:
From this we see that is the inner product in a feature space induced by the mapping
When the input features are binary-valued (booleans), the mapped features corresponds to conjunctions of input features.[2] (The terms in the expansion are immaterial.[3])
Practical use
Although the RBF kernel is more popular in SVM classification than the polynomial kernel, the latter is quite popular in natural language processing (NLP).[2][3] The most common degree is d=2, since larger degrees tend to overfit on NLP problems.
Various ways of computing the polynomial kernel (both exact and approximate) have been devised as alternatives to the usual non-linear SVM training algorithms, including:
- full expansion of the kernel prior to training/testing with a linear SVM,[3] i.e. full computation of the mapping
- basket mining (using a variant of the apriori algorithm) for the most commonly occurring feature conjunctions in a training set to produce an approximate expansion[4]
- inverted indexing of support vectors[4][2]
References
- ^ http://www.cs.tufts.edu/~roni/Teaching/CLT/LN/lecture18.pdf
- ^ a b c Yoav Goldberg and Michael Elhadad (2008). splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications. Proc. ACL-08: HLT.
- ^ a b c Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard and Chih-Jen Lin (2010). Training and testing low-degree polynomial data mappings via linear SVM. J. Machine Learning Research 11:1471–1490.
- ^ a b T. Kudo and Y. Matsumoto (2003). Fast methods for kernel-based text analysis. Proc. ACL 2003.