Tensor sketch
Part of a series on |
Machine learning and data mining |
---|
In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration[1] by obtaining a set of principal variables. Approaches can be divided into feature selection and feature extraction.[2]
In statistics, machine learning and algorithms, a Tensorsketch is a type of Dimensionality reduction that is particularly efficient when applied to vectors that have Tensor structure.
History
The term Tensorsketch was coined in 2013[3] describing a technique by Rasmus Pagh[4] from the same year.
Application to general matrices
Variations
Data Oblivious
- Original version, using FFT
- Dense matrices
- General theorem
- Recursive construction
Data Aware
Applications
Compressed Matrix Multiplication
Compact Multilinear Pooling
Explicit Polynomial Kernels
Kernel methods are popular in Machine Learning as they give the algorithm designed the freedom to design a "feature space" in which to measure the similarity of their data points. A simple kernel-based binary classifier is based on the following computation:
where are the data points, is the label of the th point (either -1 or +1), and is the prediction of the class of . The function is the kernel. Typical examples are the Radial basis function kernel, , and polynomial kernels such as .
When used this way, the kernel method is called "implicit". Sometimes it is faster to do an "explicit" kernel method, in which a pair of functions are found, such that . This allows the above computation to be expressed as
where the value can be computed in advance.
The problem with this method is that the feature space can be very large. That is . For example, for the polynomial kernel $k(x,x') = \rangle x,x'\rangle^3$ we get and , where is the Tensor product and where . If is already large, can be much larger than the number of data points () and so the explicit method is inefficient.
The idea of tensor sketch is that we can compute approximate functions where can even be smaller than , and which still have the property that .
This method was shown in 2020[5] to work even for high degree polynomials and radial basis function kernels.
- ^ Roweis, S. T.; Saul, L. K. (2000). "Nonlinear Dimensionality Reduction by Locally Linear Embedding". Science. 290 (5500): 2323–2326. Bibcode:2000Sci...290.2323R. CiteSeerX 10.1.1.111.3313. doi:10.1126/science.290.5500.2323. PMID 11125150.
- ^ Pudil, P.; Novovičová, J. (1998). "Novel Methods for Feature Subset Selection with Respect to Problem Knowledge". In Liu, Huan; Motoda, Hiroshi (eds.). Feature Extraction, Construction and Selection. p. 101. doi:10.1007/978-1-4615-5725-8_7. ISBN 978-1-4613-7622-4.
- ^ Ninh, Pham; Rasmus, Pagh (2013). Fast and scalable polynomial kernels via explicit feature maps. SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery. doi:10.1145/2487575.2487591.
- ^ Rasmus, Pagh (2013). "Compressed matrix multiplication". ACM Transactions on Computation Theory, August 2013 Article No.: 9. Association for Computing Machinery. doi:10.1145/2493252.2493254.
- ^ {cite conference | title = Oblivious Sketching of High-Degree Polynomial Kernels first1 = Thomas last1 = Ahle first2 = Michael last2 = Kapralov first3 = Jakob last3 = Knudsen first4 = Rasmus last4 = Pagh first5 = Ameya last5 = Velingker first6 = David last6 = Woodruff first7 = Amir last7 = Zandieh | date = 2020 | publisher = Association for Computing Machinery | conference = ACM-SIAM Symposium on Discrete Algorithms |doi = 10.1137/1.9781611975994.9}}