Tensor sketch

In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration^[1] by obtaining a set of principal variables. Approaches can be divided into feature selection and feature extraction.^[2]

In statistics, machine learning and algorithms, a Tensorsketch is a type of Dimensionality reduction that is particularly efficient when applied to vectors that have Tensor structure.

History

The term Tensorsketch was coined in 2013^[3] describing a technique by Rasmus Pagh^[4] from the same year.

Application to general matrices

M(\sum _{i}x_{i,1}\otimes \dots \otimes x_{i,k})=\sum _{i}M(x_{i,1}\otimes \dots \otimes x_{i,k})

Variations

Data Oblivious

Original version, using FFT
Dense matrices
General theorem
Recursive construction

Data Aware

Applications

Compressed Matrix Multiplication

Compact Multilinear Pooling

File:Compact Bilinear Pooling.png

Tensorsketches can be used to decrease the number of variables needed when implementing Bilinear Pooling in a neural network.

Explicit Polynomial Kernels

Kernel methods are popular in Machine Learning as they give the algorithm designed the freedom to design a "feature space" in which to measure the similarity of their data points. A simple kernel-based binary classifier is based on the following computation:

{\hat {y}}(\mathbf {x'} )=\operatorname {sgn} \sum _{i=1}^{n}y_{i}k(\mathbf {x} _{i},\mathbf {x'} ),

where $\mathbf {x} _{i}\in \mathbb {R} ^{d}$ are the data points, $y_{i}$ is the label of the $i$ th point (either -1 or +1), and ${\hat {y}}(\mathbf {x'} )$ is the prediction of the class of $\mathbf {x'}$ . The function $k:\mathbb {R} ^{d}\times \mathbb {R} ^{d}\to \mathbb {R}$ is the kernel. Typical examples are the Radial basis function kernel, $k(x,x')=\exp(-\|x-x'\|_{2}^{2})$ , and polynomial kernels such as $k(x,x')=(1+\langle x,x'\rangle )^{2}$ .

When used this way, the kernel method is called "implicit". Sometimes it is faster to do an "explicit" kernel method, in which a pair of functions $f,g:\mathbb {R} ^{d}\to \mathbb {R} ^{D}$ are found, such that $k(x,x')=\langle f(x),g(x')\rangle$ . This allows the above computation to be expressed as

{\hat {y}}(\mathbf {x'} )=\operatorname {sgn} \left\langle \left(\sum _{i=1}^{n}y_{i}f(\mathbf {x} _{i})\right),g(\mathbf {x'} )\right\rangle ,

where the value $\sum _{i=1}^{n}y_{i}f(\mathbf {x} _{i})$ can be computed in advance.

The problem with this method is that the feature space can be very large. That is $D>>d$ . For example, for the polynomial kernel $k(x,x') = \rangle x,x'\rangle^3$ we get $f(x)=x\otimes x\otimes x$ and $g(x')=x'\otimes x'\otimes x'$ , where $\otimes$ is the Tensor product and $f(x),g(x')\in \mathbb {R} ^{D}$ where $D=d^{3}$ . If $d$ is already large, $D$ can be much larger than the number of data points ( $n$ ) and so the explicit method is inefficient.

The idea of tensor sketch is that we can compute approximate functions $f',g':\mathbb {R} ^{d}\to \mathbb {R} ^{t}$ where $t$ can even be smaller than $d$ , and which still have the property that $\langle f'(x),g'(x')\rangle \approx k(x,x')$ .

This method was shown in 2020^[5] to work even for high degree polynomials and radial basis function kernels.

^ Roweis, S. T.; Saul, L. K. (2000). "Nonlinear Dimensionality Reduction by Locally Linear Embedding". Science. 290 (5500): 2323–2326. Bibcode:2000Sci...290.2323R. CiteSeerX 10.1.1.111.3313. doi:10.1126/science.290.5500.2323. PMID 11125150.
^ Pudil, P.; Novovičová, J. (1998). "Novel Methods for Feature Subset Selection with Respect to Problem Knowledge". In Liu, Huan; Motoda, Hiroshi (eds.). Feature Extraction, Construction and Selection. p. 101. doi:10.1007/978-1-4615-5725-8_7. ISBN 978-1-4613-7622-4.
^ Ninh, Pham; Rasmus, Pagh (2013). Fast and scalable polynomial kernels via explicit feature maps. SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery. doi:10.1145/2487575.2487591.
^ Rasmus, Pagh (2013). "Compressed matrix multiplication". ACM Transactions on Computation Theory, August 2013 Article No.: 9. Association for Computing Machinery. doi:10.1145/2493252.2493254.
^ {cite conference | title = Oblivious Sketching of High-Degree Polynomial Kernels first1 = Thomas last1 = Ahle first2 = Michael last2 = Kapralov first3 = Jakob last3 = Knudsen first4 = Rasmus last4 = Pagh first5 = Ameya last5 = Velingker first6 = David last6 = Woodruff first7 = Amir last7 = Zandieh | date = 2020 | publisher = Association for Computing Machinery | conference = ACM-SIAM Symposium on Discrete Algorithms |doi = 10.1137/1.9781611975994.9}}

[1] Roweis, S. T.; Saul, L. K. (2000). "Nonlinear Dimensionality Reduction by Locally Linear Embedding". Science. 290 (5500): 2323–2326. Bibcode:2000Sci...290.2323R. CiteSeerX 10.1.1.111.3313. doi:10.1126/science.290.5500.2323. PMID 11125150.

[2] Pudil, P.; Novovičová, J. (1998). "Novel Methods for Feature Subset Selection with Respect to Problem Knowledge". In Liu, Huan; Motoda, Hiroshi (eds.). Feature Extraction, Construction and Selection. p. 101. doi:10.1007/978-1-4615-5725-8_7. ISBN 978-1-4613-7622-4.

[3] Ninh, Pham; Rasmus, Pagh (2013). Fast and scalable polynomial kernels via explicit feature maps. SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery. doi:10.1145/2487575.2487591.

[4] Rasmus, Pagh (2013). "Compressed matrix multiplication". ACM Transactions on Computation Theory, August 2013 Article No.: 9. Association for Computing Machinery. doi:10.1145/2493252.2493254.

[5] {cite conference | title = Oblivious Sketching of High-Degree Polynomial Kernels first1 = Thomas last1 = Ahle first2 = Michael last2 = Kapralov first3 = Jakob last3 = Knudsen first4 = Rasmus last4 = Pagh first5 = Ameya last5 = Velingker first6 = David last6 = Woodruff first7 = Amir last7 = Zandieh | date = 2020 | publisher = Association for Computing Machinery | conference = ACM-SIAM Symposium on Discrete Algorithms |doi = 10.1137/1.9781611975994.9}}

[1]

[2]

[3]

[4]

[5]