Tensor (machine learning)
In machine learning, a tensor, is a way of embedding higher dimensional data into a compact form suitable for artificial neural networks. While tensors may be stored in a multi-dimensional array (data type) they have come to be defined and used in ways that are specific to machine learning. Tensors are used to embed images, movies, volumes, sounds, and relationships among words and concepts into neural networks in a unified way.
Tensors have many benefits in machine learning. With tensor decomposition, tensors can be factorized into several smaller tensors which enable networks with many fewer parameters.[1] The computation of gradients, an important aspect of the backpropagation algorithm, can be performed through tensors in a consistent way that enables unified frameworks such as PyTorch[2] and TensorFlow [3]. This generalization allows designers of neural networks to working with blocks that can be composed with implicit mathematical relationships. Operations on tensors can be efficiently computed, since they can be expressed in terms of arrays of linear operators built from matrix multiplication and the Kronecker product. [4]. This has resulted in the adopt of GPUs using CUDA, and dedicated hardware for tensors called a Tensor Processing Unit (Google) or Tensor core (Nvidia). These developments have greatly accelerated neural network architectures and the size and complexity of models that can be learned.
Etymology
The term tensor is a word with several meanings. A mathematical tensor is related to machine learning as a multi-dimensional object, yet the aspect of multilinear maps and vector spaces is not needed for their understanding in the context of machine learning. Similarly the notion of a tensor field in physics and engineering is generally not the way in which tensors are used here. The term tensor was adopted in machine learning through early work that identified the value of tensor decomposition as a way to reduce the number of learned parameters in optimization problems. [5] Due to the unified abstraction of tensors, it is beneficial but not strictly necessary for the machine learning expert to understand tensor decomposition or backpropagation when designing new models.
History
The history of tensors in machine learning builds on the use of vectors and matrices for optimization problems. Tensors appeared in several areas as a form of data reduction.
Natural Language Processing
One of the first use of tensors for machine learning appears in Natural Language Processing, where early networks attempt to learn relationships between concepts in text. A single word is expressed as a vector via Word2vec. Thus a relationship between two words can be encoded in a matrix. However, for more complex relationships such as subject-object-verb, it was necessary to build higher dimensional networks. In 2009, the work of Sutsekver introduced Bayesian Clustered Tensor Factorization to model relational concepts while reducing the parameter space.[6].
Computer Vision
In the area of computer vision, early work by Kunihiko Fukushima in 1980 made use of arrays (matrices) to build the concept of a Convolutional neural network. Since two dimensional images are naturally expressed by matrices, the use of tensors did not appear in this field until later. To work with video it is necessary to embed three or more dimensions in neural networks. In the 2000s, the field of Multilinear subspace learning was established to focus specifically on dimensionality reduction of data sets by using tensors. [7]. Tensors are related in this way to principal component analysis. Tensors came to be used directly in convolutional neural networks to represent images and video [8]. Panagakis et al. survey the use of tensors in computer vision [9].
Hardware
- ^ Sidiropoulous, N (2016). "Tensor Decomposition for Signal Processing and Machine Learning". IEEE Transactions on Signal Processing. 65 (13).
- ^ Paszke, A (2019). "PyTorch: An Imperative Style, High-Performance Deep Learning Library". Proceedings of the 33rd International Conference on Neural Information Processing Systems: 8026–037.
- ^ Adabi, M (2016). "TensorFlow: A System for Large-Scale Machine Learning". Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation.
- ^ Grout, I (2018). "Hardware Considerations for Tensor Implementation and Analysis Using the Field Programmable Gate Array". Electronics. 7 (320).
- ^ Sutskever, I (2009). "Modeling Relational Data using Bayesian Clustered Tensor Factorization". Advanced in Neural Information Processing Systems. 22.
- ^ Sutskever, I (2009). "Modeling Relational Data using Bayesian Clustered Tensor Factorization". Advances in Neural Information Processing Systems. 22.
- ^ Lu, Haiping; Plataniotis, N; Venetsanopoulous, A (2013). Multilinear Subspace Learning. CRC Press.
- ^ Kossaifi, Jean (2019). "T-Net: Parameterizing Fully Convolutional Nets with a Single High-Order Tensor". ArXiv.
- ^ Panagakis, Yannis (2021). "Tensor Methods in Computer Vision and Deep Learning". Proceedings of the IEEE.