Transformer (deep learning architecture)
The Transformer is a deep machine learning model introduced in 2017, used primarily in the field of natural language processing (NLP)[1]. Like recurrent neural networks, Transformers are designed to handle ordered sequences of data, such as natural language. However, unlike RNNs, Transformers do not require that the series be processed in order. So, if the data in question is natural language, the Transformer does not need to process the beginning of a sentence before it processes the end. Due to this feature, the Transformer allows for much more parallelization than RNNs.
Since their introduction, Transformers have become the basic building block of most state-of-the-art architectures in NLP, replacing gated recurrent neural network models such as the long short-term memory (LSTM) in many cases. Since the Transformer architecture facilitates more parallelization during training computations, it has enabled training on much more data than was possible before it was introduced. This led to the development of pretrained systems such as BERT and GPT-2, which have been trained with huge amounts of general language data prior to being released, and can then be fine-tune trained to any specific language task [2][3].
References
- ^ Polosukhin, Illia; Kaiser, Lukasz; Gomez, Aidan N.; Jones, Llion; Uszkoreit, Jakob; Parmar, Niki; Shazeer, Noam; Vaswani, Ashish (2017-06-12). "Attention Is All You Need".
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. Retrieved 2019-08-25.
- ^ "Better Language Models and Their Implications". OpenAI. 2019-02-14. Retrieved 2019-08-25.