Jump to content

Generative pre-trained transformer

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Mikael Häggström (talk | contribs) at 15:42, 7 February 2023 (Started). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)
The GPT model

Generative pre-trained transformer (or GPT) is a language model trained on a large corpus of text data to generate human-like text. It uses the transformer machine learning model. It can be fine-tuned for various natural language processing tasks such as text generation, language translation and text classification. The "pre-training" in its name refers to the initial training process on a large text corpus, which provides a solid foundation for the model to perform well on downstream tasks with limited amounts of task-specific data.

Uses

History

architecture parameter count training data
GPT-1 12-level, 12-headed Transformer encoder (no decoder), followed by linear-softmax. 0.12 billion BookCorpus[4]: 4.5 GB of text, from 7000 unpublished books of various genres.
GPT-2 GPT-1, but with modified normalization 1.5 billion WebText: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit.
GPT-3 GPT-2, but with modification to allow larger scaling. 175 billion 570 GB plaintext, 0.4 trillion tokens. Mostly CommonCrawl, WebText, English Wikipedia, and two books corpora (Books1 and Books2).

On June 11, 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced the Generative Pre-trained Transformer (GPT).[5] At this point, the best-performing neural NLP models primarily employed supervised learning from large amounts of manually labeled data. This reliance on supervised learning limited their use on datasets that were not well-annotated, in addition to making it prohibitively expensive and time-consuming to train extremely large models;[5][6] many languages (such as Swahili or Haitian Creole) are difficult to translate and interpret using such models due to a lack of available text for corpus-building.[6] In contrast, GPT's "semi-supervised" approach involved two stages: an unsupervised generative "pre-training" stage in which a language modeling objective was used to set initial parameters, and a supervised discriminative "fine-tuning" stage in which these parameters were adapted to a target task.[5]

The use of a transformer architecture, as opposed to previous techniques involving attention-augmented RNNs, provided GPT with a more structured memory than could be achieved through recurrent mechanisms; this resulted in "robust transfer performance across diverse tasks".[5]

During transfer, we utilize task-specific input adaptations derived from traversal-style approaches, which process structured text input as a single contiguous sequence of tokens.[5]

References

  1. ^ Roose, Kevin (5 December 2022). "The Brilliance and Weirdness of ChatGPT". The New York Times. Archived from the original on January 18, 2023. Retrieved 26 December 2022. Like those tools, ChatGPT — which stands for "generative pre-trained transformer" — landed with a splash.
  2. ^ Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 9781544361376. Archived from the original on January 10, 2023. Retrieved 10 January 2023.{{cite book}}: CS1 maint: location missing publisher (link)
  3. ^ Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H; et al. (2022). "BioGPT: generative pre-trained transformer for biomedical text generation and mining". Brief Bioinform. 23 (6). doi:10.1093/bib/bbac409. PMID 36156661.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  4. ^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books": 19–27. {{cite journal}}: Cite journal requires |journal= (help)
  5. ^ a b c d e Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
  6. ^ a b Tsvetkov, Yulia (22 June 2017). "Opportunities and Challenges in Working with Low-Resource Languages" (PDF). Carnegie Mellon University. Archived (PDF) from the original on 31 March 2020. Retrieved 23 January 2021.