Generativer vortrainierter Transformer

Framework für generative künstliche Intelligenz
Dies ist eine alte Version dieser Seite, zuletzt bearbeitet am 29. April 2023 um 22:28 Uhr durch en>Estadje (Related models and products: added ref). Sie kann sich erheblich von der aktuellen Version unterscheiden.

Vorlage:Short description

The original GPT model

Generative pre-trained transformers (GPT) are a family of large language models (LLMs)[1][2] and a prominent framework for generative artificial intelligence.[3][4] The concept and first such model were introduced in 2018 by the American artificial intelligence organization OpenAI.[5] GPT models are artificial neural networks that are based on the transformer architecture, pre-trained on large datasets of unlabelled text, and able to generate novel human-like text.[2][6] As of 2023, most LLMs have these characteristics[7] and are sometimes referred to broadly as GPTs.[8]

Between 2018 and 2023, OpenAI released four major numbered GPT foundational models, with each being significantly more capable than the previous due to increased size (number of trainable parameters) and training. The most recent of these is GPT-4 (2023), for which OpenAI declined to publish the size or training details, citing "the competitive landscape and the safety implications of large-scale models".[9] These "GPT-n" models have been the basis for various other products and technologies, including models fine-tuned for instruction followingVorlage:Mdashwhich in turn power the ChatGPT chatbot service.[1]

The term "GPT" is also used in the names of some generative LLMs developed by others, such as a series of GPT-3 inspired models created by EleutherAI,[10] and recently a series of seven models created by Cerebras.[11] Major companies in other industries (e.g. sales, finance) also use the term "GPT" in the names of their services involving or utilizing a GPT technology, like "EinsteinGPT" and "BloombergGPT".[12][13]

History

Generative pre-training (GP) was a long-established concept in machine learning applications,[14][15] but the transformer architecture was not available until 2017 when it was invented by Google.[16] That development led to the emergence of large language models like BERT in 2018[17] and XLNet in 2019,[18] which were pre-trained transformers (PT) but not designed to be generative (they were "encoder-only").[19] Also around that time, in 2018, OpenAI published its article entitled "Improving Language Understanding by Generative Pre-Training," in which it introduced the first generative pretrained transformer (GPT) system.[20]

Prior to transformer-based architectures, the best-performing neural NLP (natural language processing) models commonly employed supervised learning from large amounts of manually-labeled data. The reliance on supervised learning limited their use on datasets that were not well-annotated, and also made it prohibitively expensive and time-consuming to train extremely large language models.[20]

The semi-supervised approach OpenAI employed to make a large scale generative systemVorlage:Mdashand was first to do with a transformer modelVorlage:Mdashinvolved two stages: an unsupervised generative "pre-training" stage to set initial parameters using a language modeling objective, and a supervised discriminative "fine-tuning" stage to adapt these parameters to a target task.[20]

Foundational models

A foundational model is an AI model trained on broad data at scale such that it can be adapted to a wide range of downstream tasks.[21] Thus far the most notable GPT foundation models are from OpenAI's numbered "GPT-n" series, the most recent of which is GPT-4.

Notable GPT foundation models
Model Architecture Parameter count Training data Release date
Original GPT (GPT-1)[22] 12-level, 12-headed Transformer decoder (no encoder), followed by linear-softmax. 117 million BookCorpus:[23] 4.5 GB of text, from 7000 unpublished books of various genres. Vorlage:Date table sorting[5]
GPT-2 GPT-1, but with modified normalization 1.5 billion WebText: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit. Vorlage:Date table sorting (initial/limited version) and Vorlage:Date table sorting (full version)[24]
GPT-3 GPT-2, but with modification to allow larger scaling 175 billion 570 GB plaintext, 0.4 trillion tokens. Mostly CommonCrawl, WebText, English Wikipedia, and two books corpora (Books1 and Books2). Vorlage:Date table sorting[25] (then March 15, 2022, for a revision ultimately termed GPT-3.5)
GPT-4 Also trained with both text prediction and RLHF; accepts both text and images as input. Further details are not public.[9] Undisclosed Undisclosed Vorlage:Date table sorting

Foundational GPT models can be used to produce other related systems via additional fine-tuning to further target a particular task and/or domain.

In January 2022, OpenAI introduced "InstructGPT"Vorlage:Mdasha series of models which were fine-tuned to follow instructions using a combination of supervised training and reinforcement learning from human feedback (RLHF) on base GPT-3 language models.[26][27] Advantages this had over the bare foundational models included higher accuracy, less negative/toxic sentiment, and generally better alignment with user needs. Hence, OpenAI began using this as the basis for its API service offerings.[28]

In November 2022, OpenAI launched ChatGPTVorlage:Mdashan online chat interface powered by an instruction-tuned language model trained in a similar fashion to InstructGPT.[29] They trained this model using RLHF, with human AI trainers providing conversations in which they played both the user and the AI. They mixed this new dialogue dataset with the InstructGPT dataset, for a conversational format suitable for a chatbot.

References

Vorlage:Reflist

Vorlage:Differentiable computing Vorlage:Natural language processing Vorlage:OpenAI navbox

  1. a b Mohammed Haddad: How does GPT-4 work and how can you start using it in ChatGPT? In: www.aljazeera.com.
  2. a b Generative AI: a game-changer society needs to be ready for. In: World Economic Forum.
  3. https://pub.towardsai.net/generative-ai-and-future-c3b1695876f2
  4. https://www.computer.org/csdl/magazine/co/2022/10/09903869/1H0G6xvtREk
  5. a b Improving language understanding with unsupervised learning. In: openai.com. Abgerufen am 18. März 2023 (amerikanisches Englisch).
  6. The A to Z of Artificial Intelligence. In: Time. 13. April 2023;.
  7. Rob Toews: The Next Generation Of Large Language Models. In: Forbes.
  8. Joe Mckendrick: Most Jobs Soon To Be 'Influenced' By Artificial Intelligence, Research Out Of OpenAI And University Of Pennsylvania Suggests. Forbes, 13. März 2023;.
  9. a b OpenAI: GPT-4 Technical Report. 2023, abgerufen am 16. März 2023.
  10. EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J.
  11. Vorlage:Cite press release
  12. Ryan Morrison: Salesforce launches EinsteinGPT built with OpenAI technology. In: Tech Monitor. 7. März 2023;.
  13. The ChatGPT of Finance is Here, Bloomberg is Combining AI and Fintech. In: Forbes.
  14. http://cs224d.stanford.edu/papers/maas_paper.pdf
  15. https://www.cambridge.org/core/journals/apsipa-transactions-on-signal-and-information-processing/article/tutorial-survey-of-architectures-algorithms-and-applications-for-deep-learning/023B6ADF962FA37F8EC684B209E3DFAE
  16. https://arxiv.org/abs/1706.03762
  17. https://arxiv.org/abs/1810.04805v2
  18. https://proceedings.neurips.cc/paper_files/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf
  19. https://analyticsindiamag.com/google-introduces-new-architecture-to-reduce-cost-of-transformers/
  20. a b c Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever: Improving Language Understanding by Generative Pre-Training. OpenAI, 11. Juni 2018, S. 12, abgerufen am 23. Januar 2021.
  21. https://hai.stanford.edu/news/introducing-center-research-foundation-models-crfm
  22. https://www.makeuseof.com/gpt-models-explained-and-compared/
  23. Yukun Zhu, Ryan Kiros, Rich Kiros, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler: Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. IEEE International Conference on Computer Vision (ICCV) 2015. 2015, S. 19–27, arxiv:1506.06724 (cv-foundation.org [abgerufen am 7. Februar 2023]).
  24. https://www.theverge.com/2019/11/7/20953040/openai-text-generation-ai-gpt-2-full-model-release-1-5b-parameters
  25. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei: Language Models are Few-Shot Learners. 22. Juli 2020, arxiv:2005.14165v4 (arxiv.org).
  26. Referenzfehler: Ungültiges <ref>-Tag; kein Text angegeben für Einzelnachweis mit dem Namen instructgpt-blog.
  27. Referenzfehler: Ungültiges <ref>-Tag; kein Text angegeben für Einzelnachweis mit dem Namen instructgpt-paper.
  28. https://analyticsindiamag.com/openai-dumps-its-own-gpt-3-for-something-called-instructgpt-and-for-right-reason/
  29. Referenzfehler: Ungültiges <ref>-Tag; kein Text angegeben für Einzelnachweis mit dem Namen chatgpt-blog.