GPT (语言模型)

基于转换器的生成式预训练模型^[1]（英語：Generative pre-trained transformers，GPT）是OpenAI于2018年开始开发^[2]的一系列大型语言模型（LLM）^[3]^[1]^[4]，也是生成式人工智慧的重要框架^[5]^[6]。GPT模型是基于Transformer模型的人工神经网络，在大型未标记文本数据集上进行预训练，并能够生成类似于人类自然语言的文本^[1]^[4]。截至2023年，大多数LLM都具备这些特征^[7]，并广泛被称为GPT^[8]^[9]。

OpenAI发布了具有极大影响力的GPT基础模型，它们按顺序编号，构成了“GPT-n”系列^[10]。由于其规模（可训练参数数量）和训练程度的提升，每个模型相较于前一个都显著增强。其中最新的模型是GPT-4，于2023年3月发布。这些模型为更具任务特定性的GPT系统奠定了基础，包括经过微调以适应特定指令的模型——而这些又反过来为ChatGPT 聊天机器人服务提供了支持^[3] 。

术语“GPT”还用于命名和描述其他开发者所开发的模型。例如，其他GPT基础模型包括EleutherAI（英语：EleutherAI）开发的一系列模型^[11]，以及Cerebras（英语：Cerebras）开发的七个模型^[12]。此外，不同行业的公司还在其各自领域开发了执行特定任务的GPT，例如赛富时的“EinsteinGPT”（用于客户关系管理）^[13]和彭博的“BloombergGPT”（用于金融领域）^[14]。

历史

初步发展

生成式预训练（Generative pretraining，简称GP）是机器学习应用中一个历史悠久的概念^[15]^[16]，但直到2017年，Google的员工发明了Transformer模型^[17]，这才使得大型语言模型如BERT（2018年）^[18]和XLNet（2019年）成为可能^[19]，这些模型是预训练的转换器（Pre-trained transformers，简称PT），但未被设计为生成式，而是“仅编码器”（encoder-only）^[20]。2018年，OpenAI发表了题为《通过生成式预训练提高语言理解能力》（Improving Language Understanding by Generative Pre-Training）的文章，在其中首次介绍了基于转换器的生成式预训练模型（GPT）系统（“GPT-1”）^[21]。

在基于转换器的架构出现之前，表现最优秀的神经自然语言处理（NLP）模型通常通过大量手动标记的数据进行监督学习。这种依赖于监督学习的开发途径限制了在未经充分标记的数据集上的应用，并且使得训练极大型语言模型相当耗时且开支非常昂贵^[21]。

OpenAI采用半监督学习方法来构建大规模生成式系统，同时也是首个使用Transformer模型的方法。该方法包括两个阶段：无监督的生成式“预训练”阶段，使用目标函数来设置初始参数；以及有监督的判别式“微调（英语：fine-tuning (machine learning)）”阶段，将这些参数在目标任务上进行微调^[21]。

应用

ChatGPT (Chat Generative Pre-trained Transformer，基于转换器的交互式生成式预训练模型^[1]^[4]）是由OpenAI于2022年11月30日发布的一款聊天机器人。它采用的是GPT-3.5，应用了“基于人类反馈的强化学习方案”（Reinforcement Learning from Human Feedback，RLHF）。
BioGPT是由微软开发的^[22]一种专注于生物医学领域的GPT模型。^[23]
ProtGPT2是一种专注于蛋白质研究的GPT模型。^[24]

基础模型

GPT版本历史
	参数数量	训练数据
GPT-1	1.2亿	BookCorpus^[25]：是一个包含7000本未出版书籍的语料库，总大小为4.5 GB。这些书籍涵盖了各种不同的文学流派和主题。
GPT-2	15亿	WebText：一个包含八百万个文档的语料库，总大小为40 GB。这些文本是从Reddit上投票最高的4,500万个网页中收集的，包括各种主题和来源，例如新闻、论坛、博客、维基百科和社交媒体等。
GPT-3	1750亿	一个总大小为570 GB的大规模文本语料库，其中包含约四千亿个标记。这些数据主要来自于CommonCrawl、WebText、英文维基百科和两个书籍语料库（Books1和Books2）。

參考資料

^ ^1.0 ^1.1 ^1.2 ^1.3 冯志伟. 冯志伟教授聊ChatGPT. 中国科技术语. [2023-02-27]. （原始内容存档于2023-02-27） –通过微信公众平台. 引用错误：带有name属性“:0”的<ref>标签用不同内容定义了多次
^ 引用错误：没有为名为gpt1的参考文献提供内容
^ ^3.0 ^3.1 Haddad, Mohammed. How does GPT-4 work and how can you start using it in ChatGPT?. www.aljazeera.com.
^ ^4.0 ^4.1 ^4.2 The A to Z of Artificial Intelligence. Time. April 13, 2023. 引用错误：带有name属性“:4”的<ref>标签用不同内容定义了多次
^ Hu, Luhui. Generative AI and Future. Medium. November 15, 2022.
^ CSDL | IEEE Computer Society. www.computer.org.
^ Toews, Rob. The Next Generation Of Large Language Models. Forbes.
^ Toews, Rob. The Next Generation Of Large Language Models. Forbes.
^ Mckendrick, Joe. Most Jobs Soon To Be 'Influenced' By Artificial Intelligence, Research Out Of OpenAI And University Of Pennsylvania Suggests. Forbes. March 13, 2023.
^ GPT-1 to GPT-4: Each of OpenAI's GPT Models Explained and Compared. MUO. April 11, 2023.
^ Alford, Anthony. EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J. InfoQ. July 13, 2021.
^ News (新闻稿).
^ Morrison, Ryan. Salesforce launches EinsteinGPT built with OpenAI technology. Tech Monitor. 7 March 2023.
^ The ChatGPT of Finance is Here, Bloomberg is Combining AI and Fintech. Forbes.
^ Hinton (et-al), Geoffrey. Deep neural networks for acoustic modeling in speech recognition (PDF). IEEE Signal Processing Magazine. October 15, 2012,. Digital Object Identifier 10.1109/MSP.2012.2205597. S2CID 206485943. doi:10.1109/MSP.2012.2205597.
^ Deng, Li. A tutorial survey of architectures, algorithms, and applications for deep learning | APSIPA Transactions on Signal and Information Processing | Cambridge Core. Apsipa Transactions on Signal and Information Processing (Cambridge.org). 2014-01-22, 3: e2 [2023-05-21]. S2CID 9928823. doi:10.1017/atsip.2013.9.
^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia. Attention Is All You Need. December 5, 2017. arXiv:1706.03762 .
^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. May 24, 2019. arXiv:1810.04805v2 .
^ Yang (et-al), Zhilin. XLNet (PDF). Proceedings from NeurIPS 2019. 2019.
^ Naik, Amit Raja. Google Introduces New Architecture To Reduce Cost Of Transformers. Analytics India Magazine. September 23, 2021.
^ ^21.0 ^21.1 ^21.2 Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya. Improving Language Understanding by Generative Pre-Training (PDF). OpenAI: 12. 11 June 2018 [23 January 2021]. （原始内容存档 (PDF)于26 January 2021）.
^ Matthias Bastian. BioGPT is a Microsoft language model trained for biomedical tasks. The Decoder. 2023-01-29 [2023-02-27]. （原始内容存档于2023-02-07）.
^ Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H; et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining.. Brief Bioinform. 2022, 23 (6). PMID 36156661. doi:10.1093/bib/bbac409.
^ Ferruz, N., Schmidt, S. & Höcker, B.; et al. ProtGPT2 is a deep unsupervised language model for protein design.. Nature Communications volume. 2022, 13 [2023-02-27]. doi:10.1038/s41467-022-32007-7. （原始内容存档于2023-02-08）.
^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books: 19–27. 2015 [2023-02-27]. （原始内容存档于2023-02-05）.

[:0-1] 1.0 ^1.1 ^1.2 ^1.3 冯志伟. 冯志伟教授聊ChatGPT. 中国科技术语. [2023-02-27]. （原始内容存档于2023-02-27） –通过微信公众平台. 引用错误：带有name属性“:0”的<ref>标签用不同内容定义了多次

[gpt1-2] 引用错误：没有为名为gpt1的参考文献提供内容

[:1-3] 3.0 ^3.1 Haddad, Mohammed. How does GPT-4 work and how can you start using it in ChatGPT?. www.aljazeera.com.

[:4-4] 4.0 ^4.1 ^4.2 The A to Z of Artificial Intelligence. Time. April 13, 2023. 引用错误：带有name属性“:4”的<ref>标签用不同内容定义了多次

[5] Hu, Luhui. Generative AI and Future. Medium. November 15, 2022.

[6] CSDL | IEEE Computer Society. www.computer.org.

[7] Toews, Rob. The Next Generation Of Large Language Models. Forbes.

[8] Toews, Rob. The Next Generation Of Large Language Models. Forbes.

[9] Mckendrick, Joe. Most Jobs Soon To Be 'Influenced' By Artificial Intelligence, Research Out Of OpenAI And University Of Pennsylvania Suggests. Forbes. March 13, 2023.

[10] GPT-1 to GPT-4: Each of OpenAI's GPT Models Explained and Compared. MUO. April 11, 2023.

[:9-11] Alford, Anthony. EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J. InfoQ. July 13, 2021.

[:10-12] News (新闻稿).

[13] Morrison, Ryan. Salesforce launches EinsteinGPT built with OpenAI technology. Tech Monitor. 7 March 2023.

[14] The ChatGPT of Finance is Here, Bloomberg is Combining AI and Fintech. Forbes.

[15] Hinton (et-al), Geoffrey. Deep neural networks for acoustic modeling in speech recognition (PDF). IEEE Signal Processing Magazine. October 15, 2012,. Digital Object Identifier 10.1109/MSP.2012.2205597. S2CID 206485943. doi:10.1109/MSP.2012.2205597.

[16] Deng, Li. A tutorial survey of architectures, algorithms, and applications for deep learning | APSIPA Transactions on Signal and Information Processing | Cambridge Core. Apsipa Transactions on Signal and Information Processing (Cambridge.org). 2014-01-22, 3: e2 [2023-05-21]. S2CID 9928823. doi:10.1017/atsip.2013.9.

[17] Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia. Attention Is All You Need. December 5, 2017. arXiv:1706.03762 .

[18] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. May 24, 2019. arXiv:1810.04805v2 .

[19] Yang (et-al), Zhilin. XLNet (PDF). Proceedings from NeurIPS 2019. 2019.

[20] Naik, Amit Raja. Google Introduces New Architecture To Reduce Cost Of Transformers. Analytics India Magazine. September 23, 2021.

[gpt1paper-21] 21.0 ^21.1 ^21.2 Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya. Improving Language Understanding by Generative Pre-Training (PDF). OpenAI: 12. 11 June 2018 [23 January 2021]. （原始内容存档 (PDF)于26 January 2021）.

[22] Matthias Bastian. BioGPT is a Microsoft language model trained for biomedical tasks. The Decoder. 2023-01-29 [2023-02-27]. （原始内容存档于2023-02-07）.

[pmid36156661-23] Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H; et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining.. Brief Bioinform. 2022, 23 (6). PMID 36156661. doi:10.1093/bib/bbac409.

[24] Ferruz, N., Schmidt, S. & Höcker, B.; et al. ProtGPT2 is a deep unsupervised language model for protein design.. Nature Communications volume. 2022, 13 [2023-02-27]. doi:10.1038/s41467-022-32007-7. （原始内容存档于2023-02-08）.

[25] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books: 19–27. 2015 [2023-02-27]. （原始内容存档于2023-02-05）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]