跳至內容

大型語言模型列表

維基百科,自由的百科全書

大型語言模型 (LLM) 是一種機器學習模型,專為語言生成等自然語言處理任務而設計。LLM 是具有許多參數的語言模型,並通過對大量文本進行自監督學習進行訓練。

本頁列出了值得注意的大型語言模型。

對於訓練成本一列,1 petaFLOP-day = 1 petaFLOP/sec × 1 天 = 8.64×1019 FLOP。此外,僅列出最大模型的成本。

名稱 發佈日期[a] 開發者 參數量 (十億) [b] 語料庫大小 訓練成本 (petaFLOP-day) 許可證[c] 註解
GPT-1 2018年6月 OpenAI 0.117 1[1] MIT[2] 首個GPT模型,為僅解碼器transformer。 在8個P600GPU上訓練了30天。
BERT 2018年10月 Google 0.340[3] 33億單詞[3] 9[4] Apache 2.0[5] 這是一個早期且有影響力的語言模型。[6] 僅用於編碼器,因此並非為提示或生成而構建。[7] 在 64 個 TPUv2 晶片上訓練耗時 4 天。[8]
T5 2019年10月 Google 11[9] 340億 tokens[9] Apache 2.0[10] 許多Google項目的基礎模型,例如Imagen。[11]
XLNet 2019年6月 Google 0.340[12] 330億單詞 330 Apache 2.0[13] 作為BERT的替代,設計為僅編碼器 。在512個TPU v3晶片上訓練了5.5天。[14]
GPT-2 2019年2月 OpenAI 1.5[15] 40 GB[16] (~100億 tokens)[17] 28[18] MIT[19] 在32個TPU v3晶片上訓練了一周。[18]
GPT-3 2020年5月 OpenAI 175[20] 3000億 tokens[17] 3640[21] Proprietary A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.[22]
GPT-Neo 2021年3月 EleutherAI 2.7[23] 825 GiB[24] MIT[25] The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[25]
GPT-J 2021年6月 EleutherAI 6[26] 825 GiB[24] 200[27] Apache 2.0 GPT-3-style language model
Megatron-Turing NLG 2021年10月 [28] Microsoft and Nvidia 530[29] 338.6 billion tokens[29] 38000[30] Restricted web access Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours.[30]
Ernie 3.0 Titan 2021年12月 Baidu 260[31] 4 Tb Proprietary Chinese-language LLM. Ernie Bot is based on this model.
Claude[32] 2021年12月 Anthropic 52[33] 400 billion tokens[33] beta Fine-tuned for desirable behavior in conversations.[34]
GLaM (Generalist Language Model) 2021年12月 Google 1200[35] 1.6 trillion tokens[35] 5600[35] Proprietary Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
Gopher 2021年12月 DeepMind 280[36] 300 billion tokens[37] 5833[38] Proprietary Later developed into the Chinchilla model.
LaMDA (Language Models for Dialog Applications) 2022年1月 Google 137[39] 1.56T words,[39] 168 billion tokens[37] 4110[40] Proprietary Specialized for response generation in conversations.
GPT-NeoX 2022年2月 EleutherAI 20[41] 825 GiB[24] 740[27] Apache 2.0 based on the Megatron architecture
Chinchilla 2022年3月 DeepMind 70[42] 1.4 trillion tokens[42][37] 6805[38] Proprietary Reduced-parameter model trained on more data. Used in the Sparrow bot. Often cited for its neural scaling law.
PaLM (Pathways Language Model) 2022年4月 Google 540[43] 768 billion tokens[42] 29,250[38] Proprietary Trained for ~60 days on ~6000 TPU v4 chips.[38] 截至2024年10月 (2024-10), it is the largest dense Transformer published.
OPT (Open Pretrained Transformer) 2022年5月 Meta 175[44] 180 billion tokens[45] 310[27] Non-commercial research[d] GPT-3 architecture with some adaptations from Megatron. Uniquely, the training logbook written by the team was published.[46]
YaLM 100B 2022年6月 Yandex 100[47] 1.7TB[47] Apache 2.0 English-Russian model based on Microsoft's Megatron-LM.
Minerva 2022年6月 Google 540[48] 38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server[48] Proprietary For solving "mathematical and scientific questions using step-by-step reasoning".[49] Initialized from PaLM models, then finetuned on mathematical and scientific data.
BLOOM 2022年7月 Large collaboration led by Hugging Face 175[50] 350 billion tokens (1.6TB)[51] Responsible AI Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
Galactica 2022年11月 Meta 120 106 billion tokens[52] 未知 CC-BY-NC-4.0 Trained on scientific text and modalities.
AlexaTM (Teacher Models) 2022年11月 Amazon 20[53] 1.3 trillion[54] proprietary[55] bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI) 2023年2月 Meta AI 65[56] 1.4 trillion[56] 6300[57] Non-commercial research[e] Corpus has 20 languages. "Overtrained" (compared to Chinchilla scaling law) for better performance with fewer parameters.[56]
GPT-4 2023年3月 OpenAI 未知[f] (According to rumors: 1760)[59] 未知 未知 proprietary Available for ChatGPT Plus users and used in several products.
Chameleon 2024年6月 Meta AI 34[60] 4.4 trillion
Cerebras-GPT 2023年3月 Cerebras 13[61] 270[27] Apache 2.0 Trained with Chinchilla formula.
Falcon 2023年3月 Technology Innovation Institute 40[62] 1 trillion tokens, from RefinedWeb (filtered web text corpus)[63] plus some "curated corpora".[64] 2800[57] Apache 2.0[65]
BloombergGPT 2023年3月 Bloomberg L.P. 50 363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets[66] Proprietary Trained on financial data from proprietary sources, for financial tasks.
PanGu-Σ 2023年3月 Huawei 1085 329 billion tokens[67] Proprietary
OpenAssistant[68] 2023年3月 LAION 17 1.5 trillion tokens Apache 2.0 Trained on crowdsourced open data
Jurassic-2[69] 2023年3月 AI21 Labs 未知 未知 Proprietary Multilingual[70]
PaLM 2 (Pathways Language Model 2) 2023年5月 Google 340[71] 3.6 trillion tokens[71] 85,000[57] Proprietary Was used in Bard chatbot.[72]
Llama 2 2023年7月 Meta AI 70[73] 2 trillion tokens[73] 21,000 Llama 2 license 1.7 million A100-hours.[74]
Claude 2 2023年7月 Anthropic 未知 未知 未知 Proprietary Used in Claude chatbot.[75]
Granite 13b 2023年7月 IBM 未知 未知 未知 Proprietary Used in IBM Watsonx.[76]
Mistral 7B 2023年9月 Mistral AI 7.3[77] 未知 Apache 2.0
Claude 2.1 2023年11月 Anthropic 未知 未知 未知 Proprietary Used in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.[78]
Grok-1[79] 2023年11月 xAI 314 未知 未知 Apache 2.0 Used in Grok chatbot. Grok-1 has a context length of 8,192 tokens and has access to X (Twitter).[80]
Gemini 1.0 2023年12月 Google DeepMind 未知 未知 未知 Proprietary Multimodal model, comes in three sizes. Used in the chatbot of the same name.[81]
Mixtral 8x7B 2023年12月 Mistral AI 46.7 未知 未知 Apache 2.0 Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.[82] Mixture of experts model, with 12.9 billion parameters activated per token.[83]
Mixtral 8x22B 2024年4月 Mistral AI 141 未知 未知 Apache 2.0 [84]
DeepSeek LLM 2023年11月29日 DeepSeek 67 2T tokens[85] 12,000}} DeepSeek License Trained on English and Chinese text. 1e24 FLOPs for 67B. 1e23 FLOPs for 7B[85]
Phi-2 2023年12月 Microsoft 2.7 1.4T tokens 419[86] MIT Trained on real and synthetic "textbook-quality" data, for 14 days on 96 A100 GPUs.[86]
Gemini 1.5 2024年2月 Google DeepMind 未知 未知 未知 Proprietary Multimodal model, based on a Mixture-of-Experts (MoE) architecture. Context window above 1 million tokens.[87]
Gemini Ultra 2024年2月 Google DeepMind 未知 未知 未知
Gemma 2024年2月 Google DeepMind 7 6T tokens 未知 Gemma Terms of Use[88]
Claude 3 2024年3月 Anthropic 未知 未知 未知 Proprietary Includes three models, Haiku, Sonnet, and Opus.[89]
Nova頁面存檔備份,存於互聯網檔案館 2024年10月 Rubik's AI頁面存檔備份,存於互聯網檔案館 未知 未知 未知 Proprietary Includes three models, Nova-Instant, Nova-Air, and Nova-Pro.
DBRX 2024年3月 Databricks and Mosaic ML 136 12T Tokens Databricks Open Model License Training cost 10 million USD.
Fugaku-LLM 2024年5月 Fujitsu, Tokyo Institute of Technology, etc. 13 380B Tokens The largest model ever trained on CPU-only, on the Fugaku.[90]
Phi-3 2024年4月 Microsoft 14[91] 4.8T Tokens MIT Microsoft markets them as "small language model".[92]
Granite Code Models 2024年5月 IBM 未知 未知 未知 Apache 2.0
Qwen2 2024年6月 Alibaba Cloud 72[93] 3T Tokens 未知 Qwen License Multiple sizes, the smallest being 0.5B.
DeepSeek V2 2024年6月 DeepSeek 236 8.1T tokens 28,000 DeepSeek License 1.4M hours on H800.[94]
Nemotron-4 2024年6月 Nvidia 340 9T Tokens 200,000 NVIDIA Open Model License Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.[95][96]
Llama 3.1 2024年7月 Meta AI 405 15.6T tokens 440,000 Llama 3 license 405B version took 31 million hours on H100-80GB, at 3.8E25 FLOPs.[97][98]
DeepSeek V3 2024年12月 DeepSeek 671 14.8T tokens 56,000 DeepSeek License 2.788M hours on H800 GPUs.[99]
Amazon Nova 2024年12月 Amazon 未知 未知 未知 Proprietary Includes three models, Nova Micro, Nova Lite, and Nova Pro[100]
DeepSeek R1 2025年1月 DeepSeek 671 未知 未知 MIT No pretraining. Reinforcement-learned upon V3-Base.[101][102]
Qwen2.5 2025年1月 Alibaba 72 18T tokens 未知 Qwen License [103]
MiniMax-Text-01 January 2025 Minimax 456 4.7T tokens[104] 未知 Minimax Model license [105][104]
Gemini 2.0 2025年2月 Google DeepMind 未知 未知 未知 Proprietary Three models released: Flash, Flash-Lite and Pro[106][107][108]

參見

[編輯]

註釋

[編輯]
  1. ^ 這是描述模型架構的文檔首次發佈的日期。
  2. ^ 在許多情況下,研究人員會發佈或報告具有不同尺寸的多個模型版本。在這些情況下,此處會列出最大模型的尺寸。
  3. ^ 這是預訓練模型權重的許可證。在幾乎所有情況下,訓練代碼本身都是開源的或可以輕鬆複製。
  4. ^ The smaller models including 66B are publicly available, while the 175B model is available on request.
  5. ^ Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
  6. ^ As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."[58]

參考資料

[編輯]
  1. ^ Improving language understanding with unsupervised learning. openai.com. June 11, 2018 [2023-03-18]. (原始內容存檔於2023-03-18). 
  2. ^ finetune-transformer-lm. GitHub. [2 January 2024]. (原始內容存檔於19 May 2023). 
  3. ^ 3.0 3.1 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2可免費查閱 [cs.CL]. 
  4. ^ Prickett, Nicole Hemsoth. Cerebras Shifts Architecture To Meet Massive AI/ML Models. The Next Platform. 2021-08-24 [2023-06-20]. (原始內容存檔於2023-06-20). 
  5. ^ BERT. March 13, 2023 [March 13, 2023]. (原始內容存檔於January 13, 2021) –透過GitHub. 
  6. ^ Manning, Christopher D. Human Language Understanding & Reasoning. Daedalus. 2022, 151 (2): 127–138 [2023-03-09]. S2CID 248377870. doi:10.1162/daed_a_01905可免費查閱. (原始內容存檔於2023-11-17). 
  7. ^ Patel, Ajay; Li, Bryan; Rasooli, Mohammad Sadegh; Constant, Noah; Raffel, Colin; Callison-Burch, Chris. Bidirectional Language Models Are Also Few-shot Learners. 2022. arXiv:2209.14500可免費查閱 [cs.LG]. 
  8. ^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2可免費查閱 [cs.CL]. 
  9. ^ 9.0 9.1 Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research. 2020, 21 (140): 1–67 [2025-02-11]. ISSN 1533-7928. arXiv:1910.10683可免費查閱. (原始內容存檔於2024-10-05). 
  10. ^ google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02 [2024-04-04], (原始內容存檔於2024-03-29) 
  11. ^ Imagen: Text-to-Image Diffusion Models. imagen.research.google. [2024-04-04]. (原始內容存檔於2024-03-27). 
  12. ^ Pretrained models — transformers 2.0.0 documentation. huggingface.co. [2024-08-05]. (原始內容存檔於2024-08-05). 
  13. ^ xlnet. GitHub. [2 January 2024]. (原始內容存檔於2 January 2024). 
  14. ^ Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. 2 January 2020. arXiv:1906.08237可免費查閱 [cs.CL]. 
  15. ^ GPT-2: 1.5B Release. OpenAI. 2019-11-05 [2019-11-14]. (原始內容存檔於2019-11-14) (英語). 
  16. ^ Better language models and their implications. openai.com. [2023-03-13]. (原始內容存檔於2023-03-16). 
  17. ^ 17.0 17.1 OpenAI's GPT-3 Language Model: A Technical Overview. lambdalabs.com. 3 June 2020 [13 March 2023]. (原始內容存檔於27 March 2023). 
  18. ^ 18.0 18.1 openai-community/gpt2-xl · Hugging Face. huggingface.co. [2024-07-24]. (原始內容存檔於2024-07-24). 
  19. ^ gpt-2. GitHub. [13 March 2023]. (原始內容存檔於11 March 2023). 
  20. ^ Wiggers, Kyle. The emerging types of language models and why they matter. TechCrunch. 28 April 2022 [9 March 2023]. (原始內容存檔於16 March 2023). 
  21. ^ Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario. Language Models are Few-Shot Learners. May 28, 2020. arXiv:2005.14165v4可免費查閱 [cs.CL]. 
  22. ^ ChatGPT: Optimizing Language Models for Dialogue. OpenAI. 2022-11-30 [2023-01-13]. (原始內容存檔於2022-11-30). 
  23. ^ GPT Neo. March 15, 2023 [March 12, 2023]. (原始內容存檔於March 12, 2023) –透過GitHub. 
  24. ^ 24.0 24.1 24.2 Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. 31 December 2020. arXiv:2101.00027可免費查閱 [cs.CL]. 
  25. ^ 25.0 25.1 Iyer, Abhishek. GPT-3's free alternative GPT-Neo is something to be excited about. VentureBeat. 15 May 2021 [13 March 2023]. (原始內容存檔於9 March 2023). 
  26. ^ GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront. www.forefront.ai. [2023-02-28]. (原始內容存檔於2023-03-09). 
  27. ^ 27.0 27.1 27.2 27.3 Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel. Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster. 2023-04-01. arXiv:2304.03208可免費查閱 [cs.LG]. 
  28. ^ Alvi, Ali; Kharya, Paresh. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model. Microsoft Research. 11 October 2021 [13 March 2023]. (原始內容存檔於13 March 2023). 
  29. ^ 29.0 29.1 Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. 2022-02-04. arXiv:2201.11990可免費查閱 [cs.CL]. 
  30. ^ 30.0 30.1 Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong, DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale, 2022-07-21, arXiv:2201.05596可免費查閱 
  31. ^ Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng. ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. December 23, 2021. arXiv:2112.12731可免費查閱 [cs.CL]. 
  32. ^ Product. Anthropic. [14 March 2023]. (原始內容存檔於16 March 2023). 
  33. ^ 33.0 33.1 Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. A General Language Assistant as a Laboratory for Alignment. 9 December 2021. arXiv:2112.00861可免費查閱 [cs.CL]. 
  34. ^ Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. Constitutional AI: Harmlessness from AI Feedback. 15 December 2022. arXiv:2212.08073可免費查閱 [cs.CL]. 
  35. ^ 35.0 35.1 35.2 Dai, Andrew M; Du, Nan. More Efficient In-Context Learning with GLaM. ai.googleblog.com. December 9, 2021 [2023-03-09]. (原始內容存檔於2023-03-12). 
  36. ^ Language modelling at scale: Gopher, ethical considerations, and retrieval. www.deepmind.com. 8 December 2021 [20 March 2023]. (原始內容存檔於20 March 2023). 
  37. ^ 37.0 37.1 37.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. Training Compute-Optimal Large Language Models. 29 March 2022. arXiv:2203.15556可免費查閱 [cs.CL]. 
  38. ^ 38.0 38.1 38.2 38.3 Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways 互聯網檔案館存檔,存檔日期2023-06-10.
  39. ^ 39.0 39.1 Cheng, Heng-Tze; Thoppilan, Romal. LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything. ai.googleblog.com. January 21, 2022 [2023-03-09]. (原始內容存檔於2022-03-25). 
  40. ^ Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo. LaMDA: Language Models for Dialog Applications. 2022-01-01. arXiv:2201.08239可免費查閱 [cs.CL]. 
  41. ^ Black, Sidney; Biderman, Stella; Hallahan, Eric; et al. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models: 95–136. 2022-05-01 [2022-12-19]. (原始內容存檔於2022-12-10). 
  42. ^ 42.0 42.1 42.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent. An empirical analysis of compute-optimal large language model training. Deepmind Blog. 12 April 2022 [9 March 2023]. (原始內容存檔於13 April 2022). 
  43. ^ Narang, Sharan; Chowdhery, Aakanksha. Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance. ai.googleblog.com. April 4, 2022 [2023-03-09]. (原始內容存檔於2022-04-04) (英語). 
  44. ^ Susan Zhang; Mona Diab; Luke Zettlemoyer. Democratizing access to large-scale language models with OPT-175B. ai.facebook.com. [2023-03-12]. (原始內容存檔於2023-03-12). 
  45. ^ Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke. OPT: Open Pre-trained Transformer Language Models. 21 June 2022. arXiv:2205.01068可免費查閱 [cs.CL]. 
  46. ^ metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq. GitHub. [2024-10-18]. (原始內容存檔於2024-01-24) (英語). 
  47. ^ 47.0 47.1 Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay, YaLM 100B, 2022-06-22 [2023-03-18], (原始內容存檔於2023-06-16) 
  48. ^ 48.0 48.1 Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant. Solving Quantitative Reasoning Problems with Language Models. 30 June 2022. arXiv:2206.14858可免費查閱 [cs.CL]. 
  49. ^ Minerva: Solving Quantitative Reasoning Problems with Language Models. ai.googleblog.com. 30 June 2022 [20 March 2023]. (原始內容存檔於2022-06-30). 
  50. ^ Ananthaswamy, Anil. In AI, is bigger always better?. Nature. 8 March 2023, 615 (7951): 202–205 [9 March 2023]. Bibcode:2023Natur.615..202A. PMID 36890378. S2CID 257380916. doi:10.1038/d41586-023-00641-w. (原始內容存檔於16 March 2023). 
  51. ^ bigscience/bloom · Hugging Face. huggingface.co. [2023-03-13]. (原始內容存檔於2023-04-12). 
  52. ^ Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert. Galactica: A Large Language Model for Science. 16 November 2022. arXiv:2211.09085可免費查閱 [cs.CL]. 
  53. ^ 20B-parameter Alexa model sets new marks in few-shot learning. Amazon Science. 2 August 2022 [12 March 2023]. (原始內容存檔於15 March 2023). 
  54. ^ Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model. 3 August 2022. arXiv:2208.01448可免費查閱 [cs.CL]. 
  55. ^ AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog. aws.amazon.com. 17 November 2022 [13 March 2023]. (原始內容存檔於13 March 2023). 
  56. ^ 56.0 56.1 56.2 Introducing LLaMA: A foundational, 65-billion-parameter large language model. Meta AI. 24 February 2023 [9 March 2023]. (原始內容存檔於3 March 2023). 
  57. ^ 57.0 57.1 57.2 The Falcon has landed in the Hugging Face ecosystem. huggingface.co. [2023-06-20]. (原始內容存檔於2023-06-20). 
  58. ^ GPT-4 Technical Report (PDF). OpenAI. 2023 [March 14, 2023]. (原始內容存檔 (PDF)於March 14, 2023). 
  59. ^ Schreiner, Maximilian. GPT-4 architecture, datasets, costs and more leaked. THE DECODER. 2023-07-11 [2024-07-26]. (原始內容存檔於2023-07-12) (美國英語). 
  60. ^ Dickson, Ben. Meta introduces Chameleon, a state-of-the-art multimodal model. VentureBeat. 22 May 2024 [2025-02-11]. (原始內容存檔於2025-02-11). 
  61. ^ Dey, Nolan. Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models. Cerebras. March 28, 2023 [March 28, 2023]. (原始內容存檔於March 28, 2023). 
  62. ^ Abu Dhabi-based TII launches its own version of ChatGPT. tii.ae. [2023-04-03]. (原始內容存檔於2023-04-03). 
  63. ^ Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. 2023-06-01. arXiv:2306.01116可免費查閱 [cs.CL]. 
  64. ^ tiiuae/falcon-40b · Hugging Face. huggingface.co. 2023-06-09 [2023-06-20]. (原始內容存檔於2023-06-02). 
  65. ^ UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free 互聯網檔案館存檔,存檔日期2024-02-08., 31 May 2023
  66. ^ Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon. BloombergGPT: A Large Language Model for Finance. March 30, 2023. arXiv:2303.17564可免費查閱 [cs.LG]. 
  67. ^ Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun. PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing. March 19, 2023. arXiv:2303.10845可免費查閱 [cs.CL]. 
  68. ^ Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew. OpenAssistant Conversations – Democratizing Large Language Model Alignment. 2023-04-14. arXiv:2304.07327可免費查閱 [cs.CL]. 
  69. ^ Wrobel, Sharon. Tel Aviv startup rolls out new advanced AI language model to rival OpenAI. www.timesofisrael.com. [2023-07-24]. (原始內容存檔於2023-07-24). 
  70. ^ Wiggers, Kyle. With Bedrock, Amazon enters the generative AI race. TechCrunch. 2023-04-13 [2023-07-24]. (原始內容存檔於2023-07-24). 
  71. ^ 71.0 71.1 Elias, Jennifer. Google's newest A.I. model uses nearly five times more text data for training than its predecessor. CNBC. 16 May 2023 [18 May 2023]. (原始內容存檔於16 May 2023). 
  72. ^ Introducing PaLM 2. Google. May 10, 2023 [May 18, 2023]. (原始內容存檔於May 18, 2023). 
  73. ^ 73.0 73.1 Introducing Llama 2: The Next Generation of Our Open Source Large Language Model. Meta AI. 2023 [2023-07-19]. (原始內容存檔於2024-01-05). 
  74. ^ llama/MODEL_CARD.md at main · meta-llama/llama. GitHub. [2024-05-28]. (原始內容存檔於2024-05-28). 
  75. ^ Claude 2. anthropic.com. [12 December 2023]. (原始內容存檔於15 December 2023). 
  76. ^ Nirmal, Dinesh. Building AI for business: IBM's Granite foundation models. IBM Blog. 2023-09-07 [2024-08-11]. (原始內容存檔於2024-07-22) (美國英語). 
  77. ^ Announcing Mistral 7B. Mistral. 2023 [2023-10-06]. (原始內容存檔於2024-01-06). 
  78. ^ Introducing Claude 2.1. anthropic.com. [12 December 2023]. (原始內容存檔於15 December 2023). 
  79. ^ xai-org/grok-1, xai-org, 2024-03-19 [2024-03-19], (原始內容存檔於2024-05-28) 
  80. ^ Grok-1 model card. x.ai. [12 December 2023]. (原始內容存檔於2023-11-05). 
  81. ^ Gemini – Google DeepMind. deepmind.google. [12 December 2023]. (原始內容存檔於8 December 2023). 
  82. ^ Franzen, Carl. Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance. VentureBeat. 11 December 2023 [12 December 2023]. (原始內容存檔於11 December 2023). 
  83. ^ Mixtral of experts. mistral.ai. 11 December 2023 [12 December 2023]. (原始內容存檔於13 February 2024). 
  84. ^ AI, Mistral. Cheaper, Better, Faster, Stronger. mistral.ai. 2024-04-17 [2024-05-05]. (原始內容存檔於2024-05-05). 
  85. ^ 85.0 85.1 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui; Dong, Kai, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, 2024-01-05 [2025-02-11], arXiv:2401.02954可免費查閱, (原始內容存檔於2025-03-29) 
  86. ^ 86.0 86.1 Hughes, Alyssa. Phi-2: The surprising power of small language models. Microsoft Research. 12 December 2023 [13 December 2023]. (原始內容存檔於12 December 2023). 
  87. ^ Our next-generation model: Gemini 1.5. Google. 15 February 2024 [16 February 2024]. (原始內容存檔於16 February 2024). This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens. 
  88. ^ Gemma. [2025-02-11]. (原始內容存檔於2024-02-21) –透過GitHub. 
  89. ^ Introducing the next generation of Claude. www.anthropic.com. [2024-03-04]. (原始內容存檔於2024-03-04). 
  90. ^ Fugaku-LLM/Fugaku-LLM-13B · Hugging Face. huggingface.co. [2024-05-17]. (原始內容存檔於2024-05-17). 
  91. ^ Phi-3. azure.microsoft.com. 23 April 2024 [2024-04-28]. (原始內容存檔於2024-04-27). 
  92. ^ Phi-3 Model Documentation. huggingface.co. [2024-04-28]. (原始內容存檔於2024-05-13). 
  93. ^ Qwen2. GitHub. [2024-06-17]. (原始內容存檔於2024-06-17). 
  94. ^ DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi; Ruan, Chong, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, 2024-06-19 [2025-02-11], arXiv:2405.04434可免費查閱, (原始內容存檔於2025-03-30) 
  95. ^ nvidia/Nemotron-4-340B-Base · Hugging Face. huggingface.co. 2024-06-14 [2024-06-15]. (原始內容存檔於2024-06-15). 
  96. ^ Nemotron-4 340B | Research. research.nvidia.com. [2024-06-15]. (原始內容存檔於2024-06-15). 
  97. ^ "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta. [2025-02-11]. (原始內容存檔於2024-07-24). 
  98. ^ llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models. GitHub. [2024-07-23]. (原始內容存檔於2024-07-23) (英語). 
  99. ^ deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26 [2024-12-26], (原始內容存檔於2025-03-27) 
  100. ^ Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27 [2024-12-27], (原始內容存檔於2025-02-11) 
  101. ^ deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21 [2025-01-21], (原始內容存檔於2025-02-04) 
  102. ^ DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao; Ma, Shirong, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, 2025-01-22 [2025-02-11], arXiv:2501.12948可免費查閱, (原始內容存檔於2025-04-09) 
  103. ^ Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan; Liu, Dayiheng, Qwen2.5 Technical Report, 2025-01-03 [2025-02-11], arXiv:2412.15115可免費查閱, (原始內容存檔於2025-04-01) 
  104. ^ 104.0 104.1 MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao; Guo, Congchao, MiniMax-01: Scaling Foundation Models with Lightning Attention, 2025-01-14 [2025-01-26], arXiv:2501.08313可免費查閱, (原始內容存檔於2025-03-22) 
  105. ^ MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26 [2025-01-26] 
  106. ^ Kavukcuoglu, Koray. Gemini 2.0 is now available to everyone. Google. [6 February 2025]. (原始內容存檔於2025-04-10). 
  107. ^ Gemini 2.0: Flash, Flash-Lite and Pro. Google for Developers. [6 February 2025]. (原始內容存檔於2025-04-10). 
  108. ^ Franzen, Carl. Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search. VentureBeat. 5 February 2025 [6 February 2025]. (原始內容存檔於2025-03-17).