Jump to content

Large language model

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Colin M (talk | contribs) at 15:43, 9 March 2023 (Created page with '{{draft article}} A '''large language model''' (LLM) is a general purpose language model consisting of a neural network with many parameters (i.e. billions of weights or more). LLMs trained on large quantities of unlabelled text perform well at a wide variety of tasks, a development which, since their emergence around 2018, has shifted the focus of natural language processing research away from th...'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)


A large language model (LLM) is a general purpose language model consisting of a neural network with many parameters (i.e. billions of weights or more). LLMs trained on large quantities of unlabelled text perform well at a wide variety of tasks, a development which, since their emergence around 2018, has shifted the focus of natural language processing research away from the previous paradigm of training specialized supervised models for specific tasks.[1]

Properties

Though the term large language model has no formal definition, it generally refers to deep learning models having a parameter count on the order of billions or more.[2] LLMs are general purpose models which excel at a wide range of tasks, as opposed to being trained for one specific task (such as sentiment analysis, named entity recognition, or mathematical reasoning).[1][3]

LLMs also exhibit emergent capabilities not present in smaller language models. One example is the capability to solve "few-shot prompted" tasks, in which the model is given a small number of examples of a particular task in the form of input text and must solve new ones (without undergoing any actual training, in the form of parameter updates). For example, a sentiment analysis task of labelling the sentiment of a movie review could be prompted as follows:[3]

Review: This movie stinks. Sentiment: negative

Review: This movie is fantastic! Sentiment:

If the model outputs "positive", then it has correctly solved the task.

Architecture

Since 2018, large language models have generally used the transformer architecture (whereas, previously, recurrent architectures such as the LSTM were most common).[1]

References

  1. ^ a b c Manning, Christopher D. (2022). "Human Language Understanding & Reasoning". Daedalus.
  2. ^ https://www.usenix.org/system/files/sec21-carlini-extracting.pdf. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)
  3. ^ a b Wei, Jason. "Emergent Abilities of Large Language Models".