Large language model
A large language model (LLM) is a general purpose language model consisting of a neural network with many parameters (i.e. billions of weights or more). LLMs trained on large quantities of unlabelled text perform well at a wide variety of tasks, a development which, since their emergence around 2018, has shifted the focus of natural language processing research away from the previous paradigm of training specialized supervised models for specific tasks.[1]
Properties
Though the term large language model has no formal definition, it generally refers to deep learning models having a parameter count on the order of billions or more.[2] LLMs are general purpose models which excel at a wide range of tasks, as opposed to being trained for one specific task (such as sentiment analysis, named entity recognition, or mathematical reasoning).[1][3]
LLMs also exhibit emergent capabilities not present in smaller language models. One example is the capability to solve "few-shot prompted" tasks, in which the model is given a small number of examples of a particular task in the form of input text and must solve new ones (without undergoing any actual training, in the form of parameter updates). For example, a sentiment analysis task of labelling the sentiment of a movie review could be prompted as follows:[3]
Review: This movie stinks.
Sentiment: negative
Review: This movie is fantastic!
Sentiment:
If the model outputs "positive", then it has correctly solved the task.
Architecture
Since 2018, large language models have generally used the transformer architecture (whereas, previously, recurrent architectures such as the LSTM were most common).[1]
References
- ^ a b c Manning, Christopher D. (2022). "Human Language Understanding & Reasoning". Daedalus.
- ^ https://www.usenix.org/system/files/sec21-carlini-extracting.pdf.
{{cite journal}}
: Cite journal requires|journal=
(help); Missing or empty|title=
(help) - ^ a b Wei, Jason. "Emergent Abilities of Large Language Models".