Top-p sampling
Top-p sampling, also called nucleus sampling, is a technique for language model decoding introduced by Ari Holtzman in 2019.[1] Naively sampling the highest probability token at each step in auto-regressive decoding is known to product texts that are repetitive and otherwise unnatural. Top-p sampling avoids this by setting a threshold p and then restricting the sampling to the set of most probable tokens with cumulative probability less than p.
Top-k sampling is similar except that the sample is taken from the k-highest probability tokens regardless of their cumulative probability. The advantage of top-p sampling is that one avoids the difficult problem of choosing the optimal value of k which can very depending on the shape of the output distribution and the particular task and dataset[2].
The top-p sampling technique is used in popular large language model applications like ChatGPT and is implemented in language modeling frameworks like Hugging Face and Cohere[3].
- ^ Holtzman, Ari; Buys, Jan; Du, Li; Forbes, Maxwell; Choi, Yejin (22 April 2019). "The Curious Case of Neural Text Degeneration". Retrieved 23 August 2023.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ McCaffrey, James D. "Nucleus Sampling for Natural Language Processing". Retrieved 23 August 2023.
- ^ von Platen, Patrick. "How to generate text: using different decoding methods for language generation with Transformers". Hugging Face. Retrieved 23 August 2023.