Benutzer:Cybjoerk/Retrieval-augmented generation

Dieser Artikel (Retrieval-augmented generation) ist im Entstehen begriffen und noch nicht Bestandteil der freien Enzyklopädie Wikipedia.
Wenn du dies liest:
  • Der Text kann teilweise in einer Fremdsprache verfasst, unvollständig sein oder noch ungeprüfte Aussagen enthalten.
  • Wenn du Fragen zum Thema hast, nimm am besten Kontakt mit dem Autor Cybjoerk auf.
Wenn du diesen Artikel überarbeitest:
  • Bitte denke daran, die Angaben im Artikel durch geeignete Quellen zu belegen und zu prüfen, ob er auch anderweitig den Richtlinien der Wikipedia entspricht (siehe Wikipedia:Artikel).
  • Nach erfolgter Übersetzung kannst du diese Vorlage entfernen und den Artikel in den Artikelnamensraum verschieben. Die entstehende Weiterleitung kannst du schnelllöschen lassen.
  • Importe inaktiver Accounts, die länger als drei Monate völlig unbearbeitet sind, werden gelöscht.

Vorlage:Short description

Retrieval-augmented generation (RAG) is a type of information retrieval process. It modifies interactions with a large language model (LLM) so that it responds to queries with reference to a specified set of documents, using it in preference to information drawn from its own vast, static training data. This allows LLMs to use domain-specific and/or updated information.[1] Use cases include providing chatbot access to internal company data, or giving factual information only from an authoritative source.[2]

The RAG process is made up of four key stages. First, all the data must be prepared and indexed for use by the LLM. Thereafter, each query consists of a retrieval, augmentation and a generation phase.[1]

Indexing

Bearbeiten

The data to be referenced must first be converted into LLM embeddings, numerical representations in the form of large vectors. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs).[1] These embeddings are then stored in a vector database to allow for document retrieval.

 
Overview of RAG process: user input and context from documents are combined into an LLM prompt to get tailored responses

Retrieval

Bearbeiten

Given a user query, a document retriever is first called to select the most relevant documents which will be used to augment the query.[3] This is done by encoding the query as a vector embedding and then comparing it to the vectors of the source documents.[2] This comparison can be done using a variety of methods, which depend in part on the type of indexing used.[1]

Augmentation

Bearbeiten

The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query.[2] Newer implementations (Vorlage:As of) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains, and using memory and self-improvement to learn from previous retrievals.[1]

Generation

Bearbeiten

Finally, the LLM can generate output based on both the query and the retrieved documents.[4] Some models incorporate extra steps to improve output such as the re-ranking of retrieved information, context selection and fine tuning.[1]

Challenges

Bearbeiten

If the external data source is large, retrieval can be slow. The use of RAG does not completely eliminate the general challenges faced by LLMs, including hallucination.[3]

References

Bearbeiten

Vorlage:Reflist

{{compu-ai-stub}} [[Category:Large language models]] [[Category:Natural language processing]] [[Category:Information retrieval systems]]

  1. a b c d e f Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang: Retrieval-Augmented Generation for Large Language Models: A Survey. In: eprint arXiv. 2023, doi:10.48550/arXiv.2312.10997.
  2. a b c What is RAG? - Retrieval-Augmented Generation AI Explained - AWS. In: Amazon Web Services, Inc. Abgerufen am 16. Juli 2024.
  3. a b Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook. In: freeCodeCamp.org. 11. Juni 2024, abgerufen am 16. Juli 2024 (englisch).
  4. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In: Advances in Neural Information Processing Systems. 33. Jahrgang. Curran Associates, Inc., 2020, S. 9459–9474, arxiv:2005.11401 (neurips.cc).