Wikipedia:Using neural network language models on Wikipedia

This idea is in the brainstorming stage.
Feel free to add new ideas; improve, clarify and classify the ideas already here; and discuss the merits of these ideas on the talk page.

Shortcut

WP:ChatGPTWP:ChatGPT

Feel free to edit this page! (Visual edit)

With the rise of machine learning, discussions about Wikipedia and AI models are becoming more and more heated. As of December 2022, with the release ChatGPT for free to the public, AI has shown its potential to either massively improve or disrupt Wikipedia. It is clear that research is needed in order to inform discussions surrounding potential AI policies, so I made this page to catalog my observations around ChatGPT and its potential use based on its capabilities. And yes, this page is written entirely by human editors.

NOTICE: Don't use neural networks to generate content, use them to assist you at creating content. Especially in the neural network context, confidence in the result does not mean validity.

Proposed guidelines

Based on my research below, here are my proposed guidelines on how to align neural network models to our purpose of building an encyclopedia. Some of the guidelines are obvious from common sense, but I think it's worth it to write them down.

You may not ask neural networks to write original content and find sources, as these neural networks don't know what is right and wrong. Adding these kind of content would jeopardize Wikipedia's WP:OR and WP:RS policy. Even if it is heavily edited by humans, seek other alternatives that don't use the neural network's original content.
You may use these neural networks as a writing advisor, i.e. asking for outlines, asking how to improve the paragraph, asking for criticism for the text, etc. However, you should be aware that the information it gives to you can be unreliable and flat out wrong. Use due diligence and common sense when choosing whether to incorporate the neural network's suggestion or not.
You may use these neural networks for copyediting and paraphrasing, but note that it may not properly detect grammatical errors or keeping the key information intact. Use due diligence and do heavily edit the response from the neural network.
Use due diligence when crafting prompts for neural networks. Prompt designed for Wikipedia should use natural sentences and be as descriptive as possible, and include keywords such as "encyclopedic", "keep the meaning intact", etc. to minimize the AI from adding original content.
You are responsible for making sure that using neural network will not be disruptive to Wikipedia. Therefore, you must denote whether the edit use the neural network or not, and what do you use it for in the edit summary.

Potential uses

Planning an article

It is no surprise that the bot can give coherent answers since it is based on the earlier GPT-3 model. As many has noted, original content from AI models should not be imported directly to Wikipedia due to sourcing and accuracy concerns. I am very impressed however by the fact that the bot knows about our internal policies and give a reasonable outline about how a WIkipedia article may be structured. It seems ChatGPT uses Wikipedia's policy pages in addition to articles for its dataset.

Based on the results, AI models seem to be a very powerful brainstorming tool, and via prompt engineering, these AI do allow an impressive amount of refinement to the plan. AI can also be a great tool as a pointer to potential sources and can remind editors of Wikipedia's content policy (NPOV, RS, etc.) Even though original content from AI is not suitable for Wikipedia as an import, it can be used by editors as an inspiration for research ideas. In the future, when Abstract Wikipedia becomes a thing, AI tools can be a massive help for organizing information for the planning stage of the article. This research is a bit limited due to the fact that the article SpaceX Starship has already existed when the AI was trained.

Copyediting paragraphs

AI copyediting of Wikipedia text as of 2022 can slightly reduce the work copyeditors need to do. However, human supervision is critical when using such tools. This task heavily relies on prompt engineering in order for the AI to give satisfactory results. For me, I settled with the prompt "Can you copyedit this paragraph from Wikipedia while still keeping the tone and the information as intact as possible:" followed by the paragraph without citations in plain text. There seems to be room for improvement for the prompt as ChatGPT may occasionally give texts that have run-on sentences or grammatical errors, but other than that, the text usually is more clear after a run by the AI.

Even though the AI is conservative at removing information and details, the text's length usually decrease by quite a bit as it removes redundant phrases. The AI is also good at reordering phrases to make the text more coherent, but at the cost of grammar errors and obscuring meaning. In more developed articles, AI seems to give more minor fixes to the text and less inclined to slash out content. In my opinion, ChatGPT can be used on Wikipedia as a coherence checker, as long as care is taken to make sure that no critical information is lost.

I've published these AI generated texts below on Wikipedia after very heavily modified them. Overall, I think that ChatGPT can does reduce the copyedit work quite a bit, but not as much as a lot of people make it out to be. Think of the AI response as a "second opinion" about what to cut, not as an authoritative answer.

Shorten a bloated section

Based on User:JPxG's optimistic result using ChatGPT to condense plot summaries, I tried my hand on trying to condense sections in general, which some articles in Wikipedia:Vital articles are guilty of. I found ChatGPT to be prone of the "garbage in, garbage out" problem; if the text contains a lot of junk and not enough useful details, then it may try to repackage those junk in the result, despite that you have told it explicitly to not do so.

Potential pitfalls

Requesting citations

Use great caution when asking ChatGPT for specific sources. Its neural model will likely respond with very persuasive looking citations, but they generally should not be relied upon without detailed examination. Sometimes, the bot will list a real author along with a fictitious article or book title that looks authentic but is not, and sometimes both the author and the title are invented. On the other hand, major authors and works are known, so if you ask about Chomskyan linguistics, it's going to know about Aspects of the Theory of Syntax and other works.

Countermeasures

One of the main concerns about using these language models is that a person cannot detect whether the text is original or it is written by AI.

Algorithms

In a demo by Hugging Face at [1] (based on RoBERTa), even with a heavily edited paragraph (such as those in § Copyediting paragraphs), the detector can recognize AI text and real text with extremely high confidence (>99%); make sure to remove the reference notes "[1], [2]" beforehand. Such a model can be extremely useful for ORES, a MediaWiki machine learning API primarily used to detect vandalism in Special:RecentChanges. Over time however, these models a hard ertime finding "abnormalities" as AI text generation becomes more sophisticated.