Wikipedia:WikiProject AI Cleanup
| Main page | Discussion | Noticeboard | Guide | Resources | Policies | Research |
| This is a WikiProject, an open group of Wikipedia editors. New participants are welcome; feel free to talk to us!
|
| Wikipedia editors are making a guide to identifying AI-generated writing and the kinds of problems it tends to introduce. Your contributions are welcome! |
Welcome to WikiProject AI Cleanup, a collaboration to combat the increasing problem of unsourced, poorly written AI-generated content on Wikipedia. If you would like to help, add yourself as a participant in the project, inquire on the talk page, and see the to-do list.
Goals
[edit]Since 2022, large language models (LLMs) like GPTs have become a convenient tool for writing at scale. Unfortunately, these models virtually always fail to properly source claims and often introduce errors. Essays like WP:LLM strongly encourage care in using them for editing articles. These are the project's goals:
- To identify text written by AI, and proofread such text to make sure they follow Wikipedia's policies. Any unsourced or likely inaccurate claims should be removed.
- To identify AI-generated images and ensure appropriate usage.
- To help and keep track of AI-using editors who may not realize the deficiencies of AI as a writing tool.
The purpose of this project is not to restrict or ban the use of AI in articles, but to verify that its output is acceptable and constructive, and to fix or remove it otherwise.
Editing advice
[edit]- Tag articles with appropriate templates, remove unsourced information and warn users who add unsourced AI-generated content to articles.
- Articles that are clearly entirely LLM-generated pages without human review can be nominated for speedy deletion under WP:G15.
- Identifying AI-assisted edits is difficult in most cases since the generated text is often indistinguishable from human text. The signs of AI writing page provides a list of characteristics that are associated with text generated by AI chatbots.
- If the text contains phrases like "as an AI model" or "as of my last knowledge update", or if the editor copy-pasted the prompt used to generate the text together with the AI response, the text is almost certainly AI-generated.
- Other indications include the presence of fake references or other obvious AI hallucinations. AI content sometimes takes a promotional tone, reading like a tourism website. Other times, the AI gets confused and will write about a hotel instead of a nearby village.
- AI content detection tools like GPTZero are unreliable and should not be used as the sole means of determining whether text is AI-generated. Given the high rate of false positives, deleting or tagging content purely because it was flagged by an automatic AI detector is not acceptable.
- When missing more precise information, AI will often describe in detail very generic and common features, praising a village for its fertile farmlands, livestock and scenic countryside despite it being in an arid mountain range.
- AI content is not always "unsourced"—sometimes it has real sources that are unrelated to the article's topic, sometimes it creates its own fake sources, and sometimes it uses legitimate sources to create the AI content. Be careful when removing bad AI content not to remove legitimate sources, and always check the cited sources for legitimacy.
- Example: the article Leninist historiography was entirely written by AI and previously included a list of completely fake sources in Russian and Hungarian at the bottom of the page. Google turned up no results for these sources.
- Other example: the article Estola albosignata, about a beetle species, had paragraphs written by AI sourced to actual German and French sources. While the sourced articles were real, they were completely off-topic, with the French one discussing a completely unrelated lifeform.
- Sometimes entire articles are AI-generated, and in such a case, make sure to check that the topic is legitimate and notable. Occasionally, WP:HOAXes have made it onto Wikipedia because AI tools can create fake citations that may appear legitimate.
- Example: the article Amberlihisar was created in January 2023, passed articles for creation, and was not discovered to be entirely fictional until December 2023. It has since now been deleted.
- Text that was present in an article before November 30, 2022 (the release date of ChatGPT) is very unlikely to be AI-generated.
Open tasks
[edit]See Category:Articles containing suspected AI-generated texts for all articles that have been tagged as possibly {{AI-generated}}. The tasks page recommends ways to handle articles, talk page discussions, and sources that use AI-generated content.
Participants
[edit]Primary contacts: Chaotıċ Enby (talk · contribs) • 3df (talk) • Queen of Hearts talk
Feel free to add yourself here!
List of participants
|
|---|
|
Resources
[edit]Essays
[edit]Information
[edit]- AI - Article text generation
- Perennial sources - Large language models
- LLM dungeon, a list of LLM-created articles with bogus sources maintained by JPxG
- LLM demonstration 1 & LLM demonstration 2, experiments with AI and Wikipedia done by JPxG
- AI Images and German Wikipedia
- Academic sources regarding synthetic content
Relevant discussions
[edit]These threads may be useful for editors seeking information about how AI has previously been handled on Wikipedia.
Want to update this table? Try using the visual editor to edit this page.
Project resources
[edit]- List of uses of ChatGPT at Wikipedia
- Articles using ChatGPT as a reference
- AI images in non-AI contexts
- Wikipedia:Signs of AI writing
AI cleanup thread in the Wikimedia discord- Wikipedia:WikiProject AI Cleanup/VWF bot log, an automated log of images categorised as AI/upscaled on Commons which are in use on Wikipedia. It updates every Sunday, using the script at User:DreamRimmer/commonsfileusage.py, and has an ignore list for AI-related articles.