Jump to content

Wikipedia talk:WikiProject AI Cleanup

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

User rapidly creating long bios that GPTZero says are 100% probability AI-generated

[edit]

Please see Special:Contributions/HRShami. I tested the first paragraph of Calin Belta § Career and the first paragraph of David L. Woodruff § Career and got a 100% AI-generated score from GPTZero in both cases, but the likelihood of AI generation is also suggested by the speed at which these articles are being generated. Sourcing quality is poor: many opinions about what the subjects have accomplished, mostly sourced to the publications of the subjects themselves; spot-checking the references in the Woodruff article found that they backed up maybe 1/3 of the claims in the text they purported to be references for. —David Eppstein (talk) 07:34, 27 February 2025 (UTC)[reply]

I have been writing articles pretty much the same way since pre-GPT era. It's a very standard Wikipedia way. The thought of checking my writing against GPTZero did not even occur to me because I absolutely despise AI generated writing. After your message I checked three articles on GPT Zero and it declared "moderately confident that writing is human" and "certainly human writing" on all three. In any writing, if you pick a very small part of it, no machine can tell correctly whether it is AI or human. You must check the whole writing. Even checking single paragraphs of my writing generated "human content" on GPT Zero for most of the paragraphs. If just one paragraph in an article with 8 or 9 paragraph returns AI Generated, with the rest of the paragraphs returning "Human Content", I think we should accept the writing as human content. I don't know what you mean by speed. I have written a total of 10 articles in February and edited one article completely. If I use AI, I can easily generate 10 articles a day. I might have misplaced references in the Woodruff article, which is a human error. Sometimes, other editors point out that the reference is not correct for the preceding information and I fix it with the correct reference. I asked ChatGPT to generate the same Woodruff article. I suggest you do the same. Even after multiple prompts, the article generated by ChatGPT was nowhere near my writing.HRShami (talk) 10:05, 27 February 2025 (UTC)[reply]
Please don't accuse people of using AI based on GPTZero -- it is often wrong, to the point that its wrongness has made the news. Especially, as the person above says, if you only test certain paragraphs. It also might be better to ask first if someone is using AI before making a public accusation -- I don't image you'd like it either if someone called your articles AI-generated. Mrfoogles (talk) 06:07, 26 March 2025 (UTC)[reply]

Old Gods of Appalachia

[edit]

I believe the episode summaries in Old Gods of Appalachia are AI generated. It looks like a large number of summaries were added in a single edit by an editor who has previously been warned for using AI generated content. It looks like someone else has also questioned whether it's AI generated content on the talk page. I'm looking for a second opinion, guidance on what to do, or assistance in cleaning it up. TipsyElephant (talk) 00:17, 16 March 2025 (UTC)[reply]

Some of them definitely sound like AI to me. In the first one alone: The narrative delves into, The prologue highlights the interconnectedness... Chaotic Enby (talk · contribs) 00:58, 16 March 2025 (UTC)[reply]


Likely AI contents scraping, but also likely public relations editing

[edit]

This maybe of interest for members here Wikipedia:Conflict_of_interest/Noticeboard#User_Hifisamurai and https://commons.wikimedia.org/wiki/Special:Log/Hifisamurai Graywalls (talk) 09:24, 16 March 2025 (UTC)[reply]

Chatbot additions to VG (nerve agent)

[edit]

This is being discussed by members of the chemistry project at WT:WikiProject Chemicals#Use of chatbot in VG (nerve_agent) but may be of wider interest. Please comment there, not here. Mike Turnbull (talk) 15:32, 16 March 2025 (UTC)[reply]

Passive or active cleanup?

[edit]

I'm interested and excited to help with this effort. I'm curious how folks here practice AI cleanup. Do you actively look for AI slop or are you passively aware of it while doing other tasks?

I spent some time this AM reviewing Special:RecentChanges expecting to find more instances of potentially AI generated content given the lengthy policy discussions on Village pump. I'm in tune with some of the quirks and language tendencies of popular chat models in other context so I guess I was surprised not to find anything obvious. I'm not an experienced editor by any means... Does anyone have any tips related to visual queues they look for in edit history summaries that merit a closer look? Zentavious (talk) 14:44, 20 March 2025 (UTC)[reply]

I would say I'm doing a mix of passive cleanup (cleaning it up while doing other tasks such as new page patrolling), semi-active cleanup (cleaning articles reported by other users as potentially AI-generated), and behind-the-scenes technical work. Regarding history and edit summary alone, there's often less to work with, but two clues are long, structured edit summaries (often generated by LLMs, although humans can also take care of writing good edit summaries!), and repeated long additions by the same user in a short time, especially on different articles. That last one is particularly telling: if the same editor makes 5000 bytes additions every five minutes, they likely haven't written everything by themselves. Chaotic Enby (talk · contribs) 17:37, 20 March 2025 (UTC)[reply]
Thank you much for the tips. The structured summaries note is a great suggestion. Cheers, Zentavious (talk) 14:29, 25 March 2025 (UTC)[reply]
If you're trying to find suspicious articles more easily, Category:Articles_containing_suspected_AI-generated_texts is a good place to start. In a sense I guess it's a combination of active and passive -- passively, articles are tagged, and people who feel like being active try to fix them. I'm not surprised, given AI isn't that common, that you didn't find much at recent changes, though. Mrfoogles (talk) 06:11, 26 March 2025 (UTC)[reply]
Is the tag intended to only mark AI content that is not acceptable and or constructive? Or is it intended to disclose the use of AI universally, including above the bar AI-assisted edits? Zentavious (talk) 13:49, 27 March 2025 (UTC)[reply]

I'm not sure where the threshold is for the outright removal of AI generated text. At Elkmont, Alabama, an editor has stated--when asked if they are using AI--"I am using something to help me edit the text". I reverted their edit twice, because the tone was extremely formal and out of line with Wikipedia's voice. The input of others would be appreciated! Thanks. Magnolia677 (talk) 15:26, 23 March 2025 (UTC)[reply]

In this case, I would say that WP:NOTEVERYTHING and WP:INDISCRIMINATE apply, and that it is reasonable to revert the edits. I mean, these are all delightful:
  • Farmers were diligently planting corn, with hopes for a bountiful harvest if conditions remained favorable, while wheat and oat crops showed promise. The cotton market was active, and concerns arose over potential losses in the peach crop due to recent frosts
  • T. O. Bridgforth celebrated his 55th birthday with a large family reunion and dinner, which was described as one of the most sumptuous meals enjoyed since the end of a severe drought
  • The article closed with lighthearted local anecdotes, including a humorous mix-up involving a wheelbarrow and an umbrella
but not remotely encyclopedic. There are also some instances of external URLs in the content body, which violates WP:NOELBODY. You might politely point them in the direction of WP:LLM too, and if they must continue to use an LLM assistant, to add well-cited encyclopedic content in smaller chunks, so that each addition may be considered on its own merit. Rather than one huge swathe of text. Cheers, SunloungerFrog (talk) 16:08, 23 March 2025 (UTC)[reply]
Went in and deleted some text with fake citations -- if someone adds unsourced content, you have the right to challenge it, and if they can't source it (and it's not "the sky is blue") then it is reasonable to remove it. I've had that happen to me before (it was annoying but you know, lacking a source, I didn't try to put it back). And at the point where it has fake citations like[11], which could only have been added by an AI, it is definitely reasonable to delete it. Mrfoogles (talk) 06:15, 26 March 2025 (UTC)[reply]
If they continue to add the same unsourced content, that sounds like WP:Disruptive editing. See that page for guidance with how to deal with it. Mrfoogles (talk) 06:16, 26 March 2025 (UTC)[reply]

Free play

[edit]

Do you think that Free play is AI- generated? See Talk:Free play for more context. GenericUser24 (talk) 01:46, 27 March 2025 (UTC)[reply]

It's possible, but it's also possibly a certain sociology/psychology style (that corpus might be where llms gets some of their flair). Both possibilities are likely due to how the article seems to have been written as an essay, rather than built from sources. The resulting tonal issues have already been raised on the talkpage. CMD (talk) 06:03, 27 March 2025 (UTC)[reply]

Listenbourg

[edit]

Two people keep readding AI generated images to the Listenbourg article where the only source for it is two sentences in a single source. Those two details just are there to explain that the name sounds European enough that DALL-E generated vaguely European buildings when prompted with it. Can I please get another person to give their input here? I think it is frankly absurd and stupid that this is even something I have to debate with those two as it very clearly is not relevant to the topic at hand. NineOnLB (talk) 04:48, 28 March 2025 (UTC)[reply]

@NineOnLB: I'll take a look at it. scope_creepTalk 08:13, 30 March 2025 (UTC)[reply]
While I've replied on the merits of the image, I would note that the way you worded this post might be seen as WP:CANVASSING. A more neutral notification would have been ideal, such as "We are having a disagreement on Talk:Listenbourg about whether to include an AI-generated illustration. Can we please get more inputs in the discussion?" Otherwise, {{WikiProject please see}} can generate a pre-written notification message for you. Chaotic Enby (talk · contribs) 11:01, 30 March 2025 (UTC)[reply]
Gotcha, will keep in mind for the future and thank you for that resource. IzzySwag (talk) 13:13, 30 March 2025 (UTC)[reply]

Suspicious Draft:Kushwaha community of nepal

[edit]


This may be irrelevant if the draft never gets accepted, but I wanted to have a closer look as discrepancies in language proficiency between the article and the user's comments on discussion pages have tripped my alarms. I'm already watching this user for other reasons and wondering whether LLM use is yet another concern. The draft has been declined at AFC by Sophisticatedevening, Theroadislong, and DoubleGrazing.

  • Sample article text
  • The Kushwahas share close historical and cultural ties with the Kushwahas of Bihar and Uttar Pradesh in India. Many migrated to Nepal over centuries, bringing with them a rich agricultural tradition. The community traces its lineage to the Suryavanshi dynasty and is traditionally associated with Kshatriya and Vaishya status. They are considered to be descendants of the legendary King Kush, the son of Lord Rama.. Historical records suggest their presence in the Madhesh region predates modern Nepal.
  • Maurya dynasty: Linked to Emperor Chandragupta Maurya.The Kushwaha community traces its lineage to the Mauryan Empire through historical and cultural traditions. They identify as descendants of the Suryavanshi Kshatriyas, particularly linking themselves to Chandragupta Maurya, the founder of the Maurya dynasty. The Mauryas, originally from a farming and warrior background, were believed to have belonged to the (Koiri) or Shakya lineage, which aligns with the Kushwaha identity. Over time, the Kushwahas continued their association with agriculture while maintaining their historical pride in their supposed Mauryan ancestry.
  • One of the most notable Kachhwaha rulers was Maharaja Sawai Jai Singh II, the founder of Jaipur. He was a visionary leader known for his advancements in astronomy, urban planning, and scientific research. Under his reign, Jaipur became a center of knowledge and innovation, featuring well-planned streets, grand palaces, and the famous **Jantar Mantar observatories**. (Markdown formatting copied from an LLM?)
  • Sample source check
  • Jha, Hari Bansh (1993). The Terai Community and National Integration in Nepal. Centre for Economic and Technical Studies. ISBN 978-81-7022-523-2.
  • According to Worldcat and Open Library, this ISBN belongs to Indian library and information science literature, 1990-1991 by Sewa Singh.
  • But a book titled The Terai Community and National Integration in Nepal by Hari Bansh Jha does appear in Worldcat and Google Books.
  • Sharma, Vikram (2015). "The Political Strategies of the Kachhwaha Rajputs". Indian Historical Review. 42 (3): 210–230. doi:10.1177/1234567890. Dodgy DOI. There is an Indian Historical Review and volume 42 does line up with 2015. but it looks like they were publishing only two issues a year (as far as I can tell from Sage via TWL). No matching title for "The Political Strategies of the Kachhwaha Rajputs" in Indian Historical Review, TWL, or Google Scholar.
  • Singh, Rajendra (2010). The Kachhwaha Dynasty: History and Heritage. Oxford University Press. pp. 45–60. ISBN 978-0198066759. Invalid ISBN. No book with this title in Worldcat or Google Books.


My preliminary verdict: could be LLM-style or just lazy puffery, but inconsistent with user's writing in discussion pages; possibly some hallucinated refs. Copyvio unlikely according to Earwig. — ClaudineChionh (she/her · talk · contribs · email · global) 13:01, 29 March 2025 (UTC)[reply]

I'd say there is a very strong possibility. It looks like there was some effort to clean up the formatting as there is no obvious markdown red flags and headings look fine, but the contrast with their comments is super suspicious. I'd run each paragraph individually through GPTzero (I would but I ran out of scans this month), and see if you get any hits. Also, it is super strange (suspicous?) that in one of the earliest versions of it they added From Wikipedia, the free encyclopedia in the lead. If it is more than likely that all of it is AI I'll probably go back and decline it for LLM, and if they resubmit someone else will probably reject it for notability. Sophisticatedevening🍷(talk) 14:12, 29 March 2025 (UTC)[reply]
Also they left this comment not too long ago at the AfC help desk: sir/mam plesae accept it it is for the kuswaha people of nepal not india please Sophisticatedevening🍷(talk) 14:20, 29 March 2025 (UTC)[reply]
Thanks, good to get a second opinion/vibe check on this. And they were spamming the Teahouse about accepting the draft too. ClaudineChionh (she/her · talk · contribs · email · global) 01:36, 30 March 2025 (UTC)[reply]
I agree it looks somewhat generated. The language a bit stilted and artificial like a brochure almost. Who would write like that. But we probably only have a window about 2-3 years before we won't be able to tell. scope_creepTalk 08:12, 30 March 2025 (UTC)[reply]
I agree, there is a big difference between how this draft is written, and how the user communicates on talk pages etc.
Oddly, though, the text (even the original version) has some punctuation, capitalisation, etc. mistakes in it, so if it is AI-generated, then AI may need some remedial English grammar lessons. -- DoubleGrazing (talk) 11:17, 30 March 2025 (UTC)[reply]

WP:UPSD Update

[edit]

Following Wikipedia:Village_pump_(policy)/Archive_201#URLs_with_utm_source=chatgpt.com_codes, I have added detection for possible AI-generated slop to my script.

Possible AI-slop sources will be flagged in orange, thought I'm open to changing that color in the future if it causes issues. If you have the script, you can see it in action on those articles.

For now the list of AI sources is limited to ChatGPT (utm_source=chatgpt.com), but if you know of other chatGPT-like domains, let me know!

Headbomb {t · c · p · b} 22:24, 8 April 2025 (UTC)[reply]

Thanks, this is awesome, I've already found a bunch of garbage to revert. You're probably already aware of this, but there's also a filter for this, Special:AbuseFilter/1346, being trialed. Apocheir (talk) 21:52, 9 April 2025 (UTC)[reply]
Thanks for the EF, I'll add the other AI agents to my script! Headbomb {t · c · p · b} 21:57, 9 April 2025 (UTC)[reply]
@Samwalton9:, I've added m365copilot.com to the EF, since that was listed at Microsoft Copilot. I think I did it right? Headbomb {t · c · p · b} 22:10, 9 April 2025 (UTC)[reply]
If you want, you can take a look at a relevant Phabricator task where I tested out the outputs of a few LLMs to see if any others gave a utm_source parameter, it seems like it is exclusive to ChatGPT. Chaotic Enby (talk · contribs) 22:29, 9 April 2025 (UTC)[reply]