Wikipedia talk:Large language models/Archive 7
![]() | This is an archive of past discussions on Wikipedia:Large language models. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | ← | Archive 5 | Archive 6 | Archive 7 |
LLM Experiment - Sources and unsourced information helper
I conducted an experiment to see if LLMs (Chat-GPT4) could identify sourced and unsourced information in a Wikipedia article. The experiment can be viewed step-by-step here. The Weather Event Writer (Talk Page) 03:05, 21 September 2023 (UTC)
- Hello WeatherWriter and thanks for sharing your experiment. I think there is some potential in using LLMs to help with verification. One note regarding your experiment: the amount of history of which ChatGPT is aware is limited. This is the case even if you tell it to "Keep track of this source". So as the chat grows longer, it will forget the earlier steps in the exchange, which contain the text of the sources. But it does not tell you this and may attempt to answer your question nonetheless, probably with hallucinations. This issue becomes more serious with very long sources or very many sources. One way to solve this problem is to restrict oneself to the verification of one claim and one source a time and start a new chat for each new claim/source. Another issue is that your process only tells you which sources verify a claim but does not cite the passage that verifies it. So you would have to trust that it is not a hallucination instead of being able to check it for yourself.
- For a user script that implements a similar idea, see User:Phlsph7/SourceVerificationAIAssistant. Phlsph7 (talk) 08:17, 21 September 2023 (UTC)
- Great work! I did some follow up work with the 32k version of GPT4. Could fit the texts of all of the RS in and pose a single prompt for find unsupported material. See User_talk:WeatherWriter/LLM_Experiment_1 for more details. Nowa (talk) 07:38, 22 September 2023 (UTC)
"Wikipedia:CHATGPT" listed at Redirects for discussion
The redirect Wikipedia:CHATGPT has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2023 October 2 § Wikipedia:CHATGPT until a consensus is reached. - CHAMPION (talk) (contributions) (logs) 10:22, 2 October 2023 (UTC)
Information page
Based on the discussion above, #RfC: Is this proposal ready to be promoted?, it is clear that this page will not be promoted either to policy or to guideline. However, the RfC didn't really specify what would happen to the page if it isn't promoted. In the absence of such clarity, I am considering swapping out the {{Proposal}} template with {{Information page}}. (An information page is a kind of project-space page that "intends to describe some aspect(s) of Wikipedia's norms, customs, technicalities, or practices", and it doesn't require much consensus for a page to be labelled as such.) Would there be any objection to that? It would be a softer template than something like {{failed}}, and I'm not sure if we have enough consensus to use {{supplement}}. Mz7 (talk) 03:43, 7 October 2023 (UTC)
- I would object because this page doesn't describe any aspect of Wikipedia's norms, customs, etc. I would suggest it be labeled an essay (not a failed proposal), because it conveys the opinion of some editors on an issue relating to Wikipedia, and that's what an essay is. Levivich (talk) 05:34, 7 October 2023 (UTC)
- I have no objection to {{essay}}. Mz7 (talk) 19:53, 7 October 2023 (UTC)
Done: switched {{proposal}} to {{essay}} in the absence of objections. [1] The "essay" label doesn't really require much consensus to use, anyway, and as indicated above, it doesn't seem like there is consensus to use a different label for this page. Mz7 (talk) 21:37, 9 October 2023 (UTC)
- Essay should be fine. Cheers, · · · Peter Southwood (talk): 12:11, 12 October 2023 (UTC)
- I would prefer {{failed proposal}} to essay as long as the page remains substantially unchanged, since it was drafted as policy, is worded like one, and risks being brought up as semi-authoritative in the same way many essays are. If someone decides to substantially rewrite it into a more typical essay format, no objection to the essay tag being brought back. DFlhb (talk) 19:54, 14 October 2023 (UTC)
- I took a first step to make it more essay-like by using less authoritative language. I tried to keep the changes to a minimum: most changes involve replacing expressions like "must" with "should". Phlsph7 (talk) 07:52, 15 October 2023 (UTC)
- I have no objection to {{essay}}. Mz7 (talk) 19:53, 7 October 2023 (UTC)
Paper
Good paper on the topic of LLM assisted writing and the kind of tools we might like to create if there are any software folk here. https://arxiv.org/pdf/2309.15337.pdf Talpedia 14:56, 9 November 2023 (UTC)
- Copyright on that seems to be 2018, btw. - Dank (push to talk) 15:07, 9 November 2023 (UTC)
- Hmm... odd the paper says "7 Sep 2023" in the margin and is using gpt-4 (which was released this year). Talpedia 18:02, 9 November 2023 (UTC)
AI being used by reliable sources
@ActivelyDisinterested has suggested a list in the Wikipedia: project namespace to collect notes on sources that we have considered reliable in the past, but which seem to be using LLM-generated content. Please see Wikipedia talk:Reliable sources#Use of AI content generation by previously reliable sources if you're interested in this subject and/or would like to help compile the list. WhatamIdoing (talk) 22:29, 1 December 2023 (UTC)
You are invited to join the discussion at Wikipedia:Templates for discussion/Log/2023 December 4 § Template:OpenAI. –Novem Linguae (talk) 05:32, 4 December 2023 (UTC)
Discussion at Wikipedia:Templates for discussion/Log/2023 December 4 § Template:AI-generated notification
You are invited to join the discussion at Wikipedia:Templates for discussion/Log/2023 December 4 § Template:AI-generated notification. –Novem Linguae (talk) 05:32, 4 December 2023 (UTC)
Drafts for 15,616 genes, only 9% wrong
- As a demonstration of the potential for AI to impact scientific practice, we use WikiCrow to generate draft articles for the 15,616 human protein-coding genes that currently lack Wikipedia articles, or that have article stubs. WikiCrow creates articles in 8 minutes, is much more consistent than human editors at citing its sources, and makes incorrect inferences or statements about 9% of the time, a number that we expect to improve as we mature our systems.
https://www.futurehouse.org/wikicrow
Welp. Sandizer (talk) 18:41, 13 December 2023 (UTC)
Estimating risk
Raw chatgpt assist
|
---|
Question to ChatGPT: "write a list of activities that can be assisted by LLMs, to write encyclopedias, in order of riskiness". ChatGPT's response appeared to have errors e.g. drafting likely more risky than language refinement, but ChatGPT said the opposite. "Assisting in the generation of citations and references" is almost halluncinatory? ChatGPT response:Sure, it's important to note that language models like GPT-3.5, often referred to as large language models (LLMs), can be powerful tools but may not always provide accurate or up-to-date information. The riskiness of using LLMs for certain activities can vary, and it's crucial to verify the information obtained from these models. Here's a list of activities related to writing encyclopedias, ordered from lower to higher perceived risk: Gathering Basic Information: Summarizing established facts and information. Compiling lists of well-known data. Creating Outlines: Generating initial outlines for encyclopedia articles. Providing structure and key points for various topics. Generating Drafts: Assisting in drafting sections of encyclopedia entries. Offering ideas and context for historical events or scientific concepts. Language Refinement: Improving the readability and coherence of drafted content. Suggesting alternative phrasing for clarity. Explaining Concepts: Providing explanations for complex concepts in simple terms. Offering background information on specific subjects. Citing Sources and References: Assisting in the generation of citations and references. Helping with proper attribution of information. Fact-Checking: Cross-referencing information with reliable sources. Verifying the accuracy of historical or scientific details. In-Depth Research Assistance: Assisting in more comprehensive research on specific topics. Offering insights into less well-known or niche subjects. Handling Controversial Topics: Providing information on controversial issues with caution. Giving balanced perspectives and highlighting multiple viewpoints. Ethical Considerations: Assessing and addressing ethical concerns in content creation. Ensuring that information presented is unbiased and fair. Up-to-Date Information: Verifying the recency of information to ensure it is current. Handling topics that may evolve rapidly with care. Legal and Sensitivity Issues: Avoiding the generation of content that may violate laws or ethical guidelines. Exercising caution when dealing with sensitive or potentially harmful topics. Remember that while LLMs can be valuable tools, they are not infallible, and human oversight is essential. Critical evaluation, fact-checking, and ensuring compliance with ethical and legal standards should always be part of the content creation process. |
Grateful for corrections. This list, bot-assisted, has been revised by a human. It estimates riskiness from least to most.
- For talk pages
- Gathering Basic Information. Summarizing established facts and information. Compiling lists of well-known data.
- Generating outlines for articles. Providing structure and key points.
- Offering ideas and context for historical events or scientific concepts.
- In-Depth Research Assistance: Assisting in more comprehensive research on specific topics. Offering insights into less well-known or niche subjects.
- Main space
- Spell-checking
- Language Refinement: Improving readability and coherence of drafted content. Suggesting alternative phrasing for clarity. Copyedit, reformulate, simplify.
- Explaining Concepts: Providing explanations for complex concepts in simple terms. Offering background information on specific subjects.
- Suggest images
- Suggest wikilinks
- Generating summaries e.g. lead summaries or other summaries
- Suggest expansion
- Higher risk:
- Generating Drafts: Assisting in drafting sections of entries.
- Fact-Checking: Cross-referencing information with reliable sources. Verifying the accuracy of historical or scientific details.
- Up-to-Date Information: Verifying the recency of information to ensure it is current. Handling topics that may evolve rapidly with care.
- Handling Controversial Topics: Providing information on controversial issues with caution. Giving balanced perspectives and highlighting multiple viewpoints.
- Citing Sources and References: Assisting in the generation of citations and references. Helping with proper attribution of information.
- Ethical Considerations: Assessing and addressing ethical concerns in content creation. Ensuring that information presented is unbiased and fair.
- Legal and Sensitivity Issues
Tom B (talk) 16:23, 20 December 2023 (UTC)
Discussion at Wikipedia talk:WikiProject AI Cleanup § Proposal: adopting WP:LLM as this WikiProject's WP:ADVICEPAGE
You are invited to join the discussion at Wikipedia talk:WikiProject AI Cleanup § Proposal: adopting WP:LLM as this WikiProject's WP:ADVICEPAGE. QueenofHearts 21:37, 9 January 2024 (UTC)
Someone implemented a full-on article generator, and Anthropic gave it an award
Blog post description, GitHub repo, based on this Stanford work, which also has a repo, and a live working demo.
Have people noticed those kind of articles? The outline structure is more distinctive and flamboyant than we usually see from human editors. 141.239.252.245 (talk) 05:43, 26 April 2024 (UTC)
- Just saw this. Taking a look now. Thanks for posting. Nowa (talk) 12:07, 9 November 2024 (UTC)
- I took a look at the tool and in its current state, it is unsuitable for drafting Wikipedia articles. The big problem is the text in the drafted articles is not necessarily supported by the cited references. Here is an example. See my comments at the bottom of the article. Another problem is that the LLM engages in wp:synth where it takes two facts stated in the references and extrapolates a conclusion from them. You can see an example of synth here. Again, scroll to my comments at the bottom.Nowa (talk) 23:24, 10 November 2024 (UTC)
- Using the app a bit more, it's actually a helpful "search engine". The references are relevant to the topic of concern. So it might be useful for getting references for a Wikipedia article. Here is an example. Nowa (talk) 21:48, 11 November 2024 (UTC)
- I took a look at the tool and in its current state, it is unsuitable for drafting Wikipedia articles. The big problem is the text in the drafted articles is not necessarily supported by the cited references. Here is an example. See my comments at the bottom of the article. Another problem is that the LLM engages in wp:synth where it takes two facts stated in the references and extrapolates a conclusion from them. You can see an example of synth here. Again, scroll to my comments at the bottom.Nowa (talk) 23:24, 10 November 2024 (UTC)
Opening sentence
There seems to be something wrong with the opening sentence. I can't parse it:
- "While large language models (often known as "chatbots") are very useful, machine-generated text (like human-generated) often contains errors, is useless, whilst seeming accurate."
Am I missing something? Or does it need editing? AndyJones (talk) 12:35, 28 August 2024 (UTC)
- Yeah it was made not-very-sensical in Special:Diff/1230068490
—Alalch E. 14:36, 28 August 2024 (UTC)− While [[large language model]]s(colloquiallytermed"AIchatbots"insomecontexts)canbevery useful, machine-generated text(muchlikehuman-generatedtext)cancontainerrorsorflaws,orbeoutrightuseless.+ While [[large language model]]s (often known as "chatbots") are very useful, machine-generated text (like human-generated) often contains errors, is useless, whilst seeming accurate.- I have reverted the above mentioned change. —Alalch E. 21:37, 29 August 2024 (UTC)
Ban it. Full stop.
I was shocked and horrified to see a banner on an article announcing that it may contain "hallucinations" due to use of LLM content. A more accurate word for that template would be "lies", "fabrications", or "misinformation". So I would like that change to be made. But even more so, I firmly believe that using LLM-generated content goes directly against the ethos of the Wikipedia project and as such, that material needs to be completely, unambiguously banned from addition. Where can we go about enacting such a policy? Matt Gies (talk) 15:08, 2 November 2024 (UTC)
- Wikipedia:Village pump (policy) is the place to do it. Make sure you read the previous discussions about this topic before, or you'll get nowhere. The main ones are linked at the top of this talk page. Tercer (talk) 15:39, 2 November 2024 (UTC)
Using Claude 3.5 to summarize reference content for Wikipedia
I've been evaluating some of the more improved LLMs to see if they do a better job of summarizing reference content for Wikipedia without undue copyright infringement. Claude 3.5 seems to do a reasonably good job. See Feral_pig#Canada. I fed the content of a National Geographic article into Claude and asked it to summarize the content in Wikipedia format, including the citation.
To check for copyvio, I used Microsoft Word's "compare" function to see how the wiki draft compared to the original reference. I posted the results of the comparison on my google drive here.
As far as I can tell:
- The wiki draft content looks faithful to the reference.
- There are no hallucinations.
- The longest extracted phrase from the original text was "...throughout western and central Canada, from British Columbia to Manitoba.."
- There is no wp:synth.
Did I miss anything? Is there a more challenging task we should give it? Nowa (talk) 12:38, 21 November 2024 (UTC)
- Simply put, I see no value in using these systems to "summarize" a citation. — The Hand That Feeds You:Bite 17:00, 21 November 2024 (UTC)
- Seems awfully like close paraphrasing anyway, when you change
- "In the late 1980s and early 1990s, some Canadian farmers imported wild boars from Europe to raise for meat. But as wild boars are wont to do, some of them escaped, either digging under fences or barreling through them. Others were set free once the boar meat market cooled. At first, it didn’t seem like a big problem; many thought they couldn’t survive Canada’s long winters. "
into
- "Canadian farmers began importing wild boars from Europe in the late 1980s and early 1990s for meat production. Some of these animals escaped from farms by digging under or breaking through fences, while others were intentionally released when the boar meat market declined. Initially dismissed as a minor concern due to assumptions about their inability to survive Canadian winters" Fram (talk) 17:18, 21 November 2024 (UTC)
@Fram Interesting. I thought I would see how the paraphrasing of a human-drafted original new section based on a reference compared to an LLM drafted new section based on the same reference. I went back to when the section "Attacks on Humans" was introduced into the feral pig article here. I then asked Claude to draft a two sentence section based on the same NYT reference Here's how the human-drafted and Claude-drafted sections compare:
NYT reference excerpts | Human-drafted section | Claude-drafted section |
---|---|---|
Christine Rollins, 59, was feet away from the front door when she was killed. It was only the fifth documented fatal wild hog attack in the country since 1825...
it just a few steps away from the front door...where she worked as a caretaker for an older couple. she was attacked by a herd of feral hogs... There were about 100 documented attacks by feral hogs on humans in the United States between 1825 and 2012 |
Feral pigs in the living in the United States have been known to attack and fatally injure human beings. A 59 year old Texas woman named Christine Rollins was attacked and killed only a few feet away from the front door of her workplace by a herd of feral pigs in the Texas town of Anahuac, which is 50 miles east of Houston. This incident was the fifth documented fatal feral pig attack in the United States since 1825. | While feral hog attacks on humans are rare, with only 100 documented attacks in the United States between 1825 and 2012, they can be fatal as demonstrated by the 2019 death of Christine Rollins in Anahuac, Texas, who was killed by a herd of feral hogs outside a home where she worked as a caretaker. This was only the fifth documented fatal wild hog attack in the United States since 1825. |
I agree that the Claude-drafted section is a closer paraphrase of the reference than the human-drafted section.
Any other comparisons worth noting?Nowa (talk) 16:29, 22 November 2024 (UTC)
- or we could get a person to do it Skeletons are the axiom (talk) 15:01, 31 January 2025 (UTC)
For an LLM tester
I wonder if this very boring task could be done by an LLM.
We now have Wikipedia:Missing Wikipedians#D and Wikipedia:Missing Wikipedians/2000–2009#D. (I've manually split the first three sections). The goal is to evaluate each line of wikitext. Any line that mentions *only* years before 2010 goes in one list. Any line that mentions any years after 2009 goes in another list. (Any line that does not mention any years should go in a third list, but I don't think there are any of these.)
Is anyone willing to try it for me? WhatamIdoing (talk) 02:25, 14 December 2024 (UTC)
- Wouldn't this be a better task for a script than an LLM? Since LLMs by design predict text based on processed input, rather than actually comprehend said input, doesn't that mean there's every chance plugging this wikitext into an LLM wouldn't actually reproduce the content but instead an invented approximation of it? (Like fake usernames 'prompted' by the real ones?) Hydrangeans (she/her | talk | edits) 18:48, 29 January 2025 (UTC)
- In this case, that should be obvious from the diff that you would (obviously, right?) check before posting the change.
- I didn't think it was worth someone writing a specialized script for one-time use if there were alternatives. If it could be done with grep, then so much the better. WhatamIdoing (talk) 18:53, 29 January 2025 (UTC)
- I tried removing with ChatGPT but text messages are limited in length so I had to manually copy and paste sections at a time, and ChatpGPT quickly got overwhelmed as I reached the daily limit, so it switched to an older version, which didn't work so well and it made a fair amount of errors. Ultimately I just copy and pasted the list into a text file, uploaded into R as a tsv named Missing, Missing <- Missing[!grepl("^\\*", Missing$V1) | grepl("201[0-9]|202[0-5]", Missing$V1), ], then exported it, and copy and pasted the text back into the article. Photos of Japan (talk) 04:41, 30 January 2025 (UTC)
- a script would be better imo Skeletons are the axiom (talk) 15:05, 31 January 2025 (UTC)
- Doing it in R seems to have worked, with only a handful of false positives (all due oddities in the text). The good news is that it's done for now, until we need to split off the next decade. WhatamIdoing (talk) 16:52, 31 January 2025 (UTC)
Copyright of LLM output
Generative AI output is widely held to be ineligible for copyright (except under UK and Hong Kong law), but just in case, the terms of use for ChatGPT and Claude contain provisions assigning any copyrights to outputs to the user, meaning that the user would have the authority to license those outputs under CC-BY-SA. Meta AI's terms contain no such provision. I suggest that editors who contribute AI-generated text to Wikipedia only use GenAI services that grant them copyright ownership of the output. Qzekrom (she/her • talk) 18:01, 27 December 2024 (UTC)
LLM-generated content
I've noticed links containing "utm_source=chatgpt.com" in some articles (for example in Lucid_Motors). It's quite likely that the content supported by these sources was written by LLMs. While now there are only few such articles, maybe it would make sense to create some kind of a filter or alert? Alaexis¿question? 23:28, 28 December 2024 (UTC)
- If I understand it correctly, this is a UTM parameter to track where the user came from. For example, the article Lucid Motors gives one source as
https://marketrealist.com/p/who-makes-lucid-motors-battery/?utm_source=chatgpt.com
. If my interpretation is correct, this means that the user clicked on a link on chatgpt.com, which brought them to the website marketrealist.com. In that case, it doesn't say anything about whether the website marketrealist.com was created by an LLM. Phlsph7 (talk) 09:50, 29 December 2024 (UTC)- Probably you're right. Alaexis¿question? 20:58, 30 December 2024 (UTC)
- A filter might still be useful as someone could be having ChatGPT writing a section for them and then citing the sources. E.g. "ChatGPT, write me a summary of X with sources", and then copying and pasting the ChatGPT summary along with the sources. Photos of Japan (talk) 21:56, 30 December 2024 (UTC)
- @Photos of Japan: to confirm, I'm accurately understanding what you're describing...
- Are you imagining a user experience where Wikipedia is able to detect when someone is copying and pasting content that was "written" by service likes ChatGPT, Claude, Gemini, etc.? If so, what can you imagine happening next?
- For context, I ask the above as a product manager at the Wikipedia Foundation who is helping to develop a system that is meant to offer the kind of "in-the-moment" guidance/feedback/awareness I understand you to be describing here. Further, we're in the midst of building a version of – what we're calling "Paste Check" – as I type this! In fact, as part of this work, we investigated whether we could reliably detect pastes from popular LLMs. PPelberg (WMF) (talk) 22:10, 27 January 2025 (UTC)
- @User:PPelberg (WMF) If a link includes "utm_source=chatgpt.com" then that means it was copied from ChatGPT, but that doesn't mean any other associated text added alongside the link was also copied from ChatGPT, but it makes it more likely that it was. I would propose a filter flagging such edits so that a user patrolling recent changes who is interested in reviewing content added by LLMs (such as potentially a member of the WP:WikiProject AI Cleanup) is able to review it. Photos of Japan (talk) 00:15, 28 January 2025 (UTC)
- @Photos of Japan I don't think it makes it especially more likely (as ChatGPT would most likely link to content it is familiar with from its training data, rather than content someone just created with ChatGPT). To clarify,
utm_source=chatgpt.com
isn't part of the address itself: https://en.wikipedia.org/wiki/Wikipedia_talk:Large_language_models and https://en.wikipedia.org/wiki/Wikipedia_talk:Large_language_models?utm_source=chatgpt.com are the exact same page.However, ChatGPT might not be ideal in terms of recommending links (as it hasn't necessarily read the content of the pages), and such a link is good evidence that the edit should be reviewed by a human. I'm pinging @Queen of Hearts (WPAIC's edit filter manager) on this one, and cross-posting this just in case. Chaotic Enby (talk · contribs) 11:22, 31 January 2025 (UTC)- Here's a list of articles that currently have a
utm_source=chatgpt.com
link. Nobody (talk) 12:06, 31 January 2025 (UTC)- Oof, that's a lot. Checking out the first one, here. The source chat found is about the topic, but doesn't support all its assigned content (that presumably come from somewhere else in its knowledge base), labelled the link as dead for some reason, and got the publisher wrong. CMD (talk) 12:20, 31 January 2025 (UTC)
- As someone who regularly uses large language models, but not for the purpose of contributing to any Wikimedia project, I have noticed that large language models can, and often do, pick unreliable sources. Google's AI overviews often list Wikipedia and blogs in its sources. I have noticed that Google Gemini sometimes lists Wikipedia as a source when I use it. If Wikipedia imposes a duty to use reliable sources, in my opinion, we should create an edit filter, and warn, if not disallow, any citations that involve links to large language models, such as ChatGPT, Google Gemini, Claude, and Microsoft Copilot. Wikipedia already lists some AI-generated articles as generally not reliable. See Wikipedia:Reliable sources/Perennial sources#Large language models. Z. Patterson (talk) 22:49, 1 February 2025 (UTC)
- Chaotic Enby (talk · contribs) 23:18, 1 February 2025 (UTC)
page_namespace == 0 & added_lines rlike "utm_source=chatgpt\.com"
- @Chaotic Enby: I think this will work. I also suggest writing this.
equals_to_any(page_namespace, 0, 10, 118) & ( llmurl := "\b(chatgpt|copilot\.microsoft|gemini\.google|groq|)\.\w{2,3}\b"; added_lines irlike (llmurl) & !(removed_lines irlike (llmurl)) & !(summary irlike "^(?:revert|restore|rv|undid)|AFCH|speedy deletion|reFill") & !(added_lines irlike "\{\{(db[\-\|]|delete\||sd\||speedy deletion|(subst:)?copyvio|copypaste|close paraphrasing)|\.pdf") )
- An edit filter manager can add more LLM links, if necessary. It would work similarly to Special:AbuseFilter/1045. Z. Patterson (talk) 23:43, 1 February 2025 (UTC)
- Do we currently have an edit filter for references that cite Wikipedia? There's about 1,500 articles that link to wikipedia.org and once you get past our articles about ourselves most of them are inappropriate. Almost all of our 172 articles linking to the mobile version of our site are inappropriate as well. Photos of Japan (talk) 05:47, 2 February 2025 (UTC)
- Filter 1057 (hist · log) Nobody (talk) 08:05, 2 February 2025 (UTC)
- Thanks. I noticed that it doesn't include any other wikimedia-related sites such as Wiktionary of which we appear to have a few hundred citations to. Addition of a citation referencing Wiktionary doesn't appear to trigger any filter and Citebot will also add dates to it which I find curious given WP:UPSD highlights it as a 'Wikimedia-related website'. Photos of Japan (talk) 07:28, 3 February 2025 (UTC)
- The filter was prettty much created by accident wenn modifying a LTA filter. (Here) They kept it, because it seemed useful for dealing with CIRCULAR issues. It wasn't really a discussion if references to, for example: Wikibooks, Wiktionary, Wikivoyage, Wikiquote, Wikidata, Wikisource should be on the filter too as CIRCULAR or SPS sources. Nobody (talk) 08:33, 3 February 2025 (UTC)
- I see. There's few enough of them that I could probably manually remove most of them some day. Photos of Japan (talk) 03:52, 4 February 2025 (UTC)
- The filter was prettty much created by accident wenn modifying a LTA filter. (Here) They kept it, because it seemed useful for dealing with CIRCULAR issues. It wasn't really a discussion if references to, for example: Wikibooks, Wiktionary, Wikivoyage, Wikiquote, Wikidata, Wikisource should be on the filter too as CIRCULAR or SPS sources. Nobody (talk) 08:33, 3 February 2025 (UTC)
- Thanks. I noticed that it doesn't include any other wikimedia-related sites such as Wiktionary of which we appear to have a few hundred citations to. Addition of a citation referencing Wiktionary doesn't appear to trigger any filter and Citebot will also add dates to it which I find curious given WP:UPSD highlights it as a 'Wikimedia-related website'. Photos of Japan (talk) 07:28, 3 February 2025 (UTC)
- Filter 1057 (hist · log) Nobody (talk) 08:05, 2 February 2025 (UTC)
- Do we currently have an edit filter for references that cite Wikipedia? There's about 1,500 articles that link to wikipedia.org and once you get past our articles about ourselves most of them are inappropriate. Almost all of our 172 articles linking to the mobile version of our site are inappropriate as well. Photos of Japan (talk) 05:47, 2 February 2025 (UTC)
- @Chaotic Enby: I think this will work. I also suggest writing this.
- As someone who regularly uses large language models, but not for the purpose of contributing to any Wikimedia project, I have noticed that large language models can, and often do, pick unreliable sources. Google's AI overviews often list Wikipedia and blogs in its sources. I have noticed that Google Gemini sometimes lists Wikipedia as a source when I use it. If Wikipedia imposes a duty to use reliable sources, in my opinion, we should create an edit filter, and warn, if not disallow, any citations that involve links to large language models, such as ChatGPT, Google Gemini, Claude, and Microsoft Copilot. Wikipedia already lists some AI-generated articles as generally not reliable. See Wikipedia:Reliable sources/Perennial sources#Large language models. Z. Patterson (talk) 22:49, 1 February 2025 (UTC)
- Oof, that's a lot. Checking out the first one, here. The source chat found is about the topic, but doesn't support all its assigned content (that presumably come from somewhere else in its knowledge base), labelled the link as dead for some reason, and got the publisher wrong. CMD (talk) 12:20, 31 January 2025 (UTC)
- Here's a list of articles that currently have a
- @Photos of Japan I don't think it makes it especially more likely (as ChatGPT would most likely link to content it is familiar with from its training data, rather than content someone just created with ChatGPT). To clarify,
- @User:PPelberg (WMF) If a link includes "utm_source=chatgpt.com" then that means it was copied from ChatGPT, but that doesn't mean any other associated text added alongside the link was also copied from ChatGPT, but it makes it more likely that it was. I would propose a filter flagging such edits so that a user patrolling recent changes who is interested in reviewing content added by LLMs (such as potentially a member of the WP:WikiProject AI Cleanup) is able to review it. Photos of Japan (talk) 00:15, 28 January 2025 (UTC)
This section appears to be AI-generated and could use attention from someone more familiar with fixing or removing AI-generated output. Skyerise (talk) 13:56, 29 January 2025 (UTC)