Wikipedia:Bots/Noticeboard
Here we coordinate and discuss Wikipedia issues related to bots and other programs interacting with the MediaWiki software. Bot operators are the main users of this noticeboard, but even if you are not one, your comments will be welcome. Just make sure you are aware about our bot policy and know where to post your issue.
Do not post here if you came to
- discuss non-urgent bot issues, bugs and suggestions for improvement. Do that at the bot operator's talk page
- discuss urgent/major bot issues. Do that according to instructions at WP:BOTISSUE
- discuss general questions about the MediaWiki software and syntax. We have the village pump's technical section for that
- request approval for your new bot. Here is where you should do it
- request new functionality for bots. Share your ideas at the dedicated page
| Bot-related archives |
|---|
Is Monkbot 21 removing url-status=dead desirable?
[edit]The recently approved Wikipedia:Bots/Requests for approval/Monkbot 21 is primarily concerned with changing |pages= to |article-number=, a helpful fix for many common journal pagination schemes. However, it has an ancillary task of completely removing |url-status=dead from CS1 citations, which doesn't seem to have been addressed in the BRFA and might be controversial. There are 1.7 million uses of this parameter value, so I'm not sure if this removal is desired by the community.
This task conflicts with multiple other bots:
- GreenC's Wayback Medic, which changes the
|url-status=parameter toliveordeadupon request at Wikipedia:Link rot/URL change requests (example) - InternetArchiveBot adds
|url-status=deadwhen it can find good values for|archive-url=and|archive-date=(example).
It is also in contravention to citation documentation:
- Wikipedia:Link rot specifically encourages editors to add the flag:
Within citation templates, put the archive URL in
|archive-url=and add an|archive-date=. If the link is still valid, include|url-status=live, otherwise set|url-status=dead. - Template:Cite web/doc § Using "archive-url" and "archive-date" (and optionally "url-status") for webpages that have been archived describes the parameter as optional, not redundant.
There's a handful of complaints about this:
- I was wondering why from Shearonink
- What is Monkbot doing? from Mikeblas
- This is quite annoying from Traumnovelle
Because many editors add |archive-url= to citations for live URLs without also including |url-status=live, I had always used |url-status=dead to confirm that I had manually certified the URL as dead, removing ambiguity. I'd be open to a discussion on deprecating |url-status=dead but it doesn't seem like one ever happened. Dan Leonard (talk • contribs) 17:11, 8 October 2025 (UTC)
- In the What is Monkbot doing? conversation, some surprising defaults were explained to me. But for the liveness parameter, I think it ends up being that
|url-status=deadis a default, and the bot author uses the bot to enforce their opinion thatadding url-status equals dead to cs1|2 templates does nothing other than clutter the wikitext
. - As you're saying, Dan Leonard, to me the presence of the parameter means someone has tried and verified the link, and that it is actually dead. There might be a practice of adding
|archive-url=proactively to plan for event that the original URL should ever go dead or become inaccessible in the future. (Maybe it's also a prompt for the Wayback Machine to archive the page?) - In a lapse of judgement, I found myself editing Gaza genocide where an editor had, as a habit, refused to add
|url-status=liveto citations, and may have actually removed it. That made verifying citations more difficult, since the archive websites aren't particularly fast and often clicked to death. - And so no: I don't think
|url-status=deadis extraneous, and should not be removed by robots. -- mikeblas (talk) 17:40, 8 October 2025 (UTC)- Either way it won't break my bot or IABot. My bot does add url-status-dead when generating new archive URLs, but this is more due to old code back before this was an issue. I think TTM is the main proponent of removing it. I can see arguments either way. Until there is a clear consensus I probably won't change mainly because it would take some work. I suspect this will be a problem for many tools that are adding archives (VE?). But, if TTM wants to remove it as a cosmetic edit while doing other work on the citation, I have no qualms with that either. -- GreenC 19:37, 8 October 2025 (UTC)
- Sorry. I'm not new, but every day someone uses at least one abbreviation I've never heard of before: What is "TTM"? -- mikeblas (talk) 20:03, 8 October 2025 (UTC)
- The person whose task we are discussing! :) -- GreenC 00:00, 9 October 2025 (UTC)
- Trappist the monk. –Novem Linguae (talk) 00:40, 9 October 2025 (UTC)
- I have reviewed the BRFA and don’t see any approval to remove the above-discussed parameter, so the bot should be limited to only the approved functions. Cleaning up other parameters would need to be discussed and approved by BAG. – DreamRimmer ■ 00:48, 9 October 2025 (UTC)
- Ah, got it. Thank you for the helpful answer, Novem Linguae. -- mikeblas (talk) 03:20, 9 October 2025 (UTC)
- Sorry. I'm not new, but every day someone uses at least one abbreviation I've never heard of before: What is "TTM"? -- mikeblas (talk) 20:03, 8 October 2025 (UTC)
- I approved this task, so I'll comment.
|url-status=deadis indeed the default behaviour, it does literally nothing but clutter the wikitext. Compare:{{cite web |title=Article of things |website=example.com |url=https://example.com |archive-url=http://webarchive.com/2025-02-04-asdfasdf/htts://example.com |archive-date=2025-02-24 |url-status=dead}}- "Article of things". example.com. Archived from the original on 2025-02-24.
{{cite web |title=Article of things |website=example.com |url=https://example.com |archive-url=http://webarchive.com/2025-02-04-asdfasdf/htts://example.com |archive-date=2025-02-24}}- "Article of things". example.com. Archived from the original on 2025-02-24.
- Literally nothing changes. Removal therefore unclutters the Wikitext, and while it's not something a bot should do own its own per WP:COSMETICBOT, as part of other edits, it's fine. Headbomb {t · c · p · b} 19:42, 8 October 2025 (UTC)
- Is this the end of the conversation, or are you willing to consider opinions to the contrary? -- mikeblas (talk) 20:04, 8 October 2025 (UTC)
- I don't think anyone disputed that
|url-status=deadis the default, but rather just the claim that it is purely clutter. I see it as having significant semantic value. As Mikeblas and I mentioned, many citations are erroneously written by humans to include archive URLs for live webpages without also setting|url-status=live, so explicit inclusion of thedeadvalue signifies that it is actually dead. Personally, I've always thought|url-status=should be a required value rather than assumingdead, but it's far too late to fix that. Dan Leonard (talk • contribs) 20:11, 8 October 2025 (UTC)- Re: many citations are erroneously written by humans to include archive URLs for live webpages without also setting
|url-status=live" - See this is the issue, if there is ever one. The solution, if you really, really care, is to identify live links, since that's changes the template behaviour, not dead ones.
- "Article of things". example.com. Archived from the original on 2025-02-24.
- Or you can just leave the template as they are since archive links are always functional.
- "Article of things". example.com. Archived from the original on 2025-02-24.
- Headbomb {t · c · p · b} 22:24, 8 October 2025 (UTC)
- The solution? Sorry, I missed something. What is it that you're solving with your suggestion? -- mikeblas (talk) 03:26, 9 October 2025 (UTC)
- Re: many citations are erroneously written by humans to include archive URLs for live webpages without also setting
- In terms of human readability, having the specification of the URL status as dead in the markup is of great use, indicates that someone has checked the source and found the link has gone dead. As someone who hand-codes his reference links and hand-edits those placed there by others, I resent a bot having been designed to erase my efforts in order to make what is going on less obvious to those editing the text. If you wish to eliminate "dead" as a valid setting for that parameter, the bot page is not the place for that conversation. -- Nat Gertler (talk) 23:03, 8 October 2025 (UTC)
- And to learn that there's one bot that adds and one that removes seems pretty absurd. -- mikeblas (talk) 03:19, 9 October 2025 (UTC)
- Either way it won't break my bot or IABot. My bot does add url-status-dead when generating new archive URLs, but this is more due to old code back before this was an issue. I think TTM is the main proponent of removing it. I can see arguments either way. Until there is a clear consensus I probably won't change mainly because it would take some work. I suspect this will be a problem for many tools that are adding archives (VE?). But, if TTM wants to remove it as a cosmetic edit while doing other work on the citation, I have no qualms with that either. -- GreenC 19:37, 8 October 2025 (UTC)
- Here's my take on the above discussion:
- Of the "handful of complaints", it seems two accepted (at the time) the operator's explanation that
|url-status=deadis redundant. I don't see that the OP here has tried talking to the operator before bringing their complaint here, which IMO is a bit of a faux pas. - While some claim that there's a semantic difference between
|url-status=deadand the parameter not being present, namely that the former indicates a human actually verified the deadness, that doesn't seem to be reflected at Help:Citation Style 1#Web archives or Template:Cite web#csdoc urlstatus. It's not clear to what extent other bots adding|url-status=deadreflects that semantic difference either. - While Wikipedia:Bots/Requests for approval/Monkbot 21 does not explicitly list this secondary cleanup, it does state
more at User:Monkbot/task_21: Replace page(s) with article-number
and that page does (and did at the time of approval) describe it, and some of the trial edits included it.
- Of the "handful of complaints", it seems two accepted (at the time) the operator's explanation that
- Overall, as far as the bot goes it seems to be operating within approval. Establishing a consensus that there's a semantic difference between
|url-status=deadand leaving the parameter out, or that the parameter should be left despite there being no semantic difference, is probably better for some other forum. Anomie⚔ 01:44, 9 October 2025 (UTC)- I accepted in it the sense that I don't think debating it is worth any time but I do not feel that simply removing a parameter because it is redundant is a useful task, even as part of another task. Its like changing {{cn}} to {{citation needed}} it is a useless change but its even more useless to bother getting people not to make such a change. Traumnovelle (talk) 02:47, 9 October 2025 (UTC)
- Traumnovelle,
{{cn}}to{{citation needed}}is useful, because cn is cryptic shorthand and citation needed is comprehensible. It is also cosmetic, though, so shouldn't be done on its own. — Qwerfjkltalk 08:59, 9 October 2025 (UTC) - You're the one of the three I didn't count as accepting the explanation, since you ended with disagreement (
If the renderings are the same then there is no need to change it
) rather than thanks. There's nothing wrong with that, BTW. Reasonable editors can and do disagree. Anomie⚔ 13:34, 9 October 2025 (UTC)
- Traumnovelle,
- I was one of the handful of opinions and I guess I've changed my mind since March. As an editor and as a reader I appreciate url-status=dead. Personally I haven't cared if it can't be seen by readers but as an editor & when I am doing research and verifying info or a citation...then, yeah it is valuable to me. - Shearonink (talk) 03:03, 9 October 2025 (UTC)
- I accepted in it the sense that I don't think debating it is worth any time but I do not feel that simply removing a parameter because it is redundant is a useful task, even as part of another task. Its like changing {{cn}} to {{citation needed}} it is a useless change but its even more useless to bother getting people not to make such a change. Traumnovelle (talk) 02:47, 9 October 2025 (UTC)
- Aha, I finally found a previous relevant discussion: Help talk:Citation Style 1/Archive 93 § url-status=dead (and a side-topic about language=en). I'm not sure what, if any, consensus can be found there but at least I've now answered my initial question of
it doesn't seem like one ever happened
. Pinging participants SMcCandlish, Trappist the monk, jacobolus, David Eppstein, GreenC, Ceyockey, ActivelyDisinterested, Folly Mox, Izno, Nigel Ish, and Grorp. Dan Leonard (talk • contribs) 03:39, 9 October 2025 (UTC)- Count me among the people who use url-status=dead to indicate that the url has actually been checked to be dead rather than just relying on the default behavior of the citation templates to treat it as dead. It is semantically meaningful even if it is visually no different than when blank. This was not a bot-approved task, should not have been a bot-approved task, and the bots should not do it, especially because the ambiguity has led to some bots adding it and some removing it. —David Eppstein (talk) 05:28, 9 October 2025 (UTC)
- Personally I don't think it's useful, as it's absence is the same as it's inclusion. However if other editors find it useful I don't see a pressing requirement to remove it. Since the last discussion I've made an effort to include it because of that. -- LCU ActivelyDisinterested «@» °∆t° 11:46, 9 October 2025 (UTC)
- Yes, it is desirable. Yes, that parameter combination is redundant. Yes, the url-status being dead is the default when an archive-url is present (and the former parameter serves no purpose when the latter is not). Yes, any time a bot or a human "gnome" changes something that touches citations in any way, some handful of people will reflexively complain, because we have a number of editors obsessive about "their" citation formatting detailia that no one else cares about (it has always been this way). No, that does not mean there is an actual "conroversy" or "dispute", not when many thousands of editors don't have an issue with it, most of those who care to look into the matter understand why the parameter should be removed, and no one is putting up a fuss but editors you can count on one hand. Cf. WP:BIKESHED and WP:1AM. — SMcCandlish ☏ ¢ 😼 21:00, 13 October 2025 (UTC)
- One bot sets it. Another removes it. But it doesn't matter? -- mikeblas (talk) 21:33, 13 October 2025 (UTC)
- Ah, them strategy of simply sloughing off the expressed concerns by belittlling the people stating it. I haven't seen the "many thousands of editors" showing support for it either, but feel free to point them out. -- Nat Gertler (talk) 22:57, 13 October 2025 (UTC)
- Not to mention SMcCandlish's most recent comment on this was
there is an active dispute about this particular parameter+value,
. The reason I've opened this discussion was because, in SMcCandlish's own words,|url-status=dead(a dispute in which I'm now neutral), but at least two of you are still going around removing it at will despite vociferous objectionssomeone's still removing
. This parameter value is listed as valid in the CS1 documentation, appears in help pages, and is used 1.7 million times. If it really should be removed, it should be deprecated by community consensus. It's fine, of course, to change one's mind (and especially so over the course of two years). But to now turn around to call people who share his original position|url-status=deadfrom random articles, but I don't get the sense that there's actually a consensus in favor of the idea. I like the notion of removing any parameter that is actually redundant, but in fairness I'm not certain this is seen to be reundant, regardless what the original idea for implementing the parameter wasobsessive
and sayno one else cares about
this is insufferably belittling. Dan Leonard (talk • contribs) 20:02, 15 October 2025 (UTC)
- Not to mention SMcCandlish's most recent comment on this was
Discussion at Wikipedia:Administrators' noticeboard § SineBot, benign helper or closet vandal?
[edit]
You are invited to join the discussion at Wikipedia:Administrators' noticeboard § SineBot, benign helper or closet vandal?, which may be of interest to this noticeboard. Tenshi! (Talk page) 16:25, 10 October 2025 (UTC)
Global bot approval request for SchlurcherBot
[edit]Hello!
In accordance to the policy, this message is to notify you that there is a new approval request for a global bot.
The discussion is available at Steward_requests/Bot_status#Global_bot_status_for_User:SchlurcherBot on Meta-Wiki. All Wikimedia community members are invited to participate.
Thank you for your time.
Best regards, MediaWiki message delivery (talk) 22:10, 21 October 2025 (UTC)
- Note that, per WP:GLOBALBOTS, only interwiki link updating bots may operate on the English Wikipedia without a BRFA. In this case, Wikipedia:Bots/Requests for approval/SchlurcherBot has already approved this task locally. Anomie⚔ 23:00, 21 October 2025 (UTC)
Mass revert request
[edit]I had a typo in my code that resulted in 356 edits that need reverting. Is there still a mass revert script? The list of articles. Example Special:Diff/1305772241/1317946469 (the "SKIPDEADURL" string). -- GreenC 15:38, 22 October 2025 (UTC)
- Is this fixable with a regex? If so, might be a good fit for WP:AWBREQ. –Novem Linguae (talk) 15:52, 22 October 2025 (UTC)
- Looking at the diff, looks like it's not fixable with a regex. Might still be a good fit for AWBREQ if a good mass revert userscript isn't found. That's the place to request a medium number of repetitive edits (somewhere between "too many edits for me to do" and "not enough edits to bother coding up a bot"). –Novem Linguae (talk) 15:54, 22 October 2025 (UTC)
Doing... with alt account. – DreamRimmer ■ 16:31, 22 October 2025 (UTC)
Done Special:Contributions/DreamRimmer Alt – DreamRimmer ■ 16:59, 22 October 2025 (UTC)
- Awesome, thank you User:DreamRimmer! I'll rerun the pages with the fix. -- GreenC 18:35, 22 October 2025 (UTC)
- GreenC, in future try Wikipedia:Kill-It-With-Fire. — Qwerfjkltalk 17:47, 22 October 2025 (UTC)
- Cool didn't know about that. -- GreenC 18:35, 22 October 2025 (UTC)
AI-driven article review bot
[edit]I am in the process of developing an AI-based article improvement suggestions bot. Before I go any further, I'd like to raise possible policy issues up front, before I go to the effort of actually implementing it and making a proposal.
Background: I've been experimenting with using LLMs and web searching to do multi-step systematic review to find errors in articles, with very promising results. Here's my methodology:
- Select an article using 'Random article'
- Get Claude to perform a review of that article, giving it the article's wikitext as an input (Claude, and I imagine other LLM agents, has been blocked from accessing Wikipedia directly.)
- Based on that, tell it to perform a set of web searches to find sources to confirm or deny any factual errors it thinks it may have found. (It incorrectly 'believes' that it cannot access the web unless actually told to.) It is forbidden to use Wikipedia as a source. I will soon add more stringent criteria on sources.
- Based on the output of those searches, perform a review of the claims based on the evidence it has found. It is instructed to generate a detailed rationale for each claim, together with source URLs to back up its assertions.
- Finally, based on that, select the single correction out of the remaining errors that it is most confident about.
Note that this is a multi-stage process, with each round of checking being isolated from the previous rounds.
So far, the results have been stunning. Not a single bogus error has passed through this multi-step review process, with the LLM revising its opinion at the systematic review phase, and some of the errors detected have been based on foreign-language sources, with the script finding and correctly interpreting them without even being directed to do so.
If I can make this accurate and reliable enough, I am then contemplating using it to drive a bot that would then notify other editors by putting a comment on the article's talk page, wrapped in a template that could be used to style the text and also put the talk page into an appropriate tracking categories. The template would clearly label the report as AI-generated, and warn editors that they are responsible for fact-checking the reports themselves against the sources and should not blindly incorporate its suggestions into articles, or use its own words verbatim.
My aim would be to make false positive rates < 1%. If the bot can succesfully review 100 articles without a single error, I would regard that as evidence of that criterion likely being met, and would then take the bot to this forum as a bot proposal.
Just as with my geocoding bot, I can then record that the article has been visited, and not go back to the same article again for some time, if ever, to avoid annoying other editors with repeated reports. There are, after all, millions of articles to review, so this isn't really a limitation. To avoid overwhelming the editing community with noise, I could limit the bot to perhaps 100 edits a day - the aim is to be a helper to editors, not a taskmaster. Not to mention that LLM access costs money, and this sort of multi-stage review consumes a lot of tokens per article reviewed.
In the interests of transparency, I will of course make the code open source.
Before I go any further in working on this, there is an obvious policy issue, which is the legitimacy of making automated edits containing text generated by LLMs, even on talk pages and clearly marked as such.
I would like to hear your opinions on this. — The Anome (talk) 11:35, 6 November 2025 (UTC)
- Wasting the Earth's resources to find some minor errors like this is ethically rather hard to defend. Fram (talk) 11:36, 6 November 2025 (UTC)
- Hello again, Fram! That cost perhaps $0.001 to $0.01 of electricity to find. I can provide exact costings for further queries, if you'd like. Perhaps you might find this a bit more impressive? — The Anome (talk) 11:45, 6 November 2025 (UTC)
- Not really, no. It's an additional detail, not an error. Obviously you only know after the fact whether you will find something worthwhile or have just wasted a LLM run. And I don't really care about the cost of a single run anyway, it's the general "let's use LLMs for everything, from the important to the trivial" which in toto leads to the creation of these huge resource-guzzling centers, with datasets created by misusing the often copyrighted work of others. If you want to improve articles, read articles, use your brain and skills, and improve them based on your own work. Fram (talk) 12:11, 6 November 2025 (UTC)
- Your rejection of this on these grounds is ludicrous. We sit in our warm houses, heated and lit by resource-guzzling heating, using computers made by resource-guzzling manufacturing, communicating via the (surprisingly) resource-guzzling internet, eating food made by resource-guzzling agriculture... there are far more serious problems than GPU barns to worry about. Can we please have a serious discussion about improving the encylopedia instead? — The Anome (talk) 12:23, 6 November 2025 (UTC)
- @The Anome We best find some common ground before wasting precious resources on debating. Dismissing concerns is not as effective as finding common ground and exploring our options from there (even tho debating online can be a lot of fun). Polygnotus (talk) 12:25, 6 November 2025 (UTC)
- Bye. Fram (talk) 12:30, 6 November 2025 (UTC)
- @Fram You may be interested in joining the meta:Sustainability Initiative.
- It is true that in the grand scheme of things the fact that we try to write and improve an encyclopedia is rather hard to defend; people are literally dying of hunger as we speak.
- It is also true that when a new tech hype shows up idiots try to cram it into anything, and we don't really need vibrators to be on the blockchain.
- On the other hand, it is wise to explore to make sure we understand it, and to not throw out the baby with the bathwater. LLMs are both a giant threat and an opportunity for Wikipedia. Polygnotus (talk) 12:15, 6 November 2025 (UTC)
- Thanks, but no, I'm not interested in editing meta in any way or shape. Fram (talk) 12:32, 6 November 2025 (UTC)
- @Fram And you don't appear to have many userboxen.
Switching to (relatively) green energy is a no-brainer, and many data centers support it. Polygnotus (talk) 12:35, 6 November 2025 (UTC)
- At the risk of derailing the conversation further, full electrification of the economy using sustainable power and energy storage is indeed the way to go. I'm an enthusiastic booster of green energy. — The Anome (talk) 12:49, 6 November 2025 (UTC)
- @Fram And you don't appear to have many userboxen.
- Thanks, but no, I'm not interested in editing meta in any way or shape. Fram (talk) 12:32, 6 November 2025 (UTC)
- Resource guzzling centers? I thought DeepSeek proved that a mid-range, off-the-shelf PC was all you needed for AI? ~2025-31723-49 (talk) 16:27, 6 November 2025 (UTC)
- DeepSeek#Training_framework
As of 2022, Fire-Flyer 2 had 5,000 PCIe A100 GPUs in 625 nodes, each containing 8 GPUs
Polygnotus (talk) 09:19, 9 November 2025 (UTC)
- DeepSeek#Training_framework
- Your rejection of this on these grounds is ludicrous. We sit in our warm houses, heated and lit by resource-guzzling heating, using computers made by resource-guzzling manufacturing, communicating via the (surprisingly) resource-guzzling internet, eating food made by resource-guzzling agriculture... there are far more serious problems than GPU barns to worry about. Can we please have a serious discussion about improving the encylopedia instead? — The Anome (talk) 12:23, 6 November 2025 (UTC)
- Not really, no. It's an additional detail, not an error. Obviously you only know after the fact whether you will find something worthwhile or have just wasted a LLM run. And I don't really care about the cost of a single run anyway, it's the general "let's use LLMs for everything, from the important to the trivial" which in toto leads to the creation of these huge resource-guzzling centers, with datasets created by misusing the often copyrighted work of others. If you want to improve articles, read articles, use your brain and skills, and improve them based on your own work. Fram (talk) 12:11, 6 November 2025 (UTC)
- Hello again, Fram! That cost perhaps $0.001 to $0.01 of electricity to find. I can provide exact costings for further queries, if you'd like. Perhaps you might find this a bit more impressive? — The Anome (talk) 11:45, 6 November 2025 (UTC)
- As far as WP:Bot policy goes, I don't see a problem other than needing community consensus. As far as that, your discussion at WP:VPI#AI citation-checking bot is a good start but we'd probably want to see approval in a WP:VPR discussion to let it go ahead, considering general community sentiment against using LLMs to directly produce content or discussion comments. Anomie⚔ 12:20, 6 November 2025 (UTC)
- Just to be clear, I'm emphatically against using LLMs to generate Wikipedia content; this would mark the beginning of the end for Wikipedia and its descent into a Grokipedia clone. This isn't that - it's flagging things for human attention. Following the earlier discussion, I realised that in-page annotation was a bad idea, so I've already changed my proposal based on that. — The Anome (talk) 12:30, 6 November 2025 (UTC)
- That doesn't change anything I said. There seems to be enough sentiment against LLMs in general that we'd want to see a well-attended WP:VPR discussion showing consensus for your idea. Anomie⚔ 12:34, 6 November 2025 (UTC)
- @Anomie: I'm in total agreement with you; perhaps I didn't make myself clear enough above, I'm more than emphatically againt abuse of LLMs to generate 'fact', I'm violently opposed to it. Grokipedia stands as an awful warning of what happens if you try. Any use of LLMs on Wikipedia, in any way, needs justification, and that why I'm talking about this in public, in a number of appropriate forums, trying to get feedback about all the different aspects of this. (Oh - to give another example of what I would consider a legitimate use of LLMs, adding WikiProject tags to talk pages of articles without them comes to mind. Proposal to follow there, too.) — The Anome (talk) 12:40, 6 November 2025 (UTC)
- Grokipedia is not an attempt to generate facts, it is an attempt to create alternative facts to not offend tiny-brained far right idiots. Polygnotus (talk) 12:42, 6 November 2025 (UTC)
- Indeed. To quote Frank Herbert: "Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them." — The Anome (talk) 12:43, 6 November 2025 (UTC)
- Grokipedia is not an attempt to generate facts, it is an attempt to create alternative facts to not offend tiny-brained far right idiots. Polygnotus (talk) 12:42, 6 November 2025 (UTC)
- @Anomie: I'm in total agreement with you; perhaps I didn't make myself clear enough above, I'm more than emphatically againt abuse of LLMs to generate 'fact', I'm violently opposed to it. Grokipedia stands as an awful warning of what happens if you try. Any use of LLMs on Wikipedia, in any way, needs justification, and that why I'm talking about this in public, in a number of appropriate forums, trying to get feedback about all the different aspects of this. (Oh - to give another example of what I would consider a legitimate use of LLMs, adding WikiProject tags to talk pages of articles without them comes to mind. Proposal to follow there, too.) — The Anome (talk) 12:40, 6 November 2025 (UTC)
- That doesn't change anything I said. There seems to be enough sentiment against LLMs in general that we'd want to see a well-attended WP:VPR discussion showing consensus for your idea. Anomie⚔ 12:34, 6 November 2025 (UTC)
- Just to be clear, I'm emphatically against using LLMs to generate Wikipedia content; this would mark the beginning of the end for Wikipedia and its descent into a Grokipedia clone. This isn't that - it's flagging things for human attention. Following the earlier discussion, I realised that in-page annotation was a bad idea, so I've already changed my proposal based on that. — The Anome (talk) 12:30, 6 November 2025 (UTC)
- I find the argument that "using electricity on bots is wasteful" to be unpersuasive. Electricity isn't something we need to be rationing at the user level. If it's truly that expensive, limits will be introduced via raising prices, or Toolforge will introduce restrictions.
- However the inaccuracy of LLMs often wastes an incredible amount of editor time that could be spent doing other productive things onwiki. So my presumption for anything LLM is that it's bad until proven otherwise.
- Anyway, got any diffs of what kinds of reports this bot would make, so that we can evaluate the LLM's accuracy? –Novem Linguae (talk) 13:19, 6 November 2025 (UTC)
- I will code up an implementation of my hand-driven experiments so far, and post some. You can see a little bit of the early results at User:The Anome/Claude experiment. — The Anome (talk) 13:22, 6 November 2025 (UTC)
- My objection was not about "using electricity on bots", but about the costs of creating and running LLMs. Wikipedia should not support or rely upon such a dreadful, wasteful, thieving, mind-numbing technology. Fram (talk) 13:31, 6 November 2025 (UTC)
- The WMF essentially caused us to miss out on a generation of new editors by ignoring that the common way to interact with the internet was via mobile devices. Now the community is is going to the same thing by ignoring that LLMs have changed the way that many people research and interact with the internet. That will be two generations of new editors shut out.
We should be seriously considering ways that LLMs can be used constructively on Wikipedia so that we don't wither away. The number of English speakers on the internet has ballooned while active editor numbers have stagnated, active admin numbers are dropping, and our page views are starting to decline. Maybe we shouldn't accelerate the decline? ScottishFinnishRadish (talk) 13:32, 6 November 2025 (UTC)- At the risk of seeming to play both sides, I think the real value is Wikipedia is that it represents the human distillation of the consensus reality of human knowledge. The NPOV principle and the social infrastructure built around Wikipedia has made it the nearest thing to a single source of truth in the modern world - even if that truth is "sources differ" or "there is no consensus", Wikipedia reports the controversy in as neutral way as we possibly can. And there is huge value in that. It's also the case that LLMs are to a substantial extent driven by Wikipedia as one of their central sources of knowledge, as no other single source binds all human knowledge together to the same degree. Nothing we should do should imperil that. But at the same time, to ignore the potential utility of LLMs to aid human curation of human knowledge would also be crazy. We can either use the tool to help human endeavour - by checking facts, finding citations, and so on - or have it used against us to take it away. Feeding LLM-generated text directly into Wikipedia will help bring about model collapse, and that would be a disaster. (Ironically, another valid use of LLMs would be to detect LLM editing of Wikipedia so it can be rooted out - but that's another discussion for another day.) — The Anome (talk) 14:07, 6 November 2025 (UTC)
- I broadly agree. Human hands need to be involved, and I don't think it's very likely we'll reach a point where we can just say "hey, make me an article on shit flow diagrams" and accept whatever it shits out. But coming up with constructive use cases and what editors need to know and do to use those tools to contribute is necessary for the continued relevancy of Wikipedia. ScottishFinnishRadish (talk) 14:39, 6 November 2025 (UTC)
- Exactly. Man discovered how to use fire in prehistoric times, we discovered deep learning just recently. But man learned the lesson that there is a big difference between using contained, controlled fire to cook and power things, and setting yourself on fire. And that's the lesson we need to learn now. — The Anome (talk) 15:20, 6 November 2025 (UTC)
- To use the chainsaw analogy from after my RFA, I used a chainsaw to cut down the tree I used to build my bed, but I didn't use it to cut lap joints or mortices. Tools used in the right way are effective. Used incorrectly they end in a ruined bed and a missing finger. ScottishFinnishRadish (talk) 15:57, 6 November 2025 (UTC)
- Exactly. Man discovered how to use fire in prehistoric times, we discovered deep learning just recently. But man learned the lesson that there is a big difference between using contained, controlled fire to cook and power things, and setting yourself on fire. And that's the lesson we need to learn now. — The Anome (talk) 15:20, 6 November 2025 (UTC)
- I broadly agree. Human hands need to be involved, and I don't think it's very likely we'll reach a point where we can just say "hey, make me an article on shit flow diagrams" and accept whatever it shits out. But coming up with constructive use cases and what editors need to know and do to use those tools to contribute is necessary for the continued relevancy of Wikipedia. ScottishFinnishRadish (talk) 14:39, 6 November 2025 (UTC)
- I am dubious that this particular proposal would encourage LLM-friendly editors to get involved. It seems to me (and I don't claim to be an expert) that LLM fans generally want AI to do their work for them, while this would be the other way 'round: the AI is making the choices then asking for human hands to do the work for it. -- Nat Gertler (talk) 15:35, 6 November 2025 (UTC)
- I support The Anome's efforts thus far. Running various reports to create lists for human editors (and sometimes bots operated by human editors) to review and fix is something that we have been doing on Wikipedia for a long time. This proposed report looks like the latest version of that. Whether it's analyzing a database dump for possible typos, or generating a report of possibly invalid ISBNs, we have helpful tools that create hundreds of lists for humans to analyze. One more, if it is of high quality, is welcome. And as for bringing in LLM-friendly editors, who knows? The goal is to help editors find errors in Wikipedia articles and fix them. We don't know what kinds of editors will show up to do that. – Jonesey95 (talk) 16:06, 6 November 2025 (UTC)
- On the sample page, the output looks pretty verbose. If the bot could be configured to just print reports with bulleted lists, and each bullet contains a page wikilink and then one suggestion per bullet, that could be a pretty efficient format. Wouldn't even need a BRFA for that since the reports could be printed to the bot's userspace.
- I think asking to let the bot put these suggestions on article talk pages would be much more controversial, so perhaps might want to avoid that route. –Novem Linguae (talk) 16:13, 6 November 2025 (UTC)
- I support The Anome's efforts thus far. Running various reports to create lists for human editors (and sometimes bots operated by human editors) to review and fix is something that we have been doing on Wikipedia for a long time. This proposed report looks like the latest version of that. Whether it's analyzing a database dump for possible typos, or generating a report of possibly invalid ISBNs, we have helpful tools that create hundreds of lists for humans to analyze. One more, if it is of high quality, is welcome. And as for bringing in LLM-friendly editors, who knows? The goal is to help editors find errors in Wikipedia articles and fix them. We don't know what kinds of editors will show up to do that. – Jonesey95 (talk) 16:06, 6 November 2025 (UTC)
- Just generally speaking, yes page views are declining, don't know if the correct answer to all of this is to embrace LLMs. With respect to this specific bot idea, while I'm not against what is being done here, but I do feel like I would feel a bit out of my depth if I were to describe "why" the LLM is able to say "yes this claim is supported" (which imo is not a good thing for a "fact checking" bot). I would personally favor a more low-tech approach of feeding RoBERTa chunks of data/sentences from a URL to create a vectored DB indexing each work and then using cosine similarity or some kind of textual entailment model to verify if a statement is supported by the text. I'm not sure how it would compare with a LLM in terms of emissions or electricity usage, but it would allow us to say "the model saw X claim and Y text and gave the wrong prediction here" rather than "god knows why it thought X is fine". Sohom (talk) 16:46, 6 November 2025 (UTC)
- There's a difference between broadly embracing LLMs and figuring out constructive use cases for them. ScottishFinnishRadish (talk) 17:19, 6 November 2025 (UTC)
- @ScottishFinnishRadish, we should consider each technical approach on it's technical merits and not shoehorn technical solutions due to a sense of urgency fueled by external factors as you lay out above. I don't have problems with using LLMs in constructive use cases, but it needs to be the square key in the square hole, I feel like there is a flavor of "lets just smash the hexagonal key into the square hole" in this thread. Sohom (talk) 19:19, 6 November 2025 (UTC)
- There's a difference between broadly embracing LLMs and figuring out constructive use cases for them. ScottishFinnishRadish (talk) 17:19, 6 November 2025 (UTC)
- At the risk of seeming to play both sides, I think the real value is Wikipedia is that it represents the human distillation of the consensus reality of human knowledge. The NPOV principle and the social infrastructure built around Wikipedia has made it the nearest thing to a single source of truth in the modern world - even if that truth is "sources differ" or "there is no consensus", Wikipedia reports the controversy in as neutral way as we possibly can. And there is huge value in that. It's also the case that LLMs are to a substantial extent driven by Wikipedia as one of their central sources of knowledge, as no other single source binds all human knowledge together to the same degree. Nothing we should do should imperil that. But at the same time, to ignore the potential utility of LLMs to aid human curation of human knowledge would also be crazy. We can either use the tool to help human endeavour - by checking facts, finding citations, and so on - or have it used against us to take it away. Feeding LLM-generated text directly into Wikipedia will help bring about model collapse, and that would be a disaster. (Ironically, another valid use of LLMs would be to detect LLM editing of Wikipedia so it can be rooted out - but that's another discussion for another day.) — The Anome (talk) 14:07, 6 November 2025 (UTC)
- I guess there is no real policy precedent yet for an LLM "AI" bot. It all still falls under WP:CONTEXTBOT and normal need for consensus. I guess the closest we have is Cluebot machine learning stuff, where some false positives are deemed acceptable. But of course that's not creating content. Then again, in this case it's manually reviewed, so it's "just" assisted editing. So it's under WP:LLM (but even that hasn't moved beyond an essay). I suppose my main concern would be "who is going to verify that the editor isn't just mass-adding content without verifying?" In fact, how is the operator themselves verifying it? That has always been one of the main reasons for the bot policy to exist. No one can review all the millions of edits made by bots. And it's much more difficult when every edit is unique. When LLMs hallucinate, they're confidently wrong and it can be really hard to tell if they're wrong. But I can see how with some effort one can assemble an agent LLM that has the tools for feedback to validate its own work, i.e. searching the web or looking at its own addition "critically", and reducing errors greatly. Anyway, I'm just thinking out loud. Because this is very novel, I don't think we'll find out the broader community reaction until something like it goes live. Though I can definitely predict there will be a lot of both reasonable and reactionary division on this, so we might want to up the policy/guidelines sooner than later. I just don't know with what exactly... — HELLKNOWZ ∣ TALK 16:43, 6 November 2025 (UTC)
- I'm working my way through all this as I try to pin the proposal down to something that is supportable by community consensus. But I have thought of a way to deal with the problem of people just banging the bot's corrections in verbatim, and that's for the bot itself to re-visit article to check for over-literal non-trivial use of the bot's output in edits. Which can then in turn be reported for review. As I said earlier, writing the bot itself is the easy bit, it's engagement with the Wikipedia ecosystem of editors, culture, processes and rules that's the hard bit. — The Anome (talk) 17:10, 6 November 2025 (UTC)
- To copy over somewhat my comment at the pump, an automatic post-to-talkpage bot has its drawbacks, and may go the way of being mostly ignored like the EL bot. Such a tool would work best as an on-request tool, especially if it's only picking out 1 issue each run. CMD (talk) 08:40, 7 November 2025 (UTC)
- An on-request review-bot would seem like a good way to go. — The Anome (talk) 17:38, 7 November 2025 (UTC)
- By the way, I've searched for "EL bot", and can't find out what this refers to. Can you tell me? It sounds like something I should know. — The Anome (talk) 10:59, 9 November 2025 (UTC)
- The Anome, it's the bot that checked whether the external links in an article were dead, updated them to archive links, and then posted a talkpage notice about fixing them. E.g. Talk:Links (series)#External links modified 2. — Qwerfjkltalk 14:07, 9 November 2025 (UTC)