Wikipedia talk:Large language models/Archive 6
![]() | This is an archive of past discussions on Wikipedia:Large language models. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | ← | Archive 4 | Archive 5 | Archive 6 | Archive 7 |
Secondary sources
Should we cite any of the factual claims about LLM's in this page to reliable sources? –LaundryPizza03 (dc̄) 23:42, 24 June 2023 (UTC)
LLMs and UNDUE
Is it worth mentioning that LLMs also have trouble with WP:DUE? When asked to summarise content, they may produce a summary which places too much weight on certain details, or tries to summarise parts of the article which would normally be left out of the lead. I've been involved in some cleanup of LLM-generated lead sections, which often seem to be overly interested in a subjects relationship history or personal affairs, and less so in their career (for which they were actually notable). Mako001 (C) (T) 🇺🇦 00:14, 23 July 2023 (UTC)
- Yes, because this is afaik even bigger issue than inventing stuff from thin air as this is even more complex to detect. -- Zache (talk) 03:05, 23 July 2023 (UTC)
Non-compliant LLM text: remove or tag?
This draft currently recommends tagging non-compliant contributions with {{AI generated}}
. It should recommend removal instead.
The tagging recommendation is incoherent with the user warn templates like {{uw-ai1}}
. If LLM text were worth keeping, then why would we warn people who add it? There's no point in trying to fix non-compliant LLM text, which will either have no sources or likely-made-up sources. It's better to remove. Do LLMs write more accurate text than the deleted WP:DCGAR? I doubt it.
Let's try to keep this discussion organized: should the draft recommend removal, or tagging? Note that we're only talking about deleting raw LLM outputs added to existing articles. For deleting fully LLM-generated articles through WP:CSD, there's a current discussion elsewhere.
DFlhb (talk) 11:50, 3 June 2023 (UTC)
- Friendly ping for editors who participated in a previous discussion on this: Novem Linguae, Barkeep49, Thryduulf. DFlhb (talk) 12:49, 3 June 2023 (UTC)
- My opinion has not changed since the previous discussions - whether text is AI-generated or not is irrelevant (and impossible to determine reliably even if it was relevant). In all cases you should do with AI-generated content exactly what you would do with identical content generated by a human - i.e. ideally fix it. If you can't fix it then tag it with what needs fixing. If it can't be fixed then nominate it for deletion using the same process you would use if it was human-generated. Thryduulf (talk) 13:07, 3 June 2023 (UTC)
- Thanks, this logic is compelling. If we can't tell them apart then our response must be the same. In that case, do we need
{{AI generated}}
, or would the existing templates suffice? DFlhb (talk) 13:21, 3 June 2023 (UTC)- I don't see any need for that template - it's speculation that even if correct doesn't contain anything useful that existing templates do not (and in at least most cases they also do it better). Also, even if we could reliably tell human and AI-generated content apart, our response should be identical in both cases anyway because the goal is always to produce well-written, verifiable encyclopaedic content. Thryduulf (talk) 13:36, 3 June 2023 (UTC)
- Thanks, this logic is compelling. If we can't tell them apart then our response must be the same. In that case, do we need
- I disagree that experienced editors can't figure out what is AI-generated and what is not. According to this template's transclusion count, it is used 108 times, which is good evidence that there are at least some folks who feel confident enough to spot AI-generated content. I definitely think that the wording of this template should recommend deletion rather than fixing. AI-generated content tends to be fluent-sounding but factually incorrect, sometimes complete with fake references. It reminds me a lot of a WP:CCI in terms of the level of effort to create the content (low) versus the level of effort to clean it up (high). Because of this ratio, I consider AI-generated content to be quite pernicious. –Novem Linguae (talk) 14:06, 3 June 2023 (UTC)
- In other words, it's a guess and the actual problem is not that it is written by an AI but that it tends to be factually incorrect. Why should AI-written content that is factually incorrect be treated differently to human-written content that is factually incorrect? Why does it matter which it is? Thryduulf (talk) 14:16, 3 June 2023 (UTC)
- Because given it's structure it would take 5 times as much work to try to use the material (in the context of a Wikipedia article) than it would be to delete & replace it. North8000 (talk) 14:15, 3 June 2023 (UTC)
- A good analogy is badly written, unstructured and undocumented software. 10 times less hours to nuke and replace than to reverse engineer and rebuild a herd of cats. North8000 (talk) 14:21, 3 June 2023 (UTC)
- I'm not arguing that material in that state shouldn't be deleted, I'm arguing that whether it's in that state because it was written by AI or whether it is in that state because it was written by a human is irrelevant. Thryduulf (talk) 19:02, 3 June 2023 (UTC)
- That's true in theory, but a big difference is that human-written content is usually presumed to be salvageable as long as the topic is notable while AI should be treated more like something written by a known hoaxer. –dlthewave ☎ 20:15, 3 June 2023 (UTC)
- IMO the context of how it was generated is important in trying to figure out how to deal with it. For example, if you (Thryduulf) wrote "the sky is green" It would probably be worth the time to find out what you intended.....e.g. maybe in certain contexts / times. Or (knowing that there must have been some reason) take to see if there are instances when the sky is actually is green and build upon what they wrote. If the "sky is green" was built by a random word generator or typed by a chimpanzee, it would be silly to waste my time on such an effort. North8000 (talk) 21:56, 3 June 2023 (UTC)
- @Dlthewave and @North8000 These require you to know whether text was generated by a human or by a LLM. There is no reliable way to do this, so what you are doing is considering the totality of the text and making a judgement about whether it is worth your (or someone else's) effort to spend time on it. The process and outcome are the same regardless of whether the content is human-written or machine-written, so it's irrelevant which it is. Thryduulf (talk) 22:02, 3 June 2023 (UTC)
- We're really talking about two different things. I was answering: "presuming that one knew, should it be treated differently?" You are asserting one premise that it is impossible to know, and then based on that saying that that if that premise is true, then the question that I answered is moot. Sincerely, North8000 (talk) 01:29, 4 June 2023 (UTC)
- I was explaining why the question is irrelevant - it isn't possible to know, so there is no point presuming. However, even if we were to somehow able to know, there is no reason to treat it differently because what matters is the quality (including writing style, verifiability, etc) not who wrote it. Thryduulf (talk) 09:28, 4 June 2023 (UTC)
- Why wouldn't it be possible to know? I've seen several users make suspect contributions, those users were asked if they used an LLM, and they admitted they did; in cases where they admit it, we know for sure, we're not presuming.
- I'm not convinced that users can never tell it's an LLM. There were several cases at ANI of users successfully detecting it, including a hilariously obvious instance from AfD. What I said below seems to work: if we identify it, delete; if we can't identify, by default we do what we normally do. DFlhb (talk) 10:53, 4 June 2023 (UTC)
- Plus whenever the discussion gets more detailed I think it will almost inevitably come out. For example, let's say that there is a phrase in there that makes no sense and you ask the person who put it in "what did you mean by that?" Are they going to make up lies that they wrote something stupid in order to cover for the bot? Or blame the bot? .North8000 (talk) 13:43, 5 June 2023 (UTC)
- I was explaining why the question is irrelevant - it isn't possible to know, so there is no point presuming. However, even if we were to somehow able to know, there is no reason to treat it differently because what matters is the quality (including writing style, verifiability, etc) not who wrote it. Thryduulf (talk) 09:28, 4 June 2023 (UTC)
- We're really talking about two different things. I was answering: "presuming that one knew, should it be treated differently?" You are asserting one premise that it is impossible to know, and then based on that saying that that if that premise is true, then the question that I answered is moot. Sincerely, North8000 (talk) 01:29, 4 June 2023 (UTC)
- @Dlthewave and @North8000 These require you to know whether text was generated by a human or by a LLM. There is no reliable way to do this, so what you are doing is considering the totality of the text and making a judgement about whether it is worth your (or someone else's) effort to spend time on it. The process and outcome are the same regardless of whether the content is human-written or machine-written, so it's irrelevant which it is. Thryduulf (talk) 22:02, 3 June 2023 (UTC)
- I'm not arguing that material in that state shouldn't be deleted, I'm arguing that whether it's in that state because it was written by AI or whether it is in that state because it was written by a human is irrelevant. Thryduulf (talk) 19:02, 3 June 2023 (UTC)
- A good analogy is badly written, unstructured and undocumented software. 10 times less hours to nuke and replace than to reverse engineer and rebuild a herd of cats. North8000 (talk) 14:21, 3 June 2023 (UTC)
- Because given it's structure it would take 5 times as much work to try to use the material (in the context of a Wikipedia article) than it would be to delete & replace it. North8000 (talk) 14:15, 3 June 2023 (UTC)
- Note that every transclusion of that template is on drafts, not articles, and those drafts are just tagged so no one wastes time working on them (because presumably MfD/CSD would fail).
- I've just reviewed those drafts. They're all very blatantly promotional, and have all the hallmarks of LLM text: stilted writing,
"In conclusion..."
. There's no identification problem there, and indeed we should delete that stuff, pointless to fix. - When it's easy to identify, we should delete, since it's basically spam. When it's not easy to identify, people won't come to this draft/policy for advice anyway, and they'll just do what they normally do. So I guess it's fine for this draft to recommend deletion for
identif[ied] LLM-originated content that does not to comply with our core content policies
. DFlhb (talk) 15:31, 3 June 2023 (UTC)
- My thought is that although AI-generated content should generally be removed without question, it's not always black-and-white enough to delete on sight. Just as we have Template:Copyright violation and the more extreme Template:Copyvio, there are cases where it's unclear whether or to what extent AI was used. Maybe the editor wants to wait for a second opinion, circle back to deal with it later or keep the article tagged while it's going through AfD/CSD. –dlthewave ☎ 18:22, 3 June 2023 (UTC)
- Good point. In essence saying that it LLM content should be removed but saying that there should be normal careful processes when there is a question, which is/will be often. North8000 (talk) 15:07, 6 June 2023 (UTC)
- @Dlthewave: In my view, we should take as stern a stance on LLM as we do on SOCK. An inflexible, blanket "do not use an LLM" should be policy, because (1) LLMs have the potential to completely destroy the entire Wikipedia project by overwhelming it with massive volumes of WP:BOLLOCKS, (2) those who have to resort to LLMs most probably lack basic WP:COMPETENCE to build an encyclopedia, and (3) even good-faith LLM users would create thankless busywork for legitimate contributors who have to waste time and energy cleaning up after them (which we don't need any more of as it stands, look at the WP:AfC backlog!). If possible, as soon as a reliable LLM-detector is developed, it should be used similarly to CheckUser, and violators (who are confirmed beyond reasonable doubt) should be indef banned. 〜 Festucalex • talk 20:31, 6 July 2023 (UTC)
- Have any of you guys checked out jenni.ai. Supposedly it has a million users... though I don't believe that number. It has a nice feature where it will search academic literature matches. If we implement something similar perhaps we could let people plug in their own literature libraries... Talpedia 15:42, 24 July 2023 (UTC)
Zero-click AI-search concerns
Have you checked out perplexity.ai lately?
If you prompt it to "write 800+ words on" whatever "with multiple headings and bullet points", it creates an article on whatever, that looks a lot like a Wikipedia article.
Even general prompts (with just the subject name) return fairly detailed responses, usually with the information the user was looking for, often quoted from Wikipedia, greatly reducing the need to actually visit Wikipedia, or any other page for that matter.
My concern is that, as perplexity.ai and similar search engines gain in popularity and use, this may eventually have a noticeable reduction on organic (i.e., human) traffic to Wikipedia, which may in turn diminish the flow of new editors, causing Wikipedia to become more and more out of date.
Meanwhile, bot (crawler) traffic to Wikipedia may continue to increase, driving the total traffic figure upwards, thereby hiding an organic traffic reduction.
I'm interested in how this issue can be tracked.
So, my question is: "How can organic traffic on Wikipedia be measured?
I look forward to your replies. — The Transhumanist 10:43, 2 August 2023 (UTC)
P.S.: And it now has picture support.
Citogenesis concerns
One thing the proposal should probably mention is that most LLMs were trained on Wikipedia; this means that using them (even just running things past them for verification) risks WP:CITOGENESIS issues. -- Aquillion (talk) 15:36, 30 July 2023 (UTC)
- Absolutely. This is the thing that should be avoided. Kirill C1 (talk) 09:28, 2 August 2023 (UTC)
- And future LLMs and new editions of LLMs will be trained on Wikipedia, a Wikipedia that has in part been edited by LLMs, thus creating a feedback loop. That is, LLMs will be trained on their own edits, which could amplify errors and bias, and compound citogenesis. — The Transhumanist 10:43, 2 August 2023 (UTC)
- I had included this in the draft at some point but it was removed for whatever reason.—Alalch E. 22:31, 14 August 2023 (UTC)
perplexity.ai revisited
It's been awhile since we've gone over the capabilities of perplexity.ai.
It has been improving rapidly.
Pulling its responses mostly from the web pages in its search results, it can:
- Prevent most chatbot "hallucinations"
- Answer in natural language prose
- Write source code
- Write MediaWiki wiki text "in a code block" (when explicitly requested)
- Do calculations
- Conduct comparisons
- Answer in your preferred language
- Make lists
- Produce tables (rendered or not)
- Expand prose that you provide
- Summarize:
- Works
- Books
- Plays
- Movies
- Specific web pages
- News articles
- News coverage by a particular newspaper for a specific time frame
- Current events (such as the Ukraine War)
- Works
- Write from a particular viewpoint or style, as a:
- Blog entry
- News article
- Encyclopedia article
- Documentary script
- Commercial or ad
- Poem
- Resume
- Particular person, like "Albert Einstein" (though, sometimes it refuses, depending on how you word the prompt)
- Etc.
- Note: Each style/viewpoint request represents a different enough prompt that you get more detail on a topic the more styles/viewpoints you request via a sequence of prompts.
- And it now has picture support
In addition to processing search results in the above manners, it can access the general capabilities of its underlying LLMs and other tools, to process data that you provide it in your prompt. Such as:
- Convert a format
- Copy edit prose
- Translate a passage
- Etc.
As it answers in natural language, using it is more like reading web pages than conventional search engine results pages, and the better you get at using it, the longer you can go without dipping into an actual web page.
They recently got an influx of 25 million dollars, and have expanded their team, who are detailed on the site. This has accelerated the rate at which the tool is increasing in capability.
Some improvements just become available, without any announcements. For example, the maximum size of prompts and of its results have been going up. Results were limited to around 250 words. Then they went up to around 400. And are now over 800.
The answer buffer is actually larger than the maximum response it can present, so you can ask it to "tell me the rest", and it will continue where it left off.
After each response, potential follow-up questions are provided via multiple choice, sometimes about things that may not have occurred to you.
It also remembers previous prompts and has access to its previous responses (in the current session only), and therefore can discern the context of your subsequent prompts and follow your instructions on what to do with previous results, like redisplay them further refined per your instructions.
All in all, this is a general purpose web browsing and summarization tool, that is more powerful than conventional browsing and searching, because it zeroes in on exactly what you request it to and pulls that data from web pages for you to digest directly.
Essentially, it is a textual genie that obeys your commands and fulfills your wishes, as long as it can do so in the form of text and available images.
What it can do is limited mostly by your limitations in dreaming things up for it to do. It may surprise you.
On the darker side, sometimes you may inadvertantly prompt it to argue with you! When that happens, create a new session. ;)
I hope this explanation of the tool has helped.
Sincerely, — The Transhumanist 12:01, 2 August 2023 (UTC)
Radical proposition on ban of LLM
I propose an outright ban of Large language models. Are they reliable? No. Can they perform a simple task like summarising a plot correctly? No. I emphasise, correctly. How can we distinguish which users have good knowledge of them and can handle who can't? The rule will only work if it will be unambiguously prohibit use of such tools, which, let's be honest, are incompatible with Wikipedia rules. Kirill C1 (talk) 11:03, 25 July 2023 (UTC)
- @Кирилл С1: Although I want LLMs to be able to be used to improve the encyclopaedia, I too find myself often thinking that they would be better off prohibited. As has been pointed out by others, LLMs give the most plausible answer, not necessarily the most accurate. It's only when they fail to be plausible that the errors get detected. I still don't quite know if I'd support an outright LLM ban, since it would remain hard to enforce, but then again, the only other option is an evil bit-type solution. Mako001 (C) (T) 🇺🇦 07:07, 30 July 2023 (UTC)
- We already had various similar discussions in the past here. One problem with a general ban is that it excludes way too much. For example, some popular spellcheckers like Grammarly are based on LLMs. And autocompletion functions while typing can be based on LLMs. Autocompletion functions for single words are very common for mobile users when entering text. LLMs have many applications and these are just a few examples. I assume it was not your intention to make a general ban in this sense. LLM technology is also more and more implemented into regular word processing software, like Microsoft Office.
- Many of the formulations in our current draft reflect exactly this issue: it currently only bans certain types of uses, like
Do not publish content on Wikipedia obtained by asking LLMs to write original content or generate references
. This is probably the more fruitful approach. - As a side-note: this draft is a draft of a policy. Policies reflect a very wide consensus among editors and are not considered "radical", as the title of your post states. Phlsph7 (talk) 13:44, 30 July 2023 (UTC)
- This has been brought up several times and did not garner support. Also, how would "large" language model be defined? Large language models are not ok, but small ones are ok (seems backward)? Is OCR assisted with a language model acceptable? Are predictive text, word completion, autocomplete, grammar checking etc. permissible?
- Seems like an almost neo-luddite reaction to something that is just unavoidable. —DIYeditor (talk) 16:36, 30 July 2023 (UTC)
- @Кирилл С1: I'm trying to collect examples of poor summarization by LLMs, and have a different experience with them than yours. Can you please give me some examples of failed summaries you've encountered? Sandizer (talk) 16:42, 17 August 2023 (UTC)
I'm pretty strongly opposed to LLM use in most if not all cases, but an outright ban is a non-starter for several reasons. First, previous discussions have shown that it's unlikely to achieve consensus. Second, such a rule would be challeged outright every time an editor comes up with a Great Idea that they think can be accomplished with LLMs. If we ban all use by default but also have a process where specific uses can be approved, we'll be protected from misuse but folks will also have a path they can follow if they think they can use the technology productively. Perhaps we could start out allowing spelling/grammar checkers and the like. –dlthewave ☎ 15:21, 30 July 2023 (UTC)
Based on discussions seen on the village pumps and other locations, I think there is a possibility of reaching a consensus on disallowing the use of programs to generate text, including text generated based on human prompts, thus disallowing copy-editing existing text. I agree that a ban on a specific technology is unlikely due to its many uses, but I also think focusing on technology is too limiting. I think the base principle of having text essentially written by human authors is what many editors will support. isaacl (talk) 16:11, 30 July 2023 (UTC)
- I agree that having a general ban on a technology with a variety of uses is not a good idea. Regarding your suggestion: I assume you want to allow LLM usage for spellcheckers. Allowing this while banning copyediting will be a difficult line to draw. Grammarly also includes some basic copyediting functions and would, presumably, also be banned in this case. Phlsph7 (talk) 14:02, 2 August 2023 (UTC)
- As I discussed at Wikipedia talk:Large language models/Archive 5 § Focus on types of uses, I consider spellcheckers and grammar checkers to be analysis tools/features, rather than text generators. Yes, with more and more software integrating text generation features (such as the ones integrated with Microsoft Word), text generated by these features would be prohibited under this principle. isaacl (talk) 14:14, 2 August 2023 (UTC)
US copyright law in the news
There was a recent ruling about AI-generated images in the US. Here are some of the news articles:
- https://www.jpost.com/business-and-innovation/all-news/article-755483
- https://www.theverge.com/2023/8/19/23838458/ai-generated-art-no-copyright-district-court
- https://www.hollywoodreporter.com/business/business-news/ai-works-not-copyrightable-studios-1235570316/
WhatamIdoing (talk) 19:39, 20 August 2023 (UTC)
- "absent any guiding human hand" is an insult to the hundreds of thousands of artists whose work was used to train the algorithm. On the other hand I don't want the copyright owned by the non-artist data engineer/compilation manager. Sandizer (talk) 01:31, 21 August 2023 (UTC)
- I think the point is that there is no human guiding the AI to decide which artists' work to emulate, or how to go about emulating them. (I don't know enough about this particular AI system to know what kinds of datasets it uses.) WhatamIdoing (talk) 03:59, 21 August 2023 (UTC)
Promoting to policy
Since the discussion has largely died down, I think it would be best to hold a RfC to see if it is ready to be promoted into policy. What do y'all think? Ca talk to me! 12:17, 26 August 2023 (UTC)
- I like that idea! Llightex (talk) 14:00, 26 August 2023 (UTC)
Problems with basic guidelines 5 & 8
For the basic guidelines, I think we need to change the following points:
- 5. You must denote that an LLM was used in the edit summary.
- 8. Do not use LLMs to write your talk page or edit summary comments.
The reason is the following: mobile users often use autocompletion features, which are usually enabled by default. Autocompletion features are sometimes based on LLMs. The two guidelines would mean that the affected mobile users would have to declare LLM-use in almost every edit and would not be able to write edit summaries or post comments on talk pages. It would basically keep them from any editing since they can't even write the edit summaries to declare their LLM use. Phlsph7 (talk) 08:41, 28 August 2023 (UTC)
- As previously discussed, this is why I think focusing on technology is the wrong approach. Users have no idea about the specific technology being used by their spellcheck/word suggestion tools, and no one objects to these tools being used to assist editing. I think it would be more effective for any new guideline or policy to address specific use cases for which truly new guidance is required, versus just repeating sections of other guidelines or policies. Providing additional guidance for other guidelines or policies in the context of specific situations can be provided with explanatory essays. isaacl (talk) 16:24, 28 August 2023 (UTC)
- That's an important observation that users are often not aware of the underlying technology they are using. Focusing on specific use cases could solve that problem. But it could be difficult to provide general rules this way, like our basic guidelines. Phlsph7 (talk) 07:20, 29 August 2023 (UTC)
- I think it will be easier to describe general principles when looking at uses, rather than technology. For example, similar to your changes to the nutshell, there could be consensus that programs should not be used to generate text submitted to Wikipedia. isaacl (talk) 16:18, 29 August 2023 (UTC)
- I don't think this is an either-or decision: we can focus on both technology and uses. One quick fix for the problem at hand that implements your idea would be to slightly restrict what we mean by LLM for the purpose of this draft. Currently, it contains the passage
LLMs power many applications, such as AI chatbots and AI search engines. They are used for a growing number of features in common applications, such as word processors, spreadsheets, etc. In this policy, the terms "LLM" and "LLM output" refer to all such programs and applications and their outputs.
we could change it to something likeWhile LLMs power applications with many different functions, this policy covers primarily the use of chatbots and similar external tools used to create and alter text.
- I'm not sure if "chatbots and similar external tools used to create and alter text" is the best formulation to characterize those tools. The term "external" is meant to exclude applications running in the background without the user knowing it. Maybe a footnote could be added. The formulation is intentionally vague and reflects our own ignorance of what those present and future tools might be. It would cover ChatGPT while at the same time excluding mobile autocompletion features.
- I don't know what the prospects of a "technology-free" version of this draft would be since it currently focuses a lot on LLMs. It would probably require extensive revisions to most parts. Phlsph7 (talk) 17:13, 29 August 2023 (UTC)
- I feel the technology is irrelevant to what many editors and readers want: text that is essentially written by humans, not programs. It doesn't matter how any programs being used to assist are coded. isaacl (talk) 17:16, 29 August 2023 (UTC)
- One issue would be that many of the problems discussed here concern primarily LLMs, like hallucinations. Another issue is practical: it would be a lot of work to implement this idea. We might have to start a new draft from scratch. Phlsph7 (talk) 17:26, 29 August 2023 (UTC)
- Yeah, this is my deal, basically. jp×g 05:50, 1 September 2023 (UTC)
- Also: it could well be that we encounter similar problems when trying to describe exactly which uses are ok and which ones aren't. Phlsph7 (talk) 17:28, 29 August 2023 (UTC)
- It doesn't matter why programs may create incorrect facts, or the technology that leads to it. Specific explanations putting matters into context for a specific technology can still exist in explanatory essays. Yes, changes along these lines would require, as previously discussed multiple times, stripping down this proposal to a more barebones version. isaacl (talk) 17:32, 29 August 2023 (UTC)
- re "text that is essentially written by humans, not programs.": Wikipedia:Bots have been a thing since essentially forever. The same goes for tools that assist with editing and patrolling that are human supervised Wikipedia:Tools/Editing_tools. I think the sane thing is that LLMs should be human-supervised for now. --Kim Bruning (talk) 23:09, 29 August 2023 (UTC) though, seeing the progress we've seen in the past year, who knows what the technology will be capable of next year.
- Bots perform specific copy edits; they don't generate new text. If, though, there were a consensus that new text written by programs was acceptable, then again I think a policy or guideline should state this general principle, without referring to the underlying technology, which is subject to rapid change. isaacl (talk) 00:44, 30 August 2023 (UTC)
- This draft is focused on LLMs and even gets its name from them. For a draft that focuses primarily on different forms of uses without particular regard to the underlying technology, it would probably be best to start something new rather than to try to adjust this one. Phlsph7 (talk) 17:26, 30 August 2023 (UTC)
- I'm not suggesting that this page shouldn't exist. I'm only discussing why I feel it would be better to have a policy or guideline based on more general principles, rather than anchoring it to a specific technology, and thus don't support having this page as a policy or guideline. isaacl (talk) 17:31, 30 August 2023 (UTC)
- This draft is focused on LLMs and even gets its name from them. For a draft that focuses primarily on different forms of uses without particular regard to the underlying technology, it would probably be best to start something new rather than to try to adjust this one. Phlsph7 (talk) 17:26, 30 August 2023 (UTC)
- Bots perform specific copy edits; they don't generate new text. If, though, there were a consensus that new text written by programs was acceptable, then again I think a policy or guideline should state this general principle, without referring to the underlying technology, which is subject to rapid change. isaacl (talk) 00:44, 30 August 2023 (UTC)
- One issue would be that many of the problems discussed here concern primarily LLMs, like hallucinations. Another issue is practical: it would be a lot of work to implement this idea. We might have to start a new draft from scratch. Phlsph7 (talk) 17:26, 29 August 2023 (UTC)
- I feel the technology is irrelevant to what many editors and readers want: text that is essentially written by humans, not programs. It doesn't matter how any programs being used to assist are coded. isaacl (talk) 17:16, 29 August 2023 (UTC)
- I don't think this is an either-or decision: we can focus on both technology and uses. One quick fix for the problem at hand that implements your idea would be to slightly restrict what we mean by LLM for the purpose of this draft. Currently, it contains the passage
- I think it will be easier to describe general principles when looking at uses, rather than technology. For example, similar to your changes to the nutshell, there could be consensus that programs should not be used to generate text submitted to Wikipedia. isaacl (talk) 16:18, 29 August 2023 (UTC)
- That's an important observation that users are often not aware of the underlying technology they are using. Focusing on specific use cases could solve that problem. But it could be difficult to provide general rules this way, like our basic guidelines. Phlsph7 (talk) 07:20, 29 August 2023 (UTC)
How to deal with bogus AI citations
A colleague at the place where I work pointed me toward a citation in the "Pileated woodpecker" article which seems to be bogus ("Woodpecker excavations promote tree decay and carbon storage in an old forest"); this citation was added by Filippetr2 back in March. After doing a quick search for it and realizing that the DOI was misassigned and the title did not yield any hits, I removed it. We suspect this was an AI-generated citation. What is the policy for dealing with this sort of thing?--Gen. Quon[Talk] 14:08, 28 August 2023 (UTC)
- At least for citations using cite tags with a DOI or ISBN, we could probably have a bot that checks that those are valid and that the other parts of the cite matches the information the DOI / ISBN points to. Obviously a degree of fuzziness would be needed, and it couldn't just remove them because of that, but it could flag or tag cites with clear-cut issues for review (ie. totally invalid DOI / ISBN, or the data that those point to don't match at all.) That would help with this and would be nice-to-have in general; AI-generated citations would tend to use that format and wouldn't have valid values. Of course, it wouldn't help against a deliberate attacker (who would just remove the DOI / ISBN) but it would help catch people who just don't know any better or other casual problems. --Aquillion (talk) 18:39, 28 August 2023 (UTC)
Policy versus this
It was said a few months ago -- I don't remember if by me or someone else, but I do remember agreeing with it -- that making an RfC for this whole thing to be made a policy was probably going to be a difficult process and its adoption was unlikely, since this page is extremely long and contains a lot of extremely detailed stuff. That is to say, there are about a dozen separate things that could each be subject to their own entire giant page-filling multiple-option RfC. It doesn't seem to me like many people are just going to agree to all of them being instituted as-is with no modification. jp×g 19:48, 29 August 2023 (UTC)
- Perhaps you are referring to Wikipedia talk:Large language models/Archive 5 § Circling back to getting this into a presentable state, or the earlier discussion, Wikipedia talk:Large language models/Archive 4 § Major trim. For better or worse, only a few of the participants on this page have shown an interest in having a slimmer policy. isaacl (talk) 00:56, 30 August 2023 (UTC)
Adding content when one does not know where it comes from.
Adding unverified content generated by a LLM is basically adding content when one does not know where it came from. When we add content from personal knowledge or personal research we take personal responsibility for adding that content, whether it is true, verifiable, misleading, made up or whatever, and the existing guidance and policy covers that. LLMs are just another source of text which may be verifiable, nonsense, misleading, or even occasionally true. The biggest difference is that LLMs are prolific compared to normal humans. The existing Wikipedia policies and guidance are generally effective for content added without having to specify how the editor got the content, because the editor is responsible for their edits. We accept that there is variability in interpretation, some people misunderstand or misrepresent the sources, sometimes the effort to avoid copyright infringement or condense the content distorts the information. Sometimes we just make mistakes. Professional writers also have these problems, and hope that their editors will find the mistakes. We rely on our fellow Wikipedians to edit our work. These are things that make good content creation difficult, and why not everyone is suited to content creation. We deal with it. LLMs just make the same problems more common and on a potentially larger scale. Competence is required, Wikipedia may be the encyclopedia that anyone can edit, but only as long as they follow the rules and are acceptably competent in the type of work they choose to do, and are able to learn and adapt to the environment, and develop into useful members of the community. If they fail to comply with the terms of use or do useful work, they get thrown out. Using content generated by an LLM without checking the product first is like putting a gun to your own head without checking if it is loaded. It is conclusive evidence of incompetence. Cheers, · · · Peter Southwood (talk): 05:47, 31 August 2023 (UTC)
- I like that paragraph and think it is true. jp×g 02:04, 2 September 2023 (UTC)
- I agree that this is an excellent argument against allowing such content at all, and that it does fall under WP:CIR. These programs, and that is all they are, are not competent to write an encyclopedia based on verifiable information. That should be our policy. The fact that people will try it anyway is an invalid argument, as that would also apply to policies like WP:SOCK and WP:PAID. Beeblebrox (talk) 20:23, 6 September 2023 (UTC)