Jump to content

Wikipedia talk:Large language models/Archive 6

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by ClueBot III (talk | contribs) at 21:18, 20 August 2023 (Archiving 2 discussions from Wikipedia talk:Large language models. (BOT)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Archive 1Archive 4Archive 5Archive 6Archive 7

Secondary sources

Should we cite any of the factual claims about LLM's in this page to reliable sources? –LaundryPizza03 (d) 23:42, 24 June 2023 (UTC)

LLMs and UNDUE

Is it worth mentioning that LLMs also have trouble with WP:DUE? When asked to summarise content, they may produce a summary which places too much weight on certain details, or tries to summarise parts of the article which would normally be left out of the lead. I've been involved in some cleanup of LLM-generated lead sections, which often seem to be overly interested in a subjects relationship history or personal affairs, and less so in their career (for which they were actually notable). Mako001 (C)  (T)  🇺🇦 00:14, 23 July 2023 (UTC)

Yes, because this is afaik even bigger issue than inventing stuff from thin air as this is even more complex to detect. -- Zache (talk) 03:05, 23 July 2023 (UTC)

Non-compliant LLM text: remove or tag?

This draft currently recommends tagging non-compliant contributions with {{AI generated}}. It should recommend removal instead.

The tagging recommendation is incoherent with the user warn templates like {{uw-ai1}}. If LLM text were worth keeping, then why would we warn people who add it? There's no point in trying to fix non-compliant LLM text, which will either have no sources or likely-made-up sources. It's better to remove. Do LLMs write more accurate text than the deleted WP:DCGAR? I doubt it.

Let's try to keep this discussion organized: should the draft recommend removal, or tagging? Note that we're only talking about deleting raw LLM outputs added to existing articles. For deleting fully LLM-generated articles through WP:CSD, there's a current discussion elsewhere.

DFlhb (talk) 11:50, 3 June 2023 (UTC)

Friendly ping for editors who participated in a previous discussion on this: Novem Linguae, Barkeep49, Thryduulf. DFlhb (talk) 12:49, 3 June 2023 (UTC)
  • My opinion has not changed since the previous discussions - whether text is AI-generated or not is irrelevant (and impossible to determine reliably even if it was relevant). In all cases you should do with AI-generated content exactly what you would do with identical content generated by a human - i.e. ideally fix it. If you can't fix it then tag it with what needs fixing. If it can't be fixed then nominate it for deletion using the same process you would use if it was human-generated. Thryduulf (talk) 13:07, 3 June 2023 (UTC)
    Thanks, this logic is compelling. If we can't tell them apart then our response must be the same. In that case, do we need {{AI generated}}, or would the existing templates suffice? DFlhb (talk) 13:21, 3 June 2023 (UTC)
    I don't see any need for that template - it's speculation that even if correct doesn't contain anything useful that existing templates do not (and in at least most cases they also do it better). Also, even if we could reliably tell human and AI-generated content apart, our response should be identical in both cases anyway because the goal is always to produce well-written, verifiable encyclopaedic content. Thryduulf (talk) 13:36, 3 June 2023 (UTC)
  • I disagree that experienced editors can't figure out what is AI-generated and what is not. According to this template's transclusion count, it is used 108 times, which is good evidence that there are at least some folks who feel confident enough to spot AI-generated content. I definitely think that the wording of this template should recommend deletion rather than fixing. AI-generated content tends to be fluent-sounding but factually incorrect, sometimes complete with fake references. It reminds me a lot of a WP:CCI in terms of the level of effort to create the content (low) versus the level of effort to clean it up (high). Because of this ratio, I consider AI-generated content to be quite pernicious. –Novem Linguae (talk) 14:06, 3 June 2023 (UTC)
    In other words, it's a guess and the actual problem is not that it is written by an AI but that it tends to be factually incorrect. Why should AI-written content that is factually incorrect be treated differently to human-written content that is factually incorrect? Why does it matter which it is? Thryduulf (talk) 14:16, 3 June 2023 (UTC)
Because given it's structure it would take 5 times as much work to try to use the material (in the context of a Wikipedia article) than it would be to delete & replace it. North8000 (talk) 14:15, 3 June 2023 (UTC)
A good analogy is badly written, unstructured and undocumented software. 10 times less hours to nuke and replace than to reverse engineer and rebuild a herd of cats. North8000 (talk) 14:21, 3 June 2023 (UTC)
I'm not arguing that material in that state shouldn't be deleted, I'm arguing that whether it's in that state because it was written by AI or whether it is in that state because it was written by a human is irrelevant. Thryduulf (talk) 19:02, 3 June 2023 (UTC)
That's true in theory, but a big difference is that human-written content is usually presumed to be salvageable as long as the topic is notable while AI should be treated more like something written by a known hoaxer. –dlthewave 20:15, 3 June 2023 (UTC)
IMO the context of how it was generated is important in trying to figure out how to deal with it. For example, if you (Thryduulf) wrote "the sky is green" It would probably be worth the time to find out what you intended.....e.g. maybe in certain contexts / times. Or (knowing that there must have been some reason) take to see if there are instances when the sky is actually is green and build upon what they wrote. If the "sky is green" was built by a random word generator or typed by a chimpanzee, it would be silly to waste my time on such an effort. North8000 (talk) 21:56, 3 June 2023 (UTC)
@Dlthewave and @North8000 These require you to know whether text was generated by a human or by a LLM. There is no reliable way to do this, so what you are doing is considering the totality of the text and making a judgement about whether it is worth your (or someone else's) effort to spend time on it. The process and outcome are the same regardless of whether the content is human-written or machine-written, so it's irrelevant which it is. Thryduulf (talk) 22:02, 3 June 2023 (UTC)
We're really talking about two different things. I was answering: "presuming that one knew, should it be treated differently?" You are asserting one premise that it is impossible to know, and then based on that saying that that if that premise is true, then the question that I answered is moot. Sincerely, North8000 (talk) 01:29, 4 June 2023 (UTC)
I was explaining why the question is irrelevant - it isn't possible to know, so there is no point presuming. However, even if we were to somehow able to know, there is no reason to treat it differently because what matters is the quality (including writing style, verifiability, etc) not who wrote it. Thryduulf (talk) 09:28, 4 June 2023 (UTC)
Why wouldn't it be possible to know? I've seen several users make suspect contributions, those users were asked if they used an LLM, and they admitted they did; in cases where they admit it, we know for sure, we're not presuming.
I'm not convinced that users can never tell it's an LLM. There were several cases at ANI of users successfully detecting it, including a hilariously obvious instance from AfD. What I said below seems to work: if we identify it, delete; if we can't identify, by default we do what we normally do. DFlhb (talk) 10:53, 4 June 2023 (UTC)
Plus whenever the discussion gets more detailed I think it will almost inevitably come out. For example, let's say that there is a phrase in there that makes no sense and you ask the person who put it in "what did you mean by that?" Are they going to make up lies that they wrote something stupid in order to cover for the bot? Or blame the bot? .North8000 (talk) 13:43, 5 June 2023 (UTC)
Note that every transclusion of that template is on drafts, not articles, and those drafts are just tagged so no one wastes time working on them (because presumably MfD/CSD would fail).
I've just reviewed those drafts. They're all very blatantly promotional, and have all the hallmarks of LLM text: stilted writing, "In conclusion...". There's no identification problem there, and indeed we should delete that stuff, pointless to fix.
When it's easy to identify, we should delete, since it's basically spam. When it's not easy to identify, people won't come to this draft/policy for advice anyway, and they'll just do what they normally do. So I guess it's fine for this draft to recommend deletion for identif[ied] LLM-originated content that does not to comply with our core content policies. DFlhb (talk) 15:31, 3 June 2023 (UTC)
  • My thought is that although AI-generated content should generally be removed without question, it's not always black-and-white enough to delete on sight. Just as we have Template:Copyright violation and the more extreme Template:Copyvio, there are cases where it's unclear whether or to what extent AI was used. Maybe the editor wants to wait for a second opinion, circle back to deal with it later or keep the article tagged while it's going through AfD/CSD. –dlthewave 18:22, 3 June 2023 (UTC)
Good point. In essence saying that it LLM content should be removed but saying that there should be normal careful processes when there is a question, which is/will be often. North8000 (talk) 15:07, 6 June 2023 (UTC)
@Dlthewave: In my view, we should take as stern a stance on LLM as we do on SOCK. An inflexible, blanket "do not use an LLM" should be policy, because (1) LLMs have the potential to completely destroy the entire Wikipedia project by overwhelming it with massive volumes of WP:BOLLOCKS, (2) those who have to resort to LLMs most probably lack basic WP:COMPETENCE to build an encyclopedia, and (3) even good-faith LLM users would create thankless busywork for legitimate contributors who have to waste time and energy cleaning up after them (which we don't need any more of as it stands, look at the WP:AfC backlog!). If possible, as soon as a reliable LLM-detector is developed, it should be used similarly to CheckUser, and violators (who are confirmed beyond reasonable doubt) should be indef banned. Festucalextalk 20:31, 6 July 2023 (UTC)
Have any of you guys checked out jenni.ai. Supposedly it has a million users... though I don't believe that number. It has a nice feature where it will search academic literature matches. If we implement something similar perhaps we could let people plug in their own literature libraries... Talpedia 15:42, 24 July 2023 (UTC)

Zero-click AI-search concerns

Have you checked out perplexity.ai lately?

If you prompt it to "write 800+ words on" whatever "with multiple headings and bullet points", it creates an article on whatever, that looks a lot like a Wikipedia article.

Even general prompts (with just the subject name) return fairly detailed responses, usually with the information the user was looking for, often quoted from Wikipedia, greatly reducing the need to actually visit Wikipedia, or any other page for that matter.

My concern is that, as perplexity.ai and similar search engines gain in popularity and use, this may eventually have a noticeable reduction on organic (i.e., human) traffic to Wikipedia, which may in turn diminish the flow of new editors, causing Wikipedia to become more and more out of date.

Meanwhile, bot (crawler) traffic to Wikipedia may continue to increase, driving the total traffic figure upwards, thereby hiding an organic traffic reduction.

I'm interested in how this issue can be tracked.

So, my question is: "How can organic traffic on Wikipedia be measured?

I look forward to your replies.    — The Transhumanist   10:43, 2 August 2023 (UTC)

P.S.: And it now has picture support.