User:Tiggerjay/LLM

This essay is in development.

It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one of Wikipedia's policies or guidelines, as it has not been thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints—especially since this page is still under construction.

This is a draft based on my recent interactions with LLM. There was a recent discussion over at Wikipedia:Village pump (policy) § LLM/chatbot comments in discussions that I participated in. Here is a consolidation of my opinions, as well as some next steps from my perspective:

The Problem

The use of generative text using AI models such as Chat GPT are increasing in 2024, as they become more widely available for little or no cost. At the same time, there is still a significant problem with AI slop which broadly describes the problem with AI generation that is blatantly obviously wrong to real people. Unfortunately people continue to put their trust in these tools to provide accurate information, which might sound accurate but is, in fact, incorrect. This is often seen by the fact that while it might quote policy, it often misapplies the policies, or understands them in novel ways that are not generally used by experienced editors. However, because the argument (to the uninformed) sounds convincing, the new editor is mislead into believing the generated statements.

Often conversations between experienced editors and those using LLM's can be challenging at best because often it seem like the responses are out of context with the greater overarching discussion.

Identifying LLM

The following factors are what are commonly seen in LLM responses. While a single aspect is not definite proof, the more of these included in a response, the greater the likelihood:

Written using a formal letter style with an addressee, body, and closing salutation
Excessive pleadings of apology and statements about wanting to learn and improve
The references to policy while missing the proper wikilink, so it will appear as WP:PRIMARY instead of WP:PRIMARY.
They show an uncommon understanding of policy
The response appears to only address the immediately prior question and seems to lack the context of the entire conversation
A lack of repeating the same statement used earlier (e.g. many real people dig-in on their understanding)
Perfect spelling and grammar
Multiple paragraphs with double-line breaks between paragraphs.

Community Discussions (from admin closing remarks

There is a strong consensus that comments that do not represent an actual person's thoughts are not useful in discussions. Thus, if a comment is written entirely by an LLM, it is (in principle) not appropriate. The main topic of debate was the enforceability of this principle. Opinions vary on the reliability of GPTZero, and I would say there is a rough consensus against any form of AI detection that relies solely on it or other automated tools. Overall, however, I see a consensus that it is within admins' and closers' discretion to discount, strike, or collapse obvious use of generative LLMs or similar AI technologies. This is a WP:DUCK matter, and as with other WP:DUCK matters, there is not going to be a blackletter definition of "obvious", but I think we can all agree there are some comments that could only ever be LLM-generated. As with other matters of discretion, like behavioral sockpuppetry blocks, experienced users can apply their best judgment, subject to community review.
The word "generative" is very, very important here, though. This consensus does not apply to comments where the reasoning is the editor's own, but an LLM has been used to refine their meaning. Editors who are non-fluent speakers, or have developmental or learning disabilities, are welcome to edit here as long as they can follow our policies and guidelines; this consensus should not be taken to deny them the option of using assistive technologies to improve their comments. In practice, this sets a good lower bound for obviousness, as any comment that could conceivably be LLM-assisted is, by definition, not obviously LLM-generated.
Regarding comments that are more borderline in LLM likelihood, it's worth reviewing what's already allowed by policy and guidelines: LLM-written comments will usually add little of substance to a discussion, and closers are already expected to ignore unhelpful discussions. If comments contain fabrications of fact, that is a blockable form of disruptive editing, whether or not the fabrication is by a human or a hallucinating chatbot. And while WP:TPO says disruptive comments "are usually best left as-is or archived", there is a "usually" in there, so removing patently disruptive content is within editors' discretion, whether or not LLM usage is unambiguous.

— Tamzin at Wikipedia:Village pump (policy) § LLM/chatbot comments in discussions

My position

I think the nuance required here in part is the difference between someone using any automated tool for assistance, versus true bot-like behavior. I believe that unauthorized bot behavior is already prohibited, which should help address the concerns that we mere humans cannot keep up with LLM bots. I agree, we cannot, but I don't see much of that. I am also not inclined to the "if you cannot write, you cannot contribute"... I can imagine 15 years ago some of us might have made the same statement about spelling and grammar; if you cannot spell properly without auto-correct you have no right to edit an encyclopedia. The are a significant number of very intelligent people who are afflicted with things like dyslexia, Aspergers, etc. who have been contributing using various technology tools for assistance. How many of us have Grammarly or similar running on our web browser? And beyond that tools and what they're called will continue to evolve. I am very much against banning LLM use; largely because it can become an unnecessary witch hunt. There are people who will use the tools constructively, and those who will not. I can see some places where it should probably be banned (such as using a LLM to determine consensus on a discussion that needs closing (AfD, RM, etc)). But even in those areas, I think many of our existing policies and guidelines already address most of the actual concerns we're seeing when it comes to that activity. It is likely that as long as people are being held accountable for how they use the tools, then who cares what the tool is called in 2000, 2020 or 2040?

Proposed Solutions (What we do about LLM)

Consider refinement to WP:BOTP so that we're encapsulating LLM type bot behavior, as well as some sort of threshold on "non-human" capable editing limits (perhaps as part of WP:MEATBOT;
Make a policy or guidelines very clear, bright line, that a user will be treated the same regardless of what tools they use, LLM or otherwise, and that disruptive editing will be handled accordingly.
1. perhaps a single-warning template reflective of such, to welcome people who appear to be using LLM, and that they are responsible for their adherence to policy and that LLMs tend to get policy wrong.

See alow

User:Tiggerjay/All are welcome