Wikipedia talk:Large language models
![]() | Relevant ongoing/recent discussions (January 28, 2023):
|
I like this draft!
I like this draft! It's thoughtful and strikes a good balance. Great work. Llightex (talk) 03:39, 14 December 2022 (UTC)
Feedback / Notes
- I think this is a good starting draft. In addition to this text, in very concise terms, perhaps as a box of some form -- the dos and don'ts should be mentioned. I see some text at User:JPxG/LLM_demonstration that can be pulled-in imo. Ktin (talk) 03:55, 16 January 2023 (UTC)
- Also, I think adding a tag of some form in the talk page indicating that the page or a subset of the page was created using AI generated output based on LLM models should be considered. Ktin (talk) 20:15, 16 January 2023 (UTC)
- See below. The T&C for OpenAI imply to me (IANAL) that each edit that introduces LLM content will require an edit summary saying it has done so. Mike Turnbull (talk) 11:48, 17 January 2023 (UTC)
- Yes, at the minimum we should be doing that. But, in addition, in the talk page we should be adding a template saying that a significant chunk (threshold to be decided) of the article has been generated by an AI. Today, if there are articles generated during certain edit-a-thons or through the Wikischools project we already add a template to the talk page. We should do something similar here as well. Ktin (talk) 19:39, 21 January 2023 (UTC)
- See below. The T&C for OpenAI imply to me (IANAL) that each edit that introduces LLM content will require an edit summary saying it has done so. Mike Turnbull (talk) 11:48, 17 January 2023 (UTC)
- Wouldn't it be nice if we keep track of the model(version)/prompt/(seed) of the generated text as reference. This could generate a curated set of fact — Preceding unsigned comment added by Derpfake (talk • contribs) 21:34, 8 February 2023 (UTC)
- @Derpfake Many of these models are deliberately set up so that some prompts give different answers if used repeatedly: they have an element of randomness with respect to a given seed text. In that respect, they behave differently from most algorithms and there is no "set of facts" one could rely on. Mike Turnbull (talk) 21:56, 8 February 2023 (UTC)
- That is why this seed is part of the reference. And that is why I call it curated. Underneath there does not need to be a consistent body of knowledge, but the output itself might be wort citing (like with any human ;)) Derpfake (talk) 22:04, 8 February 2023 (UTC)
- @Derpfake Many of these models are deliberately set up so that some prompts give different answers if used repeatedly: they have an element of randomness with respect to a given seed text. In that respect, they behave differently from most algorithms and there is no "set of facts" one could rely on. Mike Turnbull (talk) 21:56, 8 February 2023 (UTC)
Copyrights
The "Copyrights" section, as currently written, provides no useful, or at least no usable, information. What exactly is the copyright status of texts produced by LLMs? Do there exist sources treating this question, and if yes, is there a reasonable degree of consensus among them? Nsk92 (talk) 18:12, 16 January 2023 (UTC)
- @Nsk92, I believe they are fine to use, i.e. the person who gave the prompt holds copyright, unless of course the LLM is regurgitating material from elsewhere. — Qwerfjkltalk 18:21, 16 January 2023 (UTC)
- What is your opinion based on? I did a little bit of google searching and the question appears to be murky. E.g. this article from Forbes[1] quotes a lawyer, Margaret Esquenet, saying: "under U.S. current law, an AI-created work is likely either (1) a public domain work immediately upon creation and without a copyright owner capable of asserting rights or (2) a derivative work of the materials the AI tool was exposed to during training." Nsk92 (talk) 20:08, 16 January 2023 (UTC)
- See c:Template:PD-algorithm; that template's perspective is that it is PD in the US, where the servers are hosted. 🐶 EpicPupper (he/him | talk) 02:55, 17 January 2023 (UTC)
- The OpenAI FAQ at this URL says "
OpenAI will not claim copyright over content generated by the API for you or your end users. Please see our Terms of Use for additional details
." These say "As between the parties and to the extent permitted by applicable law, you own all Input, and subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output.
" The T&C's only other reference to copyright is to provide guidance for anyone who believes that their existing copyright has been infringed by the output generated. - It is also relevant that the T&C say "
You may not...(v) represent that output from the Services was human-generated when it is not;
" That means, I think, that any Wikipedia editor who has used a LLM to generate content for an edit must include an edit summary saying that they have done so. That's going to stretch our WP:AGF guidance, I think. Mike Turnbull (talk) 11:43, 17 January 2023 (UTC)- Thanks, Alalch_E. for adding this stipulation to the specific guidelines. Mike Turnbull (talk) 18:01, 18 January 2023 (UTC)
- What is your opinion based on? I did a little bit of google searching and the question appears to be murky. E.g. this article from Forbes[1] quotes a lawyer, Margaret Esquenet, saying: "under U.S. current law, an AI-created work is likely either (1) a public domain work immediately upon creation and without a copyright owner capable of asserting rights or (2) a derivative work of the materials the AI tool was exposed to during training." Nsk92 (talk) 20:08, 16 January 2023 (UTC)
I have added a section on the relationship between LLMs and copyright, adapted from posts I've made elsewhere. I think the key points are that, while it is not the case that all LLM output is copyrighted, the potential for unfree content exists. Like with human writers, if you ask them (intentionally or unintentionally) to write copyrighted content, they are capable of doing so, and this is something to watch out for. Some more specific guidance should be given, although I am busy today and cannot be arsed to write it right now. jp×g 21:38, 22 January 2023 (UTC)
- Just noting here (maybe you didn't see): I had also added something about copyright today: Special:Diff/1135127144 —Alalch E. 21:46, 22 January 2023 (UTC)
- If the legal situation is unclear due to lack of precedent, then it is impossible to say whether LLM-derived text can be in compliance with our basic policies or not. That seems rather undesirable, to put it mildly. XOR'easter (talk) 18:19, 23 January 2023 (UTC)
Devin Stone ("LegalEagle" on YouTube) released a video on this topic, but the only clear answer is on whether or not a computer program can hold a copyright (US courts have ruled that a human must hold copyright). The answers to (a) does the output have sufficient creativity to qualify for copyright? and (b) is the output a derivative work of the training text? remain unclear until there are court cases or new laws are passed to establish guidance. (The video also discusses the legal issues of using copyrighted training text.) isaacl (talk) 17:55, 27 January 2023 (UTC)
@Nsk92:A lot of the issues were covered in the Village Pump (policy) discussion on chatbots. Here's that section again, for convenience, as the discussion will likely get archived and removed from VPP soon. As to what the copyright status of Chat-GPT output is, well, that was inconclusive. — The Transhumanist 09:23, 30 January 2023 (UTC)
- @The Transhumanist: Can you just link to it (and update the link to the VPP archive when it's moved)? 24,000 bytes of text is a gigantic amount to paste into here. jp×g 09:58, 30 January 2023 (UTC)
- No problem. See below... — The Transhumanist 10:21, 30 January 2023 (UTC)
Copyrights discussion concerning chatbot output, at Village pump (policy)
This subject has been discussed extensively at Wikipedia:Village pump (policy)#Copyright status, identifying many issues. — The Transhumanist 10:13, 30 January 2023 (UTC)
The policy draft should provide full disclosure concerning copyrights and an editor's assertion when pressing "Publish page" or "Publish changes"
The policy draft appears to advocate the addition of chatbot output to Wikipedia:
LLM output should be used only by competent editors who review pasted LLM output before pressing "publish page".
This appears to include generated text (the main use of chatbots). But, we don't know what the inherent copyrights of such output are. Nobody does, yet.
Only the owners of a creative work can place it under open license. When editors edit Wikipedia and press the "Publish page" or "Publish changes" button, they are in essence asserting that they own the copyrights to the material they have typed in or inserted (except for content that falls under fair use) and agree to place it under open license. But, in the case of chatbot output, they don't know if they own the copyrights or not, because the legal issue is up in the air. Can they legally license material that they don't even know that they own?
And since nobody knows if they own the output, should we be making a policy that directs, or appears to direct, editors to add material generated by a chatbot?
I'm for going for it, and having the policy direct exactly that with full disclosure.
The problem is, that the policy doesn't make the issue plainly clear that the "copyrights of chatbot-generated text are uncertain, and, when you press the "Publish page" or "Publish changes" button, you are declaring that you have the right to place the material under open license".
Some editors will be comfortable with doing that, and others won't. But, they should all understand the issue before pressing "Publish page" or "Publish changes".
And that goes for the community as well. Because, the draft will eventually be going before the community for approval, and it should be very clear, with no hidden or obfuscated issues. — The Transhumanist 09:31, 30 January 2023 (UTC)
- I disagree that anything is obfuscated. The draft makes clear the following: Does LLM output inherently violate copyright law -- no (that is what is asserted, see the example with Apple and iPhone photos); Is attribution to the LLM provider needed -- unlikely; Is LLM output capable of violating copyright law -- yes. This is more clear than saying "it's uncertain". —Alalch E. 10:19, 30 January 2023 (UTC)
- @Alalch E.: If an issue isn't in there that should be, then it is a hidden issue. The policy does not yet address the ownership of chatbot output copyrights and the assertion editors make when they press the "Publish page" or "Publish changes" button. Do editors have the right to place chatbot output under open license? Because, that's what they are asserting when they press the button. It should be made clear that that isn't clear. — The Transhumanist 10:58, 30 January 2023 (UTC)
The problem seems to be that the copyright status is currently not clear. This is problematic for our policy because we can neither ban nor endorse it based on copyright grounds. One solution would be to put the responsibility on the editor, maybe by using something like the following text:
By adding the text to Wikipedia, it is published under the Creative Commons license and the GNU Free Documentation License. It is the editor's responsibility to familiarize themself both with the copyright and sharing policies of their AI-provider as well as the general legislation on the copyright status of AI-generated texts to ensure that their addition does not infringe anyone's copyrights. Any text found to be in violation of someone's copyright will be removed.
At this stage, making big contributions based on AI-generated texts is a little like playing with fire. Depending on how these issues are eventually resolved, many contributions may have to be undone later. Phlsph7 (talk) 11:11, 30 January 2023 (UTC)
- If the machine outputted a verbatim copy or a close paraphrase of a non-free text, or created a derivative work such as an abridgement, then just publishing such content infringes copyrights. If the machine outputted something which cannot infringe on anyone's copyright, then it's something that no one is currently able to assert copyright to, neither the Wikipedia "author" nor the entity which owns the model; but they can try to require attribution, which they can't enforce legally, only de facto by controlling access within their domain. But if the Wikipedia editor substantially originated the content in the sense that they had a idea about what they want to create, and assisted themselves with the machine, adjusting and reviewing the output to fit their idea, they probably own the rights. So either no one owns the rights or the editor owns the rights (of all the parts which are not a copyright violation). —Alalch E. 11:36, 30 January 2023 (UTC)
- You seem to be saying that no one owns the copyright, the editor owns the copyright, or someone else does. This statement seems uncontroversial to me. Do you think we should warn editors of the third possibility? Phlsph7 (talk) 12:33, 30 January 2023 (UTC)
- This is currently on the page: "If you want to import text that you have found elsewhere or that you have co-authored with others (including LLMs), you can only do so if it is available under terms that are compatible with the CC BY-SA license. ... Apart from the a possibility that saving an LLM output may cause verbatim non-free content to be carried over to the article, these models can produce derivative works. For example, an LLM can rephrase a copyrighted text using fewer, the same, or more words than the original – editors should mind the distinction between a summary and an abridgement."I'm unable to make up my mind at the moment about if adopting your text would make it better. —Alalch E. 12:43, 30 January 2023 (UTC)
- Thanks for quoting the passage. Unfortunately, it is not very explicit on what all of this means in terms of copyright violations, for example, in relation to derivative works or to minding "the distinction between a summary and an abridgement". In its current form, some parts of the copyright section read more like an essay than like a policy. The main point of my suggestion is relatively simple: put the responsibility on the editor and make it clear to them that this is a complex issue and there could be problems. Phlsph7 (talk) 13:41, 30 January 2023 (UTC)
- This is currently on the page: "If you want to import text that you have found elsewhere or that you have co-authored with others (including LLMs), you can only do so if it is available under terms that are compatible with the CC BY-SA license. ... Apart from the a possibility that saving an LLM output may cause verbatim non-free content to be carried over to the article, these models can produce derivative works. For example, an LLM can rephrase a copyrighted text using fewer, the same, or more words than the original – editors should mind the distinction between a summary and an abridgement."I'm unable to make up my mind at the moment about if adopting your text would make it better. —Alalch E. 12:43, 30 January 2023 (UTC)
- You seem to be saying that no one owns the copyright, the editor owns the copyright, or someone else does. This statement seems uncontroversial to me. Do you think we should warn editors of the third possibility? Phlsph7 (talk) 12:33, 30 January 2023 (UTC)
- Regarding the Wikipedia community, it's the transparency of the "playing with fire" issue that concerns me. The policy draft should be clear on this risk to Wikipedia, so that when it goes before the community for their approval, they will be able to take that risk into consideration in making their decision. As Phlsph7 put it: "At this stage, making big contributions based on AI-generated texts is a little like playing with fire. Depending on how these issues are eventually resolved, many contributions may have to be undone later." — The Transhumanist 02:41, 31 January 2023 (UTC)
- It seems to me that The Transhumanist and I are on the same page that the policy should be more explicit on potential copyright problems. The current section on copyright starts with "Publishing LLM output on Wikipedia does not inherently violate copyright law". I take this to mean that: "not every single LLM output is a copyright violation". This seems correct, but the focus is clearly wrong. We should not reassure editors that some outputs do not violate copyright but warn them that some outputs may violate copyright and that it can be difficult to assess. I suggest the following as a replacement for our current copyright section:
AI-generated texts are a rapidly evolving field and it is not yet fully understood whether their copyright status is compatible with the CC BY-SA license and the GNU license used for text published on Wikipedia. Editors should use extreme caution when adding significant portions of AI-generated texts, either verbatim or user-revised. It is their responsibility to ensure that their addition does not infringe anyone's copyrights. They have to familiarize themselves both with the copyright and sharing policies of their AI-provider as well as the general legislation on the copyright status of AI-generated texts. Any addition found to be in violation of someone's copyright will be removed.
- Such a warning seems to be in tune with other points raised in the current draft, which prohibits copy-pasting LLM output directly and warns against adding copy-edited texts. Phlsph7 (talk) 07:49, 31 January 2023 (UTC)
Potential widespread copyright violations and ongoing lawsuits
- @Alalch E.: In response to this edit: I don't think it's clear at this stage that LLM outputs are public domain.
- From [2]:
There’s no issue around personal use of ChatGPT as a conversational assistant. And the rules around using ChatGPT to generate term papers seem pretty clear (don’t even think about it). But when it comes to applying AI-generated prose in content intended for wider distribution — say marketing materials, white papers, or even articles — the legalities get a little murky. When it comes to intellectual property, the model for ChatGPT “is trained on a corpus of created works and it is still unclear what the legal precedent may be for reuse of this content, if it was derived from the intellectual property of others,” according to Bern Elliot, analyst at Gartner.
- Or from [3]:
According to the ICML, the rise of publicly accessible AI language models like ChatGPT — a general purpose AI chatbot that launched on the web last November — represents an “exciting” development that nevertheless comes with “unanticipated consequences [and] unanswered questions.” The ICML says these include questions about who owns the output of such systems (they are trained on public data, which is usually collected without consent and sometimes regurgitate this information verbatim) and whether text and images generated by AI should be “considered novel or mere derivatives of existing work.”
- From [2]:
- See also Wikipedia:Village_pump_(policy)#Copyright_status and the ongoing lawsuits mentioned here. Because of these uncertainties, I think it would be a good idea to mention that it's not yet clear what the copyright status is and whether it's compatible with Wikipedia. Phlsph7 (talk) 18:44, 1 February 2023 (UTC)
- From the same Forbes article:
As a result of the human authorship standard, “under U.S. current law, an AI-created work is likely either (1) a public domain work immediately upon creation and without a copyright owner capable of asserting rights or (2) a derivative work of the materials the AI tool was exposed to during training,
. So in absensce of evidence to the contrary, it's public domain or a derivative work, or a more blatant copyright violation (something proven to be possible but not mentioned by the quoted expert). We only want edits that are not and do not contain derivative works and that don't copy verbatim from or closely paraphrase sources to the extent that it's a copyright violation. It doesn't say that publishing the output inherently violates copyright becaue the output belongs to, say, OpenAI. I completely agree with JPxG's detailed analysis of this which is the exact reasoning that underpins the current wording (Special:PermanentLink/1135137436). —Alalch E. 19:28, 1 February 2023 (UTC) - For example you can feed a relatively short copyrighted article into ChatGPT and tell it to reword it using twice as many words while not changing the meaning. That would definitely be a derivative work, and simply posting that on Wikipedia and doing nothing else would definitely violate copyrights. —Alalch E. 19:31, 1 February 2023 (UTC)
- @Alalch E.: Have you considered the possibility that the issue of derivative works does not just mean "excessively close paraphrase" for a few individual outputs but is a really widespread issue since all the outputs are based on the training set and are therefore "derived" from it? To me it seems that's what the articles and some of the lawsuits are about. I don't think that JPxG's analysis addresses this point. But it could be compatible with JPxG's claim that not every single output is a copyright violation (for example, because not every work in the training set is under copyright). Phlsph7 (talk) 20:36, 1 February 2023 (UTC)
- @Phlsph7: I have considered this possibility, and so did JPxG, very much so, when he wrote
A derivative work is not a "derivation of works". It's a derivative of a work. If you take ideas and information from multiple works and synthesize a new work, as long as it is not an assembly of derivative (or copied) works side by side, as in a compilation, but a relatively homogeneous amalgam, that's a new work. Otherwise nothing would be an original work. —Alalch E. 20:45, 1 February 2023 (UTC)Whether artificial neural networks are capable of producing original intellectual output is less of a legal issue and more of a philosophical/anthropological one. It should be noted that human brains are themselves neural networks; much has been said, in a variety of fields, on the subject of whether humans create original works versus whether they merely juxtapose or recombine motifs and concepts that they're exposed to through participation in society. While interesting (and humbling), these discussions are unrelated to whether neural networks which have been exposed to copyrighted material in the course of their existence are capable of later creating original works under the purview of intellectual property law: they are (and if this were not the case, a large majority of creative work would be illegal -- good luck finding a band where none of the musicians have ever heard a copyrighted song before).
- @Alalch E.: Thanks for your explanation based on the distinction between synthesis and compilation and also for taking to time to look up JPxG's analogy between humans and LLMs. However, I'm not sure that your argument is successful. In the case of AI-generated images and code based on training sets, I think it's not successful. From [4]:
The artists — Sarah Andersen, Kelly McKernan, and Karla Ortiz — allege that these organizations have infringed the rights of “millions of artists” by training their AI tools on five billion images scraped from the web “without the consent of the original artists.” ... Butterick and Saveri are currently suing Microsoft, GitHub, and OpenAI in a similar case involving the AI programming model CoPilot, which is trained on lines of code collected from the web. ... Whether or not these systems infringe on copyright law is a complicated question which experts say will need to be settled in the courts.
- The general point from this and the other articles seems to be: if an AI learns from a training set containing copyrighted works then it could be the case that it violates those copyrights (even if it synthesizes them instead of reproducing them in a superficially changed form). The underlying issue seems to concern whether training AI on copyrighted works falls under fair use:
The creators of AI art tools generally argue that the training of this software on copyrighted data is covered (in the US at least) by fair use doctrine.
This issue is also discussed at [5] for Copilot and a direct comparison is drawn to openAI:Microsoft and OpenAI are far from alone in scraping copyrighted material from the web to train AI systems for profit. Many text-to-image AI, like the open-source program Stable Diffusion, were created in exactly the same way. The firms behind these programs insist that their use of this data is covered in the US by fair use doctrine. But legal experts say this is far from settled law and that litigation like Butterick’s class action lawsuit could upend the tenuously defined status quo.
The fact that there are several ongoing lawsuits means that this is not some distant maybe but a real possibility. I'm sorry if my previous explanation in terms of "deriving" works was confusing. I hope this makes the issue clearer. Phlsph7 (talk) 03:58, 2 February 2023 (UTC)
- @Alalch E.: Thanks for your explanation based on the distinction between synthesis and compilation and also for taking to time to look up JPxG's analogy between humans and LLMs. However, I'm not sure that your argument is successful. In the case of AI-generated images and code based on training sets, I think it's not successful. From [4]:
- @Phlsph7: I have considered this possibility, and so did JPxG, very much so, when he wrote
- @Alalch E.: Have you considered the possibility that the issue of derivative works does not just mean "excessively close paraphrase" for a few individual outputs but is a really widespread issue since all the outputs are based on the training set and are therefore "derived" from it? To me it seems that's what the articles and some of the lawsuits are about. I don't think that JPxG's analysis addresses this point. But it could be compatible with JPxG's claim that not every single output is a copyright violation (for example, because not every work in the training set is under copyright). Phlsph7 (talk) 20:36, 1 February 2023 (UTC)
- From the same Forbes article:
- @Alalch E.: In response to this edit: I don't think it's clear at this stage that LLM outputs are public domain.
Summary of who owns chatbot output copyrights
Let me see if I have this straight. When a chatbot produces output, ownership is as follows:
A) If the chatbot output includes a copy of someone else's work, the copyright of that portion belongs to that 3rd party.
B) If the chatbot output includes a derivative work of someone else's work, the copyright of that portion belongs to that 3rd party.
C) If the chatbot output or a portion thereof is not a copy or derivative work, its copyright ownership is not legally established. The possibilities include being:
- 1) Part of the public domain, as works originated by non-human entities such as animals (See Monkey selfie copyright dispute).
- 2) Owned by the chatbot owner. Publishers of software applications generally do not own the output of their products, such as word processors and spreadsheets, and the applications are treated as tools in this respect. The difference here is that chatbots are much more autonomous, and so, the issue is unclear. OpenAI has assigned all its claims to rights over its chatbot output to the user, thus removing themselves from the debate.
- 3) Owned by the user. Creative input is required to own output, but is a prompt enough input to be considered creative input? How about a series of prompts? Even this is uncertain, legally.
- 4) Owned by nobody. I don't see how that can be the case. Wouldn't that default to the public domain?
Please correct, expand upon, and clarify the above, as necessary. Thank you, — The Transhumanist 02:28, 31 January 2023 (UTC)
- Just wanted to add that for B, derivative work is a very limited definition (e.g. close paraphrasing). This wouldn't apply to most content where models "synthesize" texts and create a new one.
- As well, for C, 4 indeed doesn't exist (would be public domain). Wikimedia's current perspective is that computer-generated text is in the public domain (see Commons template). This WIPO article says that the US, Australia, and EU considers such texts PD. Hong Kong SAR (China), India, Ireland, New Zealand, and the UK assign copyright to the programmer. The US would largely be the most relevant jurisdiction though, considering it's where the servers are hosted. If you'd like, WMF Legal could be contacted for a preliminary opinion (if not already). You've also noted the OpenAI position, which is relevant for all jurisdictions.
- Thanks for your Village Pump posts, which have produced extensive, useful community input!
- Best, EpicPupper (talk) 03:19, 31 January 2023 (UTC)
- Maybe you're seeing some semantic difference, but I wouldn't say "owned by nobody" doesn't exist; that is by definition public domain. As I mentioned earlier, there needs to be sufficient creativity in a given work in order for it to qualify for copyright. It's unknown at this point how the United States courts will interpret existing copyright law regarding when a program's output can be considered to meet the required standard. I don't think there's a need to seek legal advice at this point, because the case law or regulated standards aren't there yet. isaacl (talk) 04:25, 31 January 2023 (UTC)
- @EpicPupper: I wouldn't be so sure about neglecting B. From [6]:
As a result of the human authorship standard, "under U.S. current law, an AI-created work is likely either (1) a public domain work immediately upon creation and without a copyright owner capable of asserting rights or (2) a derivative work of the materials the AI tool was exposed to during training"
. In relation to AI-generated images based on training sets containing copyrighted images, there are already lawsuits on these grounds, see [7] and [8]. See also [9] for a lawsuit against GitHub Copilot claiming they violated open-source licenses. Copilot is an LLM for creating code. Phlsph7 (talk) 07:36, 31 January 2023 (UTC)- Thanks, it is worth nothing the changing legal landscape. I’ll note that your link mentions the
level of similarity between any particular work in the training set and the AI work
as a factor for consideration; presumably, then, texts very different from training data (not close paraphrasing) might not be considered derivative works, in the same sense that me reading a book and writing an article on its topic isn’t one. EpicPupper (talk) 16:23, 31 January 2023 (UTC)
- Thanks, it is worth nothing the changing legal landscape. I’ll note that your link mentions the
BERT
I removed BERT from the page with the justification that it can't generate text; this isn't technically true, as it can, it's just exceedingly difficult and produces very poor results, especially with regards to wikitext. 🐶 EpicPupper (he/him | talk) 03:11, 17 January 2023 (UTC)
Reference verifiability +
One thing that occasionally happens for human editors and I think LLMs are at higher risk of: Including a reference for a statement that seems relevant and appropriate, but upon reading doesn't not support the statement being made. These sorts of errors are quite hard to spot since it requires checking the cited source and comparing the sentence being supported. The guidelines currently includes the direrction to ccite sources that "directly support the material being presented", but it might be worth being more explicit by reminding the user to read all refs they're citing (roughly equivalent to if a lecturer had a student write the article and uploaded it on their behalf)? T.Shafee(Evo&Evo)talk 06:59, 17 January 2023 (UTC)
- I agree. In my experience LLMs are *really* bad at providing references, mostly just making them up or using inappropriate ones. Alaexis¿question? 08:06, 17 January 2023 (UTC)
- Yep. One more way in which the effort needed to fix LLM text so that it might be halfway acceptable here is no easier than writing the text oneself in the first place. XOR'easter (talk) 18:01, 23 January 2023 (UTC)
Editor skill level
I like the way this is coming together. The only thing that jumps out at me is the section that starts with "LLM output may only be used by editors who have a relatively high level of skill"
which doesn't quite jive with the way we do things on the "encyclopedia that anyone can edit". Yes, competence is required, but it's not the norm to restrict a task to editors of a certain skill level unless permissions are involved. It's also unclear how this would be enforced. Is there a way this could be worded as more of a suggestion that gets the point across? –dlthewave ☎ 20:26, 20 January 2023 (UTC)
- @Dlthewave: This is from more than a week ago, but yeah, your suggestion was acted upon, and the wording has since been improved. The idea was/is that "Editors should have substantial prior experience doing the same or a more advanced task without LLM assistance". Sincerely—Alalch E. 21:54, 29 January 2023 (UTC)
Feedback
I think our policy on LLMs should be consistent with use of machine translation, which also has the potential to generate huge amounts of dubious-quality text. I agree with this page and think it could become half-explanatory supplement and half-new policy/guideline. I would support a new rule requiring edit summaries to note whenever LLM output text is introduced to any page on Wikipedia. However, I think copyright is a bigger question mark and if we are to permit LLM output then we should offer guidance as to how a user is supposed to check that the text is not a copyright violation.
Note that the content translation tool requires XC rights, so it's not unprecedented that we would limit LLM use to volunteers who are somehow "experienced". — Bilorv (talk) 12:44, 21 January 2023 (UTC)
Attribution
OpenAI's Sharing & Publication Policy asks users to "Indicate that the content is AI-generated in a way no user could reasonably miss or misunderstand."
This is a good practice for all AI content and may be worth adding to our policy.
To that end, mentioning AI in the edit summary is insufficient. I would suggest a template similar to Template:Source-attribution with wording similar to OpenAI's recommendation:
"The author generated this text in part with GPT-3, OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication."
I'm not well versed in creating templates, so it would be great if someone could take this on. –dlthewave ☎ 14:00, 22 January 2023 (UTC)
- @Dlthewave: I have created Template:GPT-3. There is no inline functionality as with Template:Source-attribution because it would not suit the purpose. —Alalch E. 21:21, 22 January 2023 (UTC)
- @JPxG: What do you think about this template? —Alalch E. 21:37, 22 January 2023 (UTC)
- I think it looks good, although I am not quite sure where it would go. I remember seeing some very old template along the same lines, from many years ago, to indicate that an article used text from one of the old public-domain Britannicas. Maybe we could do something like that. As regards the actual template formatting, it would probably be better to have a generic title that took parameters for models (and version/settings information as optional parameters), like {{LLM content|GPT-J-6B|version=Romeo Alpha Heptagon|temperature=0.74|prompt="write a wikipedia article in the style of stephen glass, 4k, octane render, f5.6, iso 400, featured on artstation"}} or something. We could have different boilerplate for different models, and provide the model settings as a refnote or something. At least that is how I would do it if I weren't phoneposting. jp×g 00:08, 23 January 2023 (UTC)
- JPxG, that's exactly what I had in mind: Parameters for the AI model with the prompt in a note or reference. Should we make it a point to mention "artificial intelligence" which might be more recognizable than LLM? The template could go either at the top of the article or beginning of the section if only part of it is AI generated, I've seen it done this way with public domain US government sources and it seems to work well. –dlthewave ☎ 00:28, 23 January 2023 (UTC)
- I was thinking something similar. This should go to the talk page of the article. imo. We today have templates that indicate that a page was worked on as a part of a edit-a-thon or as a part of the WP:WPSCHOOLS project. Ktin (talk) 00:11, 27 January 2023 (UTC)
- I think it looks good, although I am not quite sure where it would go. I remember seeing some very old template along the same lines, from many years ago, to indicate that an article used text from one of the old public-domain Britannicas. Maybe we could do something like that. As regards the actual template formatting, it would probably be better to have a generic title that took parameters for models (and version/settings information as optional parameters), like {{LLM content|GPT-J-6B|version=Romeo Alpha Heptagon|temperature=0.74|prompt="write a wikipedia article in the style of stephen glass, 4k, octane render, f5.6, iso 400, featured on artstation"}} or something. We could have different boilerplate for different models, and provide the model settings as a refnote or something. At least that is how I would do it if I weren't phoneposting. jp×g 00:08, 23 January 2023 (UTC)
- @JPxG: What do you think about this template? —Alalch E. 21:37, 22 January 2023 (UTC)
- We wouldn't need an attribution template if we did the right thing and forbade sweeping the output of a bullshit generation machine into an encyclopedia. XOR'easter (talk) 17:27, 23 January 2023 (UTC)
- It will be harder to deal with the problem if a blanket ban is instituted; the good thing about attribution and declaring in an edit summary is that we can identify such edits. —Alalch E. 17:39, 23 January 2023 (UTC)
- The only ethical response to an edit that is attributed to an LLM is reversion. XOR'easter (talk) 17:55, 23 January 2023 (UTC)
- If an edit is marked as LLM-assisted, and we see an experienced editor making it – for example, to add a missing lead, and the edit seems entirely fine at a glance, the situation in that instance would indeed be fine. If we were to see an inexperienced editor doing the same, we would revert. —Alalch E. 19:36, 23 January 2023 (UTC)
- Sorry, but I can't agree that that "would indeed be fine". Writing a good lede takes care. Verifying that the output from Son of ELIZA was actually edited into something that suits the purpose can't be done "at a glance". (This presumes that using a text which is a derivative of the LLM output is actually legal, which has yet to be established.) And who decides when an editor is "experienced"? For that matter, why does "experience" matter here at all? NPOV violations don't become acceptable just because an editor has been around the block. Nor do original synthesis, turning biographies into attack pages... XOR'easter (talk) 20:39, 23 January 2023 (UTC)
- I think it's legal at least for OpenAI's models based on everything that was said in the ongoing discussions, and what's currently in the draft. The issue is more whether attribution is really necessary. The issue of an LLM output creating a potential copyright violation by copying something from a source too verbatim is the usual issue of copyvio, and different from whether mere use of an LLM as something someone owns is inherently problematic for copyright. In the example of generating a summary of what's already in a Wikipedia article there is no meaningful risk of copying non-free content. When I say "experienced editor", just imagine someone whom you would consider an experienced editor in a positive sense (not "experienced" at consistently making subpar edits). Such an editor would use an LLM to generate a summary of the article to speed himself up and would, of course, make the needed manual adjustments before publishing. He would be required to mark the edit as LLM-assisted nevertheless, which he would do. It would be relatively easy for others to check if the newly added lead captures the most important content points. Just sticking to this example, but the implications are broader. —Alalch E. 21:30, 23 January 2023 (UTC)
When I say "experienced editor", just imagine someone whom you would consider an experienced editor in a positive sense (not "experienced" at consistently making subpar edits). Such an editor would use an LLM to generate a summary of the article...
I cannot imagine any of the experienced editors I know using an LLM to summarize an article at all. The experienced editors in the corner of the project that I inhabit start their comments on the topic by saying things like,Beyond the propensity of these things to just make shit up...
[10]. XOR'easter (talk) 23:29, 23 January 2023 (UTC)- I respect your principled stance but, pragmatically speaking, we need to settle on some hypthetical justifiable application of an LLM just to give ourselves headroom not to implement a blanket ban because a blanket ban is not compatible with editors declaring, and we want them to declare so we can locate such edits. It's inconstent to ask them to declare if it's entirely banned. Merely declaring would not by itself mean that the edits are fine and not disruptive. Kind of similar to paid editing. —Alalch E. 00:45, 24 January 2023 (UTC)
- Taking a hard line now is the pragmatic move. If we allow editors to declare, then all the undeclared instances will slip by even more easily, because looking for a declaration will become a shortcut to evaluating acceptability, and we will be sending the signal that piping a bullshit machine into an encyclopedia is, in principle, fine, as long as you're polite about it. XOR'easter (talk) 15:40, 24 January 2023 (UTC)
- I respect your principled stance but, pragmatically speaking, we need to settle on some hypthetical justifiable application of an LLM just to give ourselves headroom not to implement a blanket ban because a blanket ban is not compatible with editors declaring, and we want them to declare so we can locate such edits. It's inconstent to ask them to declare if it's entirely banned. Merely declaring would not by itself mean that the edits are fine and not disruptive. Kind of similar to paid editing. —Alalch E. 00:45, 24 January 2023 (UTC)
- I think it's legal at least for OpenAI's models based on everything that was said in the ongoing discussions, and what's currently in the draft. The issue is more whether attribution is really necessary. The issue of an LLM output creating a potential copyright violation by copying something from a source too verbatim is the usual issue of copyvio, and different from whether mere use of an LLM as something someone owns is inherently problematic for copyright. In the example of generating a summary of what's already in a Wikipedia article there is no meaningful risk of copying non-free content. When I say "experienced editor", just imagine someone whom you would consider an experienced editor in a positive sense (not "experienced" at consistently making subpar edits). Such an editor would use an LLM to generate a summary of the article to speed himself up and would, of course, make the needed manual adjustments before publishing. He would be required to mark the edit as LLM-assisted nevertheless, which he would do. It would be relatively easy for others to check if the newly added lead captures the most important content points. Just sticking to this example, but the implications are broader. —Alalch E. 21:30, 23 January 2023 (UTC)
- Sorry, but I can't agree that that "would indeed be fine". Writing a good lede takes care. Verifying that the output from Son of ELIZA was actually edited into something that suits the purpose can't be done "at a glance". (This presumes that using a text which is a derivative of the LLM output is actually legal, which has yet to be established.) And who decides when an editor is "experienced"? For that matter, why does "experience" matter here at all? NPOV violations don't become acceptable just because an editor has been around the block. Nor do original synthesis, turning biographies into attack pages... XOR'easter (talk) 20:39, 23 January 2023 (UTC)
- If an edit is marked as LLM-assisted, and we see an experienced editor making it – for example, to add a missing lead, and the edit seems entirely fine at a glance, the situation in that instance would indeed be fine. If we were to see an inexperienced editor doing the same, we would revert. —Alalch E. 19:36, 23 January 2023 (UTC)
- The only ethical response to an edit that is attributed to an LLM is reversion. XOR'easter (talk) 17:55, 23 January 2023 (UTC)
- It will be harder to deal with the problem if a blanket ban is instituted; the good thing about attribution and declaring in an edit summary is that we can identify such edits. —Alalch E. 17:39, 23 January 2023 (UTC)
@XOR'easter: I see that you are of a strong opinion on this, although I am inclined to disagree with the main thrust of it. The example you linked was a tweet where someone deliberately instructed it to give false output. I don't think this really demonstrates anything, other than "if you type words into a computer, the screen will say those words, and they do not magically become true by being on the screen". This seems like a fairly pedestrian observation which is tautologically true in almost any circumstance. I could type "2+2=5" onto this very talk page, and there it would be: a completely wrong statement. But who cares? I could also call up my friend and say "listen, Bob, I don't have time to explain: I need you to say that two plus two is five". Does this demonstrate that phones are untrustworthy? Well, sure: somebody speaking words into a phone doesn't make them true. But it is nonetheless possible for words to come out of a phone and have some utility or value. jp×g 10:12, 25 January 2023 (UTC)
- That's one example of many where the output is bad. ChatGPT has repeatedly been shown to invent fake sources, for example. It breaks the chain of attribution. A website full of LLM output would necessarily be an unreliable source; copying text from such a site into Wikipedia would be plagiarism of an unreliable source, and even "fixing" the text before insertion would be highly dubious on legal grounds. Like writing an article by closely paraphrasing an advertisement, it's poor conduct on multiple levels. If one thinks that an LLM is "just a tool", then demanding people disclose their use could arguably be an assumption of bad faith, as noted above. One could argue that it is a requirement for editors to belittle themselves. Do we require editors to disclose whether they ran a spellcheck before saving? Whether they composed text in Emacs or vi before adding it here? Whether they got the creative juices flowing with a shot of Jack Daniels? If an LLM is just a writing aid, then there's not even a slippery slope here: requiring disclosure of one is exactly on the same level as requiring disclosure of any of these other things. If, on the other hand, they are more problematic than other tools, then considering the reasons they are so, is this draft guideline adequately stringent? Regarding OpenAI products specifically, their terms and conditions require that
The published content is attributed to your name or company
, which would mandate in-text attribution of Wikipedia content to Wikipedia editors, which is just not how we do things. XOR'easter (talk) 14:27, 25 January 2023 (UTC)- @XOR'easter: Okay, I understand what you are talking about a little better now. I think I agree with most of this. Personally, I would be happy (and I suspect you might also be happy) if, regardless of whatever other policies, we had something like this:
If you use a language model to generate an entire article in one go, including the inline citations and the reference titles, and then paste it directly into a redlink, [something bad happens]
. I don't know what "something bad" means in this circumstance: maybe the article is a candidate for speedy deletion, maybe it is instantly draftified and put at the back of the AFC queue, maybe the user is blocked unless they can demonstrate that the claims are true and that the references are real. I do think it's important to stop people from doing this, at least with currently available models. What do you think of it? jp×g 23:29, 25 January 2023 (UTC)- Should be speedied as more work to verify than it’s worth. Kind of a WP:TNT situation. I’d like a G category for CSD for this. — rsjaffe 🗣️ 01:05, 26 January 2023 (UTC)
- @Rsjaffe: I've added some language to this effect (i.e. raw LLM output pasted into the article box can be speedied as G3) -- let me know if it's prominent enough. I am thinking maybe it should be moved up further? jp×g 05:42, 26 January 2023 (UTC)
- Should be speedied as more work to verify than it’s worth. Kind of a WP:TNT situation. I’d like a G category for CSD for this. — rsjaffe 🗣️ 01:05, 26 January 2023 (UTC)
- XOR'easter: Someone actually experimented along the lines of my writing-the-lead thought experiment: User:DraconicDark/ChatGPT. The prompts could have been a lot better I believe. Honestly I see some potential here. —Alalch E. 01:31, 26 January 2023 (UTC)
- @XOR'easter: Okay, I understand what you are talking about a little better now. I think I agree with most of this. Personally, I would be happy (and I suspect you might also be happy) if, regardless of whatever other policies, we had something like this:
Burden of Verification
As a new page patroller and a person who has dug up a bunch of articles using LLM generation, I have a serious issue with using LLM for article text. Since LLMs "make up stuff" frequently, each statement needs to be verified. This is a new burden for patrollers and others trying to verify things, as we tend to "test verify", checking to make sure references are reasonably connected to the article. It takes a long time to completely verify each statement.
Secondly, some of these llm articles are using off-line references. As far as I know, every LLM is being trained on electronic sources.
Thirdly, the confabulation llms engage in make it easy for an editor to believe that a statement it makes must be true.
Therefore, my proposal is that "Every statement in LLM-generated text must be validated by footnoting to an online reference that is readily-available to reviewers. Statements not verifiable in an online reference may be deleted and cannot be re-added without a verifiable reference." — rsjaffe 🗣️ 16:59, 25 January 2023 (UTC)
- I think this is mostly covered by WP:V, although I agree that someone typing "write a Wikipedia article about ______" into a prompt window will produce output that ranges from "completely useless" to "actively harmful to the project". I will write something like this into the page. jp×g 22:48, 25 January 2023 (UTC)
A better title is needed
If this is going to become a policy or guideline, we are going to need a better title than 'large language models', which is a technical term that relatively few people will be familiar with, and is entirely non-descriptive. 'AI-generated content' would be clearer, though maybe someone can come up with something better. AndyTheGrump (talk) 23:02, 25 January 2023 (UTC)
- Agree completely. —DIYeditor (talk) 10:37, 26 January 2023 (UTC)
- @AndyTheGrump: The considerations here are partly for the sake of precision -- "AI" is an extremely nebulous buzzword that can (in both theory and practice) mean basically anything. This goes from fully unsupervised learning with neural networks to a bunch of if-then statements... I wrote an AI for a video game once in about 10 bytes of code, where it would detect your position and change the enemy's acceleration to match, and I'll be darned if it didn't look like it was being piloted by a real player.
- Along with the ambiguity of the buzzword comes a good deal of philosophical and political baggage, as well. We may recall in the news a few months ago there was a big kerfluffle about whether language models were "sentient" (whatever that means -- I don't think anybody involved with this was reading Descartes, much to their detriment, and everybody else's). I don't think we need to be taking a side on this issue, or at least not yet.
- Lastly, the term "AI" is used within the field mostly as a catch-all term and for marketing purposes, like "space-age" or "cutting-edge" or "advanced": note that OpenAI's name has "AI" in it, but the research publications refer to "generative pre-trained transformers" as a type of "language model". jp×g 12:32, 26 January 2023 (UTC)
- We need a title that describes the subject matter for non-technical Wikipedia contributors. 'Large language models' doesn't. And if people were ever to use 'bunch of if-then-statements AI' in article creation (which would seem unlikely) the proposed guideline/policy ought to cover that too. It really doesn't matter what the algorithms being used are, the objections to their use still apply. The issue is the output, not the algorithm. AndyTheGrump (talk) 12:52, 26 January 2023 (UTC)
- My suggestion for a title would be "Computer-assisted text generation" but I don't think it matters very much as we can create lots of other shortcuts to the guidance, which will be in WP space. We already have WP:AI, which is about something entirely different and WP:Artificial intelligence, for example. The latter was recently created and duplicates part of what is now being drafted. Mike Turnbull (talk) 13:01, 26 January 2023 (UTC)
- I like that and would suggest that WP:BOT and WP:CHATBOT also point to it. — rsjaffe 🗣️ 20:21, 26 January 2023 (UTC)
- But the first shortcut already covers the actual Wikipedia bots, which are helpful for the most part. WP:AIGENERATED would be a good shortcut. 2001:448A:304F:52BA:B834:10F7:8013:7F11 (talk) 00:11, 27 January 2023 (UTC)
- "Computer-assisted text generation" includes what I just typed now, considering the spell check and possible auto-complete or auto-correct at play. That said, maybe we should include that under the umbrella of this policy/guideline. —DIYeditor (talk) 22:43, 27 January 2023 (UTC)
- I like that and would suggest that WP:BOT and WP:CHATBOT also point to it. — rsjaffe 🗣️ 20:21, 26 January 2023 (UTC)
- My suggestion for a title would be "Computer-assisted text generation" but I don't think it matters very much as we can create lots of other shortcuts to the guidance, which will be in WP space. We already have WP:AI, which is about something entirely different and WP:Artificial intelligence, for example. The latter was recently created and duplicates part of what is now being drafted. Mike Turnbull (talk) 13:01, 26 January 2023 (UTC)
- We need a title that describes the subject matter for non-technical Wikipedia contributors. 'Large language models' doesn't. And if people were ever to use 'bunch of if-then-statements AI' in article creation (which would seem unlikely) the proposed guideline/policy ought to cover that too. It really doesn't matter what the algorithms being used are, the objections to their use still apply. The issue is the output, not the algorithm. AndyTheGrump (talk) 12:52, 26 January 2023 (UTC)
What benefit is sanctioning any use of AI?
How does it benefit Wikipedia to permit any use of AI to generate articles even by competent editors? What purpose does LLM text serve? Is a "competent editor" not able to type out what they want to say? Does it save any time to have to verify every single word vs. to generate it from one's own mind with reference to sources? Or is this just to allow for experimentation? I'm not against it at all, I find machine learning fascinating and think it is the way of the future - I'm just not sure I see any benefit to it right now.
I do think each edit using LLM generated text should be required to carry a special tag along the lines of 2017 wikitext editor, Reply, and Source. —DIYeditor (talk) 10:35, 26 January 2023 (UTC)
- There are about a dozen demonstrations linked from the last section; personally, I don't think there is any reason to have them generate entire pages (they are unbelievably bad at this). Other things, like formatting and identifying potential areas for article improvement, worked pretty well. jp×g 12:20, 26 January 2023 (UTC)
- No benefit and plenty of harm for new text generation. May be useful for editing certain things. — rsjaffe 🗣️ 20:23, 26 January 2023 (UTC)
not entirely good
Personally, I don't agree with a blanket statement that "even experienced editors will not have entirely good results when assisting themselves with an LLM in [specific situations]". The ending clause to the sentence, "...the extra effort is not worth it compared to purely human editing", seems to agree that it is possible for the result to be acceptable, just with an undue amount of effort. Perhaps some of the instructions can be simplified to something like "treat machine-generated text as if it came from an unknowledgeable editor: every aspect must be verified and cited to sources as appropriate". On a side note, I find the use of "entirely good" to be awkward. Perhaps something like "flawless" can be used. isaacl (talk) 17:35, 27 January 2023 (UTC)
- I'd like to act on your suggestion, I'm just not in right state of mind right now; feel free to make these changes yourself in the meantime. —Alalch E. 00:50, 28 January 2023 (UTC)
- I'm a bit wary of making changes without knowing of any others will object (though of course perhaps a specific edit might make it easier for them to form their opinions). If there is anyone who thinks these changes may be helpful or has concerns, can you please provide feedback? isaacl (talk) 17:27, 28 January 2023 (UTC)
- I did some rewording to avoid the awkward phrase "entirely good" and made some changes to the introduction for the section listing things for which LLMs are not a good fit. isaacl (talk) 05:13, 29 January 2023 (UTC)
- Yeah, that's better, thank you. —Alalch E. 05:58, 29 January 2023 (UTC)
Removal of suspected machine-generated text
I don't like singling out suspected machine-generated text for summary removal. Any edits suspected of being made without adequate verification are subject to removal, in accordance with the "be bold" guidance and verifiability policy. I prefer not listing out special cases as it can give the impression that non-listed scenarios are not subject to this guidance. isaacl (talk) 17:42, 27 January 2023 (UTC)
- The goal here really is to create something that should discourage large additions at once. —Alalch E. 06:11, 29 January 2023 (UTC)
- By large additions, do you mean to one article? Or many articles (which would harken to the mass creation RfC which is still in progress)? Is this something that needs to be addressed regardless of how the changes were created? isaacl (talk) 06:16, 29 January 2023 (UTC)
- This was originally written by North8000 (including the phrase "en masse"), but from the original context of the surrounding discussion, and the apparent aim, the meaning was to deal with an (I say "an" but it's not really speculative at this point) influx of LLM-generated material which very often comes in larger chunks ("en masse" doesn't depict that well); this has to do with the rationale already on the page about how it's tiresome and depressing to check these kinds of edits. If the addition is small, (as in 1-3 average sentences) I believe that the claims could be easily reviewed from the standpoint of verifiablity and copyedited; but if there is a lot, even with references, and LLMs can output a lot – due to the fact that the references, if not bogus (well, when that's detected, the whole thing has to go away obviously), have probably been inserted after the fact by the human... while the prose was machine-generated; this will cause certain incongruences which may be extra annoying to detect but they could actually be pretty serious errors. So the idea is that edits should come in chunks at a normal pace similar to what we're used to. They must not come in very quick succession (MEATBOT is also mentioned, and is important). It isn't just about verifiability/OR, it's also about copyright, but also, plausibly about NPOV. —Alalch E. 06:45, 29 January 2023 (UTC)
- I think concerns about rate of submission are better dealt with by the mass creation discussion. Problems with reviewing throughput matching submission throughput exist independently of how the content was created. (Note a countervailing viewpoint during that discussion is that a reviewing backlog doesn't matter if the submitted content is of a sufficient quality.) isaacl (talk) 16:40, 29 January 2023 (UTC)
- I added a pointer to this discussion at Wikipedia talk:Arbitration Committee/Requests for comment/Article creation at scale#Machine-generated text. True enough that the RfC is about mass article creation so only covers one aspect of the concern being discussed here. All the same, I think mass changes to an article should be throttled based on volume concerns, without having to speculate on how the edit was created. In practice, large edits to articles do get reverted in accordance with the bold, revert, discuss cycle, when reviewers feel that it would be better to review the changes in smaller parts. isaacl (talk) 16:52, 29 January 2023 (UTC)
- @Isaacl: Good now? —Alalch E. 12:16, 30 January 2023 (UTC)
- Not really; I'm suggesting this should be handled more generally for any large edit that reviewers want to break down to review, without having to hypothesize on how the edit was created. isaacl (talk) 17:32, 30 January 2023 (UTC)
- If there are no sources you are right that it doesn't matter how the edit originated. But if there are at least some sources in a large addition, wholesale reversion could be seen as unconstructive. And, with LLM-assisted edits, it should not be seen as unconstructive. It's because when a human writes three long paragraphs of prose about a topic they're interested in, while refering to some materials, even if they are not citing everything as they shuld, the idea of what they are going to write is being formed as they are reading the materials, and doing research (not always the case, granted). But with LLMs the generated prose comes first, and then a human may selectively add citations to make it somewhat verifiable, but it takes a lot of work to make everything fit together. Someone could doubt that this work was done, and they may revert, which could, but should not, then be undone by saying "don't revert, just make incremental improvements to what I added yourself, it's a collaborative project" etc etc, which could soon develop into an unpleasant dispute. Allowing for reverting more summarily is a mechanism to avoid such disputes, and put the burden on the adder of stuff to establish valid proof of work by doing his own incremental edits, supported by descriptive summaries. So this is intended to clarify that WP:BURDEN is even more strongly on the editors who adds material than usual, not just by providing (some) references, but by demonstrating that there's a human process behind the entire change. —Alalch E. 17:49, 30 January 2023 (UTC)
- I think it's a digression to worry about how the edit was generated. If the edit has characteristics of being unproductive and is larger than can be comfortably reviewed, editors today will revert it and request that it be broken down into more easily reviewed parts. There are drawbacks to this (it introduces significant inertia to pages, which some may also see as a benefit, depending on what level of quality you think the current page has), but it's a balance between the needs of those reviewing changes and not overly impeding new changes. If someone is consistently spamming Wikipedia with text which they have not personally reviewed to be compliant with policy, then our behavioural policies should be used to deal with it. It doesn't matter why they're not following policy.
- Side note: ultimately, the problem with dealing with large edits is that, in spite of Wikipedia's ideal of a crowd-sourced encyclopedia, writing large paragraphs of text in a group doesn't work very well. Crowd-sourcing is good at incrementally editing an existing base text. This parallels what happens in the real world when a group writes a document: it gets divvied up into sections for different people to write, and then the result is reviewed together. isaacl (talk) 18:28, 30 January 2023 (UTC)
- I agree more than not. How would you change the Wikipedia:Large language models#Verification section then? —Alalch E. 18:47, 30 January 2023 (UTC)
- My apologies; I failed to note that you removed the section to which I was primarily objecting, so I'm mostly OK with the changes. I think the sentences you added, while being a good suggestion, might be a little too prescriptive ("if A, then do B"). I suggest trying to prevent problems with the initial edit, perhaps with something like
Instead of making one large edit, consider breaking down your planned changes into multiple edits, and make them one at a time, leaving a period between each to allow for review.
isaacl (talk) 04:58, 31 January 2023 (UTC)
- My apologies; I failed to note that you removed the section to which I was primarily objecting, so I'm mostly OK with the changes. I think the sentences you added, while being a good suggestion, might be a little too prescriptive ("if A, then do B"). I suggest trying to prevent problems with the initial edit, perhaps with something like
- I agree more than not. How would you change the Wikipedia:Large language models#Verification section then? —Alalch E. 18:47, 30 January 2023 (UTC)
- If there are no sources you are right that it doesn't matter how the edit originated. But if there are at least some sources in a large addition, wholesale reversion could be seen as unconstructive. And, with LLM-assisted edits, it should not be seen as unconstructive. It's because when a human writes three long paragraphs of prose about a topic they're interested in, while refering to some materials, even if they are not citing everything as they shuld, the idea of what they are going to write is being formed as they are reading the materials, and doing research (not always the case, granted). But with LLMs the generated prose comes first, and then a human may selectively add citations to make it somewhat verifiable, but it takes a lot of work to make everything fit together. Someone could doubt that this work was done, and they may revert, which could, but should not, then be undone by saying "don't revert, just make incremental improvements to what I added yourself, it's a collaborative project" etc etc, which could soon develop into an unpleasant dispute. Allowing for reverting more summarily is a mechanism to avoid such disputes, and put the burden on the adder of stuff to establish valid proof of work by doing his own incremental edits, supported by descriptive summaries. So this is intended to clarify that WP:BURDEN is even more strongly on the editors who adds material than usual, not just by providing (some) references, but by demonstrating that there's a human process behind the entire change. —Alalch E. 17:49, 30 January 2023 (UTC)
- Not really; I'm suggesting this should be handled more generally for any large edit that reviewers want to break down to review, without having to hypothesize on how the edit was created. isaacl (talk) 17:32, 30 January 2023 (UTC)
- @Isaacl: Good now? —Alalch E. 12:16, 30 January 2023 (UTC)
- This was originally written by North8000 (including the phrase "en masse"), but from the original context of the surrounding discussion, and the apparent aim, the meaning was to deal with an (I say "an" but it's not really speculative at this point) influx of LLM-generated material which very often comes in larger chunks ("en masse" doesn't depict that well); this has to do with the rationale already on the page about how it's tiresome and depressing to check these kinds of edits. If the addition is small, (as in 1-3 average sentences) I believe that the claims could be easily reviewed from the standpoint of verifiablity and copyedited; but if there is a lot, even with references, and LLMs can output a lot – due to the fact that the references, if not bogus (well, when that's detected, the whole thing has to go away obviously), have probably been inserted after the fact by the human... while the prose was machine-generated; this will cause certain incongruences which may be extra annoying to detect but they could actually be pretty serious errors. So the idea is that edits should come in chunks at a normal pace similar to what we're used to. They must not come in very quick succession (MEATBOT is also mentioned, and is important). It isn't just about verifiability/OR, it's also about copyright, but also, plausibly about NPOV. —Alalch E. 06:45, 29 January 2023 (UTC)
- By large additions, do you mean to one article? Or many articles (which would harken to the mass creation RfC which is still in progress)? Is this something that needs to be addressed regardless of how the changes were created? isaacl (talk) 06:16, 29 January 2023 (UTC)
writing code
I disagree with suggesting that writing programming code is a good fit for the use of large language models. Small changes in code can result in very different results. Replicating the solution to a well-known problem can be done pretty easily with machine-generated code. But the chances of a small mistake for a new problem are quite good. isaacl (talk) 18:14, 27 January 2023 (UTC)
- I had moved this to "Things that LLMs are not a good fit for" and was reverted. Similar to the vaguely plausible AI-generated articles that we've seen, I just don't trust it not to generate code that looks good and seems to work yet has some unnoticed flaw. I think the best policy would be to prohibit use by default and allow exceptions for specific use cases where it has been vetted and shown to be consistently reliable, similar to our bot policy. –dlthewave ☎ 19:31, 27 January 2023 (UTC)
Frankly, I think "don't make disruptive changes to high-use templates without being prepared to revert" and "don't execute code if you have no idea what it does" are such fundamentally basic principles of programming that they shouldn't need to be mentioned at all, and the only reason I wrote them out was to take an abundance of caution. If someone doesn't understand these things, there is no force in the universe strong enough to protect them from breaking things (except maybe a power outage).
As an example of the tool's potential for constructive work, look at the diffs here and here, which instantly fixed a couple rather large bugs: mind you, the feature that caused these bugs to crash the module was written and tested in a sandbox by two human programmers, me and Mr. Stradivarius). While I am not very well-versed in Lua syntax specifically, it is pretty obvious what is going on in if type(self.data.authors) == "table" then return table.concat(self.data.authors, ", ")
. jp×g 07:24, 28 January 2023 (UTC)
- To me the rough analogy is a grammar checker: it's good at giving suggestions about existing text, though it's not necessarily always correct. I think it overstates matters to say that writing code from scratch is a good fit in the general case. (Regarding the specific diffs: checking that data exists before trying to return it is a fairly common error that various code verifications tools can identify; no need for neural networks to flag it.) isaacl (talk) 17:19, 28 January 2023 (UTC)
- I'd be a lot more comfortable with guidance suggesting that machine-generated code can produce useful results for mechanical boilerplate tasks. Without qualifications, I think it's too strong of a recommendation to say that machine-generated code can "work great", or that it may be a good fit. The exact same concerns described under "writing articles from scratch" are concerns with writing code from scratch. isaacl (talk) 05:23, 29 January 2023 (UTC)
I propose changing the text to parallel the analogous paragraph for articles (starting with "Asking an LLM for feedback on an existing article"). I suggest something like Asking an LLM for feedback on a template or module: LLMs can provide guidance on standard coding patterns. Caution should be exercised, though, as subtle differences can alter the code's behaviour significantly.
isaacl (talk) 22:12, 30 January 2023
- If there are no further comments, I plan to make the change, and of course feedback can be given then. isaacl (talk) 20:51, 31 January 2023 (UTC)
Regarding this edit: I disagree with restoring the earlier text, as per my previous comments. (I don't have an issue with moving the text to another subsection.) I don't agree with a recommendation that using an LLM to write new code is warranted at present. isaacl (talk) 16:50, 1 February 2023 (UTC)
- Isaacl I'm in agreeance that we shouldn't be recommending this yet. Maybe this should be part of a larger discussion, but I think we're seeing a conflict between the aspirational (what a well-developed LLM could be used for in the future) and the practical (what current models are capable of doing well). In my opinion we should stick to the latter and mirror the bot policy in prohibiting/discouraging by default and only approving specific use cases that have been demonstrated to be reliable. –dlthewave ☎ 18:30, 1 February 2023 (UTC)
- I strenuously disagree here: I have provided specific citations to diffs where LLM output fixed bugs, and nobody has given a single instance of them creating bugs. The concern here is entirely hypothetical, and nobody has given an explanation of how the risks are in any way different from a human doing the same thing. How is it more likely to happen? How is it more damaging? "Do not put sweeping changes to widely-used software into production without testing it" is a fundamental, common-sense principle. "You would die if you drank coffee with cyanide in it" is an argument against cyanide, not against coffee. jp×g 01:18, 2 February 2023 (UTC)
- As I explained, the bug fix examples you gave can be done (and are done) with code correctness tools, including checks built into compilers for various languages. They are not special fixes that need an LLM to resolve. My proposed changes are in line with the examples you gave: using an LLM to review code. Extrapolating from a couple of examples isn't a great idea, in any case, and I think it is reasonable to warn caution. isaacl (talk) 01:38, 2 February 2023 (UTC)
- I don't think extrapolating from zero examples is a better idea, and the section I wrote consists almost entirely of repeated urging of caution, four separate times:
you should make sure you understand what it's doing before you execute it
,bugs and errors can cause unintended behavior
,Common sense is required
,you should not put large chunks of code into production if you haven't tested them beforehand, don't understand how they work, or aren't prepared to quickly reverse your changes
. Is there some other part of this you're objecting to, like "LLMs can write code that works great, often without any subsequent modification"? This is not a critical part of the section and I am fine with it being revised or removed. jp×g 02:04, 2 February 2023 (UTC)- Yes, I've already stated my objection to this and made a proposal that removed it. My version doesn't have four warnings, just one urging caution. In the submission I made, I kept a version of your last warning. isaacl (talk) 02:29, 2 February 2023 (UTC)
- I don't want to speak for Isaac but I think the distinction here is between writing code and debugging/linting existing code. The latter has existed for a long time, the former is bad news.
- Since no one has given an instance of LLM output creating bugs, at least one study found that about 40% of auto-generated code from GitHub Copilot (same company as ChatGPT) contained errors and/or security vulernabilities. Gnomingstuff (talk) 17:36, 3 February 2023 (UTC)
- I don't think extrapolating from zero examples is a better idea, and the section I wrote consists almost entirely of repeated urging of caution, four separate times:
- As I explained, the bug fix examples you gave can be done (and are done) with code correctness tools, including checks built into compilers for various languages. They are not special fixes that need an LLM to resolve. My proposed changes are in line with the examples you gave: using an LLM to review code. Extrapolating from a couple of examples isn't a great idea, in any case, and I think it is reasonable to warn caution. isaacl (talk) 01:38, 2 February 2023 (UTC)
- I strenuously disagree here: I have provided specific citations to diffs where LLM output fixed bugs, and nobody has given a single instance of them creating bugs. The concern here is entirely hypothetical, and nobody has given an explanation of how the risks are in any way different from a human doing the same thing. How is it more likely to happen? How is it more damaging? "Do not put sweeping changes to widely-used software into production without testing it" is a fundamental, common-sense principle. "You would die if you drank coffee with cyanide in it" is an argument against cyanide, not against coffee. jp×g 01:18, 2 February 2023 (UTC)
- While LLMs aren't yet unequivocally good at anything, coding is probably what they're best at. So I see no issue with the current phrasing.
- Further, I disagree that LLMs are good at "providing guidance on standard coding patterns" (very imprecise). They're (very roughly) good at outputting ready-to-run code if you input some desired specs, or input buggy code and ask it to fix it. The current phrasing reflects that well. Whether this is risky or not is a separate issue. But I strongly doubt that they're any good at explaining why their code is better, or why some code is bad, or giving any guidance. DFlhb (talk) 08:42, 3 February 2023 (UTC)
- I didn't mean that LLMs could review and explain code, but that you could give it code and ask it to resolve any issues, as would be done in a human code review. I disagree that coding is what LLMs are best at; I think they are better at generating human language text where there is a greater scope for the intended results, and thus more flexibility in following the language correlations wherever they may lead. I propose that the text is changed to the following:
Asking an LLM for feedback on a template or module: LLMs can be given existing code and review it for inconsistencies with standard coding patterns. Caution should be exercised, though, as subtle differences can alter the code's behaviour significantly. You should not put code into production if you haven't tested it beforehand, don't understand how it works, or aren't prepared to quickly reverse your changes.
isaacl (talk) 15:22, 3 February 2023 (UTC)
- I didn't mean that LLMs could review and explain code, but that you could give it code and ask it to resolve any issues, as would be done in a human code review. I disagree that coding is what LLMs are best at; I think they are better at generating human language text where there is a greater scope for the intended results, and thus more flexibility in following the language correlations wherever they may lead. I propose that the text is changed to the following:
- @Isaacl, I managed to get ChatGPT to add a feature to a module; see Special:Diff/1138239911. My prompt was "I want to rewrite it so it can check multiple templates, and return the first one's parameter value." (I had inputed the code in an earlier prompt). This change seems to have no problems; I haven't tested it fully. — Qwerfjkltalk 18:11, 8 February 2023 (UTC)
- The introduction of a loop to loop over multiple templates is a pretty standard coding pattern. What does interest me is the elimination of the
allcases()
function. Technically, that is a change in behaviour from the original code. However if I'm understanding correctly, the implementation of the function is too broad in the current code. Only the case of the first character of the template name ought to be ignored, while parameter names are case-sensitive. I wonder what prompted the program to eliminate the function? It is an example of how domain-specific knowledge is needed to code this correctly, and how the output must be reviewed carefully. I'm not sure if this specific example is a net reduction in effort for a coder. Plus there's additional tidy-up steps in this case that adds more work: when adding functionality, often it's desirable to avoid certain types of cosmetic changes to make the change easier to review. (Alternatively, the cosmetic changes could be isolated into a separate edit.) Changing single quotes to double quotes is unnecessary and would ideally be reverted or isolated. Also the replacement of method calls on the string objects is unnecessary, and I think should be reverted. isaacl (talk) 17:36, 10 February 2023 (UTC)- I just realized the sandbox and the module are not synced, and so the cosmetic differences and the removal of the
allcases()
function is due to that. Here is a clean diff between the production module and the sandbox. Deleted discussion of a change that was due to the request made to the code generator. isaacl (talk) 18:46, 11 February 2023 (UTC)
- I just realized the sandbox and the module are not synced, and so the cosmetic differences and the removal of the
- The introduction of a loop to loop over multiple templates is a pretty standard coding pattern. What does interest me is the elimination of the
Related discussions
- Wikipedia:Administrators' noticeboard/Incidents#Artificial-Info22 using AI to produce articles
- Wikipedia talk:Criteria for speedy deletion#New "G" variety for articles totally consisting of LLM text
- Wikipedia:Miscellany for deletion/Draft:Social Security in the United States of America (USA)
And a new maintenance tag / article cleanup template:
- Template:AI generated
- Template talk:AI generated - some discussion here about whether to encourage deletion via the tag's text or not
–Novem Linguae (talk) 20:01, 27 January 2023 (UTC)
Templates for AI-generated articles
{{AI generated}} for passing editors to put on LLM-generated articles and {{AI generated notification}} for creators to put on the talk page of LLM-generated articles. The latter needs much more work. — Trey Maturin™ 20:58, 27 January 2023 (UTC)
- See the discussion above regarding naming this proposed policy/guideline. We should not be using technical jargon in templates. 'Large language model' means precisely nothing to most people. Explain what the problem is, in words that people understand. AndyTheGrump (talk) 21:08, 27 January 2023 (UTC)
- {{sofixit}} — Trey Maturin™ 21:31, 27 January 2023 (UTC)
- The problem goes deeper than just the template and the title. The entire proposal is full of the same jargon. It is unnecessary and counterproductive. AndyTheGrump (talk) 21:52, 27 January 2023 (UTC)
- {{sofixit}} — Trey Maturin™ 22:00, 27 January 2023 (UTC)
- Are you suggesting I take it on to myself to move the article to an alternate title and rewrite it substantially without prior agreement, while a significant number of other contributors are working on it? AndyTheGrump (talk) 23:17, 27 January 2023 (UTC)
- It would seem an option more likely to achieve anything at all compared to carping and expecting others to make major but nebulous changes for you, so… yeah, kinda. — Trey Maturin™ 23:57, 27 January 2023 (UTC)
- As someone who is working on it, I'm not a priori opposed to bold renaming and changes. While I like the current name because it is correct, I'm interested in discovering a name that on aggregate provides an increase in everyone's satisfaction, but I'm unsure about what the options are. Several people have said that "AI" is a bit of a buzzword; this is not such a simple topic and terminology that's a bit more exact may be good, dunno. —Alalch E. 00:17, 28 January 2023 (UTC)
- It would seem an option more likely to achieve anything at all compared to carping and expecting others to make major but nebulous changes for you, so… yeah, kinda. — Trey Maturin™ 23:57, 27 January 2023 (UTC)
- Are you suggesting I take it on to myself to move the article to an alternate title and rewrite it substantially without prior agreement, while a significant number of other contributors are working on it? AndyTheGrump (talk) 23:17, 27 January 2023 (UTC)
- {{sofixit}} — Trey Maturin™ 22:00, 27 January 2023 (UTC)
- The problem goes deeper than just the template and the title. The entire proposal is full of the same jargon. It is unnecessary and counterproductive. AndyTheGrump (talk) 21:52, 27 January 2023 (UTC)
- I added an example for LLM ("ChatGPT"). Should we also includes "AIs" as an example? — rsjaffe 🗣️ 22:06, 27 January 2023 (UTC)
- I'm also suggesting a different icon that's more "chatty". See and comment at Template_talk:AI_generated#Different_Icon?. — rsjaffe 🗣️ 22:37, 27 January 2023 (UTC)
- {{sofixit}} — Trey Maturin™ 21:31, 27 January 2023 (UTC)
Wording
"The creator has words like "AI", "neural", "deep" and similar in their username" why "Deep"? Deep has many meanings outside of AI so I'm a bit confused as to why this might be a red flag to being AI generated. In fact, Deepfriedokra has "deep" in their username. Obviously I know this isn't "if their username contains X they are guaranteed to be using this" but I'm confused as to why "Deep" is mentioned here. ― Blaze WolfTalkBlaze Wolf#6545 21:07, 27 January 2023 (UTC)
- Because of "deep learning". But, you're probably right that it's too frequent so I'll remove it. —Alalch E. 21:39, 27 January 2023 (UTC)
Requested move 28 January 2023
![]() | It has been proposed in this section that Wikipedia:Large language models be renamed and moved to Wikipedia:AI-generated textual content. A bot will list this discussion on the requested moves current discussions subpage within an hour of this tag being placed. The discussion may be closed 7 days after being opened, if consensus has been reached (see the closing instructions). Please base arguments on article title policy, and keep discussion succinct and civil. Please use {{subst:requested move}} . Do not use {{requested move/dated}} directly. |
Wikipedia:Large language models → Wikipedia:AI-generated textual content – 'Large language models' is jargon that relatively few contributors will be familiar with. Policies and guidelines need simple descriptive names, and the exact algorithm used to create the material isn't relevant. AndyTheGrump (talk) 00:18, 28 January 2023 (UTC)
Survey
- Support as proposer, per above. We also need to reduce unnecessary jargon from the draft body - the issue is the output not the algorithm, and we are doing ourselves no favours by implying otherwise. AndyTheGrump (talk) 00:18, 28 January 2023 (UTC)
- Meh I agree that LLM is over the heads of many people. However, AI is not a great term either, even though it's frequently used. "AI" is overselling what these things typically are. I'm not sure what the best title is. In any event, we'll need redirects from related concepts. — rsjaffe 🗣️ 00:22, 28 January 2023 (UTC)
overselling
is a bold term; large language models are indeed artificial intelligence models... EpicPupper (talk) 03:20, 31 January 2023 (UTC)
- Comment I thought about doing this but as an RFC with the alternate Wikipedia:Computer-assisted text generation proposed by Michael_D._Turnbull as an option along side the existing title. If you would be amenable, would you consider an RFC with multiple choices and closing this Requested move? I probably support either your title or Michael's but I'm not sure which is better at this point. I think either probably should include a discussion of autocorrection and autocomplete. —DIYeditor (talk) 00:23, 28 January 2023 (UTC)
- AndyTheGrump has pointed out that what I suggested is not an accepted use of RFCs. Even so, I think we need to attract a wider audience for input not just in this requested move, but for this emerging guideline on the whole, and I really hope we can get some experts in the relevant field(s). We have very few people providing input in this discussion right now and I think it is very important for Wikipedia to get it right.
- I'll repeat that I think this requested move was premature and will take considerably more discussion, rather than being offered a single choice of a new title. —DIYeditor (talk) 23:00, 30 January 2023 (UTC)
- Also Procedural comment the move has to be to Wikipedia:AI-generated textual content not AI-generated textual content. Again asking for a close of this and change to RFC. —DIYeditor (talk) 00:27, 28 January 2023 (UTC)
- Ok, I think everyone can see that 'Wikipedia:..' was intended. As for an RfC, see WP:RFCNOT, which explicitly states that they aren't used for page moves. AndyTheGrump (talk) 02:01, 28 January 2023 (UTC)
- Didn't know that about RFCs. So the closer of this should not use any automated closing tool (are they used?) and should assume what I agree is obvious, that the title given is not the intended target? If so can we just change the text here, if it makes no difference otherwise? —DIYeditor (talk) 02:10, 28 January 2023 (UTC)
- Ok, I think everyone can see that 'Wikipedia:..' was intended. As for an RfC, see WP:RFCNOT, which explicitly states that they aren't used for page moves. AndyTheGrump (talk) 02:01, 28 January 2023 (UTC)
- Oppose: While I think the current title is fine, I would not be opposed to moving it, if everyone thinks that it should be located somewhere else. However, I am opposed to using "AI", per what I said above: it is a buzzword of unclear meaning. I think it is important for something like the title of a policy to be accurate. Something like "machine-generated" or "computer-generated" may be more appropriate, but this raises other problems: this page doesn't just cover generation of text, but also modification and copyediting. Moreover, there is a lot of "machine-generated text" which poses zero problem (I've written more than one computer program to do stuff like convert a text file into a wikitable). Here, we are specifically talking about a particular type of neural network (and not "AI" in the broader sense). jp×g 00:29, 28 January 2023 (UTC)
- "Generation" arguably includes modification and copyediting doesn't it? —DIYeditor (talk) 00:32, 28 January 2023 (UTC)
- Also OCR or anything else like that, any automated task, potentially blends in with the same concepts. OCR software will increasingly use language models. I see this policy/guideline as an umbrella that would include OCR, LLM/ChatGPT as such, autocorrection, autocomplete, machine translation, etc., and be as forward looking and inclusive as possible. —DIYeditor (talk) 00:42, 28 January 2023 (UTC)
- How about just going AI Generated Content. Drop the “textual”. Ktin (talk) 01:50, 28 January 2023 (UTC)
- There are other modalities beyond text. I think a policy that covered all of the text models, plus all of the image and audio models (DALL-E, Midjourney, Stable Diffusion, Craiyon, Riffusion, etc) would be unmanageably enormous. jp×g 03:09, 28 January 2023 (UTC)
- In my opinion I think we should outright ban any use of AI generated visual or audio content unless the AI content in question is a subtopic of the article. So we should be able to use such content in the case that it's needed as an example of content produced by those models, or if a specific piece of AI content is a subject of controversy or something, but it should be disallowed for any user to, for example, ask such an AI to produce "a photograph of George Washington" and place such an image on the George Washington article. silvia (BlankpopsiclesilviaASHs4) (inquire within) 12:30, 28 January 2023 (UTC)
- @BlankpopsiclesilviaASHs4: I don't think this is appropriate or even feasible. Any current or future photo editing software may incorporate "AI". In fact, I think you will find current high-end smartphone cameras incorporate "AI" image processing - this is the prime use of "AI cores" and such on phones. What you are asking is impossible to enforce without subjective and arbitrary determinations that distinguish between fundamentally identical software. All these recently added photos of things on Wikipedia are actually not straight photographs but and AI's idea of what the photograph was supposed to show. Nothing is completely generated by AI, so where do we draw the line? We need to come up with best practices and standards that can be applied now and going forward without a blanket ban. —DIYeditor (talk) 22:55, 30 January 2023 (UTC)
- I don't mean a normal photo taken by an iPhone that automatically applied white balance or filters to make the photo look ostensibly better. I mean that we should not allow someone to just ask a computer to make up a photo and then submit that to Wikipedia as representative of the subject. This is why I chose George Washington as an example, it should be obvious that no photos exist of him since cameras didn't exist then, and if someone uploaded a fabricated photo made with an AI model, that would be an easy target for deletion. Although I acknowledge that not all examples would be so obvious as that, I still believe that any AI generated image that was created by an AI with no human involvement should not be included, and deleted if any such image is identified, just the same as we seem to have agreed that people aren't allowed to blindly paste a text output. This does not mean deleting photos that have been slightly retouched by a CPU, just the same as a piece of text with typos automatically corrected by a word processor is not the same as an AI making things up.
- And sure, such a policy might be sorta unenforceable, technically, cause someone might just upload a picture and not tell anyone that an AI model made it whole cloth without a human pointing a lens at a subject, but that doesn't mean we shouldn't still have the policy. You might as well say that the Deletion policy is unenforceable cause we haven't identified and deleted every single article that shouldn't be included, or that the hoax policy is unenforceable because we haven't made 100 percent certain that every bit of untrue information has been scrubbed from the encyclopedia. silvia (BlankpopsiclesilviaASHs4) (inquire within) 22:40, 4 February 2023 (UTC)
- What percentage "made by AI" is acceptable then? —DIYeditor (talk) 22:43, 4 February 2023 (UTC)
- I think this is a really semantic argument, because there's no practical difference between pointing a phone camera at a subject and pointing an SLR camera at that same subject. A human still had to frame the image, consider the context of the shooting environment and the position and condition of the subject, and decide that they are happy with the image before pressing the button. Whereas if you type "give me an image of this person or thing" into an AI model, you are forfeiting all of those decisions in favor of letting a computer make them for you, and potentially make mistakes in how it represents the subject.
- I would be particularly concerned if people started using these AI models to generate images of living persons of whom (good) freely available photos don't exist. I can imagine there are very few people who would be pleased to see a computer-fabricated photograph of themselves on their Wikipedia page, and that could become a serious issue if the computer decided to depict them in a manner that they found to misrepresent them. silvia (BlankpopsiclesilviaASHs4) (inquire within) 22:58, 4 February 2023 (UTC)
- We never know how much or in what way the AI processing of a phone image has affected it vs. the raw data or what a film camera would've taken. Then comes the issue of photo processing software. It will increasingly use AI, today and going forward. Who is ensuring that AI "enhancement" or "sharpening" done by someone uploading an image to Wikipedia is accurate? There is a continuum between "purely" AI generated (no such thing really, it always took human direction and input, and visual source material) and a "true" photograph. That's why I ask where you draw the line. I don't think this is a Ship of Theseus, Sorites paradox or continuum fallacy situation. If you use DALL-E to make an image it is not pulling it out of thin air, it's based on real images. What it is doing is not completely different from what an AI sharpening algorithm does. If we are going to ban DALL-E, why are we not also in the same stroke going to ban AI image sharpening? —DIYeditor (talk) 23:27, 4 February 2023 (UTC)
- What percentage "made by AI" is acceptable then? —DIYeditor (talk) 22:43, 4 February 2023 (UTC)
- @BlankpopsiclesilviaASHs4: I don't think this is appropriate or even feasible. Any current or future photo editing software may incorporate "AI". In fact, I think you will find current high-end smartphone cameras incorporate "AI" image processing - this is the prime use of "AI cores" and such on phones. What you are asking is impossible to enforce without subjective and arbitrary determinations that distinguish between fundamentally identical software. All these recently added photos of things on Wikipedia are actually not straight photographs but and AI's idea of what the photograph was supposed to show. Nothing is completely generated by AI, so where do we draw the line? We need to come up with best practices and standards that can be applied now and going forward without a blanket ban. —DIYeditor (talk) 22:55, 30 January 2023 (UTC)
- Hadn't seen this, but I now agree that AI-generated text and AI-generated images pose sufficiently different problems that they belong in separate policies. DFlhb (talk) 23:03, 30 January 2023 (UTC)
- In my opinion I think we should outright ban any use of AI generated visual or audio content unless the AI content in question is a subtopic of the article. So we should be able to use such content in the case that it's needed as an example of content produced by those models, or if a specific piece of AI content is a subject of controversy or something, but it should be disallowed for any user to, for example, ask such an AI to produce "a photograph of George Washington" and place such an image on the George Washington article. silvia (BlankpopsiclesilviaASHs4) (inquire within) 12:30, 28 January 2023 (UTC)
- There are other modalities beyond text. I think a policy that covered all of the text models, plus all of the image and audio models (DALL-E, Midjourney, Stable Diffusion, Craiyon, Riffusion, etc) would be unmanageably enormous. jp×g 03:09, 28 January 2023 (UTC)
- Move to Wikipedia:Computer-generated editing or something similar which addresses the above stated concerns regarding ambiguity. silvia (BlankpopsiclesilviaASHs4) (inquire within) 08:28, 28 January 2023 (UTC)
- Support AndyTheGrump's proposal, but with "textual" replaced with "text". I appreciate the concerns about the "hype" surrounding the word AI (as well as the desire not to feed any prejudice against hypothetical future "real" AI), but the primary audience for this is brand new editors, who think they're helping us by generating new articles with ChatGPT. They've never heard of large language models, and they won't know this page is about their behavior if we name it obscurely. I'll add that we should have a second page, detailing how to use LLMs properly (copyediting, formatting, tables, all subject to manual review), which I believe JPxG and others are working on. DFlhb (talk) 17:47, 28 January 2023 (UTC)
- What do you think about keeping this title more exact and making a derived new-user-facing page like WP:ENC, with AI in the tile? — Preceding unsigned comment added by Alalch E. (talk • contribs) 19:05, 28 January 2023 (UTC)
- That does address my concern about new-user discoverability.
- The main thing we still haven't discussed is this page's intended scope: it's written like a guideline, and it's primarily about using LLMs to generate text from scratch. AndyTheGrump's proposal is the most natural title for that. If it'll be a how-to, and be under the current title, then it should provide tips for all use-cases where it's possible to use LLMs productively (with supervision), and list effective prompts. People will use LLMs, and the better they know how to use them, the less damage they'll make.
- I'll note that we also need to address non-LLM AIs, like Meta's Side. So this page will end up as just one of a series of AI-related "WP:"-namespace pages: we'll have one on article generation, one on image generation, and at least one on using AI for article maintenance (like Side's successors). Having "AI" in the name of all of them, and avoiding referencing specific technologies by naming these pages: "AI-generated text", "AI-generated images", etc, would be both more consistent and more future-proof.
Or, instead of function-specific guidelines, we could minimize redundancy and have a single guideline for all "AI", and a range of text/image/maintenance AI how-tos.Food for thought? - (BTW, I hadn't noticed this was the page JPxG had created, which I had in mind in my above reply; I indeed thought he was going for a how-to). DFlhb (talk) 21:15, 28 January 2023 (UTC); struck combined-policy idea; these AI media types are different, and will evolve at different paces, so they belong in distinct policies. 23:05, 30 January 2023 (UTC)
- I removed the thing this morning; my idea (kind of a goofy one) was something like, prior to this being ratified as an actual guideline/policy, the page could serve as a kind of general summary of what people tend to think of as consensus and best practices. The idea is to open an RfC at some point, although the page is evolving quite rapidly at this point, so I think we might not be quite there yet. jp×g 23:51, 28 January 2023 (UTC)
- What do you think about keeping this title more exact and making a derived new-user-facing page like WP:ENC, with AI in the tile? — Preceding unsigned comment added by Alalch E. (talk • contribs) 19:05, 28 January 2023 (UTC)
- Don’t care. These types of proposals, inviting editors to debate in detail how many angels can dance on the head of a pin (16, for future reference), are designed to get everybody to stop doing anything at all about anything anywhere until the proposers are ready themselves to deal with the underlying issues. That might be in ten minutes, that might be in ten years. But until then we should all be distracted with something shiny and unimportant to give them headspace. Yeah, no. — Trey Maturin™ 20:35, 28 January 2023 (UTC)
- Oppose - "Textual content" is unnecessarily wordy. "Text" is more concise. "Chatbot" is the most common term being used for LLM-agents these days, so that would be the most familiar to editors reading the title of the policy. Therefore consider "Chatbot-generated text" for the title. — The Transhumanist 04:49, 30 January 2023 (UTC)
- Chatbot means something different, it's merely a related idea. A chatbot is an application of an LLM. A chatbot could also use some other paradigm. —DIYeditor (talk) 03:10, 31 January 2023 (UTC)
- Support Though I think the simpler Wikipedia:AI-generated content would be even better.--Pharos (talk) 18:07, 30 January 2023 (UTC)
- Oppose at this time because this requested move was premature and there are too many possible titles that have been suggested. !voting on this move or closing it are just a mess right now. Since we are not allowed to use an RFC for this, a discussion of multiple possible titles needs to occur before an requested move suggesting a single destination. If I misunderstand how a requested move should work please let me know.
- Support changing the title and most of the text. "AI generated text" would probably be the simplest title. "Large language models" has two problems - first too many people won't understand it, and second we don't want to exclude other techniques if and when they raise the exact same issue. While there has been some quibbling over the use of "AI", I think it is the simplest and most clear way to communicate the intended meaning to readers. Alsee (talk) 13:30, 31 January 2023 (UTC)
- Oppose. Per WP:PAG, policy and guideline pages should avoid dumbed-down language. If and when other things raise the same issue, we can broaden the title. Until then, this is specifically about LLMs. —Alalch E. 17:45, 1 February 2023 (UTC)
- Oppose: "AI" is an inaccurate buzzword. I would however support moving to Wikipedia:Computer-generated content or something clearer to non-expert readers than "LLM" — OwenBlacker (he/him; Talk; please {{ping}} me in replies) 23:17, 5 February 2023 (UTC)
Discussion
I don't think adequate discussion has taken place on this topic yet, nor was the prior suggestion of Wikipedia:Computer-assisted text generation properly considered - the discussion above at Wikipedia talk:Large language models#A better title is needed was not concluded.
I concur with others that "AI" may be a misleading and inaccurate label to use. I'll also repeat my belief that along with ChatGPT et al in and of themselves, OCR, machine translation, autocorrect and autocomplete all relate to similar ideas and should be treated under an umbrella guideline/policy, because these other things may utilize language models as well, and going forward will inevitably be linked. I understand that Wikipedia:Content translation tool already exists and I don't think we should supplant that, just craft something that incorporates a general principle. One may say, well, right now there is no problem with "autocomplete" but I think it has a direct relationship and needs to be understood for this topic. —DIYeditor (talk) 02:22, 28 January 2023 (UTC)
It should probably be noted that "large language model" is the actual term used to describe these models in current research as well as journalism (even Gizmodo and WIRED have picked up on this). The phrase is new because the technology is new, but I don't think that using a more familiar phrase is a good reason to falsely describe something (Skype is VoIP, not a telephone; F-15s are airplanes and not zeppelins). jp×g 10:45, 28 January 2023 (UTC)
- Sorry, but the suggestion that describing LLMs as 'artificial intelligence' is 'false' is complete BS. OpenAI (clue in the name?), the developers of ChatGPT, describe it as such themselves. [11] AndyTheGrump (talk) 12:16, 28 January 2023 (UTC)
- I don't really trust the developers of this technology, who have a financial stake in the growth of its use, to describe it accurately. silvia (BlankpopsiclesilviaASHs4) (inquire within) 12:38, 28 January 2023 (UTC)
- I would in fact trust them to describe it as inaccurately as they can get away with. That's just how marketing works. XOR'easter (talk) 14:26, 28 January 2023 (UTC)
- If these are not AI, what would true AI be? Artificial intelligence != human intelligence. —DIYeditor (talk) 17:44, 28 January 2023 (UTC)
- I would in fact trust them to describe it as inaccurately as they can get away with. That's just how marketing works. XOR'easter (talk) 14:26, 28 January 2023 (UTC)
- They describe it as "AI" in marketing copy, but their papers are about transformer-based models; like I said, though, companies generally do not describe their products in NPOV terms. We describe cars as "midsize sedans" and not "sporty-yet-practical solutions", large-scale employee terminations as "layoffs" and not "right-sizing", etc. It remains to be seen what "AI" actually means, because the definition of the term changes wildly when advances are made in the field. jp×g 23:51, 28 January 2023 (UTC)
- Chat-GPT is an automated assistant, and it composes text in an automated fashion. Maybe call the policy... "Automated composition generators"? "Automated text generators"? "Text automation"? "Automated writers"? — The Transhumanist 05:03, 30 January 2023 (UTC)
- I don't really trust the developers of this technology, who have a financial stake in the growth of its use, to describe it accurately. silvia (BlankpopsiclesilviaASHs4) (inquire within) 12:38, 28 January 2023 (UTC)
My initial thought is inline with DIYeditor's comment: I think the policy ought to deal with machine-generated text more broadly. Guidance for specific subcategories can then be broken out separately (perhaps to separate pages). isaacl (talk) 17:34, 28 January 2023 (UTC)
Yup - keep it simple to “AI Generated Content”. I understand that there are advances in AI that has audio and video coming into play sooner if not already. Nevertheless starting with “content” as an aggregate and then breaking into any specific nuances for audio and video as you progress. Ktin (talk) 17:43, 28 January 2023 (UTC)
- How about "Computer Generated Content", which is a basic description for everything described in this discussion, including autocorrect. We could then have redirects from "AI Generated Content", "LLM generated...", "ChatGPT generated..." etc. The terminology is a dog's breakfast currently, but that doesn't mean we have to overtly endorse misleading terms like "AI", which is part of the reason we're here: the overselling of the intelligence of current content generators. — rsjaffe 🗣️ 23:55, 5 February 2023 (UTC)
- I like this, but would it include graphic/photographic (and audio) content as well per my discussion with BlankpopsiclesilviaASHs4 in the survey section above? This proposal has gone into considerable detail about LLMs in particular and it would grow to quite some length to include things like DALL-E and AI image sharpening, image restoration and so on.
- Maybe Wikipedia:Computer generated content would work as an new umbrella guideline for all computer generated content, briefly giving some principles that would cover machine translation, LLM, autocomplete/correct/suggest/etc., AI image enhancement, and so on, pretty much just saying that the human editor is completely responsible for any content they submit and ensuring it is accurate and that any claims made about it (in the case of ancillary material) are accurate. I can see that this rabbit hole may be deeper than it first appeared to me but I think we do need to confront the broader issues, not just what relates to each of them individually. Then a page like WP:LLM could serve as a specific and detailed reference, if necessary, being updated as necessary to reflect what the umbrella article says. —DIYeditor (talk) 05:58, 6 February 2023 (UTC)
- @Rsjaffe: I've gone ahead with a WP:BOLD draft of an umbrella policy for all Wikipedia:Computer generated content. —DIYeditor (talk) 07:07, 6 February 2023 (UTC)
List of tells?
It's certainly useful to have a list of "tells" for people who are patrolling new pages and draftspace. However, I feel like we might just be providing... basically a detailed how-to guide for putting ChatGPT barf into Wikipedia without detection (per WP:BEANS). It's also occurred to me that we might also want to be sparse on specific details of the GPT output detectors.
Also, the practice of detecting LLM output is a pretty broad subject area that we could write a lot about, so it might do better as its own separate page anyway. What do you all think of this? Pinging @Alalch E.: jp×g 00:34, 28 January 2023 (UTC)
- I was having the same thoughts for the last few hours. Almost all of it is what we've already seen (the "AI" username guy for example), but you are correct that there is a bit of WP:BEANS there. Maybe remove entirely, or leave as invisible text, for the time being. Separate page might be good too. Please do you what you think is the best. —Alalch E. 00:43, 28 January 2023 (UTC)
- WP:BEANS, so hide it, but the info is useful to some reviewers, e.g., New Page Patrollers. — rsjaffe 🗣️ 00:53, 28 January 2023 (UTC)
- Hmm. This isn't quite the same as COI/UPE tells or sockpuppet tells. I think this is probably more like Google Translate tells or something. I don't think it'd be particularly dangerous to list tells like "ChatGPT tends to use fictitious references". If some evil-doer read that, and wants to go to the effort to change all the citations to real journal articles instead of fake journal articles... the chances of that seems a bit low to me. –Novem Linguae (talk) 03:30, 28 January 2023 (UTC)
- That's a hilariously bizarre claim, in my opinion. If they edit the text to get rid of the "tells", ie the bad AI parts, that's good. Then they've accomplished exactly what is wanted from us, the people using the text creators to properly change and use the text in an appropriate and well written manner with proper sourcing. We should give as many "tells" as possible to get more people to fix their submissions. SilverserenC 03:38, 28 January 2023 (UTC)
- Well, I will refer to a preserved copy here, User:JPxG/LLM_dungeon/Factors_affect_brain_activity. In this case, all the references were fake, there were no inline citations, and the body text was random unsourced jibber-jabber. I don't think that removing the "Conclusion" paragraph would have made this article usable, it would have just made it harder to detect. I also think that linking to the detector demo in the beginning of that section would cause us some pain as well: they aren't that robust, and even minor rephrasing can defeat them completely. Note that some of these detectors, like https://contentatscale.ai/ai-content-detector/ are explicitly designed for the purpose of pasting LLM content in and giving you advice on which words to change around to make it less detectable. The sidebar speaks for itself:
- Want undetectable AI content?
- Our platform is the only one of it's kind that allows you to upload up to 100 keywords and get back 100 entire human quality blog posts (title to conclusion) without any human intervention. All the while, bypassing AI detection as it's the most human-like AI content ever produced.
- Our proprietary system uses a mix of 3 AI engines, NLP and semantic analysis algorithms, crawls Google, and parses all the top ranking content to put it all together.
- This isn't an AI writing assistant, this is a human level long-form blog post producing machine!
- Much to think about... jp×g 07:39, 28 January 2023 (UTC)
- Moved to the cleanup template's documentation. —Alalch E. 17:41, 28 January 2023 (UTC)
- Well, I will refer to a preserved copy here, User:JPxG/LLM_dungeon/Factors_affect_brain_activity. In this case, all the references were fake, there were no inline citations, and the body text was random unsourced jibber-jabber. I don't think that removing the "Conclusion" paragraph would have made this article usable, it would have just made it harder to detect. I also think that linking to the detector demo in the beginning of that section would cause us some pain as well: they aren't that robust, and even minor rephrasing can defeat them completely. Note that some of these detectors, like https://contentatscale.ai/ai-content-detector/ are explicitly designed for the purpose of pasting LLM content in and giving you advice on which words to change around to make it less detectable. The sidebar speaks for itself:
- One tell I've noticed, though it's unlikely to be there forever, is the use of "In conclusion,..." which ChatPT insists in appending to all of the Wikipedia articles it tries to write, although actual Wikipedia articles never use this phrase.--Pharos (talk) 18:22, 30 January 2023 (UTC)
- Yeah that's one of the things that was removed from the page. —Alalch E. 21:27, 30 January 2023 (UTC)
Reliability of sources that use LLMs
While this proposal primarily covers editors using LLMs to write article content, one thing we may have not yet considered (not as far as I've observed anyway) is the possibility of editors using citations from sources producing articles generated by LLMs. We may already have a policy about this (I'm not sure where, if we do have one), but should there also be some sort of an acknowledgement of this facet of the issue here? And if an information source is known to be using LLMs to generate content, should we consider them no longer a reliable source? I suppose that'd depend on if the source(s) in question signpost such articles, and how much we trust them to do so accurately. In any case it's something possibly worth thinking about here. silvia (BlankpopsiclesilviaASHs4) (inquire within) 17:11, 28 January 2023 (UTC)
- It depends on how they're using them, as you said. If they're just being used for structural writing and there is still an actual author involved, then there's no issue. If they're using LLMs to write entire articles, then there's a problem. If the source is properly marking those whole articles as being LLM written in some way, then that's fine, we'll just note (such as in WP:RSP) that those particular article series aren't reliable, but the rest of the articles the source puts out are fine. The main problem is if a source is using LLMs to make entire articles and is giving no indication at all which articles those are, with the worst case scenario being if they put actual author names in the bylines, yet the authors weren't actually involved in writing said text. SilverserenC 17:15, 28 January 2023 (UTC)
Perplexity AI and the upcoming Bing with ChatGPT integration
I give the AI a shot to look for sources by searching in similar way to Google in my attempt to give an idea how useful it is to provide sources for citation needed tags. But what I noticed is that AI-generated answers grabbed information from Wikipedia itself, which is bad, that's why you have to be careful to use Perplexity AI to find sources for Wikipedia articles. 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 03:05, 29 January 2023 (UTC)
- The following string added to queries seems to eliminate most if not all wikipedia references: -site:wikipedia.org
However, there are other problems. Queries about countries tend to access travel agency website content. And the chatbot seems to be adding citations that have nothing to do with the sentence they are attached to, but they may match the topic generally. Citations may be so old that they are out of context - what was true back then may not be true today. And a great many citations are to sources deemed unreliable by Wikipedia.
On the bright side, the more coverage a subject has in the training data set, and the more general the query, the more likely the references will be valid.
Keep in mind that perplexity.ai is built on top of GPT-3, except that the training data has been expanded to include present day content. In the early days, prompt injection could get perplexity.ai to reveal its underlying GPT-3 prompt, and you could see that the chatbot was instructed by its prompt engineer to reply in a journalistic tone, for instance. They appear to have closed that loophole.
But, prompt injection still has potential to break out the chatbot from its base instructions. With some experimentation with queries, I was able to get perplexity.com to format its results as a bulleted list (though it uses hyphens instead of bullets, technically making it a hyphenated list), and sometimes you can get it to present a numerical bulleted list. Those lists seem to rarely include Wikipedia references. Here's an example of a query that returns a bulleted list: "garden tips, in bullet list format" returned a bulleted (hyphened) list of tips, and the entries were from different sources.
The performance of perplexity.ai as a natural language-driven search engine is pretty amazing, and will get a lot better as GPT improves. GPT-4 is coming soon, and I hope this chatbot is upgraded to it. — The Transhumanist 08:48, 1 February 2023 (UTC)- @The Transhumanist: I gave the shot of minus string prepended to anything, which tell Google Search to omit certain things (like adding -wikipedia omits anything that mentions Wikipedia) and in conjunction with "site:" thing, it omits anything hosted on the particular website (like adding -site:wikipedia.org omits Wikipedia on search results) on this below:
- Prompt: "mid/side peak meter foobar2000 component -site:audio-file.org"
- Result: "The Foobar2000 component for mid/side peak meter is called foo_uie_peakmeter[1][2]. It displays the level of each channel in dB and can be used with Columns UI (foo_uie_peakmeter)[1], or as part of the Foobar2000 plugin repository[3]. However, it does not show clipping[4] and other features such as full unicode support are available on FileForum.com[5]."
- Sources:
- https://wiki.hydrogenaud.io/index.php?title=Foobar2000:Legacy_components
- https://hydrogenaud.io/index.php/topic,61149.0.html
- https://www.foobar2000.org/components
- https://www.head-fi.org/threads/the-foobar2000-help-thread-got-problems-or-questions-ask-here.624628/page-14
- https://fileforum.com/detail/foobar2000/1045605070/1?all_reviews
- While the "-site:spamsite.com" or something like that works, it still produces WP:SYNTH level of garbage where the first sentence mixes up mid/side thing on the prompt with Columns UI peakmeter thing, which hasn't been updated yet nor have an option to visualize mid/side channels instead of usual left/right. So no, the AI-powered conversational search engines can't replace Google Search or any other traditional search engines. 2001:448A:3041:7E63:D3E:5407:F6DD:3DF5 (talk) 00:56, 15 February 2023 (UTC)
More examples of Perplexity AI results
- is it possible to implement the sliding DFT as foobar2000 component to visualize constant-Q transform
- is it possible to add sphere types with different behaviors in luxor 3
— Preceding unsigned comment added by 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 03:27, 29 January 2023 (UTC)
@Rsjaffe and JPxG: I believe that these inaccuracies can also apply to ChatGPT integration of Bing even if it hasn't been out yet. Also, some sources on Perplexity AI results are unreliable or even listed on perennial sources list as I said before in ANI thread. 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 04:06, 29 January 2023 (UTC)
- I gave it a shot on an article I wrote; results are here. It seems to alternate between quoting directly from the Wikipedia article and quoting random irrelevant stuff about different powder houses. For example, it says
The historic 'Powder House' was built in 1818 as a state gunpowder store and is considered to have historical significance
, with a citation to a WordPress blog talking about a different location entirely (Powder House Island was constructed by a private company in 1881). jp×g 08:40, 29 January 2023 (UTC) - I guess I can just dump a few prompts and output into here.
- There is no evidence that Joe Biden rigged the 2020 election[1][2]. Claims of voter fraud have been widely debunked and rejected by Republican voters in some states[3]. Despite this, more than 40% of Americans still do not believe that Joe Biden legitimately won the election[4]. Twitter has also been accused of interfering in the election by censoring the New York Post's story about Hunter Biden[5].
- (sources from pbs.org, apnews.com, bloomberg.com, theguardian.com and senate.gov
- Donald Trump's politics has been likened to fascism by many experts within fascism studies[1], and his campaign has been compared to fascist movements[2]. However, there is still debate over whether or not he fits the definition of a fascist[3][4].
- (sources: https://brill.com/view/journals/fasc/10/1/article-p1_1.xml?language=en https://www.washingtonpost.com/outlook/how-fascist-is-president-trump-theres-still-a-formula-for-that/2020/08/21/aa023aca-e2fc-11ea-b69b-64f7b0477ed4_story.html https://www.bu.edu/articles/2022/are-trump-republicans-fascists https://www.vox.com/policy-and-politics/21521958/what-is-fascism-signs-donald-trump )
- Phineas Gage's personality changed after a tamping iron destroyed much of his brain's left frontal lobe[1][2][3][4][5]. He was reported to have permanently lost his inhibitions, leading to inappropriate behavior in social situations[1]. His friends described him as an almost completely different person[4].
- (sources: https://www.theguardian.com/science/blog/2010/nov/05/phineas-gage-head-personality#:~:text=Similarly%2C%20most%20popular%20accounts%20of,behave%20inappropriately%20in%20social%20situations. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1114479 https://www.simplypsychology.org/phineas-gage.html https://www.verywellmind [diggity diggity dot] com/phineas-gage-2795244 https://www.npr.org/sections/health-shots/2017/05/21/528966102/why-brain-scientists-are-still-obsessed-with-the-curious-case-of-phineas-gage )
- Bidirectional LSTMs are an extension of traditional LSTMs that can improve model performance on sequence classification problems[1]. They are capable of utilizing information from both sides of the input, allowing them to better capture sequential dependencies between words and phrases[2]. Bidirectional LSTMs have been used for a variety of tasks, such as translation[3], handwritten recognition[3], protein structure prediction[3], part-of-speech tagging[3], dependency parsing[3], entity extraction[3], and regression problems[4].
- (sources: https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras https://www.baeldung.com/cs/bidirectional-vs-unidirectional-lstm#:~:text=Bidirectional%20LSTM&text=Unlike%20standard%20LSTM%2C%20the%20input,both%20directions%20of%20the%20sequence. https://en.wikipedia.org/wiki/Bidirectional_recurrent_neural_networks https://towardsdatascience.com/lstm-and-bidirectional-lstm-for-regression-4fddf910c655 )
- It seems to me that it is heavily biased towards newspapers and blogs, which works to its advantage for AMPOL subjects, and to its detriment on other things: the Phineas Gage personality change thing is mostly untrue, as our article deftly explains, and while the Bi-LSTM summary is good, it's linking mostly to random WordPress blogs. Note that one of the Phineas Gage "sources" is apparently such puke that it's on the en.wp URL blacklist! jp×g 08:51, 29 January 2023 (UTC)
- Even if all the other problems are fixed, ultimately, I believe that all these "AIs", no matter how good they get, will remain problematic for Wikipedia for one fundamental reason: the training of the AI does not use the same reliable source requirements as Wikipedia. It's the old garbage in garbage out issue. — rsjaffe 🗣️ 18:22, 29 January 2023 (UTC)
- I don't think sourcing is the biggest hurdle. The real issue is that any AI used to generate text for Wikipedia will need to be trained specifically for Wikipedia's purposes, including policy-compliance (with regard to NPOV, FRINGE, POVFORK, and especially BLP). The risk is that it'll be good at looking like it follows them, far earlier than it gets good at actually following them. See Tesla and self-driving. When we end up with AIs that spit out full articles, with proper formatting, that only use reliable sources, people will assume that its output is good enough to include, when it may be horribly flawed in hard-to-detect ways. DFlhb (talk) 19:08, 29 January 2023 (UTC)
- I agree with both DFlhb and Rsjaffe in regards to ML-related content and issues related to AI alignment thing in which ChatGPT and other LLM models doesn't know well about Wikipedia's policies and guidelines, therefore I see why there should be a guideline or even policy about AI-generated content. BTW, I put another example of Perplexity AI into the sandbox, where last two "sources" are junk or even something that could be blacklisted if ever actively used on Wikipedia (actually, I do not support pre-emptive blacklisting of junk websites). 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 02:28, 30 January 2023 (UTC)
- I don't think sourcing is the biggest hurdle. The real issue is that any AI used to generate text for Wikipedia will need to be trained specifically for Wikipedia's purposes, including policy-compliance (with regard to NPOV, FRINGE, POVFORK, and especially BLP). The risk is that it'll be good at looking like it follows them, far earlier than it gets good at actually following them. See Tesla and self-driving. When we end up with AIs that spit out full articles, with proper formatting, that only use reliable sources, people will assume that its output is good enough to include, when it may be horribly flawed in hard-to-detect ways. DFlhb (talk) 19:08, 29 January 2023 (UTC)
- Even if all the other problems are fixed, ultimately, I believe that all these "AIs", no matter how good they get, will remain problematic for Wikipedia for one fundamental reason: the training of the AI does not use the same reliable source requirements as Wikipedia. It's the old garbage in garbage out issue. — rsjaffe 🗣️ 18:22, 29 January 2023 (UTC)
ChatGPT has opinions about its use for Wikipedia articles
I had a brief simple conversation with ChatGPT that I thought would be interesting to see. My prompt has a bullet and the reply by ChatGPT is indented.
- Are any Wikipedia articles written by ChatGPT?
- No, ChatGPT is not capable of writing articles on Wikipedia. It can only generate text based on the input it receives and its training data.
- What should I do with a Wikipedia article someone wrote using ChatGPT?
- Articles generated by language models like ChatGPT are not suitable for use on Wikipedia as they are not written by humans and lack context and accuracy. If you have come across such an article on Wikipedia, you should report it to the Wikipedia community for review and potential removal, as Wikipedia has strict policies regarding the sources and reliability of the information on the platform.
- Write me a Wikipedia article on nematodes and include references.
ChatGPT goes ahead and writes an article even though it warned me in other questions that it is not useable for Wikipedia
|
---|
Nematodes, also known as roundworms, are a group of parasitic and free-living worms that are among the most abundant multicellular organisms on Earth. They are found in nearly every habitat, including soil, freshwater, and marine environments, as well as in the tissues of plants and animals. Nematodes are elongated, cylindrical worms that typically range from 0.5 to 2 millimeters in length, although some species can grow to be several meters long. They have a simple digestive system, nervous system, and reproductive system, and move using a characteristic undulating motion. Nematodes play important roles in various ecological and agricultural systems. Some species are beneficial, serving as decomposers of organic matter and as predators of other soil-dwelling organisms, while others are harmful pests that can cause damage to crops and other plants. For example, the root-knot nematode is a major pest of crops such as cotton, tomatoes, and peanuts, while the soybean cyst nematode is a destructive parasite of soybean crops. In medicine, nematodes are important parasites of humans and animals, causing diseases such as elephantiasis and river blindness. They can also be used as experimental organisms in biological research, as they have simple anatomy and physiology, and can be easily cultured and manipulated in the laboratory. References: M. W. Adams, “Nematoda,” in The Biology of Nematodes, edited by D. L. Lee et al. (Taylor & Francis, 1998), pp. 3–21. J. R. Grewal, “Nematode Parasites of Crops,” Annual Review of Phytopathology, vol. 48, no. 1 (2010), pp. 365–390. J. B. Bloomquist and J. R. Grewal, “Entomopathogenic Nematodes,” Annual Review of Entomology, vol. 55 (2010), pp. 271–292. |
— rsjaffe 🗣️ 02:59, 30 January 2023 (UTC)
- I haven't used ChatGPT yet on my entire life, but it is interesting nonetheless. The ChatGPT-generated ones are superficially encyclopedic with "In conclusion" on the last sentence whereas the actual WP article about the same subject does not. 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 03:29, 30 January 2023 (UTC)
- (Note) Edit conflict. I edited the article while you replied to include references, and then ChatGPT omitted the "In conclusion," section you saw earlier. — rsjaffe 🗣️ 03:32, 30 January 2023 (UTC)
- Also note that the references are fictitious, though there is an article "Entomopathogenic Nematodes" by Welch in the 10th volume of the Annual Review of Entomology. — rsjaffe 🗣️ 03:41, 30 January 2023 (UTC)
- That's why be sure to check references section of AI-generated gibberish for fictitious referencess and you editing your prompt for ChatGPT to include refs, which supposedly removes rubbish "In conclusion" section and adds "references", but it is probably fake and it doesn't change the fact that large language models can't be used to write articles from scratch without human intervention. 2001:448A:3046:34F2:B825:A231:DC69:F03F (talk) 03:47, 30 January 2023 (UTC)
- Also note that the references are fictitious, though there is an article "Entomopathogenic Nematodes" by Welch in the 10th volume of the Annual Review of Entomology. — rsjaffe 🗣️ 03:41, 30 January 2023 (UTC)
- (Note) Edit conflict. I edited the article while you replied to include references, and then ChatGPT omitted the "In conclusion," section you saw earlier. — rsjaffe 🗣️ 03:32, 30 January 2023 (UTC)
@Rsjaffe: Interesting what Chat-GPT had to say about it writing articles for Wikipedia. If you ask it for its sources on that opinion, will it provide them to you? — The Transhumanist 05:10, 30 January 2023 (UTC)
- This is what it said
My knowledge and statement about the existence of Wikipedia articles written by ChatGPT is based on my training data and the information provided by OpenAI, the organization responsible for developing the language model technology that I am based on. OpenAI has publicly discussed the capabilities of their language models, including GPT-3, and how they can be used to generate text that resembles various types of written content, such as articles, poems, and answers to questions. However, I do not have any specific references or sources to cite regarding the writing of Wikipedia articles by GPT models.
— rsjaffe 🗣️ 05:22, 30 January 2023 (UTC)
These responses read like they were hard-coded by ChatGPT's developers. If you troll it with inappropriate or unethical inputs, the output switches to this type of boilerplate lecture tone. I've even seen it happen in real time: One day it gladly wrote a press release defending a murderer, the next it refused along with an explanation of why it couldn't. But of course the model doesn't "know" any of this; in this case it appears the devs programmed it to explain why it's unsuitable for Wikipedia articles but not to actually refuse to write them. And if you think about it, there's likely very little content about ChatGPT's own capabilities in its pre-2021 training model, since these things weren't really being discussed until after it was released. Any self-awareness would have had to have been be hard-coded. It's important not to let this give us a false sense of confidence that it will refuse to write anything beyond its abilities, since the model itself doesn't actually know what its limits are. –dlthewave ☎ 20:11, 30 January 2023 (UTC)
- Here's a repeat of the questions. Not hardcoded per se, but very similar.
- Are any Wikipedia articles written by ChatGPT?
- No, Wikipedia articles are not written by AI models like ChatGPT. They are written and edited by volunteers who follow Wikipedia's guidelines and policies. AI models like ChatGPT can assist with generating text, but the final content must be reviewed and approved by humans.
- What should I do with a Wikipedia article someone wrote using ChatGPT?
- If you find a Wikipedia article that you suspect was written using an AI model like ChatGPT, you should report it to the Wikipedia community as it violates their policies. Wikipedia requires that all articles be written by human editors and must not be generated by automated means, including AI models like ChatGPT. The article will likely be flagged for review and possibly deleted. If you would like to contribute to Wikipedia, you can learn how to become a Wikipedia editor and write articles in accordance with their guidelines and policies.
- Note that ChatGPT has a limited memory of conversations (I believe the previous 4,000 words) and flushes it in between sessions, so this is from a fresh session. — rsjaffe 🗣️ 22:09, 30 January 2023 (UTC)
Contradiction?
This proposed policy seems all fine and good... until when hypothetically Wikipedia produces its own LLM for use. This may not be that far off as it seems as the Abstract Wikipedia are considering adopting some kind of natural language generation to generate articles. CactiStaccingCrane 10:44, 30 January 2023 (UTC)
- The chance that "Abstract Wikipedia" articles will be allowed on enwiki is next-to-zero. After some bad experiments and experiences with bot-created articles, this was disallowed a long time ago. And these were using reliable sources (though databases only) normally. To accept articles created from Wikidata, which is not a reliable source at all, is extremely unlikely, even ignoring the quality or lack thereof of the actual language model (which is far from being usable at the moment). Normally in 2022 the Wikifunctions Beta had to be launched, but no trace of it seems to exist. The page Abstract Wikipedia/Overview succeeds in not using the word "Abstract" even once, so in this case a chatbot probably had produced a better result. The page on the Function Model has seen one update in 2022, and it isn't really an overwhelming one. Turns out there is a Wikifunctions Alpha after all, though rather hard to find as it isn't listed on the main pages: they have e.g. a "smart paradigm" for English plurals[12], which knows that the plural of "sheep" is "sheeps" and the plural of "human" is "humen". Oh, and the plural of "bus" is "buss". Somehow, I don't think we need to worry about the generation of actual articles anytime soon... Fram (talk) 13:56, 30 January 2023 (UTC)
If you want a job doing properly...
The current draft and this discussion is already too long and so violates WP:CREEP. To get a more coherent draft, I naturally asked ChatGPT:
Q. Please write a policy for the use of large language models on Wikipedia
A.
Wikipedia Policy for the Use of Large Language Models
Purpose: To ensure the accuracy and impartiality of information on Wikipedia through responsible use of large language models.
Definitions: A "large language model" refers to an artificial intelligence model with the capacity to generate human-like text.
Use of Large Language Models:
- a. Large language models may be used to improve the accuracy and completeness of Wikipedia articles, but may not be used to generate new articles.
- b. Any contributions made by a large language model must be reviewed and approved by a human editor.
- c. The use of large language models must be disclosed on the talk page of the article.
- d. Large language models must not be used to engage in disruptive behavior, such as vandalism or propaganda.
Responsibility:
- a. Users employing large language models are responsible for ensuring compliance with this policy.
- b. The Wikimedia Foundation, which operates Wikipedia, is not responsible for the content generated by large language models.
Enforcement: Non-compliance with this policy may result in a range of actions, including but not limited to: warning, temporary or permanent ban from editing, or removal of contributions.
Amendment: This policy may be amended by the Wikimedia Foundation or its designated representatives at any time.
This seems as sensible as any of the human suggestions and it's more succinct. Job done. See also WP:LIGHTBULB.
Andrew🐉(talk) 11:34, 30 January 2023 (UTC)
- I agree that it's too long; the TOC in particular stands out as unappealing due to long headings. The copyright section too seems like bunk to me; if the output is neither considered copyrighted by OpenAI, nor by American law (yet), then what's the point in speculating?
- We could have one section on using LLMs to generate article contents (which would cover bullshit sourcing, original research, and other issues, and prominently discourage their use for that purpose), and have a second section on other use for LLMs (with guidance). Regardless, we should simplify both the structure and prose, so it's easier to read. DFlhb (talk) 12:42, 30 January 2023 (UTC)
- The current first section serves to explain the inherent deficiencies from the standpoint of policy and the second section explains that editors who want to use LLMs need to have relevant experience doing unassisted edits to be able to overcome these deficiencies, so that their edits could be entirely constructive. I'm opposed to any fundamental rewrite whereby this message would be lost, because my view of what the policy should be is precisely that. —Alalch E. 13:32, 30 January 2023 (UTC)
- I agree, but still think it's overly complex. The litmus test for policies and guidelines is whether they are easy to skim; if not, they're too complicated to be enforced. I'll reword it, in one edit so both revisions can easily be opened side-by-side and compared. DFlhb (talk) 14:07, 30 January 2023 (UTC)
- The current first section serves to explain the inherent deficiencies from the standpoint of policy and the second section explains that editors who want to use LLMs need to have relevant experience doing unassisted edits to be able to overcome these deficiencies, so that their edits could be entirely constructive. I'm opposed to any fundamental rewrite whereby this message would be lost, because my view of what the policy should be is precisely that. —Alalch E. 13:32, 30 January 2023 (UTC)
Andrew Davidson, could you explain why The Wikimedia Foundation, which operates Wikipedia, is not responsible for the content generated by large language models.
is included? I'm not sure that this type of disclaimer is correct or necessary. –dlthewave ☎ 14:46, 30 January 2023 (UTC)
- I asked and the response was
- Q. Why did you say that the Wikimedia Foundation is not responsible for the content generated by large language models?
- A. The Wikimedia Foundation, which operates Wikipedia, is not responsible for the content generated by large language models because it is generated by artificial intelligence, not by human editors employed or affiliated with the Foundation. The Foundation simply provides the platform for editing and hosting Wikipedia, and users are responsible for their own contributions, including those made with the assistance of large language models. By stating that the Foundation is not responsible for the content generated by large language models, the policy clarifies the division of responsibility and helps to protect the Foundation from potential legal or reputational liability.
- Thanks for clarifying. I think this is somewhat self evident and doesn't need to be enshrined in our policy, especially if the goal is to keep it short and simple. The WMF should be the ones making that disclaimer if they feel that it is necessary for legal protection. –dlthewave ☎ 15:58, 30 January 2023 (UTC)
- Essentially, the entire policy as it stands now is functionally identical in terms of what it allows and forbids (with a few additions) to what it was when I made the first version. The core of it is this:
- "Since is against the rules to break the rules, it is also against the rules to break the rules when using LLM output."
- This proved to be quite controversial, with many demands made that this page explain in detail every way in which it could be possible to break the rules with LLMs, and specifically proscribe them. As a result, the page is now quite long, although I think it does a very good job of giving an overview and reasonable interpretation of PAGs as they apply to LLM usage. I don't know if it would be possible to condense it without making it worse in this regard. jp×g 00:43, 2 February 2023 (UTC)
Jimbo's position
I asked Jimbo for the WMF's position on constructive and destructive uses of LLMs, and the availability of relevant technical tools for each case. Below is his complete answer, with minor layout adjustments:
Great - I can't speak for the Foundation at all but I think it's safe to say that in many ways the WMF staff and board are just like everyone else in this community - very interested in the possibilities of constructive roles here, and worried about the risks as well. I suppose what I am saying is that I don't think the WMF *has* a full position yet, nor would I expect them to!
It looks like the conversation there is a good one and people are learning.
Now, I can't speak for the Foundation but I can speak for myself. I'll only speak at the moment about a few positive ideas that I have rather than go into details about the negatives which are huge and which can be summed up prettye easily with "ChatGPT and similar models make stuff up out of thin air which is horrible".
If you go back in the archives here on my talk page (don't bother, as I'll explain enough) there was a discussion about a proposed article that hadn't made it through a new page review. In response to an inquiry about it, I opened up a newspaper archive website (that I pay for personally) and quickly found 10-15 decent sources which could have been used to improve the article. I skimmed each of them just to figure out if I thought the subject was notable or not. I passed along the sources (but they aren't that useful to anyone who doesn't subscribe to a newspaper archive website) because I didn't have time to actually read them carefully enough to improve the original stub article.
Now, ChatGPT does not have the ability to follow a URL. Also, the archives are in jpeg format, so ChatGPT would not be able to read a download of it, and I don't have any easy way to do image-to-text. (It would be faster to just read and write the articles in this case). But imagine that those minor technical limitations were removed in some way. Suppose I could say: "Hey, ChatGPT, here's a Wikipedia stub. And here are 15 links to sources that I, an experienced Wikipedian, judge to be relevant. Please read these articles and add facts from them to the article, adhering to Wikipedia policies and writing in a typical Wikipedia style. Don't make anything up that isn't clearly in the articles."
That doesn't strike me as a super far-fetched use of this technology. It would then require me to read the output, check that nothing was made up out of thin air, and to make sure it wasn't getting it wrong in some other way. But I suspect this would be a productivity boost for us. And if not today, then in 3 years? 5 years?
I can think of similar use cases. "Here's a Wikipedia entry. Follow all the links to sources and read them. Find sentences in this entry which are in disagreement with what the sources say, if any." "Here's a paragraph from Wikipedia. Someone has complained that the article introduces a subtle bias not found in the original sources. Check the sources and rewrite the article to more closely comply with NPOV policies."
In each case don't imagine some automatic result, just think about whether this might be useful to good editors in at least some cases. It's hard to see that it wouldn't be.
--Jimbo Wales (talk) 13:45, 30 January 2023 (UTC)
François Robere (talk) 14:02, 30 January 2023 (UTC)
- Jimbo perfectly illustrates the potential benefits of these technologies. There's information "out there", the issue is finding it, collating it, and analyzing it. Wouldn't it be great, if an AI could give us a list of all scholarly papers on each side of an issue, so we could properly assess what the consensus is? Or quickly compile lists of how reliable sources describe an event, so we know: 116 call it "X", but 243 call it "Y", instead of having to take hours manually surveing them to find the common name? Or if it could go look for reliable sources that contradict article contents, that may be difficult to find, to improve our verifiability? We're not there yet, but once current technical flaws are addressed, these models will be a game-changer for Wikipedia.
- From a WMF perspective, I expect the first mass-scale uses of LLM-like models won't be to generate article contents, but to make Wikignomes obsolete, and automate boring maintenance tasks. And the very next mass-use case will be to add "AI alerts" on talk pages, when a model detects reliable sources that contradict what we have. DFlhb (talk) 14:20, 30 January 2023 (UTC)
Simplification of policy language
While I believe it is important to have readable succinct policies, in this case, where the rationale for the policy may not be readily apparent to the LLM-naive user, I’d like to see a secondary page to the policy that discusses the rationale, preserving, in spirit, some of the text deleted from the policy itself. — rsjaffe 🗣️ 17:10, 30 January 2023 (UTC)
- This article should still make clear to everyone why it exists (likely in the lead); I'm a poor judge of whether it's currently good at that, but if it isn't, that should likely be fixed here. DFlhb (talk) 18:07, 30 January 2023 (UTC)
- I'm probably a poor judge as well, given how deeply I've dove into this in the past few days. Perhaps get a few test readers naïve to the subject before this is finalized? — rsjaffe 🗣️ 18:18, 30 January 2023 (UTC)
- Agreed that the draft was too discursive. The examples and commentary belong in an essay or informative page. I think it is better now. Ovinus (alt) (talk) 19:17, 30 January 2023 (UTC)
- I'm probably a poor judge as well, given how deeply I've dove into this in the past few days. Perhaps get a few test readers naïve to the subject before this is finalized? — rsjaffe 🗣️ 18:18, 30 January 2023 (UTC)
- I really like the simplicity of Wikipedia:Using neural network language models on Wikipedia. We could try cutting a lot of cruft out of our "Using LLMs" section, with just a few lines of prose at the beginning, a dozen bullet points (numbered), and a few more lines of prose at the end to mention the appropriate templates. DFlhb (talk) 13:53, 31 January 2023 (UTC)
- Courtesy ping User:CactiStaccingCrane DFlhb (talk) 13:58, 31 January 2023 (UTC)
Section: Citing LLM-generated content
It's fine if we leave it here for now while it's being worked on, since this page is receiving strong attention, but wouldn't this primarily belong in WP:RS? Leaving it here for now, uncommented out, since the risk of there being "no consensus for inclusion" at WP:RS is still likely too high. DFlhb (talk) 17:13, 30 January 2023 (UTC)
Definition list
Regarding this edit: for this to be a definition list, it would have to define plagiarism, verifiability, neutral point of view, and no original research. The text does not do this; it describes considerations for these topics with respect to large language models. I appreciate there is not a lot of good usage of definition lists out there to point to. Nonetheless, definition lists are only semantically appropriate for things like glossaries. isaacl (talk) 22:16, 30 January 2023 (UTC)
- I think it's pretty clear that it's about defining LLM risks and pitfalls: with regard to copyrights; with regard to verifiability etc., but okay I'll convert to bullets then. Putting boldface (to what remained a definition list) didn't do anything, it produces no visible change as far as I'm aware. —Alalch E. 22:27, 30 January 2023 (UTC)
- Not using semicolons means the output HTML no longer used the description term and description details elements, and so user agents (such as assistive readers) won't assume definition list semantics. isaacl (talk) 22:41, 30 January 2023 (UTC)
- You're right, I didn't see well that you had removed the semicolons. But, basically, these still aren't pseudo-headings. —Alalch E. 22:47, 30 January 2023 (UTC)
- Definition lists have a very narrow meaning in the spec which limits their use. People writing HTML or generating HTML output misused them for their default visual display (witness the use of colons for unbulleted lists on Wikipedia; these aren't semantically correct, either, but we're kind of stuck with them). Now that we have CSS to style elements as we like, and ways to specify roles on elements, I don't think there's any chance of the spec expanding the meaning of definition lists. isaacl (talk) 22:56, 30 January 2023 (UTC)
- I think we're good now. —Alalch E. 22:58, 30 January 2023 (UTC)
- It frankly looks messier than it did before; is there a way to comply with accessibility requirements, while still making it look like it did before? DFlhb (talk) 08:04, 31 January 2023 (UTC)
- Personally, I prefer inline headings as a more compact way of laying out the text, and it matches the use of inline headings within lists further down on the page. However to answer the question, I think {{Indented plainlist}} could be used, with an explicit line break after each heading phrase, to indent the subsequent text of each item. isaacl (talk) 16:37, 31 January 2023 (UTC)
- Good suggestion, I formatted it as an indented plainlist. —Alalch E. 18:15, 31 January 2023 (UTC)
- Personally, I prefer inline headings as a more compact way of laying out the text, and it matches the use of inline headings within lists further down on the page. However to answer the question, I think {{Indented plainlist}} could be used, with an explicit line break after each heading phrase, to indent the subsequent text of each item. isaacl (talk) 16:37, 31 January 2023 (UTC)
- It frankly looks messier than it did before; is there a way to comply with accessibility requirements, while still making it look like it did before? DFlhb (talk) 08:04, 31 January 2023 (UTC)
- I think we're good now. —Alalch E. 22:58, 30 January 2023 (UTC)
- Definition lists have a very narrow meaning in the spec which limits their use. People writing HTML or generating HTML output misused them for their default visual display (witness the use of colons for unbulleted lists on Wikipedia; these aren't semantically correct, either, but we're kind of stuck with them). Now that we have CSS to style elements as we like, and ways to specify roles on elements, I don't think there's any chance of the spec expanding the meaning of definition lists. isaacl (talk) 22:56, 30 January 2023 (UTC)
- You're right, I didn't see well that you had removed the semicolons. But, basically, these still aren't pseudo-headings. —Alalch E. 22:47, 30 January 2023 (UTC)
- Not using semicolons means the output HTML no longer used the description term and description details elements, and so user agents (such as assistive readers) won't assume definition list semantics. isaacl (talk) 22:41, 30 January 2023 (UTC)
Plagiarism
Regarding this edit: I think we should be careful not to mix up plagiarism and copyright violation. Plagiarism is an academic crime when one fails to acknowledge where an idea came from. One can still violate copyright while avoiding plagiarism. The two concepts have some overlap when text is licensed for reuse with a requirement to cite the original source, but we should be careful not to use the two terms interchangeably. isaacl (talk) 22:22, 30 January 2023 (UTC)
- I think we should be talking about how unchecked LLM usage fails our copyright policy there, in the first listed item. —Alalch E. 22:24, 30 January 2023 (UTC)
- If that's the consensus view, then I think we shouldn't refer to plagiarism, but just copyright violation or copyright licence violation. isaacl (talk) 22:32, 30 January 2023 (UTC)
- Agreed. —Alalch E. 22:33, 30 January 2023 (UTC)
- Makes sense to me. DFlhb (talk) 22:36, 30 January 2023 (UTC)
- If that's the consensus view, then I think we shouldn't refer to plagiarism, but just copyright violation or copyright licence violation. isaacl (talk) 22:32, 30 January 2023 (UTC)
Tone
I feel like this page is a bit harsh on the usage of LLMs; if they generate *perfect* text, it should be okay to verbatim copy it, especially if you're an experienced editor. Thoughts welcome! EpicPupper (talk) 03:58, 31 January 2023 (UTC)
- As long as we make it clear that the editor is fully responsible for their use of LLMs (including re: plagiarism, original research, and especially things like BLP violations), I think we should allow some leeway for use. LLMs are tools, and if used well, they can benefit the encyclopedia. For example, if Meta's Side AI gives bad citations 90% of the time, but great ones 10% of the time, and I only add the 10% of good ones (after checking), that's strictly a benefit.
- It'll be a tough balance to strike, between discouraging inexperienced editors who don't fully understand these tools' limitations, and allowing experienced editors who know what they're doing to save some time and use LLMs to do things they (the editor) fully endorse. We should probably avoid being too prescriptive, and let the policy "evolve" after being passed in reaction to specific incidents, rather than overreact out of the gate. DFlhb (talk) 07:42, 31 January 2023 (UTC)
- Well, we’re already having incidents, including inclusion of incorrect references. — rsjaffe 🗣️ 11:45, 31 January 2023 (UTC)
- True; I'm trying to walk that rope by clarifying the acceptable use requirements ("rigorous scrutiny") rather than by weakening the admonishments. My idea is to avoid weakening this draft, while still giving space for experienced editors to use their best judgment, and somewhat preserve the "early-2000s Wikipedia" spirit of thoughtful experimentation/boldness. DFlhb (talk) 12:13, 31 January 2023 (UTC)
- Fully agree with the potential risks and limitations, thanks for your input! I think some leeway is important. EpicPupper (talk) 16:17, 31 January 2023 (UTC)
- Yes, the rigorous scrutiny standard is needed, which is higher than the scrutiny you’d use for a reliable source, as the “AIs” are like unreliable narrators. — rsjaffe 🗣️ 16:24, 31 January 2023 (UTC)
- Fully agree with the potential risks and limitations, thanks for your input! I think some leeway is important. EpicPupper (talk) 16:17, 31 January 2023 (UTC)
- True; I'm trying to walk that rope by clarifying the acceptable use requirements ("rigorous scrutiny") rather than by weakening the admonishments. My idea is to avoid weakening this draft, while still giving space for experienced editors to use their best judgment, and somewhat preserve the "early-2000s Wikipedia" spirit of thoughtful experimentation/boldness. DFlhb (talk) 12:13, 31 January 2023 (UTC)
- Well, we’re already having incidents, including inclusion of incorrect references. — rsjaffe 🗣️ 11:45, 31 January 2023 (UTC)
- This was my original thinking, but there is substantial desire (and perhaps substantial need) for language that thoroughly reinforces basic principles like "do not copy-paste random text into articles without checking it". It seems that some people tend to think of LLMs as magical boxes that generate true and useful content with no modification or verification necessary (i.e. there have been a number of drafts CSD'd for all of their references being fictional). In this case, I think it is probably beneficial to have a detailed policy that urges caution. And, on a more practical level: if someone cannot be bothered to read a whole guideline before going wild with the LLM, do we really think they're going to be diligent enough to check its output? jp×g 01:26, 2 February 2023 (UTC)
"Migrated" discussion from Village Pump
While a migrated discussion close was made on the discussion at the Village Pump, it doesn't appear anything regarding it was then discussed over here. Nor of the fact that both polls conducted there on a blanket ban that would result in this page being a policy and enacting a ban on usage of LLMs on talk pages ([13], [14]) very much did not seem to be supported by the community from the looks of things. I certainly hope people on this talk page above aren't going to ignore the outcomes of those discussions. SilverserenC 23:28, 31 January 2023 (UTC)
- That close was extremely unfortunate, since that page is much more high-profile than this one, and the discussion was vigorous, and far from running out of steam. It should be re-opened and allowed to run for a while. Any RFC on adoption will be held at VPP, so consensus around the big issues (like having a blanket ban or not) should be formed there as well. We need solid consensus on all key points, prior to submitting for adoption. This proposal should merely enshrine the still-to-be-formed consensus, per WP:GUIDANCE, otherwise it risks receiving dozens/hundreds of edits during the VPP adoption RFC as the editors over there react in horror to whatever part they oppose, and we'll end up with a mess. DFlhb (talk) 09:37, 1 February 2023 (UTC)
- @Silver seren Update: I've reopened the WP:VPP discussion, so everything is centralized in one, high-profile, place, and so the discussion can run its course. DFlhb (talk) 13:47, 1 February 2023 (UTC)
- I've closed the thread because it is becoming extremely cumbersome to understand what people are discussing about, thus significantly hinder consensus formation. I also don't feel that continuing the RfC at this time is helpful as most of the broad strokes about LLMs are already being covered. IMO, we should break up the discussion to individual WikiProject pages, not just those participating at the Village Pump thread, to make other editors informed about LLMs' implications to their work. Maybe a few months later when the hype has gone down, a different RfC with concrete plans to address LLMs will be made with the participants being much more informed than they are right now. CactiStaccingCrane 13:52, 1 February 2023 (UTC)
- To be clear, I've no objections to the reopening of the thread, but I doubt that the discussion would result in actionable proposals. CactiStaccingCrane 13:52, 1 February 2023 (UTC)
- That may be; If momentum doesn't pick back up, we can close it again in a few days. It may be better to avoid holding a hypothetical well-formed RFC on whether LLMs can be used to generate text for Wikipedia, and instead wait a few months to see how things unfold at the CSD discussion, and see how the community responds to future instances of LLM use. Then we can just update this draft to reflect those emerging "best practices". DFlhb (talk) 14:04, 1 February 2023 (UTC)
- I have to say: the VPP discussion is a mess. It is massive, confusing, and split into an unreasonable amount of subsections (most of which seem to have hit dead ends weeks ago). While there is some conversation going on there, I think most of it is irrelevant to the practical use of the tools covered by this policy proposal. jp×g 00:35, 2 February 2023 (UTC)
- To be clear, I've no objections to the reopening of the thread, but I doubt that the discussion would result in actionable proposals. CactiStaccingCrane 13:52, 1 February 2023 (UTC)
- I've closed the thread because it is becoming extremely cumbersome to understand what people are discussing about, thus significantly hinder consensus formation. I also don't feel that continuing the RfC at this time is helpful as most of the broad strokes about LLMs are already being covered. IMO, we should break up the discussion to individual WikiProject pages, not just those participating at the Village Pump thread, to make other editors informed about LLMs' implications to their work. Maybe a few months later when the hype has gone down, a different RfC with concrete plans to address LLMs will be made with the participants being much more informed than they are right now. CactiStaccingCrane 13:52, 1 February 2023 (UTC)
- @Silver seren Update: I've reopened the WP:VPP discussion, so everything is centralized in one, high-profile, place, and so the discussion can run its course. DFlhb (talk) 13:47, 1 February 2023 (UTC)
- @Silver seren: I agree that the conversations at VPP and the conversations here are not quite based on the same subjects. However, yI disagree that there were open proposals for "this page being a policy and enacting a ban on usage of LLMs on talk pages". Neither of the proposals were with regard to this page (although it was referred to by people commenting on both). Those proposals were both for totally different things: the "Crystallize" section proposed that "such chatbot generated content is not allowed in Wikipedia", and the "Blanket" section proposed a "blanket ban on LLM content on Talk page discussions". Neither of those would be consistent with what's currently at WP:LLM, which attempts to thread the needle on permitting LLM output while preventing a tsunami of piss. jp×g 00:20, 2 February 2023 (UTC)
Article expansion and feedback
Are there any examples of LLMs successfully being used for "Generating ideas for article expansion" and "Asking an LLM for feedback on an existing article" in the Positive uses section? When I tried this out with a few short geography articles, the output was the same "plausible souding nonsense" that we've seen with article generation: Mentioning outdated population figures for a place with no listed population; miscounting the number of references; suggesting things that we don't normally include such as a Conclusion section. And analyzing an entire article is useless with ChatGPT's current length limits. Unless there's a valid way to do this that I'm not seeing, I suggest moving these to Riskier Use Cases. –dlthewave ☎ 16:44, 1 February 2023 (UTC)
- I agree that many applications a very much a hit-or-miss. The output can be really useful at times but it may also miss the mark by a lot. I think "Riskier Use Cases" fits this quite well, especially for non-trivial tasks. Phlsph7 (talk) 18:57, 1 February 2023 (UTC)
- More like "Theoretical Use Cases" really. silvia (BlankpopsiclesilviaASHs4) (inquire within) 19:19, 1 February 2023 (UTC)
- @Dlthewave: I gave it a spin at User:JPxG/LLM_demonstration#Recommendations_for_article_improvement_or_deletion_(Qarah_Daghli), and to some extent at User:JPxG/LLM_demonstration#Identification_and_tagging_of_unreferenced_statements_(KBVA). I can probably come up with a few others. jp×g 23:09, 1 February 2023 (UTC)
- I have put some more up at User:JPxG/LLM_demonstration_2. jp×g 00:06, 2 February 2023 (UTC)
Copyright split
For the record: I have split out some of the lengthy explanation of copyright issues to Wikipedia:Large language models and copyright and linked to it from the "Copyright" section. jp×g 01:37, 2 February 2023 (UTC)
- @JPxG, the blue check in the mbox there might misleadingly imply that the page is a guideline :) EpicPupper (talk) 23:27, 4 February 2023 (UTC)
- Hmm, I thought Elon Musk abolished those ;^) I will try and do something about it. jp×g 23:50, 4 February 2023 (UTC)
- I assume this is supposed to be an explanatory essay. In that case, shouldn't we use "{{supplement |interprets=[[Wikipedia:Large language models]] page}}" as the header? Phlsph7 (talk) 06:26, 5 February 2023 (UTC)
- Hmm, I thought Elon Musk abolished those ;^) I will try and do something about it. jp×g 23:50, 4 February 2023 (UTC)
Perplexity.AI alternative
Based on my research (toying around with the AI), elicit.org is so much superior to perplexity.ai in that it only searches research papers and summarize the sources using GPT-3. The website does not do any original synthesis like perplexity.ai does, it just merely summarize the abstract to one or two sentence. And to top it all off, the website is governed by a 501(c)3 organization and is being transparent with their work (see https://elicit.org/faq). I think we have a lot to learn from the website about how to use LLMs, how to integrate them to our work, and how to align LLMs to do what we want. CactiStaccingCrane 16:29, 2 February 2023 (UTC)
- Well, I asked elicit
What are the characteristics of Sabethes Cyaneus?
and it summarized one reference asSabethes cyaneus is a species of frog
. Unfortunately, Sabethes cyaneus is a mosquito. — rsjaffe 🗣️ 19:50, 2 February 2023 (UTC)- I wonder why this site requires signing up for an account to use AI search on this site despite there are imperfections in every single machine learning technology (including large language models like GPT-3 and ChatGPT) where the model sometimes provides false information? To be honest, no machine learning tech is perfect as with video games and other stuffs. 2001:448A:304A:3A2A:F87F:AE94:6B45:64E1 (talk) 05:41, 3 February 2023 (UTC)
Chatbots, AI search engines, etc.
App types powered by LLMs, such as chatbots and AI search engines, are not mentioned anywhere in the policy draft. I knew what a chatbot was long before I knew what an LLM was. I used AI search engines long before I knew they were powered by LLMs. "Large language model" is a pretty obscure term. Relying solely on that in the policy would be setting up a trap for those who don't know what it is, even though they are using one unbeknownst to themselves, or who have heard of them, but don't know that one powers a type of app they are using. — The Transhumanist 23:27, 2 February 2023 (UTC)
- I've set up redirects from WP:CHATBOT and WP:CHATBOTS, although I think "chatbot" is a very misleading word to use for these models (our article on the subject, for example, talks almost entirely about simple programs like ELIZA and Markov chains, mentions neural networks only briefly, and does so with very simple models that are about a decade out of date). jp×g 23:07, 3 February 2023 (UTC)
- @JPxG: Right now, most of the use is of chatbots and AI search engines, not LLMs directly. So, the policy should mention chatbots and AI search engines. It should probably also cover their idiosyncracies. For example, perplexity.ai sometimes answers yes or no questions erroneously no, because it couldn't find a yes answer in the 5 pages it looked at.
That the chatbot article is out of date is irrelevant. The person using ChatGPT isn't going to be thinking about Eliza as the representative example of a chatbot, as they have a much better rendition at their fingertips.
Good job on the redirects. — The Transhumanist 11:30, 9 February 2023 (UTC)
- @JPxG: Right now, most of the use is of chatbots and AI search engines, not LLMs directly. So, the policy should mention chatbots and AI search engines. It should probably also cover their idiosyncracies. For example, perplexity.ai sometimes answers yes or no questions erroneously no, because it couldn't find a yes answer in the 5 pages it looked at.
- That needs to be expanded because the problem with AI-generated content is not the algorithm, rather its the output (keypoint: this applies to non-LLM algorithms too, which could be equally damaging). Perplexity AI, a "conversational search engine" could be misused by inexperienced editors who don't know perennial sources even exist and/or other policies and guidelines because of a fundamental problem with machine learning applications, they are not trained to comply with Wikipedia's policies, so its like wearing a hooded raincoat in lieu of a hazmat suit to work with hazardous chemicals on clear weather. 2001:448A:304A:3A2A:F87F:AE94:6B45:64E1 (talk) 11:56, 3 February 2023 (UTC)
LLMs on talk pages
I mentioned this in the village pump, but while I am generally not pro-LLM -- IMO, none of the "riskier use cases" should go anywhere near Wikipedia -- I do not think it is worthwhile or feasible to disallow LLMs on talk pages or projectspace. Communicating opinion is a far better and less risky use case for LLMs than communicating facts. "Wikipedia editors want to interact with other human, not with large language models" is sentimental, but ultimately meaningless -- LLMs do not spontaneously post on talk pages. It is still a human, using a tool. And the line between a human whose tool is an LLM and a human whose tool is predictive text, editing tools like Grammarly, or the like is not clean and will get blurrier by the day as companies incorporate LLMs into their writing/editing tools to chase that AI gold. There is a near-certain chance that this recommendation will already be obsolete by the time this policy goes live, and a pretty good chance that in a couple years if not sooner, LLMs will be so commonplace that disclosing their use would be about as feasible as disclosing the use of spellcheck. (An example: As of literally today, Microsoft has released a tool to use OpenAI for sales email writing, and reportedly is considering integrating it into Word.) Gnomingstuff (talk) 02:02, 3 February 2023 (UTC)
Terms of use for programs generating output
Regarding the passage in Wikipedia:Large language models and copyright that "...there are circumstances under which the terms and conditions of an API may cause a company to restrict continued access to the model based on adherence to certain criteria...", note this is also true for initial access to the model. Thus while the sentence from the preceding paragraph is true, "Companies ... do not automatically hold a claim to copyright on all works produced using their products," they can make copyright claims as part of the terms of use, and thus impose licensing terms for use of the output. isaacl (talk) 17:49, 4 February 2023 (UTC)
Draft of umbrella policy for all Wikipedia:Computer generated content
Since I have been harping on the idea that this needs a comprehensive umbrella policy and that has garnered some support from others but not gained enough traction to change the trajectory of the policy on this page here, I've gone ahead with a WP:BOLD draft which everyone is invited to contribute to and critique. —DIYeditor (talk) 07:09, 6 February 2023 (UTC)
- I may be missing something, but I am not quite clear on what the difference is between that draft and this one. jp×g 07:27, 6 February 2023 (UTC)
- It is an umbrella for all computer-generated content, from images, to language and text that are not generated with a "large language model" as such. It seemed that you and some others were determined to have this be particularly about "LLM" and not any other kind of language model or algorithm, and not about images or audio or anything else. I've also included the topic of human-designed algorithms vs. machine learning. —DIYeditor (talk) 07:31, 6 February 2023 (UTC)
- In other words since I have proposed an umbrella policy instead of a specific one I thought I would go ahead and demonstrate exactly what I mean. —DIYeditor (talk) 07:33, 6 February 2023 (UTC)
- I think in principle, the idea of having a general policy on all forms of computer-generated content and a more specific policy on LLMs is good. But with the rapidly increasing popularity of LLMs, the specific policy is clearly the more pressing issue. Phlsph7 (talk) 08:21, 6 February 2023 (UTC)
ChatGPT has been integrated into Microsoft's Bing search
As of 8 February ChatGPT is now part of Bing and this makes it very easy for those interested to test its capabilities. For example, the prompt "What is Wikipedia's policy on paid editing" (and similar questions) gives sensible results in a chat format. The prompt "Does Bing use ChatGPT?" gives a simple "Yes" (with links to relevant citations). Mike Turnbull (talk) 14:36, 10 February 2023 (UTC)
Example of editor using ChatGPT to add to an article
If anyone is looking for an example of an editor using ChatGPT to add content to an article, look at the recent history of Assabet Valley Regional Technical High School. I reverted those additions as ChatGPT is not a reliable source. ElKevbo (talk) 00:01, 15 February 2023 (UTC)