Jump to content

Talk:Reflection (artificial intelligence)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

How you can contribute

[edit]
  • Would it be better to shorten or supplement the data transferred from Prompt Engineering?
  • The history section could be expanded, ideally incorporating the achievements of external teams (both foundational developments and agent-based systems like DeepClaude).
  • A list of benchmarks and their results should be included.

TheTeslak (talk) 11:04, 5 February 2025 (UTC)[reply]

Thanks for leading this. As of now, it needs more references (e.g. the Coconut's and R1 papers), and the mention of reinforcement learning applied to reasoning steps. Hplotter (talk) 16:34, 5 February 2025 (UTC)[reply]
The "Techniques" section was copied from the article prompt engineering. So it's mostly relevant for how humans should prompt LLMs, rather than for how to train LLMs for reflection. This part seems to need improvement. Alenoach (talk) 02:02, 6 February 2025 (UTC)[reply]

Non-LLM reflection

[edit]

I support broadening the definition to every neural networks that feed back to neurons already passed, on the same input. This would also include e.g. fully-connected modern Hopfield networks, hence my previous distinction, since this would not particularly be tied to LLMs. I don't want however to verse into WP:No original research, so if you are aware of literature about such broader definition, please add them. Hplotter (talk) 10:18, 7 February 2025 (UTC)[reply]

Why is this needed?

[edit]

I'm not at all sure that this article is needed; the original prompt engineering page already is questionable enough, given the pseudoscientific nature of the topic. Similar to that article, this seems to be highly biased towards the nonconsensus view that prompt engineering is an actual scientific discipline that can be meaningfully considered as a kind of engineering. cgranade (talk) 21:39, 8 February 2025 (UTC)[reply]

I totally get your concerns about scientific rigor, but what’s your suggestion? Just delete both articles?
The general principles of reflection are important to describe, and reflection itself is a real and rapidly developing area within LLM research. The lack of a strong evidentiary base is precisely why there aren’t many well-developed articles on the topic yet. LLM research, in general, doesn’t follow the same academic and peer-review traditions as classical sciences, which makes it harder to meet traditional standards of rigor. But this article isn’t trying to position reflection in LLMs as a classical scientific discipline: it's documenting an emerging area.
I've read the discussion on prompt engineering and agree that, from a classical academic perspective, the evidentiary base is still developing. But what about fields like neurobiology? It would be unreasonable to say it doesn't exist or that articles on it aren’t needed, even though its evidentiary standards can be debated. The same applies to political science, which deals with complex, often subjective topics yet is widely documented on Wiki.
If you'd like to discuss the scientific validity of specific claims or concerns about bias toward a non-consensus perspective, I'm happy to go over them. TheTeslak (talk) 02:20, 14 February 2025 (UTC)[reply]
Regarding bias, let’s look at OpenAI’s papers. They are, of course, an interested party, and their work hasn’t undergone independent peer review. But there’s no evidence that they are falsifying their results (please share if you have any). I understand that framing the question this way isn’t entirely scientific, but as I mentioned earlier, this is a rapidly evolving field and not a "pure" science. We do have confirmations from different sources showing similar results, and their papers reflect the broader development of models. Judging by the community discussions, this approach seems to be acceptable, as long as the necessary caveats are included.
I would add to the Criticism section that the research isn’t rigorous enough, but what can I cite to ensure it doesn’t come across as just personal opinion? Aside from the fact that these studies aren’t peer-reviewed, it would be useful to have a breakdown of specific flaws, ideally in highly cited works as well. TheTeslak (talk) 02:40, 14 February 2025 (UTC)[reply]

Potential merge

[edit]

The article Reasoning language model covers a very similar topic. Maybe we should consider merging the two articles. Alenoach (talk) 20:36, 20 February 2025 (UTC)[reply]

I agree with the need for a merge, though I think we should keep the 'reflection' title, as 'reasoning' is unspecific (all non-reflective LLM reason). Hplotter (talk) 11:49, 11 March 2025 (UTC)[reply]
I agree that a merge is necessary. Initially, I hadn't seen the RLM article by Cosmia Nebula (which is very good) and ended up here after reading the discussion at this talk page.
The main issue here is the naming. In the prompt engineering discussion, Alenoach suggested "reflection," which I believe is the most accurate term. Currently, there is no standard term; various studies and discussions refer to these models as Large Reasoning Models, Reasoning Language Models, and Reasoning Models.
I would also like to point out that another established term exists: VLM. However, it is used less frequently than LLM when referring to multimodal models. Moreover, I think that distinguishing such terms has its limitations. Modern models often have the ability to speak, listen, and handle various inputs, so introducing umbrella terms such as VARLM or splitting the concepts into V, A, and R would be excessive.
I would like to hear @Cosmia Nebula opinion. TheTeslak (talk) 00:35, 14 March 2025 (UTC)[reply]
The name "Reflection" is unclear.
Is it supposed to cover all language models that "reflect"... on what? Does Chain of Thought count as reflection? If so, then it does not fall under the format as shown in the lead image, which clearly shows a very specific format of what "reflection" is supposed to be:
Reflective agent architecture with self-reflection, evaluation, short/long-term memory, and environment interaction
Assuming that, then many reasoning language models do not reflect, yet they do reason. Specifically, DeepSeek-R1 and Grok 3 do not reflect. They simply generate one very long thinking trace.
In fact, it seems to me that "Reflection" is supposed to be a quite specific Cognitive architecture. If you consider "just run a very long chain of thought" as a good cognitive architecture, then I have no reply, other than note that if that's the case, then the lead image is wrong.
I think the page should pivot to covering the specific "Reflection" approach to reasoning language model, instead of attempting to cover all reasoning language models and other test-time computing methods. pony in a strange land (talk) 02:10, 14 March 2025 (UTC)[reply]
Okay. Do you view this article as a distinct topic or do you believe it should be merged? Considering that both articles receive very few contributions, their completeness is affected. TheTeslak (talk) 18:52, 14 March 2025 (UTC)[reply]
For me, what makes reflection specific is that there is a feedback connection to earlier layers, after either a full pass (last layers connect back to the first ones after decoding, like in every commercial LLM to this day that implement it), or without decoding in between (continuously in latent space) or via inter-layer feedback connection (sub-network recurrence, only in a few research pub.s for now),. i.e. this refers to recurrence in depth, a topological property. I think it was clearer in one of my version, but aleonach partially reverted it. Hplotter (talk) 16:54, 15 March 2025 (UTC)[reply]
Looking back at one of my edits, there is indeed a sentence in the "Introduction" section that seems clearer before. Feel free to modify it if the current phrasing is not great. Alenoach (talk) 01:10, 16 March 2025 (UTC)[reply]
I eventually modified the sentence in this edit to be closer to the original meaning. Alenoach (talk) 01:39, 16 March 2025 (UTC)[reply]
I think ‘Feedback (artificial intelligence)’ as a title for the merge solves issues with both: it is used in the literature (both in AI and neuroscience, sometimes as ‘top-down feedback’), meaningful (unlike ‘reasoning’), and unspecific to LLMs. Hplotter (talk) 08:24, 6 April 2025 (UTC)[reply]

Badly sourced promotional essay

[edit]

It reads like an essay ("Introduction"?) and the sourcing is almost entirely arXiv preprints and promotional material from the companies themselves. Is this just advertising? - David Gerard (talk) 08:56, 10 March 2025 (UTC)[reply]