Talk:Reflection (artificial intelligence)

How you can contribute

Would it be better to shorten or supplement the data transferred from Prompt Engineering?
The history section could be expanded, ideally incorporating the achievements of external teams (both foundational developments and agent-based systems like DeepClaude).
A list of benchmarks and their results should be included.

TheTeslak (talk) 11:04, 5 February 2025 (UTC)[reply]

Thanks for leading this. As of now, it needs more references (e.g. the Coconut's and R1 papers), and the mention of reinforcement learning applied to reasoning steps. Hplotter (talk) 16:34, 5 February 2025 (UTC)[reply]

The "Techniques" section was copied from the article prompt engineering. So it's mostly relevant for how humans should prompt LLMs, rather than for how to train LLMs for reflection. This part seems to need improvement. Alenoach (talk) 02:02, 6 February 2025 (UTC)[reply]

Non-LLM reflection

I support broadening the definition to every neural networks that feed back to neurons already passed, on the same input. This would also include e.g. fully-connected modern Hopfield networks, hence my previous distinction, since this would not particularly be tied to LLMs. I don't want however to verse into WP:No original research, so if you are aware of literature about such broader definition, please add them. Hplotter (talk) 10:18, 7 February 2025 (UTC)[reply]

Why is this needed?

I'm not at all sure that this article is needed; the original prompt engineering page already is questionable enough, given the pseudoscientific nature of the topic. Similar to that article, this seems to be highly biased towards the nonconsensus view that prompt engineering is an actual scientific discipline that can be meaningfully considered as a kind of engineering. cgranade (talk) 21:39, 8 February 2025 (UTC)[reply]

I totally get your concerns about scientific rigor, but what’s your suggestion? Just delete both articles?

The general principles of reflection are important to describe, and reflection itself is a real and rapidly developing area within LLM research. The lack of a strong evidentiary base is precisely why there aren’t many well-developed articles on the topic yet. LLM research, in general, doesn’t follow the same academic and peer-review traditions as classical sciences, which makes it harder to meet traditional standards of rigor. But this article isn’t trying to position reflection in LLMs as a classical scientific discipline: it's documenting an emerging area.

I've read the discussion on prompt engineering and agree that, from a classical academic perspective, the evidentiary base is still developing. But what about fields like neurobiology? It would be unreasonable to say it doesn't exist or that articles on it aren’t needed, even though its evidentiary standards can be debated. The same applies to political science, which deals with complex, often subjective topics yet is widely documented on Wiki.

If you'd like to discuss the scientific validity of specific claims or concerns about bias toward a non-consensus perspective, I'm happy to go over them. TheTeslak (talk) 02:20, 14 February 2025 (UTC)[reply]

Regarding bias, let’s look at OpenAI’s papers. They are, of course, an interested party, and their work hasn’t undergone independent peer review. But there’s no evidence that they are falsifying their results (please share if you have any). I understand that framing the question this way isn’t entirely scientific, but as I mentioned earlier, this is a rapidly evolving field and not a "pure" science. We do have confirmations from different sources showing similar results, and their papers reflect the broader development of models. Judging by the community discussions, this approach seems to be acceptable, as long as the necessary caveats are included.

I would add to the Criticism section that the research isn’t rigorous enough, but what can I cite to ensure it doesn’t come across as just personal opinion? Aside from the fact that these studies aren’t peer-reviewed, it would be useful to have a breakdown of specific flaws, ideally in highly cited works as well. TheTeslak (talk) 02:40, 14 February 2025 (UTC)[reply]

Potential merge

The article Reasoning language model covers a very similar topic. Maybe we should consider merging the two articles. Alenoach (talk) 20:36, 20 February 2025 (UTC)[reply]

I agree with the need for a merge, but I think we should keep the 'reflection' title, as 'reasoning' is unspecific (all non-reflective LLM reason). Hplotter (talk) 11:49, 11 March 2025 (UTC)[reply]

Badly sourced promotional essay

It reads like an essay ("Introduction"?) and the sourcing is almost entirely arXiv preprints and promotional material from the companies themselves. Is this just advertising? - David Gerard (talk) 08:56, 10 March 2025 (UTC)[reply]