Wikipedia:Articles for deletion/Inner alignment
Appearance
[Hide this box] New to Articles for deletion (AfD)? Read these primers!
- Inner alignment (edit | talk | history | protect | delete | links | watch | logs | views) – (View log | edits since nomination)
- (Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL)
The article does not currently cite reliable sources. Current citations include the forums "LessWrong" and "AI Alignment Forum", and blog articles on "AISafety.info", Medium, and LinkedIn. A web search turned up the following primary source articles:
- Li et al., "Alleviating Action Hallucination for LLM-based Embodied Agents via Inner and Outer Alignment," PRAI 2024
- Melo et al., "Machines that halt resolve the undecidability of artificial intelligence alignment", Sci. Rep. 2025
- Safron et al., "Value Cores for Inner and Outer Alignment", IWAI 2022
I am recommending this article for deletion since I could find no references to this concept in reliable secondary sources. Elestrophe (talk) 01:40, 25 June 2025 (UTC)
![]() | If you came here because someone asked you to, or you read a message on another website, please note that this is not a majority vote, but instead a discussion among Wikipedia contributors. Wikipedia has policies and guidelines regarding the encyclopedia's content, and consensus (agreement) is gauged based on the merits of the arguments, not by counting votes.
However, you are invited to participate and your opinion is welcome. Remember to assume good faith on the part of others and to sign your posts on this page by adding ~~~~ at the end. Note: Comments may be tagged as follows: suspected single-purpose accounts:{{subst:spa|username}} ; suspected canvassed users: {{subst:canvassed|username}} ; accounts blocked for sockpuppetry: {{subst:csm|username}} or {{subst:csp|username}} . |
- Keep: This concept seems to exist and be a confounding factor in artificial intelligence spaces, and therefore has some value to the overall encyclopedia. Because AI is advancing at such a rate, and because such advancements raise challenges faster than scientific study of those challenges can be adequately conducted, I would argue that there is some limited room for article creation before full adequate sourcing exists. There is a fine line between what I am talking about and a violation of WP:CRYSTALBALL and WP:NOR; but I would raise that it is better to have an article in this case than not have an article. Foxtrot620 (talk) 18:23, 25 June 2025 (UTC)
- Creating an article "before full adequate sourcing exists" is a violation of the No Original Research policy, full stop. Stepwise Continuous Dysfunction (talk) 00:20, 26 June 2025 (UTC)
- Note: This discussion has been included in the list of Technology-related deletion discussions. WCQuidditch ☎ ✎ 02:27, 25 June 2025 (UTC)
- Keep - this is a notable concept. I just added a reference to the article from Scientific Reports. A Google Scholar search for
"inner alignment" artificial intelligence
turns up 300+ results. Many are preprints but there remain many peer-reviewed papers and books. Books, too. --A. B. (talk • contribs • global count) 20:43, 25 June 2025 (UTC)- Scientific Reports is not a good journal. It's the cash-grab of the Nature company. The majority of Wikipedia's own article about it is the "Controversies" section, for goodness sake. Stepwise Continuous Dysfunction (talk) 00:12, 26 June 2025 (UTC)
- Keep The version has been improved and the concept itself is notable and increasingly discussed in the academic literature. The notion of “inner alignment” is widely cited in alignment research and has been already formalized. While the original discussions emerged on platforms like the AI Alignment Forum and LessWrong, the term has since migrated into peer-reviewed academic publications. Southernhemisphere (talk) 23:15, 25 June 2025 (UTC)
- Delete In the absence of actual serious literature, i.e., multiple reliably-published articles that cover the topic in depth, this is just an advertisement for an ideology. The current sourcing is dreadful, running the gamut from LessWrong to LinkedIn, and a search for better options did not turn up nearly enough to indicate that this needs an article rather than, at most, a sentence somewhere else. Stepwise Continuous Dysfunction (talk) 00:17, 26 June 2025 (UTC)
- LessWrong and LinkedIn referenced texts were deleted. While the article requires further refinement, the topic remains highly relevant. Southernhemisphere (talk) 05:27, 26 June 2025 (UTC)
- OK, now remove "aisafety.info" (a primary, non-independent source with no editorial standards that can be discerned). And "Bluedot Impact" (likewise). And the blog post about a podcast episode on Medium, which fails every test one could want for a source good enough to build an encyclopedia article upon. What's left? Not much. Stepwise Continuous Dysfunction (talk) 06:42, 26 June 2025 (UTC)
- LessWrong and LinkedIn referenced texts were deleted. While the article requires further refinement, the topic remains highly relevant. Southernhemisphere (talk) 05:27, 26 June 2025 (UTC)
- Keep Deleting by what is in the article today vs what is out there is not how it works. Poorly or incompletely written is not grounds to delete. Google this:
"Inner alignment" artificial intelligence
. Lots of stuff if we but look: [1], [2], [3], [4], [5]. Exists and is notable, and newer sciences, so you have to dig more. -- Very Polite Person (talk) 03:50, 26 June 2025 (UTC)- The first link is to the arXiv preprint version of a conference proceedings paper in a conference with unknown standards. The lead author was at OpenAI, which means that the paper has to be judged for the possibility of criti-hype, and in any event, should be regarded as primary and not independent. The second is a page of search results from a search engine that does not screen for peer review and even includes a self-published book. The third is in Scientific Reports, which via this essay I learned has published crackpot physics. The fifth is a thesis, which is generally not a good kind of source to use. In short, there is much less here than meets the eye. Stepwise Continuous Dysfunction (talk) 06:38, 26 June 2025 (UTC)
- I will note that a doctoral thesis is an allowable reliable source. However hinging an article like this on a single source is not appropriate. This is why I proposed draftification. This topic could very well be one that generates reliable sources but it's clearly not there yet. Simonm223 (talk) 13:34, 26 June 2025 (UTC)
- The first link is to the arXiv preprint version of a conference proceedings paper in a conference with unknown standards. The lead author was at OpenAI, which means that the paper has to be judged for the possibility of criti-hype, and in any event, should be regarded as primary and not independent. The second is a page of search results from a search engine that does not screen for peer review and even includes a self-published book. The third is in Scientific Reports, which via this essay I learned has published crackpot physics. The fifth is a thesis, which is generally not a good kind of source to use. In short, there is much less here than meets the eye. Stepwise Continuous Dysfunction (talk) 06:38, 26 June 2025 (UTC)
- Delete The only source that looks halfway like credible computer science is a wildly speculative pre-print from 2024 sponsored by Google and Microsoft. The article looks like covert advertising for AIsafety.info. Jujodon (talk) 10:14, 26 June 2025 (UTC)
- Draftify as WP:TOOSOON. If reliable academic sources come forward then this article then that's fine but preprints and blogs are not reliable sources. Simonm223 (talk) 13:31, 26 June 2025 (UTC)
- Delete or draftify. Is there a single RS for this? Perhaps we could move the article to arXiv too, or maybe viXra - David Gerard (talk) 18:50, 26 June 2025 (UTC)
- Keep. Inner alignment is a notable and emerging concept in AI safety, now cited in peer-reviewed sources such as Scientific Reports (Melo et al., 2025) and PRAI 2024 (Li et al.). While the article began with less formal sources, newer academic literature confirms its relevance. Per WP:GNG, the topic has significant coverage in reliable sources. Improvements are ongoing, and deletion would be premature for a concept gaining scholarly traction. Sebasargent (talk) 19:05, 26 June 2025 (UTC) — Sebasargent (talk • contribs) has made few or no other edits outside this topic.
- "emerging concept" places it squarely as WP:TOOSOON - David Gerard (talk) 23:54, 26 June 2025 (UTC)
- Inner alignment is an urgent topic because it addresses a core safety challenge in the development of powerful AI systems, especially those based on LLMs or other ML techniques. Southernhemisphere (talk) 00:04, 27 June 2025 (UTC)
- "emerging concept" places it squarely as WP:TOOSOON - David Gerard (talk) 23:54, 26 June 2025 (UTC)
- I have just removed the many paragraphs cited solely to blog posts, arXiv preprints, Medium posts, some guy's website, or nothing at all. This is now a three-paragraph article with two cites. Is that really all there is to this? Nothing else in a solid RS? - David Gerard (talk) 00:03, 27 June 2025 (UTC)
- The article should be fixed and enhanced, not deleted. Inner alignment is crucial to preventing both existential risks and suffering risks. Misaligned AI systems may pursue unintended goals, leading to human extinction or vast suffering. Ensuring AI internal goals match human values is key to avoiding catastrophic outcomes as AI systems become more capable and autonomous. Southernhemisphere (talk) 00:06, 27 June 2025 (UTC)
- If you seriously claim that LLMs will lead to the end of humanity, then this sounds like the topic is squarely within the purview of WP:FRINGE. This puts upon it strong RS requirements. Right now it has two RSes, one of those the topic is merely a passing mention in a footnote. Given this, you really, really need more solid sourcing. I just posted a call on WP:FTN asking for good sourcing - David Gerard (talk) 00:10, 27 June 2025 (UTC)
- The article doesn’t assert that LLMs will end humanity, but notes that some researchers view inner alignment as a potential contributor to AI risk. I agree that stronger secondary sources are needed and will work on adding more reliable references to reflect the seriousness of the topic neutrally. Southernhemisphere (talk) 00:14, 27 June 2025 (UTC)
- If you seriously claim that LLMs will lead to the end of humanity, then this sounds like the topic is squarely within the purview of WP:FRINGE. This puts upon it strong RS requirements. Right now it has two RSes, one of those the topic is merely a passing mention in a footnote. Given this, you really, really need more solid sourcing. I just posted a call on WP:FTN asking for good sourcing - David Gerard (talk) 00:10, 27 June 2025 (UTC)
- The article should be fixed and enhanced, not deleted. Inner alignment is crucial to preventing both existential risks and suffering risks. Misaligned AI systems may pursue unintended goals, leading to human extinction or vast suffering. Ensuring AI internal goals match human values is key to avoiding catastrophic outcomes as AI systems become more capable and autonomous. Southernhemisphere (talk) 00:06, 27 June 2025 (UTC)
- To speak to your point, User:David Gerard, As an expert in Emergency Management, and someone who has spent a great deal of time studying global catastrophic risk, the idea that AI could lead to the end of humanity is far from fringe science. The fact that essentially every AI company working towards AGI has a team working on Catostrophic Risk is more than enough evidence that AI poses a possible existential threat. Essentially no one on either side of the AI debate disagrees that AI poses a general catastrophic risk. They may disagree on the level of risk and everything else, but the risk is universally acknowledged to be there. - Foxtrot620 (talk) 00:50, 27 June 2025 (UTC)
- Every "AI" company would have a team working on catastrophic risk is not significant evidence, because they would still have those teams just for hype under the null hypothesis of lack of belief in catastrophic risk. It would almost certainly fail to reject the null with p < .05, and the Bayes factor would be so small that it shouldn't convince you of anything that you don't already have very high priors for. (Which, sure, might be reasonable for some narrow statements, like companies believing actual AGI "possibly" posing existential risks. Companies believing the current marginal dollar spent on this providing more benefit to them on the "actual risk" side compared to the "attract investment and other hype" is going to be a nah from me) Alpha3031 (t • c) 03:42, 27 June 2025 (UTC)
- To speak to your point, User:David Gerard, As an expert in Emergency Management, and someone who has spent a great deal of time studying global catastrophic risk, the idea that AI could lead to the end of humanity is far from fringe science. The fact that essentially every AI company working towards AGI has a team working on Catostrophic Risk is more than enough evidence that AI poses a possible existential threat. Essentially no one on either side of the AI debate disagrees that AI poses a general catastrophic risk. They may disagree on the level of risk and everything else, but the risk is universally acknowledged to be there. - Foxtrot620 (talk) 00:50, 27 June 2025 (UTC)