Jump to content

Wikipedia:Articles for deletion/Inner alignment

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by A. B. (talk | contribs) at 03:15, 28 June 2025 (Inner alignment: Reply to Bishonen). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Inner alignment (edit | talk | history | protect | delete | links | watch | logs | views) – (View log | edits since nomination)
(Find sources: Google (books · news · scholar · free images · WP refs· FENS · JSTOR · TWL)

The article does not currently cite reliable sources. Current citations include the forums "LessWrong" and "AI Alignment Forum", and blog articles on "AISafety.info", Medium, and LinkedIn. A web search turned up the following primary source articles:

I am recommending this article for deletion since I could find no references to this concept in reliable secondary sources. Elestrophe (talk) 01:40, 25 June 2025 (UTC)[reply]

  • Keep: This concept seems to exist and be a confounding factor in artificial intelligence spaces, and therefore has some value to the overall encyclopedia. Because AI is advancing at such a rate, and because such advancements raise challenges faster than scientific study of those challenges can be adequately conducted, I would argue that there is some limited room for article creation before full adequate sourcing exists. There is a fine line between what I am talking about and a violation of WP:CRYSTALBALL and WP:NOR; but I would raise that it is better to have an article in this case than not have an article. Foxtrot620 (talk) 18:23, 25 June 2025 (UTC)[reply]
    Creating an article "before full adequate sourcing exists" is a violation of the No Original Research policy, full stop. Stepwise Continuous Dysfunction (talk) 00:20, 26 June 2025 (UTC)[reply]
  • Note: This discussion has been included in the list of Technology-related deletion discussions. WCQuidditch 02:27, 25 June 2025 (UTC)[reply]
  • Keep - this is a notable concept. I just added a reference to the article from Scientific Reports. A Google Scholar search for "inner alignment" artificial intelligence turns up 300+ results. Many are preprints but there remain many peer-reviewed papers and books. Books, too. --A. B. (talkcontribsglobal count) 20:43, 25 June 2025 (UTC)[reply]
    Scientific Reports is not a good journal. It's the cash-grab of the Nature company. The majority of Wikipedia's own article about it is the "Controversies" section, for goodness sake. Stepwise Continuous Dysfunction (talk) 00:12, 26 June 2025 (UTC)[reply]
  • Keep The version has been improved and the concept itself is notable and increasingly discussed in the academic literature. The notion of “inner alignment” is widely cited in alignment research and has been already formalized. While the original discussions emerged on platforms like the AI Alignment Forum and LessWrong, the term has since migrated into peer-reviewed academic publications. Southernhemisphere (talk) 23:15, 25 June 2025 (UTC)[reply]
  • Delete In the absence of actual serious literature, i.e., multiple reliably-published articles that cover the topic in depth, this is just an advertisement for an ideology. The current sourcing is dreadful, running the gamut from LessWrong to LinkedIn, and a search for better options did not turn up nearly enough to indicate that this needs an article rather than, at most, a sentence somewhere else. Stepwise Continuous Dysfunction (talk) 00:17, 26 June 2025 (UTC)[reply]
    LessWrong and LinkedIn referenced texts were deleted. While the article requires further refinement, the topic remains highly relevant. Southernhemisphere (talk) 05:27, 26 June 2025 (UTC)[reply]
    OK, now remove "aisafety.info" (a primary, non-independent source with no editorial standards that can be discerned). And "Bluedot Impact" (likewise). And the blog post about a podcast episode on Medium, which fails every test one could want for a source good enough to build an encyclopedia article upon. What's left? Not much. Stepwise Continuous Dysfunction (talk) 06:42, 26 June 2025 (UTC)[reply]
  • Keep Deleting by what is in the article today vs what is out there is not how it works. Poorly or incompletely written is not grounds to delete. Google this: "Inner alignment" artificial intelligence. Lots of stuff if we but look: [1], [2], [3], [4], [5]. Exists and is notable, and newer sciences, so you have to dig more. -- Very Polite Person (talk) 03:50, 26 June 2025 (UTC)[reply]
    The first link is to the arXiv preprint version of a conference proceedings paper in a conference with unknown standards. The lead author was at OpenAI, which means that the paper has to be judged for the possibility of criti-hype, and in any event, should be regarded as primary and not independent. The second is a page of search results from a search engine that does not screen for peer review and even includes a self-published book. The third is in Scientific Reports, which via this essay I learned has published crackpot physics. The fifth is a thesis, which is generally not a good kind of source to use. In short, there is much less here than meets the eye. Stepwise Continuous Dysfunction (talk) 06:38, 26 June 2025 (UTC)[reply]
    I will note that a doctoral thesis is an allowable reliable source. However hinging an article like this on a single source is not appropriate. This is why I proposed draftification. This topic could very well be one that generates reliable sources but it's clearly not there yet. Simonm223 (talk) 13:34, 26 June 2025 (UTC)[reply]
  • Delete The only source that looks halfway like credible computer science is a wildly speculative pre-print from 2024 sponsored by Google and Microsoft. The article looks like covert advertising for AIsafety.info. Jujodon (talk) 10:14, 26 June 2025 (UTC)[reply]
  • Draftify as WP:TOOSOON. If reliable academic sources come forward then this article then that's fine but preprints and blogs are not reliable sources. Simonm223 (talk) 13:31, 26 June 2025 (UTC)[reply]
  • Delete or draftify. Is there a single RS for this? Perhaps we could move the article to arXiv too, or maybe viXra - David Gerard (talk) 18:50, 26 June 2025 (UTC)[reply]
  • Keep. Inner alignment is a notable and emerging concept in AI safety, now cited in peer-reviewed sources such as Scientific Reports (Melo et al., 2025) and PRAI 2024 (Li et al.). While the article began with less formal sources, newer academic literature confirms its relevance. Per WP:GNG, the topic has significant coverage in reliable sources. Improvements are ongoing, and deletion would be premature for a concept gaining scholarly traction. Sebasargent (talk) 19:05, 26 June 2025 (UTC) Sebasargent (talkcontribs) has made few or no other edits outside this topic. [reply]
  • I have just removed the many paragraphs cited solely to blog posts, arXiv preprints, Medium posts, some guy's website, or nothing at all. This is now a three-paragraph article with two cites. Is that really all there is to this? Nothing else in a solid RS? - David Gerard (talk) 00:03, 27 June 2025 (UTC)[reply]
    The article should be fixed and enhanced, not deleted. Inner alignment is crucial to preventing both existential risks and suffering risks. Misaligned AI systems may pursue unintended goals, leading to human extinction or vast suffering. Ensuring AI internal goals match human values is key to avoiding catastrophic outcomes as AI systems become more capable and autonomous. Southernhemisphere (talk) 00:06, 27 June 2025 (UTC)[reply]
    If you seriously claim that LLMs will lead to the end of humanity, then this sounds like the topic is squarely within the purview of WP:FRINGE. This puts upon it strong RS requirements. Right now it has two RSes, one of those the topic is merely a passing mention in a footnote. Given this, you really, really need more solid sourcing. I just posted a call on WP:FTN asking for good sourcing - David Gerard (talk) 00:10, 27 June 2025 (UTC)[reply]
    The article doesn’t assert that LLMs will end humanity, but notes that some researchers view inner alignment as a potential contributor to AI risk. I agree that stronger secondary sources are needed and will work on adding more reliable references to reflect the seriousness of the topic neutrally. Southernhemisphere (talk) 00:14, 27 June 2025 (UTC)[reply]
    To speak to your point, User:David Gerard, As an expert in Emergency Management, and someone who has spent a great deal of time studying global catastrophic risk, the idea that AI could lead to the end of humanity is far from fringe science. The fact that essentially every AI company working towards AGI has a team working on Catostrophic Risk is more than enough evidence that AI poses a possible existential threat. Essentially no one on either side of the AI debate disagrees that AI poses a general catastrophic risk. They may disagree on the level of risk and everything else, but the risk is universally acknowledged to be there. - Foxtrot620 (talk) 00:50, 27 June 2025 (UTC)[reply]
    Every "AI" company having a team working on catastrophic risk is not significant evidence, because they would still have those teams just for hype under the null hypothesis of lack of belief in catastrophic risk. It would almost certainly fail to reject the null with p < .05, and the Bayes factor would be so small that it shouldn't convince you of anything that you don't already have very high priors for. (Which, sure, might be reasonable for some narrow statements, like companies believing actual AGI "possibly" posing existential risks. Companies believing the current marginal dollar spent on this providing more benefit to them on the "actual risk" side compared to the "attract investment and other hype" is going to be a nah from me) Alpha3031 (tc) 03:42, 27 June 2025 (UTC)[reply]
    I want to pause and reframe, because I don't think this is conveying the point I need to be heard here. While your points are valid, they don't invalidate the concerns I'm raising about AI risk. I want to present this from an emergency management perspective, my area of expertise in order to insure that it's fully understood.
discussion of the general subject of AI risk, not the article nor the specific topic
  • In emergency management, we assess risk based on three core factors: scale, likelihood, and severity. A risk is worth planning for if any two factors are high. If all three factors are high, or if the likelihood is certain, planning is essential.
    Let's illustrate this with some examples in a hypothetical Midwest US town, "Anytown," with a population of 70,000:
    Tornado:
    Likelihood: High (Midwest location).
    Scale: High (could impact the entire town).
    Severity: High (could destroy Anytown).
    Conclusion: A tornado is a critical risk to prepare for.
    Asteroid Impact:
    Likelihood: Very low.
    Scale: Variable (could be a house or the entire city), but large impacts are extremely low likelihood.
    Severity: Variable (from a ruined garden to flattening the town).
    Conclusion: Not a primary risk for Anytown to plan for due to low likelihood.
    Pandemic:
    Likelihood: Certain (history shows pandemics recur).
    Scale: High (will impact the entire town).
    Severity: Generally high if classified as a pandemic.
    Conclusion: A pandemic is an essential risk to prepare for.
    Tsunami:
    Likelihood: Essentially impossible (Anytown is landlocked).
    Conclusion: Not a risk for Anytown to plan for.
    Now, applying this established emergency management framework to AI and AGI, we have multiple companies actively developing AGI, often with questionable ethical guidelines and insufficient safeguards. While the likelihood of AGI reaching a critical stage where it poses a significant threat is currently unknown, its potential scale and severity could both be of the absolute highest level, impacting the entire globe. According to the same emergency management principles, that tell us a tornado is a threat to prepare for, so is AI. This is not fringe science; it's a direct application of widely accepted risk assessment principles.
    It's also crucial to differentiate here, as the risk isn't just with the theoretical AGI. While AGI poses a potential Global Catastrophic Risk, the issue of AI risk isn't limited to hypothetical future scenarios. AI is already demonstrating tangible risks at various levels:
    We know, indisputably, that current, AI has already contributed to loss of life. For instance, when UnitedHealthcare implemented an AI system for prior authorizations, it wrongfully denied countless claims, leading to treatment delays and, tragically, patient deaths. This wasn't AGI; it was basic AI with real-world, life-or-death consequences. While not a global risk, it was certainly a significant risk for the over 22 million patients insured by UHC. It was a national level impact from AI, and it's one that happened.
  • Comment - I have added 3 refs to the article that I got from a quick check of the Wikipedia Library:
  • Li, Kanxue; Zheng, Qi; Zhan, Yibing; Zhang, Chong; Zhang, Tianle; Lin, Xu; Qi, Chongchong; Li, Lusong; Tao, Dapeng (August 2024). "Alleviating Action Hallucination for LLM-based Embodied Agents via Inner and Outer Alignment". 2024 7th International Conference on Pattern Recognition and Artificial Intelligence (PRAI): 613–621. doi:10.1109/PRAI62207.2024.10826957. Accessed via The Wikipedia Library.
  • Kilian, Kyle A.; Ventura, Christopher J.; Bailey, Mark M. (1 August 2023). "Examining the differential risk from high-level artificial intelligence and the question of control". Futures. 151: 103182. doi:10.1016/j.futures.2023.103182. ISSN 0016-3287. Accessed via The Wikipedia Library.
  • Hartridge, Samuel; Walker-Munro, Brendan (4 April 2025). "Autonomous Weapons Systems and the ai Alignment Problem". Journal of International Humanitarian Legal Studies. 16 (1). Brill | Nijhoff: 38–65. doi:10.1163/18781527-bja10107. ISSN 1878-1373. Accessed via The Wikipedia Library.
--A. B. (talkcontribsglobal count) 22:00, 27 June 2025 (UTC)[reply]
And yet you did not check them - the third only mentions "inner alignment" in a footnote pointing somewhere else. Please review WP:REFBOMB - David Gerard (talk) 00:35, 28 June 2025 (UTC)[reply]
The third ref discusses alignment in general and is written for less technical people.
David, what's your analysis of the other two references? Thanks, --A. B. (talkcontribsglobal count) 00:57, 28 June 2025 (UTC)[reply]