Jump to content

Draft:List of unsolved problems in ai safety

From Wikipedia, the free encyclopedia


This article is a list of notable unsolved problems in AI safety. Artificial intelligence (AI) safety is an interdisciplinary field focused on preventing accidents, misuse, risks, or other harmful consequences arising from AI systems. Problems here are considered unsolved if no answer is known or if there is significant disagreement among experts about a proposed solution.

Risk

[edit]

AI risk concerns the probability and magnitude of harmful outcomes caused by artificial intelligence systems, particularly as systems gain greater autonomy and influence over society. [1]

  • How likely are the various pathways through which AI could cause significant, catastrophic, or existential harm? [2][3]
  • What follows after creating artificial general intelligence? [4][5]
  • What follows after creating superintelligence? [6][7]

Alignment

[edit]

AI alignment is the problem of building machines that faithfully try to do what we want them to do (or what we ought to want them to do). [8]

  • What are the human values or intentions that AI should be aligned to? [9][10]
  • How do we align increasingly capable systems? [11][12]
  • How can we understand and verify the objectives and reasoning processes of complex AI models? [13][14]

Control

[edit]

AI control relates to the technical and procedural measures designed to prevent AI systems from causing unacceptable outcomes, even if these systems actively attempt to subvert safety measures. It focuses on maintaining human oversight, regardless of whether the AI's objectives align with human intentions.[15]

  • Can a sufficiently intelligent AI be controlled? [6][16][17]

Ethics

[edit]

Ethical issues in AI safety concern fairness, accountability, transparency, and the moral status of AI systems. These questions overlap with but are distinct from technical safety, focusing on the societal consequences of AI deployment. [18]

  • How can algorithmic biases be overcome? [19][20]
  • How can the environmental impact of AI be reduced? [21][22]
  • How can the moral status of AI systems be evaluated?[23][24]

Governance

[edit]

AI governance examines institutional, legal, and policy mechanisms for managing risks and ensuring the safe development and deployment of AI technologies. [25]

  • How can AI be safely developed, evaluated, and deployed? [26][27]
  • How can society balance innovations in AI with the prevention of irreversible harms? [28][29]
  • Who is responsible for the actions of an AI model? [30][31]

List of unsolved problems in ai safety

References

[edit]
  1. ^ Future of Life Institute. "Introductory Resources on AI Risks". Future of Life.org. Retrieved 30 October 2025.
  2. ^ Turchin, Alexey; Denkenberger, David (2018-05-03). "Classification of global catastrophic risks connected with artificial intelligence". AI & Society. 35 (1): 147–163. doi:10.1007/s00146-018-0845-5. ISSN 0951-5666. S2CID 19208453.
  3. ^ Chin, Ze Shen (2025). "Dimensional Characterization and Pathway Modeling for Catastrophic AI Risks". arXiv:2508.06411 [cs.CY].
  4. ^ Ord, Toby (2020). The Precipice: Existential Risk and the Future of Humanity. New York: Hachette Books. p. 468. ISBN 9780316484916. Retrieved 29 October 2025.
  5. ^ McLean, Scott; Read, Gemma J. M.; Thompson, Jason; Baber, Chris; Stanton, Neville A.; Salmon, Paul M. (2021). "The risks associated with artificial general intelligence: a systematic review". Journal of Experimental & Theoretical Artificial Intelligence. 35 (4): 1–17. doi:10.1080/0952813X.2021.1964003. Retrieved 29 October 2025.
  6. ^ a b Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies (First ed.). Oxford: Oxford University Press. ISBN 9780199678112.
  7. ^ PauseAI. "The extinction risk of superintelligent AI". PauseAI. Retrieved 29 October 2025.
  8. ^ Christiano, Paul (21 March 2014). "AI Alignment". Retrieved 30 October 2025.
  9. ^ World Economic Forum (8 October 2024). "AI Value Alignment: Guiding Artificial Intelligence Towards Shared Human Goals". World Economic Forum. Retrieved 27 October 2025.
  10. ^ Mitchell, Melanie (13 December 2022). "What Does It Mean to Align AI With Human Values?". Quanta Magazine. Retrieved 29 October 2025.
  11. ^ Ji, Jiaming; Qiu, Tianyi; Chen, Boyuan (2023). "AI Alignment: A Comprehensive Survey". arXiv:2310.19852 [cs.AI].
  12. ^ Grey, Markov; Segerie, Charbel-Raphaël (2025). "Scalable Oversight". AI Safety Atlas. Retrieved 29 October 2025. This document uses hyperlinked citations throughout the text. Each citation is directly linked to its source using HTML hyperlinks rather than traditional numbered references.
  13. ^ Tegmark, Max; Omohundro, Steve (2023). "Provably safe systems: the only path to controllable AGI". arXiv:2309.01933 [cs.CY].
  14. ^ Grey, Markov; Segerie, Charbel-Raphaël (2025). "Chapter 9 – Interpretability". AI Safety Atlas. Retrieved 29 October 2025.
  15. ^ Shlegeris, Buck; Greenblatt, Ryan (7 May 2024). "The case for ensuring that powerful AIs are controlled". Redwood Research Blog. Retrieved 30 October 2025.
  16. ^ Shlegeris, Buck; Greenblatt, Ryan (7 May 2024). "The case for ensuring that powerful AIs are controlled". Redwood Research Blog. Retrieved 30 October 2025.
  17. ^ Yampolskiy, Roman V. (2020). "On Controllability of AI". arXiv:2008.04071 [cs.CY].
  18. ^ Hagendorff, Thilo (2020). "The Ethics of AI Ethics: An Evaluation of Guidelines". Minds and Machines. 30 (1): 99–120. doi:10.1007/s11023-020-09517-8. Retrieved 30 October 2025.
  19. ^ Varsha, P. S. (2023). "How can we manage biases in artificial intelligence systems – A systematic literature review". International Journal of Information Management Data Insights. 3 (1) 100165. doi:10.1016/j.jjimei.2023.100165. Retrieved 30 October 2025.
  20. ^ Ferrara, Emilio (2024). "Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies". Sci. 6 (1): 3. doi:10.3390/sci6010003.
  21. ^ Artificial Intelligence (AI) end-to-end: The Environmental Impact of the Full AI Lifecycle Needs to be Comprehensively Assessed – Issue Note (Report). United Nations Environment Programme. September 2024. Retrieved 30 October 2025.
  22. ^ Ren, Shaolei; Wierman, Adam (15 July 2024). ""The Uneven Distribution of AI's Environmental Impacts"". Harvard Business Review. Retrieved 30 October 2025.
  23. ^ "Moral Status of Digital Minds". 80,000 Hours. Centre for Effective Altruism. 2023. Retrieved 30 October 2025.
  24. ^ Shulman, Carl; Bostrom, Nick (2021). ""Sharing the World with Digital Minds"". In Steve Clarke; Hazem Zohny; Julian Savulescu (eds.). Rethinking Moral Status. Oxford University Press. pp. 306–326. doi:10.1093/oso/9780192894076.003.0018. ISBN 978-0-19-289407-6. Retrieved 30 October 2025.
  25. ^ Dafoe, Allan (27 August 2018). AI Governance: A Research Agenda (Report). Centre for the Governance of AI. Retrieved 30 October 2025.
  26. ^ Ren, Richard; Basart, Steven; Khoja, Adam (2024). "Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?". arXiv:2407.21792 [cs.LG].
  27. ^ Papagiannidis, Emmanouil; Mikalef, Patrick; Conboy, Kieran (2025). "Responsible artificial intelligence governance: A review and research framework". Journal of Strategic Information Systems. 34 (2): 101885. doi:10.1016/j.jsis.2024.101885. Retrieved 27 October 2025.{{cite journal}}: CS1 maint: article number as page number (link)
  28. ^ Bengio, Yoshua; Hinton, Geoffrey; Yao, Andrew (2024). "Managing extreme AI risks amid rapid progress …". Science. 384 (6698). et al.: 842–845. arXiv:2310.17688. Bibcode:2024Sci...384..842B. doi:10.1126/science.adn0117. PMID 38768279. Retrieved 26 October 2025.
  29. ^ "The Bletchley Declaration by Countries Attending the AI Safety Summit, 1–2 November 2023". UK Government. 2 November 2023. Retrieved 29 October 2025.
  30. ^ Recommendation on the Ethics of Artificial Intelligence (Programme and meeting document). Paris: UNESCO. 2022. SHS/BIO/PI/2021/1. Retrieved 27 October 2025.
  31. ^ Coeckelbergh, Mark (2020). "Artificial Intelligence, Responsibility, and Moral Status". AI & Society. 35 (4): 1033–1040. doi:10.1007/s00146-019-00931-5 (inactive 30 October 2025). Retrieved 30 October 2025.{{cite journal}}: CS1 maint: DOI inactive as of October 2025 (link)