Draft:List of unsolved problems in ai safety

Review waiting, please be patient.

This may take 2 months or more, since drafts are reviewed in no specific order. There are 2,856 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · List of unsolved problems in ai safety (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 4 days ago by PolarBear94 (talk: D · +) · Last edited 4 days ago by Citation bot

This article is a list of notable unsolved problems in AI safety. Artificial intelligence (AI) safety is an interdisciplinary field focused on preventing accidents, misuse, risks, or other harmful consequences arising from AI systems. Problems here are considered unsolved if no answer is known or if there is significant disagreement among experts about a proposed solution.

Risk

AI risk concerns the probability and magnitude of harmful outcomes caused by artificial intelligence systems, particularly as systems gain greater autonomy and influence over society. ^[1]

How likely are the various pathways through which AI could cause significant, catastrophic, or existential harm? ^[2]^[3]

What follows after creating artificial general intelligence? ^[4]^[5]

What follows after creating superintelligence? ^[6]^[7]

Alignment

AI alignment is the problem of building machines that faithfully try to do what we want them to do (or what we ought to want them to do). ^[8]

What are the human values or intentions that AI should be aligned to? ^[9]^[10]

How do we align increasingly capable systems? ^[11]^[12]

How can we understand and verify the objectives and reasoning processes of complex AI models? ^[13]^[14]

Control

AI control relates to the technical and procedural measures designed to prevent AI systems from causing unacceptable outcomes, even if these systems actively attempt to subvert safety measures. It focuses on maintaining human oversight, regardless of whether the AI's objectives align with human intentions.^[15]

Can a sufficiently intelligent AI be controlled? ^[6]^[16]^[17]

Ethics

Ethical issues in AI safety concern fairness, accountability, transparency, and the moral status of AI systems. These questions overlap with but are distinct from technical safety, focusing on the societal consequences of AI deployment. ^[18]

How can algorithmic biases be overcome? ^[19]^[20]

How can the environmental impact of AI be reduced? ^[21]^[22]

How can the moral status of AI systems be evaluated?^[23]^[24]

Governance

AI governance examines institutional, legal, and policy mechanisms for managing risks and ensuring the safe development and deployment of AI technologies. ^[25]

How can AI be safely developed, evaluated, and deployed? ^[26]^[27]

How can society balance innovations in AI with the prevention of irreversible harms? ^[28]^[29]

Who is responsible for the actions of an AI model? ^[30]^[31]

List of unsolved problems in ai safety

References

^ Future of Life Institute. "Introductory Resources on AI Risks". Future of Life.org. Retrieved 30 October 2025.
^ Turchin, Alexey; Denkenberger, David (2018-05-03). "Classification of global catastrophic risks connected with artificial intelligence". AI & Society. 35 (1): 147–163. doi:10.1007/s00146-018-0845-5. ISSN 0951-5666. S2CID 19208453.
^ Chin, Ze Shen (2025). "Dimensional Characterization and Pathway Modeling for Catastrophic AI Risks". arXiv:2508.06411 [cs.CY].
^ Ord, Toby (2020). The Precipice: Existential Risk and the Future of Humanity. New York: Hachette Books. p. 468. ISBN 9780316484916. Retrieved 29 October 2025.
^ McLean, Scott; Read, Gemma J. M.; Thompson, Jason; Baber, Chris; Stanton, Neville A.; Salmon, Paul M. (2021). "The risks associated with artificial general intelligence: a systematic review". Journal of Experimental & Theoretical Artificial Intelligence. 35 (4): 1–17. doi:10.1080/0952813X.2021.1964003. Retrieved 29 October 2025.
^ ^a ^b Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies (First ed.). Oxford: Oxford University Press. ISBN 9780199678112.
^ PauseAI. "The extinction risk of superintelligent AI". PauseAI. Retrieved 29 October 2025.
^ Christiano, Paul (21 March 2014). "AI Alignment". Retrieved 30 October 2025.
^ World Economic Forum (8 October 2024). "AI Value Alignment: Guiding Artificial Intelligence Towards Shared Human Goals". World Economic Forum. Retrieved 27 October 2025.
^ Mitchell, Melanie (13 December 2022). "What Does It Mean to Align AI With Human Values?". Quanta Magazine. Retrieved 29 October 2025.
^ Ji, Jiaming; Qiu, Tianyi; Chen, Boyuan (2023). "AI Alignment: A Comprehensive Survey". arXiv:2310.19852 [cs.AI].
^ Grey, Markov; Segerie, Charbel-Raphaël (2025). "Scalable Oversight". AI Safety Atlas. Retrieved 29 October 2025. This document uses hyperlinked citations throughout the text. Each citation is directly linked to its source using HTML hyperlinks rather than traditional numbered references.
^ Tegmark, Max; Omohundro, Steve (2023). "Provably safe systems: the only path to controllable AGI". arXiv:2309.01933 [cs.CY].
^ Grey, Markov; Segerie, Charbel-Raphaël (2025). "Chapter 9 – Interpretability". AI Safety Atlas. Retrieved 29 October 2025.
^ Shlegeris, Buck; Greenblatt, Ryan (7 May 2024). "The case for ensuring that powerful AIs are controlled". Redwood Research Blog. Retrieved 30 October 2025.
^ Shlegeris, Buck; Greenblatt, Ryan (7 May 2024). "The case for ensuring that powerful AIs are controlled". Redwood Research Blog. Retrieved 30 October 2025.
^ Yampolskiy, Roman V. (2020). "On Controllability of AI". arXiv:2008.04071 [cs.CY].
^ Hagendorff, Thilo (2020). "The Ethics of AI Ethics: An Evaluation of Guidelines". Minds and Machines. 30 (1): 99–120. doi:10.1007/s11023-020-09517-8. Retrieved 30 October 2025.
^ Varsha, P. S. (2023). "How can we manage biases in artificial intelligence systems – A systematic literature review". International Journal of Information Management Data Insights. 3 (1) 100165. doi:10.1016/j.jjimei.2023.100165. Retrieved 30 October 2025.
^ Ferrara, Emilio (2024). "Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies". Sci. 6 (1): 3. doi:10.3390/sci6010003.
^ Artificial Intelligence (AI) end-to-end: The Environmental Impact of the Full AI Lifecycle Needs to be Comprehensively Assessed – Issue Note (Report). United Nations Environment Programme. September 2024. Retrieved 30 October 2025.
^ Ren, Shaolei; Wierman, Adam (15 July 2024). ""The Uneven Distribution of AI's Environmental Impacts"". Harvard Business Review. Retrieved 30 October 2025.
^ "Moral Status of Digital Minds". 80,000 Hours. Centre for Effective Altruism. 2023. Retrieved 30 October 2025.
^ Shulman, Carl; Bostrom, Nick (2021). ""Sharing the World with Digital Minds"". In Steve Clarke; Hazem Zohny; Julian Savulescu (eds.). Rethinking Moral Status. Oxford University Press. pp. 306–326. doi:10.1093/oso/9780192894076.003.0018. ISBN 978-0-19-289407-6. Retrieved 30 October 2025.
^ Dafoe, Allan (27 August 2018). AI Governance: A Research Agenda (Report). Centre for the Governance of AI. Retrieved 30 October 2025.
^ Ren, Richard; Basart, Steven; Khoja, Adam (2024). "Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?". arXiv:2407.21792 [cs.LG].
^ Papagiannidis, Emmanouil; Mikalef, Patrick; Conboy, Kieran (2025). "Responsible artificial intelligence governance: A review and research framework". Journal of Strategic Information Systems. 34 (2): 101885. doi:10.1016/j.jsis.2024.101885. Retrieved 27 October 2025.{{cite journal}}: CS1 maint: article number as page number (link)
^ Bengio, Yoshua; Hinton, Geoffrey; Yao, Andrew (2024). "Managing extreme AI risks amid rapid progress …". Science. 384 (6698). et al.: 842–845. arXiv:2310.17688. Bibcode:2024Sci...384..842B. doi:10.1126/science.adn0117. PMID 38768279. Retrieved 26 October 2025.
^ "The Bletchley Declaration by Countries Attending the AI Safety Summit, 1–2 November 2023". UK Government. 2 November 2023. Retrieved 29 October 2025.
^ Recommendation on the Ethics of Artificial Intelligence (Programme and meeting document). Paris: UNESCO. 2022. SHS/BIO/PI/2021/1. Retrieved 27 October 2025.
^ Coeckelbergh, Mark (2020). "Artificial Intelligence, Responsibility, and Moral Status". AI & Society. 35 (4): 1033–1040. doi:10.1007/s00146-019-00931-5 (inactive 30 October 2025). Retrieved 30 October 2025.{{cite journal}}: CS1 maint: DOI inactive as of October 2025 (link)

[1] Future of Life Institute. "Introductory Resources on AI Risks". Future of Life.org. Retrieved 30 October 2025.

[auto1-2] Turchin, Alexey; Denkenberger, David (2018-05-03). "Classification of global catastrophic risks connected with artificial intelligence". AI & Society. 35 (1): 147–163. doi:10.1007/s00146-018-0845-5. ISSN 0951-5666. S2CID 19208453.

[3] Chin, Ze Shen (2025). "Dimensional Characterization and Pathway Modeling for Catastrophic AI Risks". arXiv:2508.06411 [cs.CY].

[4] Ord, Toby (2020). The Precipice: Existential Risk and the Future of Humanity. New York: Hachette Books. p. 468. ISBN 9780316484916. Retrieved 29 October 2025.

[5] McLean, Scott; Read, Gemma J. M.; Thompson, Jason; Baber, Chris; Stanton, Neville A.; Salmon, Paul M. (2021). "The risks associated with artificial general intelligence: a systematic review". Journal of Experimental & Theoretical Artificial Intelligence. 35 (4): 1–17. doi:10.1080/0952813X.2021.1964003. Retrieved 29 October 2025.

[superintelligence-6] Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies (First ed.). Oxford: Oxford University Press. ISBN 9780199678112.

[7] PauseAI. "The extinction risk of superintelligent AI". PauseAI. Retrieved 29 October 2025.

[8] Christiano, Paul (21 March 2014). "AI Alignment". Retrieved 30 October 2025.

[9] World Economic Forum (8 October 2024). "AI Value Alignment: Guiding Artificial Intelligence Towards Shared Human Goals". World Economic Forum. Retrieved 27 October 2025.

[10] Mitchell, Melanie (13 December 2022). "What Does It Mean to Align AI With Human Values?". Quanta Magazine. Retrieved 29 October 2025.

[11] Ji, Jiaming; Qiu, Tianyi; Chen, Boyuan (2023). "AI Alignment: A Comprehensive Survey". arXiv:2310.19852 [cs.AI].

[12] Grey, Markov; Segerie, Charbel-Raphaël (2025). "Scalable Oversight". AI Safety Atlas. Retrieved 29 October 2025. This document uses hyperlinked citations throughout the text. Each citation is directly linked to its source using HTML hyperlinks rather than traditional numbered references.

[13] Tegmark, Max; Omohundro, Steve (2023). "Provably safe systems: the only path to controllable AGI". arXiv:2309.01933 [cs.CY].

[14] Grey, Markov; Segerie, Charbel-Raphaël (2025). "Chapter 9 – Interpretability". AI Safety Atlas. Retrieved 29 October 2025.

[15] Shlegeris, Buck; Greenblatt, Ryan (7 May 2024). "The case for ensuring that powerful AIs are controlled". Redwood Research Blog. Retrieved 30 October 2025.

[16] Shlegeris, Buck; Greenblatt, Ryan (7 May 2024). "The case for ensuring that powerful AIs are controlled". Redwood Research Blog. Retrieved 30 October 2025.

[17] Yampolskiy, Roman V. (2020). "On Controllability of AI". arXiv:2008.04071 [cs.CY].

[18] Hagendorff, Thilo (2020). "The Ethics of AI Ethics: An Evaluation of Guidelines". Minds and Machines. 30 (1): 99–120. doi:10.1007/s11023-020-09517-8. Retrieved 30 October 2025.

[19] Varsha, P. S. (2023). "How can we manage biases in artificial intelligence systems – A systematic literature review". International Journal of Information Management Data Insights. 3 (1) 100165. doi:10.1016/j.jjimei.2023.100165. Retrieved 30 October 2025.

[20] Ferrara, Emilio (2024). "Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies". Sci. 6 (1): 3. doi:10.3390/sci6010003.

[21] Artificial Intelligence (AI) end-to-end: The Environmental Impact of the Full AI Lifecycle Needs to be Comprehensively Assessed – Issue Note (Report). United Nations Environment Programme. September 2024. Retrieved 30 October 2025.

[22] Ren, Shaolei; Wierman, Adam (15 July 2024). ""The Uneven Distribution of AI's Environmental Impacts"". Harvard Business Review. Retrieved 30 October 2025.

[23] "Moral Status of Digital Minds". 80,000 Hours. Centre for Effective Altruism. 2023. Retrieved 30 October 2025.

[24] Shulman, Carl; Bostrom, Nick (2021). ""Sharing the World with Digital Minds"". In Steve Clarke; Hazem Zohny; Julian Savulescu (eds.). Rethinking Moral Status. Oxford University Press. pp. 306–326. doi:10.1093/oso/9780192894076.003.0018. ISBN 978-0-19-289407-6. Retrieved 30 October 2025.

[25] Dafoe, Allan (27 August 2018). AI Governance: A Research Agenda (Report). Centre for the Governance of AI. Retrieved 30 October 2025.

[26] Ren, Richard; Basart, Steven; Khoja, Adam (2024). "Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?". arXiv:2407.21792 [cs.LG].

[27] Papagiannidis, Emmanouil; Mikalef, Patrick; Conboy, Kieran (2025). "Responsible artificial intelligence governance: A review and research framework". Journal of Strategic Information Systems. 34 (2): 101885. doi:10.1016/j.jsis.2024.101885. Retrieved 27 October 2025.{{cite journal}}: CS1 maint: article number as page number (link)

[28] Bengio, Yoshua; Hinton, Geoffrey; Yao, Andrew (2024). "Managing extreme AI risks amid rapid progress …". Science. 384 (6698). et al.: 842–845. arXiv:2310.17688. Bibcode:2024Sci...384..842B. doi:10.1126/science.adn0117. PMID 38768279. Retrieved 26 October 2025.

[Bletchley-29] "The Bletchley Declaration by Countries Attending the AI Safety Summit, 1–2 November 2023". UK Government. 2 November 2023. Retrieved 29 October 2025.

[30] Recommendation on the Ethics of Artificial Intelligence (Programme and meeting document). Paris: UNESCO. 2022. SHS/BIO/PI/2021/1. Retrieved 27 October 2025.

[31] Coeckelbergh, Mark (2020). "Artificial Intelligence, Responsibility, and Moral Status". AI & Society. 35 (4): 1033–1040. doi:10.1007/s00146-019-00931-5 (inactive 30 October 2025). Retrieved 30 October 2025.{{cite journal}}: CS1 maint: DOI inactive as of October 2025 (link)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]