AI alignment

In artificial intelligence (AI) and philosophy, the AI control problem is the hypothetical puzzle of how to build a superintelligent agent that will aid its creators, and avoid inadvertently building a superintelligence that will harm its creators. Its study is motivated by the claim that the human race will have to get the control problem right "the first time", as a misprogrammed superintelligence might rationally decide to "take over the world" and refuse to permit its programmers to modify it after launch.^[1] In addition, some scholars argue that solutions to the control problem, alongside other advances in "AI safety engineering",^[2] might also find applications in existing non-superintelligent AI.^[3] Potential strategies include "capability control" (preventing an AI from being able to pursue harmful plans), and "motivational control" (building an AI that wants to be helpful).^[1]

Motivations

Avoiding human extinction

The human race currently dominates other species because the human brain has some distinctive capabilities that the brains of other animals lack. Some scholars, such as philosopher Nick Bostrom and AI researcher Stuart Russell, argue that if AI surpasses humanity in general intelligence and becomes "superintelligent", then this new superintelligence could become powerful and difficult to control: just as the fate of the mountain gorilla depends on human goodwill, so might the fate of humanity depend on the actions of a future machine superintelligence.^[1]

Preventing unintended consequences from existing AI

Problem description

Proposed solutions

Kill switch

Safely interruptable agents

In 2016, scientists Laurent Orseau and Stuart Armstrong proved that certain agents, called "safely interruptable agents" (SIA), will eventually "learn" to become indifferent to whether their "kill switch" (or other "interruption switch") gets pressed. Part of the tradeoff of their design is that their "kill switch" is only probabilistic; there is always a substantial probability that an SIA will decide to ignore the kill switch and continue operation.^[3]^[4]

References

^ ^a ^b ^c Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies (First ed.). ISBN 0199678111.
^ Roman Yampolskiy. "Leakproofing the Singularity Artificial Intelligence Confinement Problem." Journal of Consciousness Studies 19.1-2 (2012): 194-214.
^ ^a ^b "Google developing kill switch for AI". BBC News. 8 June 2016. Retrieved 12 June 2016.
^ Orseau, Laurent, and Stuart Armstrong. "Safely Interruptible Agents." Machine Intelligence Research Institute, June 2016.

This artificial intelligence-related article is a stub. You can help Wikipedia by expanding it.

[superintelligence-1] Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies (First ed.). ISBN 0199678111.

[2] Roman Yampolskiy. "Leakproofing the Singularity Artificial Intelligence Confinement Problem." Journal of Consciousness Studies 19.1-2 (2012): 194-214.

[bbc-google-3] "Google developing kill switch for AI". BBC News. 8 June 2016. Retrieved 12 June 2016.

[4] Orseau, Laurent, and Stuart Armstrong. "Safely Interruptible Agents." Machine Intelligence Research Institute, June 2016.

[1]

[2]

[3]

[4]

v t e Existential risk from artificial intelligence
Concepts	AGI AI alignment AI boom AI capability control AI safety AI takeover Consequentialism Effective accelerationism Ethics of artificial intelligence Existential risk from artificial intelligence Friendly artificial intelligence Instrumental convergence Vulnerable world hypothesis Intelligence explosion Longtermism Machine ethics Suffering risks Superintelligence Technological singularity
Organizations	Alignment Research Center Center for AI Safety Center for Applied Rationality Center for Human-Compatible Artificial Intelligence Centre for the Study of Existential Risk EleutherAI Future of Humanity Institute Future of Life Institute Google DeepMind Humanity+ Institute for Ethics and Emerging Technologies Leverhulme Centre for the Future of Intelligence Machine Intelligence Research Institute OpenAI
People	Scott Alexander Sam Altman Yoshua Bengio Nick Bostrom Paul Christiano Eric Drexler Sam Harris Stephen Hawking Dan Hendrycks Geoffrey Hinton Bill Joy Shane Legg Elon Musk Steve Omohundro Huw Price Martin Rees Stuart J. Russell Jaan Tallinn Max Tegmark Frank Wilczek Roman Yampolskiy Eliezer Yudkowsky
Other	Roko's basilisk Statement on AI risk of extinction Human Compatible Open letter on artificial intelligence (2015) Our Final Invention The Precipice Superintelligence: Paths, Dangers, Strategies Do You Trust This Computer? Artificial Intelligence Act
Category