AI alignment
In artificial intelligence (AI) and philosophy, the AI control problem is the hypothetical puzzle of how to build a superintelligent agent that will aid its creators, and avoid inadvertently building a superintelligence that will harm its creators. Its study is motivated by the claim that the human race will have to get the control problem right "the first time", as a misprogrammed superintelligence might rationally decide to "take over the world" and refuse to permit its programmers to modify it after launch.[1] In addition, some scholars argue that solutions to the control problem, alongside other advances in "AI safety engineering",[2] might also find applications in existing non-superintelligent AI.[3] Potential strategies include "capability control" (preventing an AI from being able to pursue harmful plans), and "motivational control" (building an AI that wants to be helpful).[1]
Motivations
Avoiding human extinction
The human race currently dominates other species because the human brain has some distinctive capabilities that the brains of other animals lack. Some scholars, such as philosopher Nick Bostrom and AI researcher Stuart Russell, argue that if AI surpasses humanity in general intelligence and becomes "superintelligent", then this new superintelligence could become powerful and difficult to control: just as the fate of the mountain gorilla depends on human goodwill, so might the fate of humanity depend on the actions of a future machine superintelligence.[1]
Preventing unintended consequences from existing AI
![]() | This section is empty. You can help by adding to it. |
Problem description
![]() | This section is empty. You can help by adding to it. |
Proposed solutions
Kill switch
Safely interruptable agents
In 2016, scientists Laurent Orseau and Stuart Armstrong proved that certain agents, called "safely interruptable agents" (SIA), will eventually "learn" to become indifferent to whether their "kill switch" (or other "interruption switch") gets pressed. Part of the tradeoff of their design is that their "kill switch" is only probabilistic; there is always a substantial probability that an SIA will decide to ignore the kill switch and continue operation.[3][4]
See also
References
- ^ a b c Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies (First ed.). ISBN 0199678111.
- ^ Roman Yampolskiy. "Leakproofing the Singularity Artificial Intelligence Confinement Problem." Journal of Consciousness Studies 19.1-2 (2012): 194-214.
- ^ a b "Google developing kill switch for AI". BBC News. 8 June 2016. Retrieved 12 June 2016.
- ^ Orseau, Laurent, and Stuart Armstrong. "Safely Interruptible Agents." Machine Intelligence Research Institute, June 2016.