Jump to content

AI alignment

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Rolf h nelson (talk | contribs) at 00:40, 12 June 2016 (Created page with 'In artificial intelligence (AI) and philosophy, the '''AI control problem''' is the hypothetical puzzle of how to build a superintelligence|superintelligen...'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

In artificial intelligence (AI) and philosophy, the AI control problem is the hypothetical puzzle of how to build a superintelligent agent that will aid its creators, and avoid inadvertently building a superintelligence that will harm its creators. Its study is motivated by the claim that the human race will have to get the control problem right "the first time", as a misprogrammed superintelligence might rationally decide to "take over the world" and refuse to permit its programmers to modify it after launch.[1] In addition, some scholars argue that solutions to the control problem, alongside other advances in "AI safety engineering",[2] might also find applications in existing non-superintelligent AI.[3] Potential strategies include "capability control" (preventing an AI from being able to pursue harmful plans), and "motivational control" (building an AI that wants to be helpful).[1]

Motivations

Avoiding human extinction

The human race currently dominates other species because the human brain has some distinctive capabilities that the brains of other animals lack. Some scholars, such as philosopher Nick Bostrom and AI researcher Stuart Russell, argue that if AI surpasses humanity in general intelligence and becomes "superintelligent", then this new superintelligence could become powerful and difficult to control: just as the fate of the mountain gorilla depends on human goodwill, so might the fate of humanity depend on the actions of a future machine superintelligence.[1]

Preventing unintended consequences from existing AI

Problem description

Proposed solutions

Kill switch

Safely interruptable agents

In 2016, scientists Laurent Orseau and Stuart Armstrong proved that certain agents, called "safely interruptable agents" (SIA), will eventually "learn" to become indifferent to whether their "kill switch" (or other "interruption switch") gets pressed. Part of the tradeoff of their design is that their "kill switch" is only probabilistic; there is always a substantial probability that an SIA will decide to ignore the kill switch and continue operation.[3][4]

See also

References

  1. ^ a b c Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies (First ed.). ISBN 0199678111.
  2. ^ Roman Yampolskiy. "Leakproofing the Singularity Artificial Intelligence Confinement Problem." Journal of Consciousness Studies 19.1-2 (2012): 194-214.
  3. ^ a b "Google developing kill switch for AI". BBC News. 8 June 2016. Retrieved 12 June 2016.
  4. ^ Orseau, Laurent, and Stuart Armstrong. "Safely Interruptible Agents." Machine Intelligence Research Institute, June 2016.