Coherent extrapolated volition
Coherent Extrapolated Volition (CEV) is a theoretical framework in the field of AI alignment, originally proposed by Eliezer Yudkowsky in the early 2000s as part of his work on Friendly AI. It describes an approach by which a superintelligent AI would act not according to humanity's current individual or collective preferences, but instead based on what humans would want—if they were more knowledgeable, more rational, had more time to think, and had matured together as a society.[1]
Concept
[edit]CEV proposes that an advanced AI system should derive its goals by extrapolating the idealized volition of humanity. This means aggregating and projecting human preferences into a coherent utility function that reflects what people would desire under ideal epistemic and moral conditions. Yudkowsky envisioned this process as combining our values in a way that they would converge if we "knew more, thought faster, were more the people we wished we were, had grown up farther together." The aim is to ensure that AI systems are aligned with humanity's true interests, rather than with transient or poorly informed preferences.
Nick Bostrom has similarly characterized CEV as “our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together.” He describes it as the ideal expression of humanity’s collective will, shaped in a way that we ourselves would endorse if we were more morally and epistemically advanced.[2]
Criticism
[edit]Despite its philosophical appeal, CEV faces significant theoretical and practical challenges.
One central critique is that human values are not stable or fixed; rather, they are deeply shaped by context, culture, and environment. The extrapolation of values could therefore lead to distortions, as increasing rationality might change or even replace original desires. It has been warned that using rationality as a tool to define ends might inadvertently overwrite the very volition the AI is supposed to serve, leading to misalignment between AI actions and genuine human values.[3]
In a thought experiment-laden essay, another criticism questions CEV's assumptions about wisdom and extrapolation. It is noted that CEV lacks a theory of which kinds of entities can become wise, or how to model their volition meaningfully. The concern is that not all agents—human or otherwise—can be extrapolated toward rational or moral idealization, and that CEV does not adequately account for these limitations.[4]
In another review, a philosophical analysis explores CEV through the lens of social trust in autonomous systems. Drawing on Anthony Giddens’ concept of "active trust," the author proposes an evolution of CEV into “Coherent, Extrapolated and Clustered Volition” (CECV). This formulation aims to better reflect the moral preferences of diverse cultural groups, thus offering a more pragmatic ethical framework for designing AI systems that earn public trust while accommodating societal diversity.[5]
Yudkowsky's later view
[edit]Over time, Eliezer Yudkowsky himself expressed reservations about CEV. He later described the concept as outdated and warned against conflating it with a practical strategy for AI alignment. While CEV may serve as a philosophical ideal, Yudkowsky emphasized that real-world alignment mechanisms must grapple with greater complexity, including the difficulty of defining and implementing extrapolated values in a reliable way.[6]
See also
[edit]References
[edit]- ^ "Coherent Extrapolated Volition". LessWrong. Retrieved 17 May 2025.
- ^ "Quote by Nick Bostrom: "Our coherent extrapolated volition is our wish ..."". Goodreads. Retrieved 17 May 2025.
- ^ XiXiDu (22 November 2011). "Objections to Coherent Extrapolated Volition". LessWrong. Retrieved 17 May 2025.
- ^ "Coherent Extrapolated Dreaming". Alignment Forum. Retrieved 17 May 2025.
- ^ Sołoducha, Krzysztof. "Analysis of the implications of the Moral Machine project as an implementation of the concept of coherent extrapolated volition for building clustered trust in autonomous machines". CEEOL. Copernicus Center Press. Retrieved 17 May 2025.
- ^ "Coherent Extrapolated Volition". LessWrong. Retrieved 17 May 2025.