Beyond CEV: Grounding AI Alignment in Evolutionary Psychology
Coverage of lessw-blog
A critical examination of Coherent Extrapolated Volition suggests biological realism may offer a more practical foundation for AI value learning.
In a recent post, lessw-blog presents a critique of one of the foundational concepts in AI safety: Coherent Extrapolated Volition (CEV). The article, titled Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV, argues that while CEV has served as a theoretical placeholder for defining "human values," it may be too abstract and computationally intractable to serve as a practical blueprint for aligning advanced artificial intelligence.
The "Alignment Problem" remains the central challenge in AGI development: how do we ensure a superintelligent system pursues goals that are beneficial to humanity? For years, the concept of CEV has been a dominant answer in the safety community. CEV suggests that an AI should not strictly adhere to what humans say they want, or even what they currently think they want. Instead, it should pursue what humans would want if they knew more, thought faster, and were more the people they wished they were. While philosophically robust, translating this idealized extrapolation into code is fraught with difficulty. If the foundation of value definition is shaky, the entire alignment architecture risks collapse.
The author contends that CEV relies on "hand-wavy" definitions and assumes a convergence of human values that might not exist without superhuman computational resources to simulate the extrapolation process. The critique highlights a significant gap between the theoretical elegance of CEV and the engineering reality required to implement it. Specifically, the post questions whether the extrapolation process yields a unique, coherent result, or if it diverges based on initial conditions and the specific definition of "coherence."
As an alternative, the post proposes grounding value learning in evolutionary psychology. Rather than relying on an idealized future version of human psychology, this approach suggests looking at the biological and evolutionary origins of human desires. By understanding the specific adaptations that drive human behavior-the "generator" of our values rather than just the expressed preferences-researchers might construct a more concrete, observable, and stable target for AI value learning systems. This shifts the focus from abstract philosophy to empirical science, potentially offering a more rigorous path toward defining the "human" in "human-compatible AI."
This proposal represents a significant pivot in alignment theory, moving away from theoretical idealization toward biological realism. For researchers and engineers tracking the frontiers of AI safety, this discussion offers a fresh perspective on how we might mathematically define human values.
Read the full post on LessWrong
Key Takeaways
- The post critiques Coherent Extrapolated Volition (CEV) as being too 'hand-wavy' and difficult to implement operationally.
- It questions the assumption that human values will naturally converge into a coherent set without superhuman simulation capabilities.
- Evolutionary psychology is proposed as a more grounded alternative, focusing on the biological origins of human desires.
- The author argues for shifting value learning from abstract philosophical extrapolation to empirical scientific observation.
- This approach aims to solve the ambiguity of 'human values' by identifying the evolutionary adaptations that generate them.