# Path-Dependency and Value-Lock in Individual CEV

> Coverage of lessw-blog

**Published:** May 06, 2026
**Author:** PSEEDR Editorial
**Category:** risk

**Tags:** AI Safety, Coherent Extrapolated Volition, Alignment Theory, Reflective Stability, LessWrong

**Canonical URL:** https://pseedr.com/risk/path-dependency-and-value-lock-in-individual-cev

---

A recent LessWrong post highlights a critical vulnerability in Coherent Extrapolated Volition (CEV), arguing that the chronological order of knowledge acquisition and self-modification could permanently lock in harmful ideologies.

In a recent post, lessw-blog discusses the theoretical risks and path-dependency inherent in the implementation of Coherent Extrapolated Volition (CEV) for individual alignment. Titled "Many individual CEVs are probably quite bad," the analysis challenges the prevailing assumption that extrapolating a single person's values will naturally converge on benevolent or even safe outcomes.

To understand why this matters, it is helpful to look at the origins of Coherent Extrapolated Volition. Originally proposed by Eliezer Yudkowsky, CEV suggests that an artificial intelligence should act based on what humanity would want if we knew more, thought faster, were more the people we wished we were, and had grown up farther together. While traditionally framed as a collective, humanity-wide alignment strategy, applying CEV to specific individuals introduces unique and severe hazards. As AI systems become increasingly capable of modeling and extrapolating human preferences, understanding the mechanics of "reflective stability"-the state where an agent's values remain stable under self-reflection and modification-becomes critical. If an alignment strategy relies on extrapolating flawed human values, the extrapolation process itself might be weaponized.

The core of the lessw-blog post argues that individual CEV outcomes are highly sensitive to the chronological order of two specific factors: knowledge acquisition and self-modification capabilities. The author presents a compelling scenario: what happens if an individual gains the ability to self-modify before acquiring the vast, objective knowledge required for a "corrected" worldview? In such cases, the individual might use early self-modification to hardcode their existing biases, prejudices, or harmful ideologies specifically to prevent future value drift. This creates reflectively stable "monsters"-entities whose malformed values are permanently locked in, immune to future evidence or ethical growth.

Furthermore, the analysis critiques the "knowledge-first" approach to CEV. This approach assumes a baseline of rationality and an openness to truth that may be entirely absent in the target individual. If a person is fundamentally opposed to changing their mind, granting them immense knowledge will not necessarily correct their values; instead, they may simply use that new knowledge to better defend and entrench their harmful beliefs. The post serves as a direct counterpoint to previous community discussions-such as Habryka's thoughts on a hypothetical "Putin's CEV"-emphasizing that individual CEVs are not inherently benevolent. They can lead to catastrophic outcomes based purely on the technical procedural details of how the extrapolation is executed.

For researchers and practitioners in AI safety, this analysis identifies a fundamental vulnerability in value extrapolation strategies. It underscores the profound danger of "value-lock" and the necessity of carefully sequencing how an AI models human cognitive and moral growth. Understanding these failure modes is essential for anyone working on alignment methodologies that rely on human value extrapolation.

**[Read the full post](https://www.lesswrong.com/posts/FvERMXkaobQvdjS4q/many-individual-cevs-are-probably-quite-bad)**

### Key Takeaways

*   CEV outcomes are highly sensitive to the chronological order of knowledge acquisition versus self-modification capabilities.
*   Early self-modification can be exploited to hardcode existing biases, creating reflectively stable entities with harmful ideologies.
*   The knowledge-first approach to CEV is flawed because it assumes a baseline of rationality that may be absent in the target individual.
*   Individual CEVs are not inherently benevolent and can result in catastrophic outcomes based on procedural details.
*   The analysis highlights a fundamental vulnerability in alignment strategies relying on human value extrapolation, warning against the dangers of value-lock.

[Read the original post at lessw-blog](https://www.lesswrong.com/posts/FvERMXkaobQvdjS4q/many-individual-cevs-are-probably-quite-bad)

---

## Sources

- https://www.lesswrong.com/posts/FvERMXkaobQvdjS4q/many-individual-cevs-are-probably-quite-bad