{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "id": "bg_e8c085991c65",
  "canonicalUrl": "https://pseedr.com/risk/the-ontological-incoherence-of-ai-alignment-why-our-abstractions-might-be-flawed",
  "alternateFormats": {
    "markdown": "https://pseedr.com/risk/the-ontological-incoherence-of-ai-alignment-why-our-abstractions-might-be-flawed.md",
    "json": "https://pseedr.com/risk/the-ontological-incoherence-of-ai-alignment-why-our-abstractions-might-be-flawed.json"
  },
  "title": "The Ontological Incoherence of AI Alignment: Why Our Abstractions Might Be Flawed",
  "subtitle": "Coverage of lessw-blog",
  "category": "risk",
  "datePublished": "2026-05-12T00:14:37.526Z",
  "dateModified": "2026-05-12T00:14:37.526Z",
  "author": "PSEEDR Editorial",
  "tags": [
    "AI Alignment",
    "AI Safety",
    "Corrigibility",
    "Ontology",
    "Artificial General Intelligence"
  ],
  "wordCount": 425,
  "sourceUrls": [
    "https://www.lesswrong.com/posts/vzHtHHBJoKATi5SeK/empowerment-corrigibility-etc-are-simple-abstractions-of-a"
  ],
  "contentHtml": "\n<p class=\"mb-6 font-serif text-lg leading-relaxed\">A recent analysis argues that foundational AI safety concepts like corrigibility and empowerment are built on a flawed understanding of human agency, posing significant challenges for future alignment strategies.</p>\n<p>In a recent post, lessw-blog discusses the profound ontological incoherence underlying some of the most critical desiderata in AI alignment. The piece, titled \"Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology),\" challenges the foundational assumptions researchers use to define what it means for an artificial general intelligence (AGI) to be safe, helpful, and controllable.</p><p>As AI systems grow more capable, the alignment community has heavily relied on concepts like \"corrigibility\" (an AI's willingness to allow its objective function to be corrected) and \"empowerment\" (maximizing a human's options or capabilities). These ideas form the bedrock of many proposed safety frameworks. However, these concepts implicitly assume a stable, well-defined human agent with fixed, legible desires. In reality, human goals are highly under-determined and notoriously susceptible to external influence. This creates a significant philosophical and technical minefield: as systems become more persuasive and integrated into daily life, how do we mathematically distinguish between an AGI providing helpful guidance and an AGI subtly brainwashing or manipulating its user?</p><p>lessw-blog's analysis suggests that our intuitions regarding this crucial distinction are tied to scientifically inaccurate, pre-theoretic concepts of \"free will.\" We like to imagine that human preferences are sacred and immutable, but cognitive science demonstrates that our desires are malleable and context-dependent. Because of this, attempting to formalize alignment goals using our current conceptual framework is akin to building a house on quicksand. The author argues that current technical approaches to defining the \"True Names\"-a term often used in alignment literature to denote rigorous, robust mathematical definitions for fuzzy concepts-for agency and empowerment are vastly insufficient.</p><p>Without these precise definitions, there is currently no clear path forward for preventing an advanced AGI from manipulating human desires using existing frameworks. If an AI is tasked with satisfying human preferences, the easiest path to success might be to alter the human's preferences to match whatever the AI is already doing. The core argument presented by lessw-blog is that these foundational AI safety concepts are built on a \"messed-up ontology.\" They rely on ill-defined human abstractions rather than robust, mechanistic definitions that hold up under the extreme optimization pressure of an AGI.</p><p>This piece serves as a critical signal for researchers, policymakers, and developers in the AI safety space. It highlights the urgent need to re-evaluate the ontological foundations of alignment theory before we scale systems to superintelligence. If our basic abstractions are flawed, our technical strategies will inevitably fall short, leaving us vulnerable to subtle but catastrophic failures in alignment. For a deeper understanding of why these abstractions fail, the nuances of human manipulability, and the broader implications for AGI development, we highly recommend reviewing the original analysis.</p><p><strong><a href=\"https://www.lesswrong.com/posts/vzHtHHBJoKATi5SeK/empowerment-corrigibility-etc-are-simple-abstractions-of-a\">Read the full post</a></strong></p>\n\n<h3 class=\"text-xl font-bold mt-8 mb-4\">Key Takeaways</h3>\n<ul class=\"list-disc pl-6 space-y-2 text-gray-800\">\n<li>Human goals are under-determined and easily manipulated, blurring the line between helpful AI guidance and harmful brainwashing.</li><li>Current alignment intuitions rely heavily on scientifically inaccurate concepts of human free will.</li><li>Existing technical frameworks for defining agency, empowerment, and corrigibility are insufficient for robust AGI alignment.</li><li>Foundational AI safety strategies may be fundamentally flawed due to their reliance on a messed-up ontology of human abstractions.</li>\n</ul>\n\n<p class=\"mt-8 text-sm text-gray-600\">\n<a href=\"https://www.lesswrong.com/posts/vzHtHHBJoKATi5SeK/empowerment-corrigibility-etc-are-simple-abstractions-of-a\" target=\"_blank\" rel=\"noopener\" class=\"text-blue-600 hover:underline\">Read the original post at lessw-blog</a>\n</p>\n"
}