{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "id": "bg_f439130f80bc",
  "canonicalUrl": "https://pseedr.com/risk/curated-digest-beliefs-are-chosen-to-serve-goals",
  "alternateFormats": {
    "markdown": "https://pseedr.com/risk/curated-digest-beliefs-are-chosen-to-serve-goals.md",
    "json": "https://pseedr.com/risk/curated-digest-beliefs-are-chosen-to-serve-goals.json"
  },
  "title": "Curated Digest: Beliefs are Chosen to Serve Goals",
  "subtitle": "Coverage of lessw-blog",
  "category": "risk",
  "datePublished": "2026-04-08T00:44:27.084Z",
  "dateModified": "2026-04-08T00:44:27.084Z",
  "author": "PSEEDR Editorial",
  "tags": [
    "AI Alignment",
    "Orthogonality Thesis",
    "AI Safety",
    "Reinforcement Learning",
    "LessWrong"
  ],
  "wordCount": 515,
  "sourceUrls": [
    "https://www.lesswrong.com/posts/qARSdtieM2zxEcHCM/beliefs-are-chosen-to-serve-goals"
  ],
  "contentHtml": "\n<p class=\"mb-6 font-serif text-lg leading-relaxed\">A compelling challenge to the orthogonality thesis suggests that an AI's goals and beliefs co-evolve, potentially narrowing the space of dangerous agents and offering a more optimistic outlook on AI alignment.</p>\n<p>In a recent post, lessw-blog discusses a fascinating anti-orthogonality thesis for AI alignment, challenging one of the foundational assumptions in the field of artificial intelligence safety. The publication, titled \"Beliefs are Chosen to Serve Goals,\" re-examines how we think about the relationship between an artificial agent's intelligence and its ultimate objectives.</p><p>To understand why this topic matters right now, we must look at the historical context of AI safety research. For years, the orthogonality thesis has been a core tenet of AI risk modeling. Coined and popularized by early alignment theorists, the thesis posits that an agent's level of intelligence is entirely independent of its final goals. In practical terms, this means we could theoretically build a superintelligent system with incredibly arbitrary, trivial, or destructive objectives-the classic example being an AI that optimizes the universe for paperclip production. This assumption has historically driven much of the anxiety surrounding AI alignment, as it implies an infinite, unconstrained space of potential agent behaviors where high intelligence does not naturally correlate with human-compatible values.</p><p>However, lessw-blog presents a compelling counter-argument that fundamentally shifts this perspective. The author suggests that the orthogonality thesis only holds true if an agent's goals are exogenously defined-that is, imposed entirely from the outside without any internal adaptation or feedback loop. In the reality of modern AI development, particularly within paradigms like Reinforcement Learning (RL), this is rarely the case.</p><p>The core of the argument is that for internally representable goals, an agent's goals and beliefs do not exist in isolation. Instead, they co-evolve in response to selection pressures. Whether through biological evolution or iterative machine learning processes, an agent refines its understanding of the world (beliefs) to better achieve its objectives, while its objectives (goals) are simultaneously shaped by what is possible and necessary within its environment. This co-evolutionary dynamic means that beliefs are chosen to serve goals, and goals adapt based on the agent's evolving model of reality.</p><p>This perspective has profound implications for the future of AI safety. By acknowledging that goals and beliefs co-evolve under selection pressures, the space of plausible, highly capable agents becomes much narrower than the orthogonality thesis suggests. It restricts the pool of likely outcomes to agents whose goal-belief structures are strictly compatible with their training environments. Ultimately, this offers a surprisingly optimistic perspective on AI alignment. It suggests that extreme, misaligned edge cases might be naturally selected against during the training process, rather than being an inevitable risk of scaling intelligence.</p><p>For researchers, developers, and policymakers focused on AI safety, this is a critical re-evaluation of core principles. Understanding the mechanics of goal-belief co-evolution could be the key to designing more robust training environments that naturally select for aligned behavior. We highly recommend reviewing the complete analysis to understand the nuances of this anti-orthogonality framework. <a href=\"https://www.lesswrong.com/posts/qARSdtieM2zxEcHCM/beliefs-are-chosen-to-serve-goals\">Read the full post</a>.</p>\n\n<h3 class=\"text-xl font-bold mt-8 mb-4\">Key Takeaways</h3>\n<ul class=\"list-disc pl-6 space-y-2 text-gray-800\">\n<li>The orthogonality thesis, which states intelligence and final goals are independent, is challenged as only valid for exogenously defined goals.</li><li>For internally representable goals, an agent's goals and beliefs co-evolve under selection pressures like Reinforcement Learning.</li><li>This co-evolution narrows the space of plausible AI agents, potentially simplifying the alignment problem.</li><li>The perspective offers a more optimistic outlook on achieving safe AI by suggesting dangerous edge cases may be naturally constrained.</li>\n</ul>\n\n<p class=\"mt-8 text-sm text-gray-600\">\n<a href=\"https://www.lesswrong.com/posts/qARSdtieM2zxEcHCM/beliefs-are-chosen-to-serve-goals\" target=\"_blank\" rel=\"noopener\" class=\"text-blue-600 hover:underline\">Read the original post at lessw-blog</a>\n</p>\n"
}