Distinguishing Coherence from Representation in Decision Theory

A recent inquiry on LessWrong challenges the conflation of two foundational concepts in decision theory, prompting a re-evaluation of how we justify rational agent behavior.

In a recent post, lessw-blog hosts a critical discussion regarding the foundational mathematics of rational decision-making. The author poses a deceptively simple question: What is the precise difference between "coherence theorems" and "representation theorems"? While often used interchangeably in high-level discussions about Artificial Intelligence safety, the post suggests that distinguishing between the two is vital for constructing robust models of agent rationality.

Contextualizing the Debate
For researchers in AI alignment and game theory, the assumption that a sufficiently advanced agent will act to maximize expected utility is central. This assumption is usually backed by mathematical proofs. However, the specific nature of these proofs matters. If we misunderstand the derivation of rational behavior, we risk misinterpreting how an AI might behave under pressure or how it might organize its internal goals.

The distinction is generally framed as follows: Coherence theorems are pragmatic warnings. They imply that if an agent's preferences do not adhere to specific rules (axioms), the agent is vulnerable to exploitation-such as "Dutch booking," where the agent is guaranteed to lose resources. Representation theorems, conversely, are structural descriptions. They demonstrate that if an agent's preferences follow certain axioms, those preferences can be mathematically represented as the maximization of a utility function.

The Core Argument
The LessWrong post highlights a gap in the current literature. It notes that resources like the AI safety wiki "Stampy" do not clearly demarcate these concepts, potentially leading to confusion. The discussion touches on Savage's subjective expected utility model, debating whether it falls strictly under representation or if it bridges the gap. The author argues that conflating "avoiding loss" (coherence) with "having a utility function" (representation) obscures the different mechanisms that might drive an AI toward rational behavior.

This is not merely a semantic dispute; it impacts how we predict the behavior of future systems. If an AI is rational only because it seeks to avoid resource forfeiture (coherence), its behavior might differ from an AI that is structurally designed to maximize a specific utility function (representation).

We recommend this post to readers interested in the mathematical underpinnings of AI safety and decision theory. The ensuing community discussion offers valuable insights into the axioms that define our expectations of machine intelligence.

Read the full post on LessWrong

Key Takeaways

Coherence theorems are framed as arguments against exploitability (e.g., avoiding Dutch books).
Representation theorems demonstrate that rational preferences can be mathematically modeled as expected utility maximization.
Current AI safety literature often conflates these two distinct mathematical justifications.
The classification of foundational models, such as Savage's subjective expected utility, remains a point of nuance.
Distinguishing these concepts is crucial for accurate modeling of AI agent incentives and behavior.

Read the original post at lessw-blog

Key Takeaways

Sources