Rethinking Agency: The Shift from Utility Functions to Goal-Models

In a recent theoretical post, lessw-blog examines the foundational structures of intelligent agents, proposing a move away from traditional utility functions toward a framework of "goal-models."

In the prevailing paradigms of artificial intelligence and economics, rational agency is almost exclusively defined by the maximization of a utility function. Whether it is a reward signal in Reinforcement Learning (RL) or a loss function in neural network training, the assumption is that an agent's motivation can be collapsed into a scalar value representing "goodness." In a recent analysis, lessw-blog challenges this abstraction, arguing that it may be insufficient for describing complex intelligent behavior. Instead, the author proposes the concept of "goal-models."

The core argument suggests that rather than simply chasing a high reward value, intelligent agents should be viewed as possessing a generative model of a desired future state. This "goal-model" functions analogously to a "world-model." While a world-model represents the agent's best prediction of the current reality (what is), a goal-model represents the agent's prediction of a preferred reality (what should be). This reframing draws inspiration from predictive processing and active inference, where action is driven by the attempt to minimize the divergence between the internal model and external sensory data.

One of the primary advantages of this perspective is the introduction of "distance" into the motivational landscape. In traditional utility frameworks, states are simply assigned values. In a goal-model framework, an agent can calculate the structural distance between its current belief (world-model) and its target (goal-model). This allows for a more nuanced understanding of planning and trajectory than simple reward maximization typically affords.

However, the post acknowledges a significant theoretical hurdle: the definition of a "world-model" itself remains ambiguous in the broader AI community. The author expresses dissatisfaction with defining world-models merely as generative models over observations (e.g., simply predicting the next token or pixel). Such a definition fails to capture the internal coherence required for true understanding.

Instead, lessw-blog suggests that a robust world-model is likely the result of a complex consensus-formation process. The post hypothesizes that an agent's internal representation is composed of numerous smaller generative models that must compete or collaborate to produce a single, unified prediction. The author points to probabilistic dependency graphs as a potential formalism for mapping this internal negotiation. This implies that "goals" and "beliefs" are not static variables but dynamic outputs of an internal consensus mechanism.

While the author describes these concepts as "inchoate," the implications for AI alignment and interpretability are significant. Moving from opaque utility functions to structured goal-models could provide researchers with better tools to inspect what an AI is actually trying to achieve, rather than inferring its intent solely from its behavior or reward signals.

For those interested in the theoretical underpinnings of agent design and the intersection of AI with cognitive science, this post offers a compelling look at the open problems in defining agency.

Read the full post at LessWrong

Key Takeaways

Beyond Utility Functions: The post argues for replacing the concept of scalar utility functions with "goal-models"-generative representations of desired world states.
Predictive Processing Roots: The framework treats beliefs and goals as parallel generative models, where action attempts to minimize the distance between the two.
Metric of Distance: Unlike reward functions, goal-models allow for calculating the structural distance between the current state and the desired state.
Consensus-Based World Models: The author critiques current definitions of world-models, suggesting they should be viewed as consensus mechanisms among many smaller internal models.
Theoretical Foundations: The post highlights that our current definitions of both world-models and goal-models are insufficiently precise for advanced agent design.

Read the original post at lessw-blog

Key Takeaways

Sources