# Distinguishing Target States from Success Metrics in AI Alignment

> Coverage of lessw-blog

**Published:** December 22, 2025
**Author:** PSEEDR Editorial
**Category:** risk

**Tags:** AI Alignment, AI Safety, Reinforcement Learning, Agency, Conceptual Framework

**Canonical URL:** https://pseedr.com/risk/distinguishing-target-states-from-success-metrics-in-ai-alignment

---

In a recent post, lessw-blog discusses the conceptual fog surrounding the term "goal" within AI alignment and agency literature, proposing a necessary distinction between external outcomes and internal value signals.

In a recent post, lessw-blog discusses the conceptual ambiguity often found in AI alignment research regarding the definition of a "goal." The article, titled "Two Notions of a Goal: Target States vs. Success Metrics," argues that the field frequently conflates two fundamentally different aspects of agentic behavior: the external outcome the agent seeks and the internal signal used to measure value.

**The Context**

As AI systems transition from passive tools to active agents, defining their objectives becomes a primary safety concern. However, the language used to describe these objectives is frequently imprecise. When researchers speak of an agent's "goal," they might be referring to the physical state of the world the agent is trying to achieve, or they might be referring to the mathematical function the agent is maximizing. This lack of precision can obscure critical failure modes, such as when an AI optimizes its reward function (success metric) in ways that fail to achieve the intended physical outcome (target state).

**The Core Argument**

The post proposes a formal separation between **Target States** and **Success Metrics**. Target states are defined as the specific world configurations an agent's actions aim to actualize. In contrast, success metrics are the magnitudes or scalar values an agent uses to represent the desirability of a given state.

The author uses biological examples to clarify this relationship. For a human, "eating ice cream" is a target state. However, the biological driver-the success metric-is the internal reward signal (such as a dopamine release) that occurs when that state is reached. Consequently, agents typically pursue target states not as ends in themselves, but as the most effective means to optimize their success metrics. In Reinforcement Learning (RL), this distinction helps differentiate between the environment states an agent navigates toward and the numerical output of the reward function.

**Why It Matters**

This distinction is not merely semantic; it provides a framework for untangling more complex alignment concepts. The author suggests that separating these notions helps clarify the relationship between terminal and instrumental goals, as well as the dynamics of inner versus outer optimization. By understanding that an agent's pursuit of a physical state is a downstream consequence of its metric optimization, researchers can better anticipate how agents might behave when those two factors decouple.

For those involved in AI safety, theory, or architecture, this post offers a necessary linguistic patch to improve the precision of technical discourse.

[Read the full post on LessWrong](https://www.lesswrong.com/posts/ZAXEscrsebuwref5Z/two-notions-of-a-goal-target-states-vs-success-metrics)

### Key Takeaways

*   The term "goal" is often overloaded, conflating external outcomes with internal measurements.
*   Target States are the specific world configurations an agent acts to achieve.
*   Success Metrics are the values or signals used to evaluate the desirability of a state.
*   Agents generally pursue target states as a strategy to maximize success metrics.
*   This framework aids in clarifying concepts like inner/outer optimization and terminal/instrumental goals.

[Read the original post at lessw-blog](https://www.lesswrong.com/posts/ZAXEscrsebuwref5Z/two-notions-of-a-goal-target-states-vs-success-metrics)

---

## Sources

- https://www.lesswrong.com/posts/ZAXEscrsebuwref5Z/two-notions-of-a-goal-target-states-vs-success-metrics