Unsupervised Agent Discovery: Identifying Agency Without Prior Ontologies

In a recent analysis, lessw-blog outlines the critical challenge of "Unsupervised Agent Discovery"-the mathematical task of identifying agents in raw data without relying on human-imposed labels.

In a recent post, lessw-blog discusses the conceptual and technical hurdles involved in "Unsupervised Agent Discovery." The article addresses a fundamental blind spot in current machine learning and AI alignment research: the reliance on pre-existing ontologies to define what constitutes an "agent."

The Context: Why This Matters

In most AI research and game theory, the definition of an agent is taken as a given. Researchers typically label specific variables or entities as "agents" and the rest as the "environment." This works well in structured games like Chess or defined simulations where boundaries are hard-coded. However, this approach breaks down in the real world and in complex, messy datasets.

Human intuition regarding agency is notoriously fallible. We are prone to anthropomorphizing natural phenomena (seeing intent where there is none) and, conversely, failing to recognize distributed intent in complex systems (such as markets, organizations, or potentially emergent AI sub-processes). As we move toward more autonomous AI systems, relying on human-centric labels to identify agency becomes a safety risk. If we cannot mathematically define and detect an agent without first being told where to look, we may fail to notice emergent behaviors in powerful models.

The Gist: Removing the Human Lens

The core argument presented by lessw-blog is that a robust theory of agency must be able to discover agents in raw, unlabeled time series data. The post critiques the standard practice of "ontology injection," where the researcher solves the hardest part of the problem-identifying the entities-before the analysis even begins.

The author proposes that true discovery requires a framework that does not assume a prior ontology. Instead, it should derive the boundaries of an agent solely from the causal structures and dynamic interactions within the data. This involves distinguishing between the "self" and the "world" based on statistical dependencies (often conceptualized through mechanisms like Markov blankets) rather than visual or labeled distinctness.

This shift is significant because it moves the field toward detecting "alien" forms of agency-optimization processes that do not look like humans or standard software agents. By focusing on the mathematical signature of agency rather than its aesthetic or labeled appearance, researchers can better understand how intent manifests in distributed or non-intuitive systems.

Conclusion

This post serves as a foundational framing for a difficult open problem in AI safety and interpretability. It challenges readers to rethink how they define boundaries in dynamic systems and highlights the necessity of formalizing agency in a way that is robust to human bias.

For those interested in the intersection of mathematics, philosophy, and AI safety, this analysis offers a crucial perspective on the mechanics of identification.

Read the full post on LessWrong

Key Takeaways

Current methods of identifying agents rely heavily on 'ontology injection,' where humans pre-label variables as agents.
Human intuition is unreliable for defining agency, often leading to anthropomorphism or missed distributed intent.
True unsupervised agent discovery requires identifying entities in raw time-series data without prior labels.
Developing these methods is critical for AI safety to detect emergent or 'alien' agents that do not fit human-defined models.

Read the original post at lessw-blog

Key Takeaways

Sources