# Tracking Dangerous Capabilities: The Launch of TakeOverBench.com

> Coverage of lessw-blog

**Published:** January 22, 2026
**Author:** PSEEDR Editorial
**Category:** risk
**Content tier:** free
**Accessible for free:** true



**Word count:** 385


**Tags:** AI Safety, Existential Risk, Benchmarks, AI Governance, LessWrong

**Canonical URL:** https://pseedr.com/risk/tracking-dangerous-capabilities-the-launch-of-takeoverbenchcom

---

In a recent post on LessWrong, the teams behind PauseAI and the Existential Risk Observatory introduced TakeOverBench.com, a dedicated project aimed at quantifying the progress of AI systems toward specific takeover scenarios.

In a recent post on LessWrong, the teams behind PauseAI and the Existential Risk Observatory introduced **TakeOverBench.com**, a dedicated project aimed at quantifying the progress of AI systems toward specific takeover scenarios. While the rapid advancement of Large Language Models (LLMs) is well-documented through general performance benchmarks, there has been a notable lack of centralized tracking for capabilities that specifically contribute to existential risks.

**The Context: Measuring the Unmeasurable**  
The field of AI safety often grapples with the difficulty of operationalizing abstract threats. While we have robust metrics for coding proficiency or creative writing, measuring an AI's potential for "loss of control" remains a complex challenge. Without concrete data, policy discussions regarding existential risk can become speculative. This new initiative seeks to bridge that gap by aggregating State-of-the-Art (SOTA) benchmark data and mapping it against specific threat models.

**The Gist: Nine Dimensions of Risk**  
The core of the TakeOverBench initiative is the tracking of nine "dangerous capabilities" originally identified in the 2023 paper _Model evaluation for extreme risks_. These capabilities include:

*   Cyber-offense
*   Deception
*   Persuasion and manipulation
*   Political strategy
*   Weapons acquisition
*   Long-horizon planning
*   AI development
*   Situational awareness
*   Self-proliferation

By monitoring these specific vectors, the project aims to visualize the trajectory toward four distinct takeover scenarios. The authors argue that progress in these areas is currently worrying and that a dedicated dashboard is necessary to keep researchers, policymakers, and the public informed about how close the technology is to critical safety thresholds.

**Identifying the Blind Spots**  
Crucially, the post highlights a significant gap in the current evaluation landscape: many leading models have outdated or missing scores for these specific risk metrics. By launching this benchmark, the authors hope to incentivize more rigorous and frequent evaluation of frontier models against safety-critical standards, rather than just performance-based ones.

For professionals involved in AI governance, safety research, or risk assessment, this project represents a vital attempt to move from theoretical concern to empirical tracking.

[Read the full post on LessWrong](https://www.lesswrong.com/posts/RQk34g37WmxnDcjte/releasing-takeoverbench-com-a-benchmark-for-ai-takeover)

### Key Takeaways

*   TakeOverBench.com was launched by PauseAI and the Existential Risk Observatory to track AI takeover risks.
*   The benchmark monitors nine specific 'dangerous capabilities,' including cyber-offense, deception, and self-proliferation.
*   The framework is grounded in the 2023 paper 'Model evaluation for extreme risks.'
*   The project highlights a lack of up-to-date safety evaluations for current frontier models.
*   The goal is to provide concrete data to inform policymakers and researchers about the proximity to loss-of-control scenarios.

[Read the original post at lessw-blog](https://www.lesswrong.com/posts/RQk34g37WmxnDcjte/releasing-takeoverbench-com-a-benchmark-for-ai-takeover)

---

## Sources

- https://www.lesswrong.com/posts/RQk34g37WmxnDcjte/releasing-takeoverbench-com-a-benchmark-for-ai-takeover
