Quantifying Safety: An Empirical Analysis of AI Lab Research Priorities
Coverage of lessw-blog
A recent LessWrong post attempts to measure the proportion of research output dedicated to AI safety across OpenAI, Anthropic, and Google DeepMind, challenging common intuitions with data.
In a recent post on LessWrong, a contributor explores a fundamental question facing the artificial intelligence industry: how does the public perception of AI safety prioritization align with measurable research output? The post, titled "How Much of AI Labs' Research Is Safety?", attempts to quantify the research portfolios of major players-specifically OpenAI, Anthropic, and Google DeepMind-to see if the data supports prevailing community narratives.
The Context: Reputation vs. Reality
As frontier models become increasingly capable, the balance between advancing capabilities and ensuring safety has become a primary metric for evaluating AI laboratories. A common heuristic in the technical community suggests a hierarchy of safety dedication, often placing Anthropic at the top-due to its "Constitutional AI" branding and origins-followed variously by Google DeepMind and OpenAI. However, these reputations are frequently built on mission statements, media narratives, and high-profile departures rather than hard data. For observers, policymakers, and researchers, distinguishing between marketing positioning and actual resource allocation is becoming increasingly difficult.
The Analysis
The author approaches this problem programmatically, analyzing publication data to test these intuitions. The methodology involves examining specific datasets for each lab:
- OpenAI: 59 posts spanning 2016 to 2025.
- Anthropic: 86 posts from 2021 to 2025.
- Google DeepMind: 233 papers, with an index starting in 2023.
By categorizing these outputs, the analysis seeks to determine what percentage of a lab's public-facing intellectual contribution is explicitly focused on safety versus general capabilities or product announcements.
Epistemic Limitations
Crucially, the post addresses the "epistemic status" of such an analysis. The author candidly notes that while the measurement of publication counts is accurate, it has a "dubious connection to the latent variable of interest." In other words, public research output is an imperfect proxy for internal engineering hours, budget allocation, or corporate culture. A lab might conduct extensive safety testing that never results in a public paper, or conversely, publish theoretical safety work that is not implemented in their models. Despite these limitations, in an industry often opaque about its internal workings, external signals like publication volume remain one of the few objective metrics available to independent observers.
This analysis is valuable not necessarily for providing a definitive ranking, but for highlighting the gap between perceived safety commitments and verifiable public contributions. It encourages a shift from relying on corporate vibes to demanding empirical evidence of safety work.
To review the methodology and the specific breakdown of the findings, we recommend reading the full analysis.
Read the full post on LessWrong
Key Takeaways
- The analysis tests the intuition that Anthropic leads in safety focus, followed by DeepMind and OpenAI.
- Methodology involves programmatically categorizing hundreds of publications and blog posts from the major labs.
- The author highlights the difficulty of using public output as a proxy for internal safety culture and resource allocation.
- The post serves as a critique of relying on reputation rather than measurable data when assessing AI risk mitigation.