Quantifying Risk Tolerance: Community Polls on AGI Safety Thresholds

In a recent update, lessw-blog directs attention to a series of polls designed to gauge the AI safety community's specific definitions of "doom" and acceptable risk thresholds.

In a recent post, lessw-blog highlights a community effort to quantify the acceptable thresholds of risk associated with Artificial General Intelligence (AGI). While the post itself acts as a logistical bridge-necessitated by a lack of native polling features on the LessWrong platform-it points to a significant conversation happening on the Effective Altruism (EA) Forum regarding the formalization of existential risk metrics.

The Context: From Abstract Fear to Quantifiable Risk

In established engineering disciplines like aviation or nuclear power, safety is not a binary state but a calculated probability. Engineers work within defined margins of error, striving for reliability standards that render catastrophic failure statistically negligible. In the domain of AGI, however, such standards are currently non-existent. The conversation often revolves around "P(doom)"-the subjective probability that AGI development will lead to catastrophic outcomes-but this metric is frequently discussed without precise definitions or agreed-upon limits.

This lack of standardization presents a major hurdle for governance. Does "doom" imply human extinction, or does it include scenarios of permanent disempowerment where humanity survives but loses control of its future? Furthermore, at what statistical probability does the risk become too high to proceed with training or deployment? Without a consensus on these variables, regulatory frameworks lack a concrete target.

The Signal: Aggregating Community Sentiment

The initiative highlighted by lessw-blog seeks to bridge this gap by gathering granular data. By directing readers to the EA Forum, the author aims to aggregate sentiment on three specific axes:

Defining the Threat Model: The polls attempt to disentangle different types of failure modes, specifically distinguishing between large-scale mortality events and scenarios of total human disempowerment.
Forecasting Risk: The survey asks participants to provide their subjective probability estimates for these negative outcomes, attempting to turn vague concerns into actionable data points.
Establishing the "Stop" Threshold: Perhaps most critically, the polls seek to determine the minimum acceptable probability of doom under which AGI development should be halted. This asks the community to define the exact point where the potential benefits of AGI are outweighed by the risk of ruin.

Why It Matters

For industry observers, policy analysts, and safety researchers, these polls offer a window into the evolving risk tolerance of the field. Moving the safety debate from qualitative philosophy to quantitative risk assessment is a necessary step for the maturation of the industry. If the community most knowledgeable about these systems cannot agree on what constitutes an unacceptable risk, enforcing safety standards at a policy level becomes significantly more difficult.

We recommend reading the full post and exploring the linked polls to understand the current sentiment regarding AGI existential risk.

Read the full post

Key Takeaways

The post facilitates a survey on the definitions of AGI 'doom', distinguishing between mass mortality and human disempowerment.
It seeks to aggregate community forecasts regarding the probability of these catastrophic outcomes.
A primary goal is to identify a consensus on the 'stop threshold'-the specific risk probability at which AGI development should be halted.
The initiative highlights the need to move from abstract safety discussions to quantifiable risk assessments similar to other engineering fields.

Read the original post at lessw-blog

Key Takeaways

Sources