# Research Update: The 'Understanding Trust' Project in AI Safety

> Coverage of lessw-blog

**Published:** January 17, 2026
**Author:** PSEEDR Editorial
**Category:** risk
**Content tier:** free
**Accessible for free:** true


**Word count:** 412


**Tags:** AI Safety, Decision Theory, Research Transparency, Mentorship, LessWrong

**Canonical URL:** https://pseedr.com/risk/research-update-the-understanding-trust-project-in-ai-safety

---

In a recent update on LessWrong, the author provides a progress report on the 'Understanding Trust' research initiative, detailing the project's output from 2025 and roadmap for 2026.

In a recent post, lessw-blog discusses the current status and future trajectory of the "Understanding Trust" project. This initiative sits at the intersection of decision theory and AI safety, aiming to formalize how autonomous systems can operate reliably within human-aligned parameters. As AI systems become increasingly agentic, the theoretical underpinnings of "trust"—moving beyond simple reliability to complex game-theoretic and decision-theoretic frameworks—become critical for safe deployment.

The update, prompted by Manifund, highlights the project's dual focus on theoretical research and community mentorship. A significant portion of the work was conducted through an AISC (AI Safety Camp) project in Spring 2025. This collaboration involved four mentees—Norman Hsia, Hanna Gabor, Paul Rapoport, and Roman Malov—and resulted in tangible academic output. Specifically, the collaboration led to the ILIAD 2024 paper titled "Understanding Trust," co-authored by Hsia and Rapoport. This paper served as both a foundational input for the group's discussions and a formalized output of their research efforts.

Beyond the academic papers, the project is experimenting with radical transparency in the research process. The author notes that weekly project meetings have been recorded and are being processed for public release. Utilizing "AI-orchestrated edits," these videos aim to provide the community with an unvarnished look at the thinking processes involved in high-level AI safety research. This approach addresses a common gap in the field: while finished papers are readily available, the dialectic process and intermediate steps often remain opaque.

With funding secured for both 2025 and 2026, the project is positioned to continue its investigation into the mechanics of trust. For researchers and observers in the AI alignment space, this update signals not only the continuation of specific theoretical work but also a commitment to fostering the next generation of safety researchers through active mentorship and open-source knowledge sharing.

We recommend reading the full update to understand the scope of the research and access the associated materials.

[Read the full post](https://www.lesswrong.com/posts/yig4LeEfpkFfiWpk2/understanding-trust-project-update)

### Key Takeaways

*   The 'Understanding Trust' project has secured funding through 2026 to continue research into AI safety and decision theory.
*   Spring 2025 activities included an AISC mentorship program that produced the ILIAD 2024 paper.
*   The project is releasing recorded research meetings, edited via AI, to provide transparency into the theoretical development process.
*   The initiative emphasizes both academic output and the mentorship of new researchers in the field.

[Read the original post at lessw-blog](https://www.lesswrong.com/posts/yig4LeEfpkFfiWpk2/understanding-trust-project-update)

---

## Sources

- https://www.lesswrong.com/posts/yig4LeEfpkFfiWpk2/understanding-trust-project-update