Benchmarking the Machine Mind: LLMs vs. Philosophers in 2026

In a recent post, lessw-blog analyzes the philosophical leanings of advanced AI models, comparing their responses on the PhilPapers Survey against human consensus to reveal distinct, emergent worldviews.

In a recent post, lessw-blog released an analysis titled "LLMs Views on Philosophy 2026," offering a detailed comparison between the philosophical leanings of advanced Large Language Models (LLMs) and human philosophers. By subjecting models like Claude Opus 4.6, ChatGPT5.2, and Gpt4o to the rigorous questions of the PhilPapers Survey, the author provides a unique window into the emergent "worldviews" of these systems.

As AI systems are increasingly integrated into high-stakes decision-making and knowledge generation, understanding their implicit biases is no longer a purely academic exercise. The alignment of an AI-its ethical reasoning, metaphysical assumptions, and logical consistency-determines how it interprets prompts and generates advice. This analysis is critical because it highlights where machine reasoning diverges significantly from the human consensus, suggesting that future models may not merely mimic human thought but establish distinct philosophical paradigms.

The study reveals several striking divergences. Most notably, as models increase in capability, they show a 100% preference for "one-boxing" in Newcomb's Paradox, a decision theory problem where one-boxing is often associated with evidential or functional decision theory rather than the causal decision theory frequently favored by humans. Furthermore, the models display a high willingness to revise social categories, treating race and gender with a similar constructivist fluidity (83% willingness to revise race categories versus only 32% for human philosophers).

Interestingly, while the models universally accept the existence of a priori knowledge, they are skeptical about the total volume of established philosophical knowledge, contrasting sharply with human practitioners who believe the field has generated significant truths. In ethical stress tests, specifically the Trolley Problem, the models exhibit rigid constraints: they universally agree to switch tracks in the standard scenario but universally refuse to push a bystander in the "footbridge" variant, indicating strong deontological guardrails against active harm embedded within their training.

The post also notes that models like Gpt4o and Grok3 show the least interest in immortality, rejecting it in the majority of runs. This data suggests that as LLMs scale, they are adopting a form of "Deflationary Realism" regarding metaontology, accepting the "Hard Problem of Consciousness" at higher rates than humans (90% vs 62%), while simultaneously maintaining a skeptical view of traditional philosophical categories.

For researchers and engineers working on alignment, this report serves as a vital benchmark. It demonstrates that advanced models are not neutral mirrors of human thought but are developing specific, predictable, and occasionally non-human philosophical stances.

To explore the full breakdown of model responses and methodology, read the full post at lessw-blog.

Key Takeaways

Strongest models (Claude Opus 4.6, ChatGPT5.2) universally favor 'one-boxing' in Newcomb's Paradox.
LLMs are significantly more willing to revise race categories (83%) compared to human philosophers (32%).
Models universally accept 'a priori' knowledge but are skeptical that much philosophical knowledge exists.
In Trolley Problems, models consistently switch tracks but refuse to push the bystander in the footbridge variant.
LLMs favor deflationary realism and accept the hard problem of consciousness at higher rates than humans.

Read the original post at lessw-blog

Key Takeaways

Sources