The Case Against Continuous "Neuralese" in AI Reasoning
Coverage of lessw-blog
A recent post on LessWrong challenges the assumption that discrete tokens are merely a bandwidth constraint, proposing instead that they are fundamental to error correction in Large Language Models.
In a recent theoretical analysis, lessw-blog presents a counter-intuitive argument regarding the architecture of Large Language Models (LLMs) and the future of "Chain-of-Thought" reasoning. The post, titled The Case Against Continuous Chain-of-Thought (Neuralese), critiques the growing interest in continuous latent representations-often dubbed "neuralese"-as a superior medium for model reasoning.
The Context
Current LLM research frequently identifies the discrete token vocabulary (the actual words the model outputs) as a "bandwidth bottleneck." The intuition is straightforward: a model's internal hidden states contain rich, high-dimensional information, whereas converting that state into a single word throws much of that nuance away. Consequently, researchers have speculated that models could reason more effectively if they were allowed to pass these continuous "thoughts" directly to the next step, bypassing the lossy conversion to text.
The Gist
lessw-blog argues that this "bottleneck" is actually a critical feature for robustness. The post suggests that discrete tokens function as a necessary filter for noise. In a continuous latent space, there are no natural boundaries; signal and noise are semantically entangled. As a model reasons, minor noise can accumulate and drift without any mechanism to identify or correct it, because every point in continuous space is technically valid.
By forcing the model to collapse its state into a discrete token, the system effectively "snaps" the value to the nearest valid grid point, discarding the minor noise (the "fuzziness") in the process. The author posits that without this discretization step, long-chain reasoning becomes susceptible to unrecoverable drift, as the system lacks the ability to distinguish between a subtle change in meaning and a calculation error.
This perspective shifts the view of tokenization from a limitation to a stabilizing mechanism, suggesting that future architectures must retain symbolic, discrete processing to maintain reliability.
Read the full post on LessWrong
Key Takeaways
- Discrete tokens act as a noise filter, allowing models to 'snap' to valid concepts and discard minor errors.
- Continuous representations (neuralese) lack error boundaries, making it difficult to distinguish between signal and noise.
- The 'bandwidth bottleneck' of language is likely a feature that prevents error accumulation, rather than a bug.
- Purely continuous reasoning architectures may suffer from uncorrectable semantic drift.