# When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability

> Coverage of lessw-blog

**Published:** May 29, 2026
**Author:** PSEEDR Editorial
**Category:** platforms

**Tags:** Mechanistic Interpretability, Tensor Networks, AI Alignment, Model Auditing, Machine Learning

**Canonical URL:** https://pseedr.com/platforms/when-are-two-networks-the-same-tensor-similarity-for-mechanistic-interpretabilit

---

lessw-blog introduces a novel approach to mechanistic interpretability, demonstrating how functional similarity between tensor networks can be computed entirely from their weights without relying on input data.

**The Hook**

In a recent post, lessw-blog discusses a fundamental question at the frontier of mechanistic interpretability: how can we definitively know if two neural networks have learned the same internal logic? The publication introduces 'tensor similarity,' a novel, weight-based metric designed to measure functional equivalence specifically within tensor network architectures.

**The Context**

The field of mechanistic interpretability aims to reverse-engineer neural networks, translating their opaque weights into understandable algorithms. A major hurdle in this domain is model comparison. Historically, verifying if two models compute the same features or share functional equivalence required passing massive, representative datasets through both networks and analyzing the correlation of their activations. This activation-based approach is computationally expensive, highly dependent on the chosen input distribution, and difficult to scale. If researchers could instead compare models directly via their weights-bypassing the data requirement entirely-it would drastically accelerate model auditing, alignment verification, and the study of training dynamics.

**The Gist**

lessw-blog presents a compelling mathematical framework to achieve exactly this for specific architectures. The core argument centers on a principled generalization of cosine similarity, termed 'tensor similarity.' The author demonstrates that for multilinear models operating under Gaussian input, the expected inner product of their activations is mathematically identical to their weight-space inner product. This means functional similarity across all inputs can be computed solely by examining the weights.

The post also highlights the practical potential of these architectures, suggesting that tensor-transformer variants are performant enough to rival standard Transformers while offering vastly superior interpretability. While the analysis is robust, readers should note that certain contextual elements-such as the computational complexity of calculating tensor similarity for billion-parameter models, specific decomposition constraints, and exact performance parity metrics-remain areas for future exploration.

**Conclusion**

This research represents a significant step toward more efficient and rigorous model analysis. By enabling weight-only similarity checks, it opens new avenues for understanding how networks converge on similar solutions during training. [Read the full post](https://www.lesswrong.com/posts/Yzw6KDQc336CpHmGi/when-are-two-networks-the-same-tensor-similarity-for) to review the mathematical foundations and consider the implications for the future of interpretable AI.

### Key Takeaways

*   Functional similarity between tensor networks can be measured directly from weights, bypassing the need for input data.
*   The method relies on 'tensor similarity,' a mathematical generalization of cosine similarity.
*   For multilinear models under Gaussian input, the expected inner product of activations equals the weight-space inner product.
*   Tensor-transformer architectures show promise as performant, highly interpretable alternatives to standard Transformers.

[Read the original post at lessw-blog](https://www.lesswrong.com/posts/Yzw6KDQc336CpHmGi/when-are-two-networks-the-same-tensor-similarity-for)

---

## Sources

- https://www.lesswrong.com/posts/Yzw6KDQc336CpHmGi/when-are-two-networks-the-same-tensor-similarity-for
