# Curated Digest: Inside the Together AI Kernels Team

> Coverage of together-blog

**Published:** April 01, 2026
**Author:** PSEEDR Editorial
**Category:** stack

**Tags:** Together AI, GPU Optimization, Kernels, FlashAttention, AI Infrastructure, Inference

**Canonical URL:** https://pseedr.com/stack/curated-digest-inside-the-together-ai-kernels-team

---

Together AI's kernel team is bridging the critical performance gap between raw GPU hardware capabilities and production AI workloads through innovations like FlashAttention and ThunderKittens.

In a recent post, together-blog discusses the internal workings and strategic focus of their dedicated kernels team. The publication sheds light on how specialized researchers and engineers are working to bridge the performance gap between raw GPU hardware and the demands of production AI systems. By focusing on the lowest levels of software-hardware interaction, the team aims to extract maximum efficiency from modern compute infrastructure.

This topic is critical for the modern AI infrastructure stack, specifically concerning GPUs and inference. As artificial intelligence models scale in size and complexity, the operational costs and latency associated with training and inference have skyrocketed. While modern GPUs offer massive theoretical compute power, standard software implementations often fail to fully utilize this hardware due to memory bandwidth limitations and inefficient data movement. Optimizing kernel operations-the highly specialized, low-level code that dictates how mathematical operations are executed directly on the GPU silicon-is essential. It is the primary mechanism for maximizing throughput, minimizing memory bottlenecks, and ultimately making advanced AI economically viable at scale. Without these optimizations, the industry faces severe bottlenecks in both the speed of innovation and the commercial deployment of large language models.

together-blog's post explores these dynamics by highlighting the specific contributions of their kernel research division. The team is responsible for highly influential optimization projects that have already reshaped the industry, most notably FlashAttention and ThunderKittens. FlashAttention, for instance, fundamentally altered how attention mechanisms process data by making them hardware-aware, drastically reducing memory reads and writes. ThunderKittens continues this trajectory by offering new frameworks for writing high-performance kernels. By focusing on the intricate interaction between hardware architecture and AI workloads, the researchers aim to extract every ounce of performance from existing and future infrastructure. The article serves as both a technical overview of their mission and a strong signal of where infrastructure optimization is heading next. It underscores that the next major leaps in AI capability may come not just from larger models, but from significantly smarter utilization of the underlying hardware.

For infrastructure engineers, AI researchers, and anyone tracking the economics of compute, understanding the role of kernel optimization is highly recommended. The work being done at the kernel level directly impacts the bottom line of AI deployment and the feasibility of real-time applications. [Read the full post on together-blog](https://www.together.ai/blog/inside-the-together-ai-kernels-team) to explore the team's methodology, learn more about their specific projects, and understand their vision for the future of GPU performance.

### Key Takeaways

*   Together AI has a dedicated kernels team focused on optimizing the interaction between GPU hardware and production AI workloads.
*   The team is responsible for major industry innovations, including FlashAttention and ThunderKittens.
*   Kernel optimization is a critical factor in reducing latency, increasing throughput, and lowering the operational costs of AI inference and training.
*   Bridging the gap between theoretical GPU capabilities and actual production performance is essential for the scalability of advanced AI systems.

[Read the original post at together-blog](https://www.together.ai/blog/inside-the-together-ai-kernels-team)

---

## Sources

- https://www.together.ai/blog/inside-the-together-ai-kernels-team