XTX Markets Open Sources TernFS: A 10-Exabyte Storage Engine for the AI Era

As the infrastructure demands of Generative AI continue to diverge from traditional enterprise IT requirements, the limitations of legacy file systems have become increasingly apparent. XTX Markets, a London-based quantitative trading firm known for its heavy reliance on data-driven strategies, has released TernFS to address these specific bottlenecks. The system is designed to operate within a single data center while supporting data volumes that dwarf standard enterprise deployments.

At the core of the TernFS architecture is a focus on "hyper-scale capacity." According to the technical documentation released by XTX, the system supports "up to 10EB data volume and trillion-file level scale." This capacity is critical for modern ML pipelines, where training corpora and model checkpoints consume petabytes of storage. Unlike traditional systems that struggle with metadata overhead when file counts reach the billions, TernFS appears engineered to decouple metadata management from data storage, allowing for linear scalability.

The Immutability Trade-off

The most distinct architectural decision in TernFS is its strict adherence to immutable data patterns. The system is explicitly focused on "immutable large files," a design choice that aligns with the Write-Once-Read-Many (WORM) nature of machine learning datasets. In a typical ML workflow, raw training data is ingested once and read millions of times; similarly, model weights are saved as static checkpoints.

By optimizing for immutability, TernFS sacrifices flexibility for throughput and reliability. The documentation notes that the system is designed for scenarios with "rare directory changes," making it unsuitable for workloads requiring frequent file modifications, random writes, or high metadata churn. This specialization allows TernFS to strip away the complex locking mechanisms required for POSIX-compliant general-purpose file systems like Ceph or Lustre, theoretically reducing latency and overhead for sequential reads.

Reliability and Atomicity

For ML engineers, data corruption during long-running training jobs is a catastrophic failure mode. TernFS addresses this through enforced "write atomicity." The system ensures that files are never in a "semi-written state"; a file is either fully committed and visible, or it does not exist. This binary state is maintained even during power outages or node failures, ensuring that metadata integrity is preserved without manual intervention.

This feature is particularly relevant for checkpointing large language models (LLMs). If a training run crashes while writing a checkpoint to a standard file system, the resulting partial file can be difficult to detect, potentially wasting days of compute time upon resumption. TernFS's atomic guarantees effectively eliminate this class of error.

Integration and Licensing

XTX Markets has adopted a pragmatic approach to open-sourcing the technology. The core protocol and client libraries are released under the Apache-2.0 license with an LLVM exception. This dual-licensing strategy is significant for commercial adoption; the LLVM exception typically allows developers to link the software into proprietary applications without triggering copyleft provisions that would require open-sourcing the entire client application. This suggests XTX envisions TernFS being integrated into closed-source proprietary trading platforms or commercial AI stacks.

In terms of interoperability, TernFS avoids vendor lock-in by supporting multiple standard access protocols. It offers native support for FUSE mounting and S3 compatibility, allowing it to serve as a backend for existing cloud-native applications without requiring code refactoring. However, the extent of S3 API compatibility remains a gap in the current documentation, specifically regarding advanced features like object tagging or lifecycle policies.

Market Context

The release of TernFS places it in direct competition with high-performance parallel file systems like Lustre and newer software-defined storage solutions like WEKA and MinIO. However, its specific focus on immutable, massive-scale storage differentiates it from general-purpose competitors. While Ceph provides block, object, and file storage in a unified platform, its complexity often requires significant operational overhead. TernFS appears to offer a leaner, more specialized alternative for organizations specifically hitting the scaling limits of existing storage during AI training.

XTX Markets' decision to open-source this internal tool underscores a broader trend: non-tech enterprises are increasingly building and releasing critical infrastructure software because off-the-shelf commercial solutions fail to meet the extreme demands of high-frequency trading and large-scale AI research.