Google introduces Trillium TPU v6: improved performance for AI models
November 6, 2024
0
Google announces Trillium, the latest generation of its Tensor Processing Units (TPU) for Google Cloud customers. Trillium offers improved performance for both training and inference tasks, optimizing energy
Google announces Trillium, the latest generation of its Tensor Processing Units (TPU) for Google Cloud customers. Trillium offers improved performance for both training and inference tasks, optimizing energy consumption and costs.
At the App Dev & Infrastructure Summit last week, Google announced Trillium, its sixth-generation TPU that shows a leap forward in performance. Compared to the previous TPU v5e, Trillium delivers over four times better training performance and up to three times higher inference throughput. In addition, Trillium increases energy efficiency by 67 percent and doubles the capacity of the High Bandwidth Memory (HBM) and the bandwidth of the Interchip Interconnect (ICI). This makes the sixth generation suitable for AI models. Trillium is available in preview for Google Cloud customers.
Language models
The enhancements enable larger AI models, such as Large Language Models (LLMs) and computationally intensive diffusion models, to be trained and deployed more efficiently. Google specifically mentions models such as Gemma 2, Llama and Stable Diffusion XL as applications that benefit from the new TPU architecture.
With doubled HBM capacity, Trillium can work with larger models with complex networks and key-value caches, contributing to more efficient resource utilization. This significantly increases performance per chip, with peak performance 4.7 times higher than the previous generation.
Scalability and cost advantages
Trillium is designed for high scalability. The TPU can connect up to 256 chips in a single pod, which can then be scaled to hundreds of pods. This creates a building-scale supercomputer connected to the Jupiter data center network at 13 petabits per second. The Multislice software ensures almost linear scalability under high workloads and enables the use of the TPU for complex and intensive training scenarios.
In addition to the performance improvements, Google also emphasizes Trillium’s cost-effectiveness. The new TPU offers almost 1.8x more performance per dollar compared to the TPU v5e and almost double compared to the TPU v5p. This makes Trillium a cost-effective choice for customers requiring high-performance and scalable infrastructure for large-scale AI training and inference.
Google hopes these innovations will usher in a new era for applications that require large-scale AI models. Trillium is now available in preview for Google Cloud users.
As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.