NVIDIA Blackwell, this new architecture promises up to 20 PFLOPS in AI

The big star of GDC 2024 was undoubtedly NVIDIA Blackwell, the new GPU architecture that the green giant will introduce this year, which will enable Running generative artificial intelligence with trillions of parameters reducing energy consumption and costs up to 25 times compared to the previous generation.

NVIDIA Blackwell GPU size, labeled as B200, which the company’s CEO Jen-Hsun Huang showed, is impressive. We are facing a huge graphics core that has tremendous power 208 billion transistorsand uses HBM3e memory to offer bandwidth up to 8 TB per second, with a maximum capacity of 192 GB.

NVIDIA Blackwell

With the B200 GPU, NVIDIA took the leap for the first time chiplet design. In this case we have two connected chips which allows you to create a super chip. The multi-chip design offers a number of important advantages, among which we can highlight a reducing costs and complexity converting high density transistor designs to a silicon wafer.

Each chip has 104 billion transistors and these are integrated in the same package and interconnected by a communication system that offers a total bandwidth of 10 TB/s. This is very important because having both silicons in the same package eliminates the latency issues that occur when we put these chiplets in different packages even though they share the same package.

B200 GPU specs and keys

NVIDIA Blackwell

Made on TSMC’s 4nm node.
Multi-chip design with two chiplets in the same package.
208 billion transistors.
160 SM units.
20,480 shaders.
8 HBM3e memory chips with a total capacity of 192 GB.
8 192-bit bus and 8 TB/s bandwidth.
Compatible with PCIe Gen6.
Maximum TGP 1000 watts.

NVIDIA also announced the B100 GPU, which is less powerful solution with lower peak consumption (up to 700 watts), which conserves memory capacity and bandwidth. This version will hit the market at the same time as the B200 GPU and will become a more affordable option.

Blackwell also includes new technologies which represent an important generational leap, among which we can highlight:

NVIDIA Blackwell

Second generation transformation engine, powered by new precision types, including new community-defined microformats. This new engine uses fine-grained scaling techniques known as microtensor scaling that optimizes performance and accuracy, opening the door to 4-bit floating-point (FP4) AI operations. This makes it possible to double the performance and size of the new generation models while maintaining a high degree of accuracy.
Safe AI: Blackwell includes NVIDIA Confidential Computing, which aims to protect sensitive data and AI models from unauthorized access using high-performance hardware security. This will enable companies to safely adopt LLM models.
New generation decompression engine, which improves performance and accelerates the entire work cycle that occurs when querying complex databases as well as data science and analytics tasks. It also supports the latest compression formats such as LZ4 and Snappy among others.
Reliability, Availability and Serviceability (RAS) engine, which offers intelligent resiliency and is dedicated to identifying potential failures that may occur at any time, all with the goal of maximizing uptime and reducing the risk of downtime. This engine also provides detailed diagnostic information that can identify problem areas and schedule necessary maintenance, helping to reduce response times.

NVIDIA Blackwell vs Hopper performance

NVIDIA Blackwell

NVIDIA provided some performance data from Blackwell to give us an idea of what to expect from this new architecture, and didn’t hesitate to directly compare it to Hopper. The numbers we have are impressive.as we will see below, and they only reaffirm NVIDIA’s leadership position.

According to NVIDIA, the Blackwell has 128 billion more transistors than the Hopper, it performs up to 5 times more artificial intelligence and has four times more memory. A GB200 system that integrates two B200 GPUs and a Grace super CPU is able to offer the following performance values compared to one based on two GH100 GPUs and a Grace super CPU:

20 PFLOPs in FP8, beats Hopper by 2.5 times.
20 PFLOPS at 6.RP, which is also 2.5 times Hopper’s performance.
40 PFLOPS at FP4, which is five times the Hopper’s performance.
It can work with models with up to 740,000 parameters, which exceeds Hopper’s performance by 6x.
90 TFLOPs in FP64, a number that exceeds Hopper’s performance by 3 times.
NVLINK All Reduce with SHARP offers a bandwidth of 7.2 TB/s, four times more than Hopper.

NVIDIA Blackwell

Availability and release date

NVIDIA is expected to launch its B100 and B200 GPUs, as well as the GB200 “super chip,” which integrates two B200 GPUs and a Grace super CPU. at the end of this yearalthough we still don’t have concrete details to that effect beyond what has been said.

It is not yet confirmed whether Blackwell will be the architecture NVIDIA will use in its next generation of mainstream consumer graphics cards, GeForce RTX 50, but if that is the case, I think it likely that in this case we shall find ourselves with only s completely monolithic patterns.

Source: Muy Computer

David

Donald Salinas is an experienced automobile journalist and writer for Div Bracket. He brings his readers the latest news and developments from the world of automobiles, offering a unique and knowledgeable perspective on the latest trends and innovations in the automotive industry.