May 2, 2025
Trending News

AMD officially unleashes the champions MI300X and MI300A in the HPC space

  • December 7, 2023
  • 0

Nvidia, hold on tight: AMD’s long-awaited AI chips are rolling off the production line in large quantities, and both the GPU and APU offer an interesting and powerful

AMD officially unleashes the champions MI300X and MI300A in the HPC space

Nvidia, hold on tight: AMD’s long-awaited AI chips are rolling off the production line in large quantities, and both the GPU and APU offer an interesting and powerful alternative to hopper chips.

Santa Claus is visiting and the HPC community is getting AMD’s Instinct MI300X and Instinct MI300A to play with. After a long wait, the chip specialist announces the general availability of these AI accelerators. HPE CEO Antonio Neri hinted last week that Nvidia’s dominance would soon be challenged by other players, and it didn’t take long for that prediction to come true.

HPE itself has already launched the HPE Cray Supercomputing EX255a with Instinct MI300A APUs, and Dell and Lenovo will also install MI300 chips in their HPC servers in early 2024. On the hyperscaler side, Microsoft is partnering with AMD for the Azure ND MI300x v5 VMs.

From a product to a series

The AMD Instinct MI300X and MI300A have been announced by AMD for some time now. Initially there was only one Instinct MI300, but in response to the current market situation it was split into two chips: The A variant represents the original concept, the X variant is purely an accelerator.

We clarify: The AMD Instinct MI300A is an AI chip par excellence and a competitor to Nvidia’s super chip Grace Hopper. First and foremost, the thing has 228 Compute Units (CUs) on board, which are based on the CDNA 3 GPU architecture. These CUs collectively represent 14,592 cores optimized for AI acceleration. An Epyc processor is installed on the same chip: 24 Zen Four cores sit neatly next to the CUs and share 128 GB of HBM3 memory. This combination of memory, GPU acceleration, and integrated CPU should be ideal for AI workloads.

The AMD Instinct MI300X is a derivative. Its brother MI300A has a new architecture that offers many advantages, but for which customers need to optimize their workload. The MI300X is a more traditional accelerator where the Zen cores disappear. In return you get more HBM3 storage: 192 GB. This chip stands in contrast to Nvidia’s more classic Hopper H100 and H200 accelerators.

According to AMD, better than Nvidia

Both systems are extremely powerful. AMD compares its new chips with the popular Hopper H100 and uses its own benchmarks to determine that the MI300X performs up to 1.6 times better in certain workloads. The enormous built-in memory also ensures that the entire Llama2 model with its 70 billion parameters fits on a MI300X: unique on the market today. This should make inference with such a model easier and less expensive.

When AMD pits the MI300A against a system with H100, AMD achieves performance improvements of a factor of four. These mainly come from the shared memory on the chip. In fact, a comparison with Grace-Hopper would be a little more accurate and AMD claims the crown there too: The Instinct MI300A has to offer twice as much performance per watt as Nvidia’s alternative. This is noticeable because AMD relies on x86 Zen 4 cores and Nvidia relies on theoretically more economical ARM cores for the Grace Hopper.

The US supercomputer El Capitan, planned since 2019, will use Instinct MI300A accelerators. The system is said to be the first in the world to achieve a computing power of two exaflops. Today, the American Frontier is the most powerful supercomputer in the world. It breaks the one exaflop barrier with AMD Epyc CPUs and MI250X accelerators: the predecessors of these new chips.

Software and ecosystem

Hardware alone is not enough to break through. Nvidia now supports its accelerators with a complete AI ecosystem aimed at professional customers. AMD has to live up to this claim if the Instinct MI300 is to become a success story. Fortunately, that’s the goal: Along with the chips, the company is announcing the open AMD ROCm 6 platform, which provides the software tools for AI development on Instinct. How successful AMD is in launching its own ecosystem will play a big role in the popularity of Instinct MI300A and X.

With the availability of the new chips, their good performance in early benchmarks and the acceptance by cloud players and server manufacturers, a breath of fresh air finally seems to be coming into the world of AI hardware. AMD has the hardware to offer an alternative to Nvidia, although the latter is not standing still either. Hopefully by 2024, AI acceleration will no longer be the exclusive domain of one top player, but there will be at least a second party at the table.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *