Nvidia shows new GPUs to satisfy the inference hunger of new AI models
March 22, 2023
0
Nvidia introduces new GPUs optimized for efficient use of previously developed AI models. According to the company, this is essential in the rapid democratization of complex AI functionality.
Nvidia introduces new GPUs optimized for efficient use of previously developed AI models. According to the company, this is essential in the rapid democratization of complex AI functionality.
Training new AI models requires a massive amount of GPUs, and then you have to make things work. Microsoft alone supported OpenAI in the training process of its GPT model with a cloud-based AI supercomputer composed of tens of thousands of GPUs. In order to effectively bring the functionality of GPT-4 to the users in the form of products, Microsoft will introduce hundreds of thousands more GPUs in all its different data center regions, Nvidia knows. And that’s just one customer.
Using a trained AI model is called inference. Inference is much less GPU intensive than training, but it takes place on a much larger scale. When thousands of people are talking to ChatGPT at the same time, thousands of inference workloads need to run simultaneously. Now that AI functionality is becoming widely available, Nvidia says there’s a need for hardware to support that goal.
NVIDIA L4
At the GTC 2023, CEO Jensen Huang will therefore present the Nvidia L4 and the Nvidia H100 LVL. The Nvidia L4 is an accelerator specially designed to efficiently infer video streams. The GPU only takes up one slot and is therefore quite compact. As a result, it fits into every server, it sounds enthusiastic.
The GOU is said to process AI videos about 120 times faster than pure CPU servers. In addition, the chip is four times faster than the previous generation of Nvidia accelerators. For generative AI workloads focused on image generation, this is 2.7x more efficient. Google will offer servers with the map in Early Access in its cloud, and classic manufacturers have also planned hardware.
Nvidia H100 NVL
The Nvidia H100 NVL is a hopper-based inference accelerator optimized for Large language model (LLM) inference, such as B. the conversation with ChatGPT. While the L4 is a relatively humble component, we can confidently call the H100 NVL a powerhouse. The thing combines two GPU chips with 188 GB of HBM3 memory, connected via NVLink.
The throughput of this new product is said to be about twelve times higher than that of the Nvidia HGX A100. Compared to the classic Nvidia H100 PCIe, the H100 NVL is about 2.5 times more powerful. We don’t know anything about availability yet.
Nvidia says on the side that its BlueField 3 DPU is also in production. It does not target AI workloads per se, but accelerates the network component of servers and thus ensures optimization.
As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.