May 15, 2025
Trending News

NVIDIA AI Decoded: TensorRT and its importance in AI development

  • March 27, 2024
  • 0

In this new article in the NVIDIA AI Decoded series, we focus on TensorRT, a development kit specialized for demanding inference and deep learning which require a high

In this new article in the NVIDIA AI Decoded series, we focus on TensorRT, a development kit specialized for demanding inference and deep learning which require a high level of performance. It includes a deep learning optimizer, offers a very low latency runtime, and can multiply power up to 36 times in inference processes (compared to CPU).

Generative artificial intelligence has become one of the most important advances of today and a technology that is constantly evolving and being implemented in various industries. This growth in popularity led ka increased demand for generative artificial intelligence at the local level, i.e. run on computers and workstations natively and without the need to resort to the cloud. This has important advantages, among which we can highlight:

  • Privacy and security.
  • Eliminating dependence on public networks.
  • Less latency.
  • Greater control over our data.

NVIDIA GeForce RTX graphics cards have tension cores, a dedicated hardware that acts as an AI accelerator that enabled generative AI to be used locally. TensorRT has played a vital role in this regard, as it allows you to fully exploit the potential offered by the tensor cores of GeForce RTX GPUs, and as I said before, it can multiply the performance by up to 36 compared to the current maximum. -end CPU.

TensorRT leads us to more efficient and accurate artificial intelligence

NVIDIA AI Decoded Automatic ControlNET

This development kit gives developers deep access to the hardware to create and deliver fully optimized AI experiences. The performance improvement compared to what we could achieve with other similar solutions is 200%, i.e. we can get double the performance compared to other frameworks.

TensorRT can also accelerate popular generative AI models such as Stable Diffusion and SDXL. For example, Stable video streaming performance improved by 40%and Stable Diffusion WebUI received ControlNets support, tools that allow us to refine the results obtained with generative artificial intelligence thanks to the ability to add images as a guide to what we want to obtain. Thanks to TensorRT performance will also improve by 40%.

Acceleration offered by TensorRT as well improves GeForce RTX 4080 SUPER performance by 50% in the new UL Procyon AI Image Generation benchmark compared to the current fastest non-TensorRT implementation. This test accurately reproduces the performance we might expect in real tests.

DaVinci Resolve is another important application that adopted TensorRT in its version 18.6. With this update, its core AI-based tools have improved performance by 50% and run up to 2.3 times faster with GeForce RTX graphics cards versus next-generation Macs. Topaz Labs is another prominent company that has opted for TensorRT and has seen its applications improve in performance up to 60%.

Optimized for LLM

TensorRT is also optimized for the most popular LLMs and offers significant value in deep learning tasks. In this case we are talking about TensorRT-LLMan open source library created by NVIDIA that harnesses the power of tensor kernels in AI and deep learning tasks.

It offers complete support for today’s most important LLMs, including Phi-2, Llama2, Gemma, Mistral and Code Llama, and is able to quadruple the performance of the platform on which inference operations related to LLM models are performed. When we work locally, this will be the computer where these models are running.

In this sense, NVIDIA has already done an important demonstration of what this library can do with ChatRTX, which uses TensorRT-LLM and is optimized for GeForce RTX graphics cards. NVIDIA has confirmed that it is working with the open source community to develop native TensorRT-LLM connectors with the most popular frameworks, among which we can highlight LlamaIndex and LangChain, all in the interest of making them easier to use and use correctly.

Source: Muy Computer

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version