NVIDIA’s level of involvement in AI development is simply outstanding and TensorRT-LLM is not only further proof of this, but also a breakthrough in client-side AI computing.e, that is, in the system in which its use is required, unlike the current model in which these actions are performed on servers, due to the huge computing capacity that is required to complete them, especially in a reasonable time. .
To expand the AI computing model in the client Of course, a connection between hardware and software is necessary., something that NVIDIA has understood very well for a long time, as we can easily see in the excellent use of technologies such as DLSS, Tensor GPU GeForce RTX cores. Not to mention the collaboration that Microsoft and NVIDIA signed last May, with which those from Redmond will be able to enrich the experience of using AI in Windows 11 by relying on the green giant’s hardware for the consumer market.
Within the artificial intelligence ecosystem, it has gained particular importance recently large generative language models (LLM, abbreviation in English, Large Language Model). They are based on a neural network with a huge set of parameters (hundreds, billions or even more) that are trained using datasets also extremely spacious. Initially, LLMs were trained exclusively in text (hence the reference to language in their title), but some LLM training courses with other types of content have been emerging for some time.
Using LLM requires, as you may have guessed, a lot of computing power, which is why the most popular services based on it and that we all know, such as Bard, Bing, ChatGPT and Claude, among others, are based on a client-server model, not on local computing capacity. But this is where TensorRT-LLM comes ina technology solution just announced by NVIDIA that aims to make a big difference in this regard.

What is TensorRT-LLM?
TensorRT-LLM is a library created by NVIDIA that, as you might have guessed from its name, leverages AI-related computing power of Tensor cores, present in GeForce RTX. Its existence is not new, since the company previously announced its version for data centers, but what we did not expect, a huge leap is that NVIDIA announced TensorRT-LLM for Windows, that is, now the library will facilitate the use of generative LLM models in the client.
To enable this jump, TensorRT-LLM quadruples the performance of the execution platform (in this case a PC) in the inference operations (which is the basis of the AI models’ answers to our requests), on computers that have GPUs with Tensor cores. In this way, and since we are talking about a library, developers will be able to integrate it into their applications, making the use of LLM models much more efficient and thus suitable for direct execution on client computers.
TensorRT-LLM, which is integrated into the NVIDIA TensorRT SDK, It is compatible with the main LLM modelslike e Llama 2 and Code Llama, but it’s even better because the company has opted for an open source model, so developers will be able to freely modify it to suit their needs and make it easier to implement. NVIDIA has also published various types of resources, such as usage-optimized scripts, open source models, and an extensive set of reference documentation.

A very interesting example of what TensorRT-LLM offers is the possibility of using technologies such as RAG (Retrieval Augmented Generation), which, explained in a simple way, consists in the fact that it can enrich the answers provided by the model, supplementing the base created from your training with additional resources configured by the developer and can be used to adapt them to a specific context, to provide more up-to-date information, to gain more depth on specific topics, etc.
As an example, which you can see in the image in the top paragraph, NVIDIA shows us the answer to the same question provided by LLaMa 2 (left) and its implementation, in which RAG was used, through TensorRT-LLM, so that the model takes into account information complementary to the information used in his training. As you can see, when you asked about the NVIDIA technologies integrated into the highly anticipated Alan Wake 2, LLaMa 2 is unable to provide an answer (in fact, it denies that it exists or even that it is in development), while the one generated by a personalized implementation, labeled as GeForce News, which of course includes its publications as sources (public, sorry for the redundancy), is able to provide a complete and correct answer. Where the original model fails, its own RAG-enriched implementation makes up for it.
As you can imagine, the options are endless. TensorRT-LLM facilitates the integration of LLM models, increases response speed up to a factor of 4, and allows model enrichment with RAG, all with client-side computations with Tensor cores and betting on an open source distribution model, which expands the possibilities exponentially. We’re talking about, without a doubt, the biggest advancement to date in the proliferation of AI at the client.

NVIDIA RTX Video Super Resolution 1.5
Without leaving the world of artificial intelligence in the client, NVIDIA also announced RTX Video Super Resolution 1.5 today, an important evolution of the technology introduced in February of this year, which, as we told you back then, aims to improve the image quality of the video content that we see through Google Chrome and Microsoft Edge. As our colleague Eduardo pointed out, we can simplistically see RTX VSR as DLSS applied to streaming content.
In the first version of this technology, compatible with RTX 30 (Ampere) and 40 (Lovelace) graphics cards, the model was able identify the difference between subtle but legitimate image elements and artifacts which were generated in it, so it could respect the first and correct the second, providing excellent image quality when the video was changed from the original resolution to the screen resolution.
With this new version of RTX VSR available today via NVIDIA Game Ready drivers, we find two very notable new features. The first is its scope extends to computers with GPUs on the Turing architecture with RT coresThis means that with NVIDIA RTX Video Super Resolution 1.5, users of RTX 20 series graphics adapters will also be able to enjoy this improvement in image quality when streaming content.
Another big news is related to the resolutions to which the correction applies. As mentioned above, RTX VSR 1.0 was activated when the video was rescaled from its native resolution (whatever that was) to the resolution of the screen it was being displayed on. However, with version 1.5, artifact correction and image enhancements will be applied in all cases, i.e. even if the original video resolution matches the screen resolution in which it is played.