May 18, 2025
Trending News

Artificial intelligence, TOP and tokens: everything you need to know

  • June 12, 2024
  • 0

About six years ago, we talked about artificial intelligence as a promising technology, as a future project that can profoundly change our society. I remember perfectly that at

About six years ago, we talked about artificial intelligence as a promising technology, as a future project that can profoundly change our society. I remember perfectly that at that time there were skeptics who said that everything was just smoke, and there was also a sector that said that we should not have high expectations because its real possibilities were exaggerated.

Time has passed and the truth is that we didn’t have to wait long for that the most positive predictions were those that were correct. Artificial intelligence is changing the way we work, create, socialize and play. Its potential is so huge that it has spread to different industries and levels, and it remains promising because we are still at a relatively early stage, which means that will continue to experience very important improvements.

When we talk about artificial intelligence, we all have a more or less clear idea of ​​what we mean. However, with the popularization of this technology new concepts which are deeply connected with him and which are very important but at the same time less well known. Today I want to dive deeper into this topic and focus on two big keys, TOP and chips.

Artificial intelligence and TOP

TOP is a unit of measure that we could compare with other more familiar ones, such as FPS (frames per second in games) or GB/s (gigabytes per second) in SSDs. These abbreviations refer to trillions of operations per secondand as usually happens in most cases when we talk about performance, “more TOP is always better”.

It is a very easy to understand unit. TOPs indicate trillions of operations that a component is capable of performing in one second. For example, if an NPU (neural processing unit) has a performance value of 50 TOP, it means it is capable 50 trillion operations in one second. This would be less powerful than other 60 TOP NPUs.

The AI ​​models we currently use need certain performance values ​​to perform optimally, and these are measured in TOP. For example, Microsoft Copilot+ requires a minimum of 40 TOP function optimally. This represents a benchmark that allows us to set a minimum level for simple AI models to work locally.

Be able to move much more advanced and complex artificial intelligence models powered by generative artificial intelligence, such as intelligent assistants for the creation of digital content, intelligent scaling technology applied to PC games (NVIDIA DLSS), image generation from text or video and LLM (large language models). you need to have much more performance and this is where GPUs come in.

A new generation NPU can offer approx 50 TOPswhile the GeForce RTX 4090 can achieve tremendous thanks to the fourth generation tensor cores 1,300 TOPs. The difference is spectacular and also makes it clear that there is an important difference between the two basic artificial intelligence which can be solved cheaply and effectively, and advanced artificial intelligencewhich requires more state-of-the-art and advanced components.

TOPs are just one side of the coin, meet the chips

You already know exactly what TOPs are, but when we talk about LLM, the performance measurement unit changes and we start using tokens. I know what you’re thinking, what is a token? Well, it’s very simple, we can define it as the number of output elements that LLM can generate. For example, a token can be a word in a phrase or even a much smaller element such as a letter or punctuation mark.

Thus, LLM performance can be measured in tokens per second. At this point it is also important to introduce another key concept that is even less well known, but which is essential when we talk about large language models, lot size, which is defined as the number of input operations that can be processed simultaneously in one inference pass.

An LLM that is able to handle multiple input operations from different sources and applications will be better than one that has to be limited to a single source. Working with larger batches will improve performance and the inference processbut at the same time it will increase the amount of memory required for LLM to function properly.

To handle this type of workload, it’s ideal to have a dedicated GPU with an adequate amount of graphics memory. For example, The GeForce RTX 4080 with 16 GB of graphics memory will be able to work with smaller batches than the GeForce RTX 4090 with 24 GB of graphics memory, and the same happens if we compare it to the NVIDIA RTX 6000, which has 48 GB of graphics memory.

But the graphics memory matters specialized hardware and software They also play a vital role in achieving maximum performance with the LLM. These can take advantage of GeForce RTX and NVIDIA RTX tensor cores and are fully supported in the NVIDIA TensorRT development kit, which translates into more efficient and accurate artificial intelligence and a better response to future challenges.

Looking at image generation, performance can also be measured based on the time required to create each image. This is what Procyon does for example, as we can see in the attached image, where we have the average performance of GeForce RTX 4090 for notebooks working under FP16 (medium precision) and TensorRT as an acceleration system to improve performance.

Interesting, isn’t it? If you wanted to learn more about artificial intelligence, I suggest you check out NVIDIA’s AI Decoded series, where you can find more information about this technology applied in different sectors, as well as other very important terms that are key to its understanding.

Source: Muy Computer

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version