May 6, 2025
Trending News

Old GPUs also support new LLMs

  • August 26, 2024
  • 0

To run LLMs, you don’t necessarily need the latest hardware. Research from Estonia shows that an Nvidia Geforce RTX 3090 can keep up. Generative AI is often equated

Old GPUs also support new LLMs

To run LLMs, you don’t necessarily need the latest hardware. Research from Estonia shows that an Nvidia Geforce RTX 3090 can keep up.

Generative AI is often equated with modern GPUs, both for training and inference. Estonian Backprop is now showing that older hardware is still relevant. Backprop is a start-up that specializes in cloud instances and offers GPU instances in particular. The company explains in detail why you can also use a virtual machine powered by a four-year-old Nvidia GeForce RTX 3090.

The Nvidia RTX 3090 came onto the market in 2020, before the AI ​​boom. The GPU is optimized for graphics and ray tracing, but has a computing power of 142 TFLOPS (FD16). The memory bandwidth is also not bad at 939 GBit/s. The card has 24 GB of GDDR6x memory, which is a lot by classic GPU standards, but less than real high-end AI cards. Overall, the RTX 3090 has plenty of FP19 horsepower on board, accompanied by fast memory with a decent capacity.

Sufficient for a modest model

Backprop now shows that these specs are enough to load a model like Llama 3.1 8B. The GPU in the company’s instances can support inference at 12.88 tokens per second. In human language, this means that the GPU generates the model text at a speed faster than the average user’s reading speed. Think about five words per second. 10 tokens per second is about the lower limit for smooth inference.

Backprop tested the model mainly with short prompts, such as those found in a business chatbot. Summarizing long documents requires more computing power. The performance of Llama 3.1 8B on the RTX 3090 then drops, but not below the crucial limit of 10 tokens per second. In addition, the GPU can process 50 to 100 simultaneous requests.

Practically useful

Backprop points out that in a realistic scenario, not everyone will be requesting something from an AI model at the same time. An instance running RTX 3090 could, in practice, support thousands of users, as long as they all want sporadic inference.

Price is the important reason to take this route. An instance with an Nvidia GeForce RTX 3090, 8 vCPUs, 60 GB RAM and 300 GB storage costs $0.36 per hour: several times less than an instance with a modern professional Nvidia GPU from a major cloud provider.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *