NVIDIA’s GeForce RTX 4090 GPU: Accelerating AI Workloads with TensorRT-LLM
6/13/2024NVIDIA’s GeForce RTX 4090 GPU: Accelerating AI Workloads with TensorRT-LLM
NVIDIA has been making waves with its GeForce RTX 40 GPUs, particularly the flagship RTX 4090. In recent AI benchmarks, these GPUs have left laptop CPUs and NPUs in the dust, and the secret lies in NVIDIA’s TensorRT-LLM acceleration.
The Power of GeForce RTX 40 GPUs
NVIDIA’s existing GPU lineup outperforms the entire NPU ecosystem, which has only managed to reach 50 TOPS in 2024. In contrast, NVIDIA’s RTX AI GPUs offer several hundred TOPS and can go up to a staggering 1321 TOPS using the GeForce RTX 4090. Not only is it the fastest desktop AI solution for running Large Language Models (LLMs), but it’s also the speediest gaming graphics card on the planet.
VRAM and Acceleration
NVIDIA’s GeForce RTX GPUs come with up to 24 GB of VRAM, while RTX GPUs offer up to 48 GB. This ample video memory is crucial for handling LLMs, which thrive on large memory capacities. But it doesn’t stop there—NVIDIA’s RTX hardware also features dedicated video memory and AI-specific acceleration through Tensor Cores (hardware) and TensorRT-LLM (software).
Token Generation Speed
The number of generated tokens across all batch sizes on NVIDIA’s GeForce RTX 4090 GPUs is already impressive, but it improves significantly—over 4x—when enabling TensorRT-LLM acceleration.
Benchmarks and Performance
NVIDIA has shared new benchmarks using the open-source Jan.ai platform, which recently integrated TensorRT-LLM into its local chatbot app. These benchmarks compare NVIDIA’s GeForce RTX 40 GPUs against laptop CPUs with dedicated AI NPUs.
- The NVIDIA GeForce RTX 4090 GPU offers an 8.7x improvement over the AMD Ryzen 9 8945HS CPU without TensorRT-LLM.
- With acceleration, that lead extends to 15x (a 70% boost over the non-TensorRT-LLM config).
- You can process up to 170.63 tokens per second versus 11.57 tokens/sec on the AMD CPU.
- Even the NVIDIA GeForce RTX 4070 Laptop GPU provides an acceleration of up to 4.45x.
External GPU Acceleration
NVIDIA also explored using an RTX 4090 in an eGPU configuration. The results? A performance uplift of 9.07x over the same AMD laptop CPU.
Scaling AI Computational Power
NVIDIA’s GeForce RTX 40 Desktop CPUs scale from 242 TOPS at the entry level to a whopping 1321 TOPS at the high end. Compared to the latest 45-50 TOPS AI NPUs, this represents a 4.84x increase at the lowest end and a 26.42x increase at the very top.
Summary
NVIDIA continues to lead the AI segment, and these benchmarks reaffirm that if you need AI power, their hardware is the way to go.