AMD MI300X Outperforms NVIDIA H100 in LLM Inference AI Benchmarks

AMD MI300X Outperforms NVIDIA H100 in LLM Inference AI Benchmarks

Tensorwave, an AI cloud provider, recently conducted benchmarks comparing AMD’s MI300X accelerator to NVIDIA’s H100 in Large Language Models (LLMs) inference tasks. The results are impressive, with the MI300X offering up to 3x higher performance than the H100.

Key Findings:

  1. Offline Performance:

    • The MI300X showed a performance uplift of 22% to 194% (almost 3x) compared to the H100 across various batch sizes (from 1 to 1024).
    • AMD’s accelerator consistently outperformed the H100 in offline scenarios.
  2. Online Performance:

    • In realistic chat applications, the MI300X achieved 33% more requests per second than two NVIDIA H100 GPUs.
    • Average latency remained low at 5 seconds.
    • The MI300X excelled in generating text quickly, even under high traffic.
  3. Hardware Details:

    • AMD MI300X:
      • 192GB VRAM, 5.3 TB/s, ~1300 TFLOPS for FP16
      • ROCm 6.1.2 driver suite with MK1 inference engine
      • Tensor parallelism set to 1
    • NVIDIA H100:
      • 80GB VRAM, 3.35 TB/s, ~986 TFLOPS for FP16
      • CUDA 12.2 driver stack
      • Tensor parallelism set to 2
  4. Notes:

    • All benchmarks used the Mixtral 8x7B model.
    • Both frameworks used FP16 compute paths.
    • The MI300X throughput was extrapolated by a factor of 2 for accurate comparison.
  5. Real-World Impact:

    • The MI300X is an excellent choice for enterprises seeking fast AI inference capabilities.
    • Competitive pricing and availability make it a compelling option.

In summary, AMD’s MI300X not only offers superior performance but also delivers value, positioning it as a strong contender in the AI accelerator market. Read more about Tensorwave’s MI300X cloud instances here.