NVIDIA’s latest AI GPU architecture, Blackwell, may be slowly emerging, but its current Hopper H100 and new H200 AI GPUs continue to dominate the performance landscape. These chips are constantly being enhanced through software optimizations within the CUDA stack, leading to remarkable improvements across a wide range of AI workloads.
The H200 and H100 AI GPUs consistently outperform the competition in benchmarks, including the recently released 56 billion parameter ‘Mixtral 8x7B’ LLM. NVIDIA’s HGX H200, which packs 8 x Hopper H200 GPUs and NVSwitch, achieves impressive performance gains in the Llama 2 70B benchmark. This system boasts token generation speeds of 34,864 (offline) and 32,790 (server) at 1000W power consumption, and 31,303 (offline) and 30,128 (server) at 700W. This translates to a significant 50% performance uplift compared to the H100, while the H100 still maintains its lead in Llama 2 performance against AMD’s new Instinct MI300X AI accelerator.
The enhanced performance is attributed to software optimizations implemented by NVIDIA that benefit both the H100 and H200. The H200, featuring 80% more HBM memory (HBM3E compared to H100’s HBM3) and 40% higher bandwidth, significantly contributes to the performance gains. In multi-GPU test server configurations, the NVIDIA Hopper H100 and H200 achieve token/second outputs of up to 59,022 and 52,416, respectively, on the Mixtral 8x7B benchmark.
Stable Diffusion XL also benefits from these full-stack improvements, with performance enhancements of up to 27% observed when using Hopper H100 and H200 AI GPUs. These advancements demonstrate NVIDIA’s commitment to driving innovation in AI computing, and while the Hopper H100 and H200 are impressive, the upcoming Blackwell B200 AI GPUs are poised to push the boundaries even further as they gradually become available in the market.