Intel has officially launched its Gaudi 3 AI accelerator, entering the competitive landscape of AI chips dominated by NVIDIA. While the Gaudi 3 falls short of NVIDIA’s H100 and H200 GPUs in terms of raw performance, Intel is positioning it as a more affordable option with a lower total cost of ownership (TCO).
Under the hood, the Gaudi 3 houses two chiplets packed with 64 tensor processor cores (TPCs), each with a 256×256 MAC structure and FP32 accumulators. These cores are supported by eight matrix multiplication engines (MMEs) featuring 256-bit wide vector processors, ensuring efficient handling of complex calculations. Additionally, the Gaudi 3 features a generous 96MB of on-die SRAM cache, providing a rapid 19.2TB/s bandwidth for data access.
For connectivity and data transfer, the Gaudi 3 boasts 24 x 200GbE networking interfaces and 14 media engines. These media engines are equipped to handle H.265, H.264, and VP9 codecs, enabling smooth vision processing tasks. The accelerator is further enhanced with 128GB of HBM2E memory, delivering a robust 3.67TB/sec memory bandwidth.
Intel claims that the Gaudi 3 delivers impressive performance, reaching up to 1856 BF16/FP8 matrix TFLOPS and 28.7 BF16 vector TFLOPS at a TDP of 600W. However, when compared to the NVIDIA H100, the Gaudi 3 lags behind. It shows slightly lower BF16 matrix performance (1856 vs 1979 TFLOPS), significantly lower FP8 matrix performance (1856 vs 3958 TFLOPS), and considerably lower BF16 vector performance (28.7 vs 1979 TFLOPS).
Despite these performance discrepancies, Intel’s Gaudi 3 is specifically designed for large-scale generative AI tasks, boasting 64 tensor processor cores and eight matrix multiplication engines to accelerate deep neural network computations. It also offers 128GB of HBM2e memory, enabling efficient training and inference processes. The Gaudi 3 further benefits from seamless compatibility with the PyTorch framework and advanced Hugging Face transformer and diffuser models.
Intel’s commitment to the Gaudi 3 is evident in their recent collaboration with IBM. This partnership involves deploying the Gaudi 3 as a service on the IBM Cloud, aiming to reduce the total cost of ownership for AI deployments while enhancing performance and scalability.
Justin Hotard, Intel’s executive vice president and general manager of the Data Center and Artificial Intelligence Group, highlights the importance of choice in the AI landscape. He emphasizes that Intel’s Gaudi 3, alongside their Xeon 6 processors, provides customers with an open ecosystem, allowing them to implement workloads with greater performance, efficiency, and security.
While the Gaudi 3 might not be a direct performance rival to NVIDIA’s top-tier offerings, Intel’s strategy of focusing on affordability and seamless integration with popular AI frameworks could position it as a compelling alternative for organizations seeking to optimize their AI infrastructure.