Part #/ Keyword
All Products

Intel Gaudi3 AI Chip: Slower than H100, More Affordable

2024-09-26 16:32:10Mr.Ming
twitter photos
twitter photos
twitter photos
Intel Gaudi3 AI Chip: Slower than H100, More Affordable

Intel has officially introduced its Gaudi3 accelerator, designed specifically for AI workloads. While the new chip's performance lags behind NVIDIA's popular H100 and H200 GPUs in speed, Intel is banking on Gaudi3's lower price point and total cost of ownership (TCO) to drive its success.

The Gaudi3 processor features a dual-chip architecture with 64 Tensor Processing Cores (TPCs) that utilize a 256x256 MAC structure with FP32 accumulators, alongside eight Matrix Multiply Engines (MMEs) equipped with 256-bit wide vector processing capabilities. It also boasts a remarkable 96MB of on-chip SRAM cache and an impressive bandwidth of 19.2TB/s. Additionally, the Gaudi3 integrates 24 200GbE network interfaces and 14 media engines, which support visual processing through formats such as H.265, H.264, JPEG, and VP9. With 128GB of HBM2E memory divided into eight stacks, the accelerator provides an enormous bandwidth of 3.67TB/s.

Compared to its predecessor, Gaudi2, the Gaudi3 shows significant advancements, including a leap from 24 TPCs and two MMEs, as well as an upgrade from 96GB of HBM2E memory. However, it appears Intel has streamlined TPC and MME support, as Gaudi3 exclusively offers FP8 matrix operations and BFloat16 matrix and vector calculations, discontinuing support for FP32, TF32, and FP16.

In terms of performance, Intel claims that Gaudi3 can deliver up to 1,856 BF16/FP8 matrix TFLOPS and 28.7 BF16 vector TFLOPS, with a thermal design power (TDP) of approximately 600W. While Gaudi3's BF16 matrix performance is slightly lower than NVIDIA's H100 (1,856 vs. 1,979 TFLOPS), its FP8 matrix performance is notably behind (1,856 vs. 3,958 TFLOPS), and its BF16 vector performance is significantly lower (28.7 vs. 1,979 TFLOPS).

More crucial than the specifications is the real-world performance of Gaudi3, as it competes against AMD's Instinct MI300 series and NVIDIA's H100 and B100/B200 chips. Intel has presented slides indicating that Gaudi3 may offer substantial value compared to NVIDIA's H100, but actual performance will depend heavily on software and other factors.

Earlier this year, Intel announced that an accelerator suite based on eight Gaudi3 chips would be priced at $125,000, translating to about $15,625 per chip. In comparison, the current price of the NVIDIA H100 stands at $30,678, suggesting that Intel aims to provide a competitive pricing advantage. However, the performance capabilities of NVIDIA's upcoming Blackwell-based B100/B200 GPUs may challenge Intel's positioning.

Intel's Gaudi3 AI accelerators will be accessible through IBM Cloud and the Intel Tiber developer cloud. Systems based on Intel Xeon 6 and Gaudi3 are set for a full rollout in Q4, with offerings from Dell, HPE, and Supermicro. Dell and Supermicro systems are expected to ship in October, while Supermicro's devices will follow in December.

* Solemnly declare: The copyright of this article belongs to the original author. The reprinted article is only for the purpose of disseminating more information. If the author's information is marked incorrectly, please contact us to modify or delete it as soon as possible. Thank you for your attention!