On November 14th, Nvidia officially introduced its cutting-edge H200 GPU and the revamped GH200 product line at the "Supercomputing 23" conference. The H200, built on the established Hopper H100 architecture, showcases augmented high-bandwidth memory (HBM3e) to adeptly manage extensive datasets crucial for artificial intelligence (AI) development and deployment. The overall performance in executing large models is anticipated to surge by 60% to 90%, surpassing its predecessor, the H100.
The H200 boasts a significant 76% boost in HBM capacity, offering an impressive 141GB of HBM3e memory with an operational speed of approximately 6.25 Gbps. Across the six HBM3e stacks, the total bandwidth per GPU is an impressive 4.8 TB/s, marking a substantial advancement over the H100. Performance evaluations reveal a remarkable 60% to 90% improvement when running substantial models like GPT3 175B and Llama 2 70B.
While the original computational prowess of the H200 remains relatively steady, Nvidia asserts that its performance running GPT-3 will outpace the original A100 by 18 times and be approximately 11 times faster than the H100. Notably, the H200 and H100 are mutually compatible, enabling a seamless transition for AI enterprises utilizing the H100 for training/inference models to embrace the latest H200 chip without necessitating modifications.
Major computer manufacturers and cloud service providers are poised to adopt the H200 starting in the second quarter of 2024. Nvidia's server manufacturing partners, including ASRock, ASUS, Dell, Eviden, GIGABYTE, HPE, Hongbai, Lenovo, Cloudra, SuperMicro, Wistron, and Wuying Technology, have the capacity to upgrade existing systems with the H200. Leading cloud service providers such as Amazon, Google, Microsoft, Oracle, and others are among the early adopters of the H200 for integration into their offerings.
In the dynamic AI market landscape, Nvidia faces active competition from industry giants AMD and Intel. AMD's MI300X, featuring 192GB of HBM3 and 5.2TB/s memory bandwidth, positions itself as a formidable contender surpassing the H200 in both capacity and bandwidth. Intel's strategic plan includes augmenting the HBM capacity of its Gaudi AI chip, with the latest Gaudi3 designed for enhanced performance and network capabilities.
Expanding its portfolio, Nvidia also unveiled the updated GH200 superchip, incorporating Nvidia NVLink-C2C chip interconnect, the latest H200 GPU, and a Grace CPU (potentially a next-gen iteration). Each GH200 superchip encompasses an impressive 624GB of memory, a noteworthy advancement from its predecessor based on the H100 GPU and 72-core Grace CPU, which offered 96GB of HBM3 and 512GB of LPDDR5X integrated in the same package.
The GH200 introduces an impactful 8x improvement in ICON performance and substantial advancements in benchmarks such as MILC, Quantum Fourier Transform, and RAG LLM Inference. This superchip is slated for use in the new HGX H200 system, seamlessly compatible with existing HGX H100 systems, facilitating enhanced performance and memory capacity without requiring infrastructure redesign.
Anticipated deployments of GH200 systems in 2024 include prominent supercomputers like the Alps supercomputer at the Swiss National Supercomputing Center and the Venado supercomputer at Los Alamos National Laboratory. Notably, the Jülich supercomputer's Jupiter system is set to accommodate nearly 24,000 GH200 superchips, delivering an impressive total of 93 exaflops of AI computing power, utilizing a "Quad GH200" board housing four GH200 superchips.