Part #/ Keyword
All Products

AMD to Unveil 1.2 Million GPU Supercomputer Cluster

2024-06-27 14:04:59Mr.Ming
twitter photos
twitter photos
twitter photos
AMD to Unveil 1.2 Million GPU Supercomputer Cluster

The demand for computational power in data centers is growing at an astonishing rate. AMD has revealed plans to build a supercomputer cluster comprising up to 1.2 million GPUs. This move is seen as a competitive response to Nvidia.

Forrest Norrod, AMD's Executive Vice President and General Manager of Data Center Solutions, confirmed inquiries about the 1.2 million GPU cluster, stating the description is nearly accurate.

Industry experts note that typical AI training clusters are built with thousands of GPUs connected through high-speed interconnects. In contrast, a cluster with 1.2 million GPUs is nearly unprecedented. Factors such as latency, power consumption, and inevitable hardware failures must be carefully considered in constructing such a cluster.

AI workloads are highly sensitive to latency. Longer data transfer times for some processes compared to others can disrupt normal operations. Additionally, current supercomputers experience hardware failures every few hours. Scaling up to 30 times the size of the largest known clusters exacerbates these issues.

Frontier, one of the fastest supercomputers known today, boasts "only" 37,888 GPUs. The concept of a million-GPU system underscores the seriousness of the AI race in the 2020s. While AMD Forrest did not disclose which organization is considering such a massive system, he did mention that "very sober people" are contemplating spending hundreds of billions of dollars on AI training clusters.

* Solemnly declare: The copyright of this article belongs to the original author. The reprinted article is only for the purpose of disseminating more information. If the author's information is marked incorrectly, please contact us to modify or delete it as soon as possible. Thank you for your attention!