On January 23rd, according to reports from international sources, AMD, a prominent processor manufacturer, has commenced large-scale production of its latest-generation Artificial Intelligence (AI) and High-Performance Computing (HPC) accelerator, Instinct MI300X. The initial shipments have been directed to LaminiAI, a key partner leveraging these accelerators for extensive language model (LLM) computations to address the needs of corporate users.
LaminiAI, with a longstanding collaboration with AMD, rightfully enjoys priority in securing the supply of MI300X. Sharon Zhou, CEO of LaminiAI, took to social media to announce the collaboration, revealing the deployment of devices equipped with eight Instinct MI300X accelerators. The first wave of AMD MI300X is already operational, and LaminiAI's next-generation LLM cluster is poised for an imminent launch.
Part of the AMD Instinct MI300 series, MI300X is a high-performance variant targeting AI and HPC applications, positioned as a pivotal product in direct competition with NVIDIA's H100. Internally, MI300X integrates 12 small chips based on the 5/6nm process (HMB and I/O at 6nm), showcasing a noteworthy 153 billion transistors. In terms of core design, it adopts a simplified configuration compared to MI250X, eschewing the APU's 24 Zen4 cores and I/O chip in favor of more compute cores in the CDNA 3 GPU.
Each XCD based on the CDNA 3 GPU architecture in MI300X comprises eight compute chips (XCD), with 40 compute units per chip, totaling 320 compute units or 20,480 core units. However, the current production version features a slightly reduced number of these cores, resulting in 304 compute units (38 CUs per GPU chip) and 19,456 stream processors. In terms of memory bandwidth, MI300X is equipped with a substantial 192GB HBM3 memory, showcasing a 50% improvement over MI250X, achieving up to 5.3TB/s bandwidth and 896GB/s Infinity Fabric bandwidth.
AMD claims that, in comparison to NVIDIA's H100 accelerator, MI300X offers:
· 2.4 times higher memory capacity
· 1.6 times higher memory bandwidth
· 1.3 times higher FP8 TFLOPS
· 1.3 times higher FP16 TFLOPS
· 20% faster speed in 1v1 comparisons for Llama 2 70B and FlashAttention 2
· 40% faster speed in 8v8 server configurations for Llama 2 70B
· 60% faster speed in 8v8 server configurations for Bloom 176B
Major industry players like Meta and Microsoft have already made significant acquisitions of AMD Instinct MI300 series products, with LaminiAI standing out as the first company to publicly deploy MI300X.