Part #/ Keyword
All Products

Intel AI Inference: Gaudi2 & Xeon Scalable Performance

2023-09-13 11:03:34Mr.Ming
twitter photos
twitter photos
twitter photos
Intel AI Inference: Gaudi2 & Xeon Scalable Performance

In recent developments, MLCommons has unveiled the latest benchmark performance results for the GPT-J model, a substantial 6-billion parameter large language model, as well as for various computer vision and natural language processing models under the MLPerf Inference v3.1 standard. These comprehensive findings encompass assessments submitted by Intel, which leveraged the Habana® Gaudi®2 accelerator, the fourth-generation Intel® Xeon® Scalable processors, and the Intel® Xeon® CPU Max series. These results underscore Intel's formidable presence in the field of AI inference, further reaffirming their commitment to facilitating the widespread deployment of AI workloads, spanning from cloud computing to network infrastructure to edge devices.

Sandra Rivera, Executive Vice President and General Manager of Intel's Data Center and AI Business, expressed, "As highlighted by the most recent MLCommons results, we offer a potent and competitive suite of AI solutions, adept at fulfilling customer demands for high-performance, energy-efficient deep learning inference and training. Our product portfolio consistently excels in delivering a compelling cost-performance advantage across a spectrum of AI model scales."

Drawing from the AI training outcomes divulged in June by MLCommons, coupled with performance assessments from Hugging Face, Habana Gaudi2 has exhibited outstanding performance when applied to advanced vision and language models. These results further underscore Intel's position as a trusted provider of superior AI computing solutions.

In line with addressing the unique demands of discerning customers, Intel is committed to delivering products designed to surmount AI inference and training challenges, thereby fostering the ubiquitous integration of AI across various applications. Intel's AI product offerings empower customers to tailor their solutions according to specific performance, efficiency, and cost criteria, thus facilitating the acquisition of optimal AI solutions, while simultaneously supporting the cultivation of an open and collaborative ecosystem.

Key Highlights Regarding Habana Gaudi2 Testing Results:

Habana Gaudi2's performance in inference tasks pertaining to the GPT-J model has been impressively validated:

image.png

· Gaudi2 demonstrated impressive inference rates, achieving 78.58 queries per second for GPT-J-99 and 84.08 queries per second for GPT-J-99.9 across server queries and offline samples.

· These noteworthy results were obtained through the utilization of the FP8 data type, attaining a remarkable accuracy rate of 99.9% on this new data format.

Intel's commitment to continual product performance enhancement is evident through its regular software updates for Gaudi2, released every 6-8 weeks, as well as its expanding coverage of AI model support within the MLPerf benchmark framework.

Key Highlights Regarding Fourth-Generation Intel Xeon Scalable Processors Testing Results:

Intel submitted a total of seven inference benchmark tests based on the fourth-generation Intel Xeon Scalable processors, encompassing evaluations of the GPT-J model and various other AI workloads. The results underscore the processors' exceptional performance across a diverse array of AI tasks, including vision, natural language processing, speech and audio translation models, as well as larger-scale deep learning recommendation models like DLRM v2 and ChatGPT-J. Notably, Intel stands as the sole vendor publicly submitting CPU results using industry-standard deep learning ecosystem software.

image.png

· The fourth-generation Intel Xeon Scalable processors represent an ideal choice for constructing and deploying general AI workloads, capable of seamlessly interfacing with popular AI frameworks and libraries. For instance, when tasked with summarizing 1000-1500-word news articles into 100-word summaries using GPT-J, these processors can achieve two summaries per second in offline mode and one summary per second in real-time server mode.

· Additionally, Intel has presented MLPerf results for the Intel Xeon CPU Max series, boasting up to 64 GB of high-bandwidth memory. Notably, this series stands as the exclusive CPU capable of achieving 99.9% accuracy, a critical attribute for precision-demanding applications.

· Collaborative efforts between Intel and OEM manufacturers have resulted in the submission of test results, further emphasizing the scalability of Intel's AI performance and the accessibility of general servers built around Intel Xeon processors, fully aligning with customer Service Level Agreements (SLAs).

MLPerf holds esteemed recognition as an industry benchmark for AI performance, prioritizing the attainment of equitable and replicable product performance comparisons. Intel has outlined its intention to submit fresh AI training performance results for forthcoming MLPerf evaluations. These persistent performance enhancements underscore Intel's unwavering dedication to aiding customers and facilitating advancements in AI technology, regardless of whether it pertains to cost-effective AI processors, high-performance AI hardware accelerators intended for network, cloud, and enterprise usage, or GPUs.

* Solemnly declare: The copyright of this article belongs to the original author. The reprinted article is only for the purpose of disseminating more information. If the author's information is marked incorrectly, please contact us to modify or delete it as soon as possible. Thank you for your attention!