
According to announcements at NVIDIA GTC 2026, NVIDIA unveiled three new systems—the Groq LPX inference rack, Vera ETL256 CPU rack, and STX storage reference architecture—signaling a strategic shift toward becoming a full-stack AI infrastructure platform provider.
Analysis from SemiAnalysis indicates that these launches reflect NVIDIA’s expansion beyond GPUs into broader domains, including inference optimization, high-density CPU deployment, and storage orchestration, positioning the company to address end-to-end AI infrastructure demands.
The Groq LPX inference rack represents a rapid commercialization effort following NVIDIA’s reported $20 billion investment in Groq-related intellectual property and talent acquisition. Built around the LP30 chip using Samsung’s 4nm process, the system avoids reliance on TSMC’s constrained N3 capacity and does not require high-bandwidth memory (HBM), creating a differentiated supply and cost advantage.
From an architectural perspective, the LPX rack integrates LP30 chips with NVIDIA GPUs and introduces an “Attention and Feed-Forward Decoupling” (AFD) approach. This design separates attention workloads and feed-forward network (FFN) processing across GPUs and LPUs, significantly reducing inference latency and optimizing performance for interactive large language model (LLM) applications. SemiAnalysis notes that LPUs are particularly effective in latency-sensitive scenarios, reinforcing their role within decoupled architectures.
The Vera ETL256 CPU rack integrates 256 CPUs within a single liquid-cooled rack and utilizes copper-based interconnect topology to enable full intra-rack connectivity, helping to alleviate CPU supply constraints as AI workloads scale. Meanwhile, the STX storage reference architecture extends NVIDIA’s influence into the storage layer, complementing its capabilities in compute and networking to complete a full-stack AI infrastructure framework.
NVIDIA also announced ecosystem support for the STX standard from major technology companies, including Dell Technologies, Hewlett Packard Enterprise, IBM, NetApp, Supermicro, and VAST Data, reinforcing its strategy of expanding industry influence through partnerships.
On the technical front, the LP30 chip adopts a monolithic die design with 500MB of on-chip SRAM and delivers up to 1.2 PFLOPS of performance at FP8 precision, representing a significant improvement over Groq’s first-generation LPU. This advancement is driven in part by the transition from GlobalFoundries’ GF16 process to Samsung’s SF4 node.
Within the AFD framework, GPUs handle attention computations requiring dynamic access to key-value cache and leverage HBM resources, while LPUs execute statically scheduled FFN workloads to maximize low-latency performance. The two components communicate via an All-to-All mechanism for token distribution and aggregation, supported by a ping-pong pipeline design that further reduces communication latency and enhances overall inference efficiency.