Recently, AI inference chip company Groq announced the completion of a $750 million funding round, bringing its post-money valuation to $6.9 billion. The round was led by Disruptive, with participation from BlackRock, Neuberger Berman, Deutsche Telekom Capital Partners, and a major U.S. West Coast mutual fund, alongside returning investors including Samsung, Cisco, D1, Altimeter, 1789 Capital, and Infinitum.
Disruptive, based in Dallas, has invested in some of the most transformative tech companies over the past decade, including Palantir, Airbnb, Spotify, Shield AI, Hims, Databricks, Stripe, and Slack, and has now contributed nearly $350 million to Groq.
Just last August, Groq raised $640 million in a Series D round at a $2.8 billion valuation and later secured $1.5 billion from Saudi Arabia in 2025 to expand AI infrastructure in the region.
In February 2024, Groq launched the world's first large-model inference chip based on its LPU (Language Processing Unit) design. Built on the new TSA (Tensor Streaming Architecture), the LPU chip targets highly parallel, compute-intensive workloads in AI and machine learning.
While Groq's LPU uses a 14nm process instead of the most advanced nodes, its proprietary TSA architecture enables massive parallel processing, handling millions of data streams simultaneously. The chip integrates 230 MB of on-chip SRAM instead of DRAM, achieving a memory bandwidth of up to 80 TB/s. Official benchmarks show the LPU can reach up to 1,000 TOPS and outperform conventional GPUs and TPUs by 10–100× on certain machine learning models.
Groq claims its LPU-based cloud servers outperform NVIDIA GPU-based ChatGPT in both speed and token generation, producing up to 500 tokens per second compared to about 40 tokens per second for ChatGPT-3.5. This represents a 10× speed advantage over NVIDIA GPUs. Across cloud AI inference workloads, Groq reports its servers achieve up to 18× faster performance than competitors while consuming just a fraction of the energy—1–3 joules per token compared to 10–30 joules on GPUs, delivering roughly 100× improvement in efficiency.
Groq's strategy uses fiber-optic interconnects to link hundreds of LPUs, each with on-chip SRAM. A 576-LPU cluster can generate over 300 tokens per second on Meta's Llama 2 70B model, outperforming an 8-GPU HGX H100 system by 10× while using just one-tenth the power.
Demonstrations have highlighted the LPU's versatility, running Mistral AI's Mixtral8x7B SMoE and Meta's Llama2 7B and 70B models with up to 4,096-byte context lengths, offering hands-on demo experiences. Groq has boldly announced plans to surpass NVIDIA within three years.
According to the latest disclosures, Groq's technology now supports over 2 million developers and numerous global Fortune 500 companies, while the company continues expanding its data center footprint across North America, Europe, and the Middle East.