
On January 27, Microsoft officially introduced Maia 200, its second-generation in-house AI chip. Designed as a next-step inference accelerator, Maia 200 targets one clear goal: making large-scale AI token generation faster and more cost-efficient for modern data center workloads.
Built on TSMC's advanced 3nm process, Maia 200 integrates native FP8 and FP4 tensor cores and a completely reworked memory architecture. The chip combines 216GB of HBM3e with up to 7TB/s bandwidth, 272MB of on-chip SRAM, and a high-throughput data movement engine to keep massive models running efficiently. According to Microsoft, this design allows Maia 200 to deliver leading performance among hyperscale, self-developed AI chips. Its FP4 performance is around three times higher than Amazon's third-generation Trainium, while FP8 performance surpasses Google's seventh-generation TPU. On a cost basis, Microsoft says Maia 200 improves performance per dollar by about 30% compared with its latest deployed hardware.
Maia 200 plays a key role in Microsoft's heterogeneous AI infrastructure strategy. It is designed to support a wide range of models, including OpenAI's latest GPT-5.2, helping improve the cost efficiency of platforms such as Microsoft Foundry and Microsoft 365 Copilot. Internally, Microsoft's Superintelligence team is also using Maia 200 for synthetic data generation and reinforcement learning, accelerating the creation and filtering of high-quality, domain-specific datasets to feed next-generation models.
Deployment is already underway. Maia 200 has been rolled out in Microsoft's U.S. Central data center region near Des Moines, Iowa, with additional deployments planned for the U.S. West region near Phoenix, Arizona, and more regions to follow. The chip is tightly integrated into Azure, and Microsoft has opened a preview of the Maia SDK, offering a full toolchain for model development and optimization. The SDK includes PyTorch support, the Triton compiler, optimized kernel libraries, and access to Maia's low-level programming model, making it easier to fine-tune performance and port models across different AI accelerators.
From a hardware perspective, Maia 200 is built at serious scale. Each chip contains over 140 billion transistors and is optimized for modern low-precision AI workloads. At FP4, a single Maia 200 can deliver more than 10 petaFLOPS, while FP8 performance exceeds 5 petaFLOPS, all within a 750W SoC TDP. This gives it enough headroom to run today's largest models while remaining ready for even larger models in the future.
Microsoft also emphasizes that raw FLOPS alone are not enough to boost AI performance. Data delivery is just as critical. Maia 200 addresses this with a redesigned memory subsystem built around narrow-precision data types, dedicated DMA engines, on-chip SRAM, and a custom network-on-chip (NoC) for high-bandwidth data movement. Together, these elements significantly improve token throughput during inference.
The newly released Maia 200 SDK is now available, featuring the Triton compiler, PyTorch integration, low-level NPL programming, plus a Maia simulator and cost estimator. These tools allow developers to optimize performance and efficiency early in the development cycle—well before deployment at scale.