On September 29, DeepSeek introduced its next-generation model architecture, DeepSeek-V3.2, sparking major attention across the AI and semiconductor industry. At the same time, Cambricon confirmed full adaptation for DeepSeek-V3.2 and released the open-source inference engine code vLLM-MLU, enabling developers to test the model on its hardware platforms.
The newly announced DeepSeek-V3.2-Exp is an experimental version, serving as a bridge from V3.1-Terminus toward the future architecture. It integrates DeepSeek Sparse Attention, a mechanism designed to improve efficiency in long-text training and inference, marking an important step in scaling model performance.
DeepSeek has already rolled out updates across its app, web, and mini-program, while also cutting API pricing to encourage testing and feedback. Meanwhile, Cambricon's fast adaptation and open-source release provide developers immediate access to benchmark and experiment with the model.
From a technical standpoint, DeepSeek-V3.2 is a 671GB model—so large that downloading it under ideal bandwidth still takes around 8–10 hours. Adapting such a massive model to specialized chips requires deep architectural tuning, compute resource alignment, and compatibility testing—far from a trivial task.
Industry experts note that Cambricon's swift response indicates early collaboration with DeepSeek, well before the official release. Their joint efforts highlight a growing trend: China's top technology companies are moving quietly but decisively, focusing less on hype and more on solid engineering breakthroughs.