
Chinese firm DeepSeek has released DeepSeek V3, a new open-source AI model, which surpasses existing open-source models and even closed models like OpenAI’s GPT-4o on several benchmarks. The model is equipped with 671 billion parameters and can generate text, code, and perform related tasks. DeepSeek V3 utilizes a mixture of experts (MoE) architecture, optimizing multiple neural networks for different tasks, which helps reduce hardware costs by activating only the relevant network for a given prompt.
The model’s training was done in approximately 2788K H800 GPU hours, estimated at a cost of $5.57 million. This is significantly lower than the multi-million-dollar training costs of large tech companies in the U.S. According to a technical paper, DeepSeek V3 outperformed open-source models like Llama-3.1-405B and Qwen 2.5-72B in most benchmarks. It also beat GPT-4o, except in benchmarks like SimpleQA, which focuses on English, and FRAMES. The only model that outperformed DeepSeek V3 in most benchmarks was Anthropic’s Claude 3.5 Sonnet.
The code for DeepSeek V3 is available on GitHub, and the model can be accessed under the company’s model license.