Chinese AI startup DeepSeek has released preview versions of its new DeepSeek-V4 models, introducing a Pro and a Flash variant. The company claims these models can compete with leading systems such as ChatGPT and Gemini across several benchmarks.
Model Specifications and Features
DeepSeek’s DeepSeek-V4 models include architectural upgrades, multiple reasoning modes, and a one-million-token context window. The flagship DeepSeek-V4-Pro is listed with 1.6 trillion total parameters, while the DeepSeek-V4-Flash is smaller at 284 billion parameters.
The new models support three reasoning modes: Non-think, Think High, and Think Max. According to DeepSeek, Non-think is aimed at daily tasks and low-risk decisions, Think High is for complex problem-solving and planning, and Think Max is intended for difficult coding and math problems.
Benchmark Claims
On a Hugging Face page for the model, DeepSeek states that its V4 Pro Max and V4 Pro “significantly advance the knowledge capabilities of open-source models” and that they “firmly establish [them] as the best open-source model available today.” The company also states its models achieve top-tier performance in coding benchmarks and narrow the gap with leading closed-source models on reasoning and agentic tasks.
DeepSeek published benchmark results comparing DeepSeek-V4-Pro-Max with other systems including OpenAI’s GPT-5.4, Anthropic’s Claude Opus 4.6, and Google’s Gemini 3.1 Pro. According to the figures DeepSeek shared, DeepSeek-V4-Pro-Max leads in coding and math performance: it tops the Apex Shortlist (math/coding) with a 90.2% score and has a Codeforces rating of 3206. It also ties for first place on SWE Verified, a benchmark focused on practical software engineering tasks, with an 80.6% score listed for DeepSeek-V4-Pro-Max.
In other areas, the posted results show DeepSeek trailing some competitors. Gemini 3.1 Pro is listed as leading on SimpleQA-Verified (57.9% for DeepSeek-V4-Pro-Max, with 75.6% shown for Gemini 3.1 Pro). GPT-5.4 ranks highest on Terminal Bench 2.0 (with 67.9% shown for DeepSeek-V4-Pro-Max and 75.1% shown for GPT-5.4), which is described in the source material as measuring effectiveness in tool use and agent-like environments.
Efficiency Improvements
DeepSeek states that the V4-Pro-Max uses nearly 10 times less memory than its V3.2 model while handling long inputs.
Context
This release comes more than a year after DeepSeek’s earlier R1 and V3 models were released. The announcement includes the one-million-token context window and multiple reasoning modes, along with benchmark comparisons against GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro across coding, accuracy, knowledge, reasoning, and tool use.
Source: mint – technology