DeepSeek releases DeepSeek-V4 preview models with claims of competitive performance against ChatGPT and Gemini

This article was generated by AI and cites original sources.

Chinese AI startup DeepSeek has released preview versions of its new DeepSeek-V4 models, introducing a Pro and a Flash variant. The company claims these models can compete with leading systems such as ChatGPT and Gemini across several benchmarks.

Model Specifications and Features

DeepSeek’s DeepSeek-V4 models include architectural upgrades, multiple reasoning modes, and a one-million-token context window. The flagship DeepSeek-V4-Pro is listed with 1.6 trillion total parameters, while the DeepSeek-V4-Flash is smaller at 284 billion parameters.

The new models support three reasoning modes: Non-think, Think High, and Think Max. According to DeepSeek, Non-think is aimed at daily tasks and low-risk decisions, Think High is for complex problem-solving and planning, and Think Max is intended for difficult coding and math problems.

Benchmark Claims

On a Hugging Face page for the model, DeepSeek states that its V4 Pro Max and V4 Pro “significantly advance the knowledge capabilities of open-source models” and that they “firmly establish [them] as the best open-source model available today.” The company also states its models achieve top-tier performance in coding benchmarks and narrow the gap with leading closed-source models on reasoning and agentic tasks.

DeepSeek published benchmark results comparing DeepSeek-V4-Pro-Max with other systems including OpenAI’s GPT-5.4, Anthropic’s Claude Opus 4.6, and Google’s Gemini 3.1 Pro. According to the figures DeepSeek shared, DeepSeek-V4-Pro-Max leads in coding and math performance: it tops the Apex Shortlist (math/coding) with a 90.2% score and has a Codeforces rating of 3206. It also ties for first place on SWE Verified, a benchmark focused on practical software engineering tasks, with an 80.6% score listed for DeepSeek-V4-Pro-Max.

In other areas, the posted results show DeepSeek trailing some competitors. Gemini 3.1 Pro is listed as leading on SimpleQA-Verified (57.9% for DeepSeek-V4-Pro-Max, with 75.6% shown for Gemini 3.1 Pro). GPT-5.4 ranks highest on Terminal Bench 2.0 (with 67.9% shown for DeepSeek-V4-Pro-Max and 75.1% shown for GPT-5.4), which is described in the source material as measuring effectiveness in tool use and agent-like environments.

Efficiency Improvements

DeepSeek states that the V4-Pro-Max uses nearly 10 times less memory than its V3.2 model while handling long inputs.

Context

This release comes more than a year after DeepSeek’s earlier R1 and V3 models were released. The announcement includes the one-million-token context window and multiple reasoning modes, along with benchmark comparisons against GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro across coding, accuracy, knowledge, reasoning, and tool use.

Source: mint – technology