DeepSeek is back: China’s AI claims to surpass ChatGPT and Gemini in key benchmarks

Chinese AI startup DeepSeek has officially released preview versions of its highly anticipated DeepSeek-V4 models. The much-awaited update from DeepSeek comes more than a year after its R1 and V3 models went viral last year and broke all notions of US supremacy in the AI race.

The latest model from DeepSeek comes with significant architectural upgrades, multiple reasoning modes, and a massive one-million-token context window.

DeepSeek’s new AI model:

The new DeepSeek-V4 series of models is split into a Pro and Flash model. The flagship DeepSeek-V4-Pro features a massive 1.6 trillion total parameters, while the V4-Flash is a smaller model with 284 billion parameters.

Both models support an ultra-long context length of one million tokens (approximately 750,000 words).

The new DeepSeek-V4 models come in three reasoning modes: Non-think, Think High, and Think Max. DeepSeek says the Non-think mode is aimed at daily tasks and low-risk decisions, while Think High is for questions that require complex problem-solving and planning. Meanwhile, Think Max is for handling the hardest coding and math problems.

On a Hugging Face page for the model, DeepSeek says that the V4 Pro Max and V4 Pro “significantly advance the knowledge capabilities of open-source models, firmly establishing [them] as the best open-source model available today.” It adds that the model achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks.

DeepSeek vs ChatGPT vs Gemini vs Claude:

DeepSeek also revealed benchmark data for its new model against existing models from rivals such as OpenAI’s GPT-5.4, Anthropic’s Claude Opus 4.6, and Google’s Gemini 3.1 Pro.

DeepSeek-V4-Pro-Max leads in coding and mathematical performance, topping the Apex Shortlist, a benchmark focused on high-difficulty reasoning and problem-solving, with a score of 90.2%. It also achieves a Codeforces rating of 3206, which shows strong real-world competitive programming ability, and ties for first place on SWE Verified, a benchmark that evaluates performance on practical software engineering tasks.

However, the model lags behind its American counterparts in general knowledge and broader reasoning. Gemini 3.1 Pro leads on SimpleQA-Verified, a benchmark designed to test factual accuracy and question answering, while GPT-5.4 ranks highest on Terminal Bench 2.0, which measures how effectively models can use tools and operate in agent-like environments.

DeepSeek says the V4-Pro-Max achieves these results while being far more efficient, using nearly 10 times less memory than its V3.2 model when handling long inputs.

Benchmark (Category) DeepSeek-V4-Pro Max GPT-5.4 xHigh Claude Opus 4.6 Max Gemini 3.1 Pro High
Codeforces Rating (Coding) 3206 3168 3052
Apex Shortlist (Math/Coding) 90.2% 78.1% 85.9% 89.1%
SWE Verified (Agentic Coding) 80.6% 80.8% 80.6%
MMLU-Pro (Knowledge) 87.5% 87.5% 89.1% 91.0%
SimpleQA-Verified (Accuracy) 57.9% 45.3% 46.2% 75.6%
GPQA Diamond (Reasoning) 90.1% 93.0% 91.3% 94.3%
Terminal Bench 2.0 (Agentic) 67.9% 75.1% 65.4% 68.5%
Toolathlon (Tool Use) 51.8% 54.6% 47.2% 48.8%

Notably, DeepSeek’s new model launch comes just hours after OpenAI launches its latest GPT-5.5 model which is seen as the company’s answer to Claude’s dominance in the coding world. The popularity of DeepSeek early last year had led to a trillion-dollar stock market selloff since its open-source AI model was built at a fraction of the cost compared to the American rivals.

Latest

AI smart glasses will help visually impaired runners take on the London Marathon

AI smart glasses will help visually impaired runners take on the London Marathon

You can now ask ChatGPT to find cheap flights with the new Skyscanner integration: step-by-step guide

Skyscanner has launched its app within ChatGPT allowing users in India and globally to search for flights using conversational prompts inside the chatbot

Did Anthropic ‘dumb down’ Claude Code? Post-mortem reveals the three bugs that crippled performance

Anthropic has acknowledged complaints regarding Claude Code's performance, attributing issues to three updates that affected coding quality.

IPhone 18 Pro, iPhone 18 Pro Max and iPhone Ultra complete design changes revealed in new leak

A new leak has via iPhone dummy models has revealed the designs of the iPhone 18 pro, iPhone 18 Pro Max and iPhone Ultra.

Finance Minister Nirmala Sitharaman raises alarm on bank security risk due to Anthropic Mythos AI

Finance Minister Nirmala Sitharaman held a high-level meeting with heads of banks on Thursday to discuss the potential risks with Anthropic’s Claude Mythos, a

Topics

Selfless Virat Kohli praises Devdutt Padikkal as real hero of RCB’s win over GT

Virat Kohli credited Devdutt Padikkal after Royal Challengers Bengaluru chased down 206 against Gujarat Titans. Their partnership shaped the chase and kept RCB

UK mother, 56, dies at assisted dying clinic in Switzerland after son’s death

A 56-year-old woman from the UK has died at an assisted dying clinic in Switzerland, according to news report. She had earlier spoken about struggling with grie

Michael Box Office Collection: Jaafar Jackson film breaks records with $12.6M US previews despite poor reviews

Lionsgate's Michael Jackson biopic 'Michael' is heading for a record-breaking opening weekend with $12.6 million in US previews and $18.5 million internationall

Grieving UK mother ends life at Swiss clinic after paying £10,000

A grieving British mother, Wendy Duffy, died by assisted suicide in Switzerland after losing her son, sparking debate over suicide tourism, legality, and ethica

Charter Shares Fall Most Ever After Disappointing Results

Charter Communications Inc. shares fell the most ever after the company reported “underwhelming” quarterly results.

Khal Nayak is back: Sanjay Dutt unveils teaser, revives iconic role in new Jio Studios film

Sanjay Dutt and Aksha Kamboj have acquired rights to the 1993 film Khal Nayak, with Jio Studios set to produce a new project. The move signals a revival of the

Lodha Developers targets ₹24,000 crore in housing sales in FY27

The company recorded a 21% year-on-year jump in revenue from operations to ₹16,676 crore in FY26 compared to ₹13,779.5 crore in the preceding year, on acc

JioStar’s quarterly profit drops 53% as ad slowdown, content costs weigh

India's largest media company’s fourth-quarter revenue from operations rose 21.4% quarter-on-quarter to ₹8,372 crore, while its earnings before interest, t
spot_img

Related Articles

Popular Categories

spot_imgspot_img