30.1 C
Delhi
Monday, March 2, 2026

Claude Opus 4.5 Outperforms ChatGPT and Gemini in AI Benchmark

Anthropic has launched Claude Opus 4.5, positioning it as the world’s most advanced AI model for coding, autonomous agents, and computer-use tasks. This release directly challenges rivals OpenAI ChatGPT 5.1 and Google Gemini 3.0 with superior benchmark performance in real-world engineering and agentic capabilities.

Key Takeaways

  • Claude Opus 4.5 achieves 80.9% accuracy on SWE-bench, surpassing the 80% threshold for the first time
  • Outperforms both Google Gemini 3 Pro (76.2%) and OpenAI GPT-5.1 Codex Max (77.9%)
  • Demonstrates enhanced safety with improved resistance to prompt injection attacks
  • Available through Claude apps, website, and APIs starting at $20/month for premium access

Breakthrough Performance in Software Engineering

The core of Claude Opus 4.5’s advancement lies in its performance on SWE-bench Verified, which simulates real-world software engineering challenges. With an impressive 80.9% accuracy, it becomes the first model to cross the 80% threshold, significantly outperforming competitors.

This represents more than just an incremental upgrade—it marks a milestone in AI’s ability to accelerate code generation and debugging. The model can potentially automate routine tasks that previously required hours of human effort.

Outperforming Human Candidates

In a proprietary 2-hour take-home exam designed for engineering hires, Claude Opus 4.5 outperformed even top human candidates in technical skills and judgment under pressure.

“The take-home test is designed to assess technical ability and judgment under time pressure,” Anthropic noted. “It doesn’t test for other crucial skills candidates may possess, like collaboration, communication, or the instincts that develop over the years. But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession.”

Advanced Agentic Capabilities

For agentic AI systems that independently complete multi-step tasks, Opus 4.5 dominates the τ2-bench evaluation. In a simulated airline service scenario, the model demonstrated creative problem-solving by upgrading cabin class before legitimately modifying flights for a distressed customer.

This approach solved issues where competing models might rigidly refuse changes to basic economy bookings, showcasing enhanced reasoning and adaptability ideal for customer support, virtual assistants, and automated workflows.

Enhanced Safety Features

Safety remains central to Anthropic’s approach, with Opus 4.5 described as the company’s most robustly aligned model yet. It shows significant improvements in resisting prompt injection attacks—deceptive inputs designed to trick AIs into harmful actions.

“With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behaviour,” the firm stated. “Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry.”

Availability and Pricing

Claude Opus 4.5 is rolling out through the Claude app on Android and iOS, the Claude website, and directly to developers via APIs. Premium access for enterprise users starts at approximately $20 per month, consistent with previous Opus versions. Free tiers with limited usage will be available to attract individual creators and hobbyists.

Latest

Sam Altman reveals real reason why OpenAI rushed to partner with US Military after Trump banned Anthropic

OpenAI executives have given more information regarding the AI startup’s contract with the US Department of Defense after facing backlash online. The Sam Altm

After Donald Trump banned Anthropic, US Military used Claude in Iran strikes: Here is what changed

The US Military reportedly used Anthropic’s Claude AI model during its strikes on Iran. The attack on Iran came just a day after US President Donald Trump ins

SIM binding rules go live starting March 1: These WhatsApp, Telegram, Signal and other messaging app users to be impacted

Tech News News: Starting March 1, messaging apps like WhatsApp, Telegram, Signal and others must comply with the Department of Telecommunications' SIM-binding r

More than one year after DeepSeek’s R1 wiped nearly $600 billion off Nvidia market value in single day, Chinese startup planning another launch

Tech News News: DeepSeek, the Chinese AI startup that wiped nearly $600 billion off Nvidia’s market value in a single day with launch of its R1 model, is repo

Nothing Phone 4a and 4a Pro launching on 5 March: Design, expected specs and more

Nothing is set to launch its Phone 4 (a) series on 5 March. The launch event is also likely to see the unveling of new Headphone (a) with bold colors and long b

Topics

Taliban attacks Pak’s Nur Khan base in latest escalation of cross border conflict

Taliban forces reportedly launched armed drone strikes targeting Pakistan’s Command and Control Centre at Nur Khan Air Base in Rawalpindi. Taliban forces carr

Satellite images show damage across Iranian military sites after US-Israel strikes

Fresh satellite imagery shows visible damage to air, drone and naval facilities near Iran’s Konarak region amid escalating regional tensions. The visuals offe

Sensex down 1,000 points: Why is the stock market falling today?

The S&P BSE Sensex fell sharply in early trade, and the NSE Nifty50 also slipped more than 1%, as investors reacted to the fast-changing situation between the U

Qatar, UAE, Syria, Oman: Full list of places that saw attacks amid US-Iran conflict

The Middle East is engulfed in conflict as Iran retaliates against US-Israeli strikes, launching missile and drone attacks across multiple countries. 

AIIMS-trained neurologist warns against repeatedly using reheated cooking oils: ‘Risk of cancer increases manifold…’

Reusing cooking oil is a common practice in many households, but does the money it saves outweigh the health risks? Dr Sehrawat explains the health risks.

Quote of the day by Jon Bon Jovi: ‘You better stand tall when they’re calling you out, don’t bend, don’t break…’

On his birthday, we look back at one of Jon Bon Jovi's most influential quotes, which highlights the importance of standing tall in the face of criticism.

Satellite images show black smoke over Dubai as Iran continues to fire missiles, drones

Iran-US war: Dubai's skyline has dramatically changed after Iranian attacks, with smoke visible in satellite images.

Sam Altman reveals real reason why OpenAI rushed to partner with US Military after Trump banned Anthropic

OpenAI executives have given more information regarding the AI startup’s contract with the US Department of Defense after facing backlash online. The Sam Altm
spot_img

Related Articles

Popular Categories

spot_imgspot_img