Claude Opus 4.5 Outperforms ChatGPT and Gemini in AI Benchmark

Anthropic has launched Claude Opus 4.5, positioning it as the world’s most advanced AI model for coding, autonomous agents, and computer-use tasks. This release directly challenges rivals OpenAI ChatGPT 5.1 and Google Gemini 3.0 with superior benchmark performance in real-world engineering and agentic capabilities.

Key Takeaways

  • Claude Opus 4.5 achieves 80.9% accuracy on SWE-bench, surpassing the 80% threshold for the first time
  • Outperforms both Google Gemini 3 Pro (76.2%) and OpenAI GPT-5.1 Codex Max (77.9%)
  • Demonstrates enhanced safety with improved resistance to prompt injection attacks
  • Available through Claude apps, website, and APIs starting at $20/month for premium access

Breakthrough Performance in Software Engineering

The core of Claude Opus 4.5’s advancement lies in its performance on SWE-bench Verified, which simulates real-world software engineering challenges. With an impressive 80.9% accuracy, it becomes the first model to cross the 80% threshold, significantly outperforming competitors.

This represents more than just an incremental upgrade—it marks a milestone in AI’s ability to accelerate code generation and debugging. The model can potentially automate routine tasks that previously required hours of human effort.

Outperforming Human Candidates

In a proprietary 2-hour take-home exam designed for engineering hires, Claude Opus 4.5 outperformed even top human candidates in technical skills and judgment under pressure.

“The take-home test is designed to assess technical ability and judgment under time pressure,” Anthropic noted. “It doesn’t test for other crucial skills candidates may possess, like collaboration, communication, or the instincts that develop over the years. But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession.”

Advanced Agentic Capabilities

For agentic AI systems that independently complete multi-step tasks, Opus 4.5 dominates the τ2-bench evaluation. In a simulated airline service scenario, the model demonstrated creative problem-solving by upgrading cabin class before legitimately modifying flights for a distressed customer.

This approach solved issues where competing models might rigidly refuse changes to basic economy bookings, showcasing enhanced reasoning and adaptability ideal for customer support, virtual assistants, and automated workflows.

Enhanced Safety Features

Safety remains central to Anthropic’s approach, with Opus 4.5 described as the company’s most robustly aligned model yet. It shows significant improvements in resisting prompt injection attacks—deceptive inputs designed to trick AIs into harmful actions.

“With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behaviour,” the firm stated. “Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry.”

Availability and Pricing

Claude Opus 4.5 is rolling out through the Claude app on Android and iOS, the Claude website, and directly to developers via APIs. Premium access for enterprise users starts at approximately $20 per month, consistent with previous Opus versions. Free tiers with limited usage will be available to attract individual creators and hobbyists.

Latest

Former Meta contractor Sama to lay off more than 1,000 workers in Kenya

Former Meta contractor Sama to lay off more than 1,000 workers in Kenya

AI is a gold mine for spammers and scammers, but Google is using it as a tool to fight back

AI is a gold mine for spammers and scammers, but Google is using it as a tool to fight back

OpenAI policy chief slams AI doomers, says we need to have more responsible conversations

OpenAI’s David Lehane urges responsible discussions around AI, highlighting risks of extreme narratives and stressing the need for balanced public understandi

AI startup Cluely hiring engineer, says it will offer free home, food and even a partner in 1 year

San Francisco-based AI startup Cluely offers a unique job package including free housing, food, and a guaranteed partner after one year.

WhatsApp may soon introduce business chat filtering to reduce spam

WhatsApp reportedly working on a new feature to reduce spam and clutter. The purported feature will help users organise business messages and keep personal chat

Topics

Schools in Kerala, MP and other states change timings, declare holidays amid heatwave

States take action to safeguard students from extreme heat

Kendriya Vidyalaya students score 90%+ in CBSE, share success mantra

With CBSE declaring the Class 10 results, students across India are celebrating their scores and planning their next academic steps. At PM SHRI Kendriya Vidyala

Aadi Abadi factor: How delimitation, women voters shape Tamil Nadu poll narrative

Women voters emerge as pivotal in Tamil Nadu's heated election scene

Markets open flat as geopolitical tensions ease, but caution remains

The BSE Sensex was trading at 78,030.99, up 42.31 points or 0.05% at around 9:43 am. The Nifty 50, however, slipped marginally by 6.85 points or 0.03% to 24,189

Kerala SSLC Results in May, plus two on May 25, confirms education minister

Kerala SSLC and Plus Two Result 2026 dates have been officially announced, giving students clarity on when to expect their scores. The state has also rolled out

Who is Girija Ji? PM Modi meets veteran educationist after 30 years, praises her work

Prime Minister Narendra Modi’s Nagercoil visit blended politics and personal warmth as he reunited with veteran educationist Gomatam Veeraraghavan Girija afte

Lebanon ceasefire: Who said what? Bibi vows troops will stay; Trump hails talks ‘very exciting’ – How Iran reacts?

Iranian Parliament speaker Ghalibaf asserts that Lebanon must be included in any peace agreement between Iran and the U.S., emphasizing its importance for regio

‘Targeting of commercial shipping unacceptable,’ India calls restoration of safe navigation in Strait of Hormuz at UN

India's Ambassador Harish P raised concerns at the UN over threats to commercial shipping in the Strait of Hormuz, urging for safe navigation and calling for de
spot_img

Related Articles

Popular Categories

spot_imgspot_img