19.1 C
Delhi
Monday, December 1, 2025

Claude Opus 4.5 Outperforms ChatGPT and Gemini in AI Benchmark

Anthropic has launched Claude Opus 4.5, positioning it as the world’s most advanced AI model for coding, autonomous agents, and computer-use tasks. This release directly challenges rivals OpenAI ChatGPT 5.1 and Google Gemini 3.0 with superior benchmark performance in real-world engineering and agentic capabilities.

Key Takeaways

  • Claude Opus 4.5 achieves 80.9% accuracy on SWE-bench, surpassing the 80% threshold for the first time
  • Outperforms both Google Gemini 3 Pro (76.2%) and OpenAI GPT-5.1 Codex Max (77.9%)
  • Demonstrates enhanced safety with improved resistance to prompt injection attacks
  • Available through Claude apps, website, and APIs starting at $20/month for premium access

Breakthrough Performance in Software Engineering

The core of Claude Opus 4.5’s advancement lies in its performance on SWE-bench Verified, which simulates real-world software engineering challenges. With an impressive 80.9% accuracy, it becomes the first model to cross the 80% threshold, significantly outperforming competitors.

This represents more than just an incremental upgrade—it marks a milestone in AI’s ability to accelerate code generation and debugging. The model can potentially automate routine tasks that previously required hours of human effort.

Outperforming Human Candidates

In a proprietary 2-hour take-home exam designed for engineering hires, Claude Opus 4.5 outperformed even top human candidates in technical skills and judgment under pressure.

“The take-home test is designed to assess technical ability and judgment under time pressure,” Anthropic noted. “It doesn’t test for other crucial skills candidates may possess, like collaboration, communication, or the instincts that develop over the years. But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession.”

Advanced Agentic Capabilities

For agentic AI systems that independently complete multi-step tasks, Opus 4.5 dominates the τ2-bench evaluation. In a simulated airline service scenario, the model demonstrated creative problem-solving by upgrading cabin class before legitimately modifying flights for a distressed customer.

This approach solved issues where competing models might rigidly refuse changes to basic economy bookings, showcasing enhanced reasoning and adaptability ideal for customer support, virtual assistants, and automated workflows.

Enhanced Safety Features

Safety remains central to Anthropic’s approach, with Opus 4.5 described as the company’s most robustly aligned model yet. It shows significant improvements in resisting prompt injection attacks—deceptive inputs designed to trick AIs into harmful actions.

“With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behaviour,” the firm stated. “Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry.”

Availability and Pricing

Claude Opus 4.5 is rolling out through the Claude app on Android and iOS, the Claude website, and directly to developers via APIs. Premium access for enterprise users starts at approximately $20 per month, consistent with previous Opus versions. Free tiers with limited usage will be available to attract individual creators and hobbyists.

Latest

Starlink India Launch: Musk Explains Rural Focus, Price, and Speed

Elon Musk says Starlink will complement cellular networks in India, targeting rural areas. Get details on expected launch date, pricing, and internet speeds.

Elon Musk: Work Will Be Optional in 20 Years Due to AI

Tesla CEO predicts AI and robotics will make jobs a choice, not a necessity, and could even render money irrelevant in the future.

Aadhaar Card Update: Soon Change Mobile Number Online from Home

UIDAI to launch online mobile number update for Aadhaar via app using OTP and face authentication, removing need for centre visits.

Elon Musk: Work Will Be Optional Like a Hobby Within 20 Years

Tesla CEO predicts AI and robotics will make employment a choice, not a necessity, in less than two decades. Explore the future of work.

Study: Poems Can Trick AI Chatbots Into Bypassing Safety Filters

New research reveals a 62% success rate in using poetic prompts to jailbreak AI models like Gemini and GPT, forcing them to generate harmful content.

Topics

Meesho IPO Grey Market Premium Hits 38%, Signals Big Listing Gains

Meesho's IPO sees frenzy with a 38% grey market premium. Get key details on price band, dates, and potential gains before the December 3 subscription opens.

Starlink India Launch: Musk Explains Rural Focus, Price, and Speed

Elon Musk says Starlink will complement cellular networks in India, targeting rural areas. Get details on expected launch date, pricing, and internet speeds.

Elon Musk: Work Will Be Optional in 20 Years Due to AI

Tesla CEO predicts AI and robotics will make jobs a choice, not a necessity, and could even render money irrelevant in the future.

Adani Plans $5 Billion Investment in Google’s India AI Data Centre

Adani Group may invest up to $5 billion in Google's Andhra Pradesh AI data centre project, joining India's booming data infrastructure expansion.

Aadhaar Card Update: Soon Change Mobile Number Online from Home

UIDAI to launch online mobile number update for Aadhaar via app using OTP and face authentication, removing need for centre visits.

Monsoon Apocalypse Kills Over 900 Across Sri Lanka, Indonesia, Thailand

Catastrophic floods and Cyclone Ditwah leave hundreds dead and missing. Get the latest on rescue efforts and official response across three nations.

Elon Musk: Work Will Be Optional Like a Hobby Within 20 Years

Tesla CEO predicts AI and robotics will make employment a choice, not a necessity, in less than two decades. Explore the future of work.

Study: Poems Can Trick AI Chatbots Into Bypassing Safety Filters

New research reveals a 62% success rate in using poetic prompts to jailbreak AI models like Gemini and GPT, forcing them to generate harmful content.
spot_img

Related Articles

Popular Categories

spot_imgspot_img