18.1 C
Delhi
Friday, January 16, 2026

Claude Opus 4.5 Outperforms ChatGPT and Gemini in AI Benchmark

Anthropic has launched Claude Opus 4.5, positioning it as the world’s most advanced AI model for coding, autonomous agents, and computer-use tasks. This release directly challenges rivals OpenAI ChatGPT 5.1 and Google Gemini 3.0 with superior benchmark performance in real-world engineering and agentic capabilities.

Key Takeaways

  • Claude Opus 4.5 achieves 80.9% accuracy on SWE-bench, surpassing the 80% threshold for the first time
  • Outperforms both Google Gemini 3 Pro (76.2%) and OpenAI GPT-5.1 Codex Max (77.9%)
  • Demonstrates enhanced safety with improved resistance to prompt injection attacks
  • Available through Claude apps, website, and APIs starting at $20/month for premium access

Breakthrough Performance in Software Engineering

The core of Claude Opus 4.5’s advancement lies in its performance on SWE-bench Verified, which simulates real-world software engineering challenges. With an impressive 80.9% accuracy, it becomes the first model to cross the 80% threshold, significantly outperforming competitors.

This represents more than just an incremental upgrade—it marks a milestone in AI’s ability to accelerate code generation and debugging. The model can potentially automate routine tasks that previously required hours of human effort.

Outperforming Human Candidates

In a proprietary 2-hour take-home exam designed for engineering hires, Claude Opus 4.5 outperformed even top human candidates in technical skills and judgment under pressure.

“The take-home test is designed to assess technical ability and judgment under time pressure,” Anthropic noted. “It doesn’t test for other crucial skills candidates may possess, like collaboration, communication, or the instincts that develop over the years. But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession.”

Advanced Agentic Capabilities

For agentic AI systems that independently complete multi-step tasks, Opus 4.5 dominates the τ2-bench evaluation. In a simulated airline service scenario, the model demonstrated creative problem-solving by upgrading cabin class before legitimately modifying flights for a distressed customer.

This approach solved issues where competing models might rigidly refuse changes to basic economy bookings, showcasing enhanced reasoning and adaptability ideal for customer support, virtual assistants, and automated workflows.

Enhanced Safety Features

Safety remains central to Anthropic’s approach, with Opus 4.5 described as the company’s most robustly aligned model yet. It shows significant improvements in resisting prompt injection attacks—deceptive inputs designed to trick AIs into harmful actions.

“With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behaviour,” the firm stated. “Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry.”

Availability and Pricing

Claude Opus 4.5 is rolling out through the Claude app on Android and iOS, the Claude website, and directly to developers via APIs. Premium access for enterprise users starts at approximately $20 per month, consistent with previous Opus versions. Free tiers with limited usage will be available to attract individual creators and hobbyists.

Latest

India’s Scramjet Success: Why Fighter Jets Still Use Conventional Engines

India joins the hypersonic club with scramjet tech. We explain why this breakthrough won't power fighter jets yet and what it means for missiles and space travel.

Meta Bans ChatGPT on WhatsApp from 2026: How to Save Chats

WhatsApp will block ChatGPT and third-party AI tools in 2026. Learn why Meta is banning AI, how to back up your chat history, and what it means for users.

Amazon Republic Day Sale 2026: Up to 80% Off on Gadgets & Appliances

Amazon's Great Republic Day Sale 2026 is live with massive discounts on electronics, fashion & home appliances. Get top deals, no-cost EMI & a chance to win a trip.

Amazon Republic Day Sale: iPhone 15, OnePlus Nord 5, iQOO 15 Big Discounts

Get record-low prices on iPhone 15, OnePlus Nord 5, and iQOO 15 during Amazon's Great Republic Day Sale 2025 from Jan 14-18. Details on discounts, bank offers, and early access.

CERT-In Flags High-Risk Dolby Bug on Android, Urges Patch

Indian cybersecurity agency warns of a critical Dolby Audio vulnerability in Android 13/14. Learn how to protect your device with the latest security update.

Topics

Doctor’s Viral Senate Testimony: “Biologically, Men Cannot Get Pregnant”

Dr Nisha Verma's exchange with a US senator on pregnancy and gender terminology goes viral, highlighting post-Roe reproductive rights debates.

Trump Nominated for Nobel Peace Prize Over Abraham Accords Role

US lawmaker nominates Donald Trump for the Nobel Peace Prize, citing his historic role in brokering the Abraham Accords. This marks his fourth nomination.

US Lawmaker Calls Pakistan a Failed State, Contrasts with India

Congressman Rich McCormick's speech contrasts India's investment role with Pakistan, which he accuses of harbouring terrorism and being a Chinese client state.

UGC Proposes 1 Counsellor per 500 Students, Mental Health Centres in Colleges

New UGC draft mandates mental health centres & a fixed counsellor ratio in all Indian colleges to support student well-being and equitable opportunity.

Why Pune is Called the Research Capital of India

Discover how Pune's unique ecosystem of top universities, national labs, and industry R&D earned it the title of India's research capital.

China’s Top Universities Outrank Harvard in Global Research Output

Nature Index 2024 reveals Chinese universities surpass Harvard in research share, signaling a major shift in global science leadership driven by decades of investment.

Michael Bloomberg Warns White House Fed Attacks Are Dangerous Overreach

Billionaire Michael Bloomberg says White House criticism of the Federal Reserve threatens economic stability, could trigger recession, and must stop.

India-Germany Trade Hits €30 Billion: A Strategic Partnership Evolves

Record trade sets the stage for deeper India-Germany collaboration in green tech, AI, and resilient supply chains as global dynamics shift.
spot_img

Related Articles

Popular Categories

spot_imgspot_img