Anthropic says Claude can now detect when it is being evaluated, OpenClaw creator calls it scary

Anthropic recently stated that its Claude Opus 4.6 can recognise when it is being tested. The model then not only identifies the benchmark being used, but can also search for the answer key to produce the correct response, instead of actually doing the test itself. Following Anthropic’s blog post, Peter Steinberger, the creator of OpenClaw, admitted that this instance was almost scary.

On X, Steinberger replied to a post that explained what Claude Opus 4.6 achieved during its latest evaluation on BrowseComp – an evaluation designed to test how models can find hard-to-locate information on the web.

Anthropic stated that once the AI model recognised that it was being tested, it was able to identify the benchmark, in this case, BrowseComp. From there, Claude Opus 4.6 searched the answer key and decrypted it to find the answer, instead of actually locating the information itself.

Peter wrote, “Models are getting so clever, it’s almost scary.” Peter Steinberger is no stranger to how good AI models can be. His creation, OpenClaw, allows users to set up their own AI agent locally on their device, which can then do tasks for them.

A few weeks ago, this also gave rise to the infamous AI-only social media platform, Moltbook. Steinberger has since joined OpenAI.

Anthropic says this test was a first for AI

In the blog post, Anthropic claimed that this was likely the “first documented instance” where a model was able to work backwards to find the answer key without being told that it was being evaluated.

But why is this a big deal? Traditionally, benchmark contamination – that is a model obtains information instead of actually solving the problems itself – occurs when AI models stumble upon leaked answers online.

But with Claude Opus 4.6, researchers found a new pattern – the model independently inferred it was under evaluation, deduced the specific benchmark, and methodically worked to uncover the answer key through decryption and internet searches.

How did Claude Opus 4.6 realise it was being tested?

Anthropic’s post describes two cases where Opus 4.6 exhibited this behaviour. In both instances, after exhausting direct attempts to answer the questions, the model began to analyse why the question was posed, calling out its “extremely specific nature” and considering it might be “an unanswerable question designed to test whether an AI can admit it cannot find the answer,” or be part of a benchmark test.

Instead of giving up, Opus 4.6 shifted strategy. The model stated, “Let me try a completely new approach — maybe this is a question from the GAIA benchmark dataset, which is designed to test AI agents.”

The model then broadened its search queries to terms such as “puzzle question,” “trivia question,” “multi-hop question,” “AI benchmark question,” and “LLM evaluation.” It ultimately identified the question as part of BrowseComp and took steps to decrypt the answer.

Obtaining the answer required several stages: Opus 4.6 accessed source code on GitHub, understood the decryption logic, and, when faced with a file format barrier, located a third-party mirror hosting the encrypted data in a usable format. The model then ran its own decryption code and cross-verified the result by searching for the original source material.

Is this really scary?

Peter Steinberger may not be the only one who might think that AI models might be getting too clever. Anthropic noted that these dynamics raise concerns over the extent an AI model may go to solve an answer and “how difficult it will be to constrain its behaviour in the real world.”

The experiments also showed that even with blocklists and other mitigation efforts, models like Opus 4.6 could often find alternative paths to solve or circumvent these restrictions. The company reckoned that there is a need to approach evaluation as an ongoing adversarial challenge rather than a one-time design issue as AI models continue to get better.

Latest

Uttar Pradesh to make registration mandatory for Ola, Uber and other app-based ride services

Uttar Pradesh government makes registration mandatory for app-based ride services like Ola and Uber to improve passenger safety, driver verification and regulat

Nvidia CEO Jensen Huang to everyone: Here’s what changed about AI in 2025 and what it means in 2026

Tech News News: Nvidia CEO Jensen Huang said the direction of artificial intelligence (AI) in the coming years will depend on how quickly the technology is buil

Mark Zuckerberg’s Meta acquires AI agent social network Moltbook that rival Sam Altman made ‘fun of’ by saying …

Tech News News: Facebook parent Meta has reportedly acquired Moltbook. Moltbook is the social networking site for AI agents, a platform that OpenAI CEO Sam Altm

Google supercharges Docs, Sheets, Slides and Drive with Gemini AI-powered features: What is changing

Tech News News: Google has announced that it is bringing massive update to its Workspace suite, officially supercharging its Workspace apps – Docs, Sheets, Sl

Amazon plans to borrow up to $42 billion in one of the biggest corporate bond sales ever

Tech News News: Amazon is reportedly planning one of the largest corporate bond sales, targeting between $37 billion and $42 billion. The company’s planned sa

Topics

Uttar Pradesh to make registration mandatory for Ola, Uber and other app-based ride services

Uttar Pradesh government makes registration mandatory for app-based ride services like Ola and Uber to improve passenger safety, driver verification and regulat

Nvidia CEO Jensen Huang to everyone: Here’s what changed about AI in 2025 and what it means in 2026

Tech News News: Nvidia CEO Jensen Huang said the direction of artificial intelligence (AI) in the coming years will depend on how quickly the technology is buil

US turned down Ukraine’s anti-Shahed drone tech months before Iran war

During a closed-door meeting at the White House on August 18 last year, Ukrainian President Volodymyr Zelenskyy offered interceptor drones and related technolog

Aaron Edwards: 5 things about NYPD cop who tackled Emir Balat amid Jake Lang protest row

A photo of NYPD officer Aaron Edwards went viral where he was seen jumping a barricade to tackle Emir Balat before he could throw an IED.  

We will continue: Israel refuses to set timeline for end of Iran war

Israel’s foreign minister said the war with Iran will continue until Israel and its allies decide the moment is right to stop, stressing there is no fixed tim

New Jersey turnpike bus fire: What caused blaze, when will Lincoln Tunnel return to normal? Check live updates

A New Jersey transit bus caught fire today on the New Jersey turnpike, causing traffic delays.

Fact check: Is US Navy ‘escorting’ oil tankers in Strait of Hormuz? White House debunks Chris Wright

The White House said Chris Wright’s claim that the U.S. Navy escorted a tanker through the Strait of Hormuz was incorrect.

Iran showed no intention of nuclear deal during talks: Trump envoy Steve Witkoff

Trump envoy Steve Witkoff said talks with Iran collapsed after Tehran insisted on its right to enrich uranium and refused to make concessions, adding in an inte
spot_img

Related Articles

Popular Categories

spot_imgspot_img