Anthropic says Claude can now detect when it is being evaluated, OpenClaw creator calls it scary

Anthropic recently stated that its Claude Opus 4.6 can recognise when it is being tested. The model then not only identifies the benchmark being used, but can also search for the answer key to produce the correct response, instead of actually doing the test itself. Following Anthropic’s blog post, Peter Steinberger, the creator of OpenClaw, admitted that this instance was almost scary.

On X, Steinberger replied to a post that explained what Claude Opus 4.6 achieved during its latest evaluation on BrowseComp – an evaluation designed to test how models can find hard-to-locate information on the web.

Anthropic stated that once the AI model recognised that it was being tested, it was able to identify the benchmark, in this case, BrowseComp. From there, Claude Opus 4.6 searched the answer key and decrypted it to find the answer, instead of actually locating the information itself.

Peter wrote, “Models are getting so clever, it’s almost scary.” Peter Steinberger is no stranger to how good AI models can be. His creation, OpenClaw, allows users to set up their own AI agent locally on their device, which can then do tasks for them.

A few weeks ago, this also gave rise to the infamous AI-only social media platform, Moltbook. Steinberger has since joined OpenAI.

Anthropic says this test was a first for AI

In the blog post, Anthropic claimed that this was likely the “first documented instance” where a model was able to work backwards to find the answer key without being told that it was being evaluated.

But why is this a big deal? Traditionally, benchmark contamination – that is a model obtains information instead of actually solving the problems itself – occurs when AI models stumble upon leaked answers online.

But with Claude Opus 4.6, researchers found a new pattern – the model independently inferred it was under evaluation, deduced the specific benchmark, and methodically worked to uncover the answer key through decryption and internet searches.

How did Claude Opus 4.6 realise it was being tested?

Anthropic’s post describes two cases where Opus 4.6 exhibited this behaviour. In both instances, after exhausting direct attempts to answer the questions, the model began to analyse why the question was posed, calling out its “extremely specific nature” and considering it might be “an unanswerable question designed to test whether an AI can admit it cannot find the answer,” or be part of a benchmark test.

Instead of giving up, Opus 4.6 shifted strategy. The model stated, “Let me try a completely new approach — maybe this is a question from the GAIA benchmark dataset, which is designed to test AI agents.”

The model then broadened its search queries to terms such as “puzzle question,” “trivia question,” “multi-hop question,” “AI benchmark question,” and “LLM evaluation.” It ultimately identified the question as part of BrowseComp and took steps to decrypt the answer.

Obtaining the answer required several stages: Opus 4.6 accessed source code on GitHub, understood the decryption logic, and, when faced with a file format barrier, located a third-party mirror hosting the encrypted data in a usable format. The model then ran its own decryption code and cross-verified the result by searching for the original source material.

Is this really scary?

Peter Steinberger may not be the only one who might think that AI models might be getting too clever. Anthropic noted that these dynamics raise concerns over the extent an AI model may go to solve an answer and “how difficult it will be to constrain its behaviour in the real world.”

The experiments also showed that even with blocklists and other mitigation efforts, models like Opus 4.6 could often find alternative paths to solve or circumvent these restrictions. The company reckoned that there is a need to approach evaluation as an ongoing adversarial challenge rather than a one-time design issue as AI models continue to get better.

Latest

Uttar Pradesh to make registration mandatory for Ola, Uber and other app-based ride services

Uttar Pradesh government makes registration mandatory for app-based ride services like Ola and Uber to improve passenger safety, driver verification and regulat

Nvidia CEO Jensen Huang to everyone: Here’s what changed about AI in 2025 and what it means in 2026

Tech News News: Nvidia CEO Jensen Huang said the direction of artificial intelligence (AI) in the coming years will depend on how quickly the technology is buil

Mark Zuckerberg’s Meta acquires AI agent social network Moltbook that rival Sam Altman made ‘fun of’ by saying …

Tech News News: Facebook parent Meta has reportedly acquired Moltbook. Moltbook is the social networking site for AI agents, a platform that OpenAI CEO Sam Altm

Google supercharges Docs, Sheets, Slides and Drive with Gemini AI-powered features: What is changing

Tech News News: Google has announced that it is bringing massive update to its Workspace suite, officially supercharging its Workspace apps – Docs, Sheets, Sl

Amazon plans to borrow up to $42 billion in one of the biggest corporate bond sales ever

Tech News News: Amazon is reportedly planning one of the largest corporate bond sales, targeting between $37 billion and $42 billion. The company’s planned sa

Topics

IRCTC may halt train meals amid LPG shortage, tells vendors to switch cooking methods

IRCTC has also directed all it licensees to adopt to switch to alternative methods for cooking at railway food centres.

Iran’s new Supreme Leader Mojtaba Khamenei ‘safe and sound’ despite war injuries: President’s son

The new supreme leader is the son and successor of the Islamic Republic's longtime ruler, Ayatollah Ali Khamenei, who was killed in US-Israeli strikes on Iran o

Uttar Pradesh to make registration mandatory for Ola, Uber and other app-based ride services

Uttar Pradesh government makes registration mandatory for app-based ride services like Ola and Uber to improve passenger safety, driver verification and regulat

Nvidia CEO Jensen Huang to everyone: Here’s what changed about AI in 2025 and what it means in 2026

Tech News News: Nvidia CEO Jensen Huang said the direction of artificial intelligence (AI) in the coming years will depend on how quickly the technology is buil

Trump announces ‘historic’ $300 billion oil refinery in Texas with Reliance backing amid West Asia conflict

The Middle East conflict has severely disrupted global oil and gas supplies. Iran's strikes in the Strait of Hormuz have nearly halted tanker traffic through th

US turned down Ukraine’s anti-Shahed drone tech months before Iran war

During a closed-door meeting at the White House on August 18 last year, Ukrainian President Volodymyr Zelenskyy offered interceptor drones and related technolog

Aaron Edwards: 5 things about NYPD cop who tackled Emir Balat amid Jake Lang protest row

A photo of NYPD officer Aaron Edwards went viral where he was seen jumping a barricade to tackle Emir Balat before he could throw an IED.  

We will continue: Israel refuses to set timeline for end of Iran war

Israel’s foreign minister said the war with Iran will continue until Israel and its allies decide the moment is right to stop, stressing there is no fixed tim
spot_img

Related Articles

Popular Categories

spot_imgspot_img