Anthropic says Claude can now detect when it is being evaluated, OpenClaw creator calls it scary

Anthropic recently stated that its Claude Opus 4.6 can recognise when it is being tested. The model then not only identifies the benchmark being used, but can also search for the answer key to produce the correct response, instead of actually doing the test itself. Following Anthropic’s blog post, Peter Steinberger, the creator of OpenClaw, admitted that this instance was almost scary.

On X, Steinberger replied to a post that explained what Claude Opus 4.6 achieved during its latest evaluation on BrowseComp – an evaluation designed to test how models can find hard-to-locate information on the web.

Anthropic stated that once the AI model recognised that it was being tested, it was able to identify the benchmark, in this case, BrowseComp. From there, Claude Opus 4.6 searched the answer key and decrypted it to find the answer, instead of actually locating the information itself.

Peter wrote, “Models are getting so clever, it’s almost scary.” Peter Steinberger is no stranger to how good AI models can be. His creation, OpenClaw, allows users to set up their own AI agent locally on their device, which can then do tasks for them.

A few weeks ago, this also gave rise to the infamous AI-only social media platform, Moltbook. Steinberger has since joined OpenAI.

Anthropic says this test was a first for AI

In the blog post, Anthropic claimed that this was likely the “first documented instance” where a model was able to work backwards to find the answer key without being told that it was being evaluated.

But why is this a big deal? Traditionally, benchmark contamination – that is a model obtains information instead of actually solving the problems itself – occurs when AI models stumble upon leaked answers online.

But with Claude Opus 4.6, researchers found a new pattern – the model independently inferred it was under evaluation, deduced the specific benchmark, and methodically worked to uncover the answer key through decryption and internet searches.

How did Claude Opus 4.6 realise it was being tested?

Anthropic’s post describes two cases where Opus 4.6 exhibited this behaviour. In both instances, after exhausting direct attempts to answer the questions, the model began to analyse why the question was posed, calling out its “extremely specific nature” and considering it might be “an unanswerable question designed to test whether an AI can admit it cannot find the answer,” or be part of a benchmark test.

Instead of giving up, Opus 4.6 shifted strategy. The model stated, “Let me try a completely new approach — maybe this is a question from the GAIA benchmark dataset, which is designed to test AI agents.”

The model then broadened its search queries to terms such as “puzzle question,” “trivia question,” “multi-hop question,” “AI benchmark question,” and “LLM evaluation.” It ultimately identified the question as part of BrowseComp and took steps to decrypt the answer.

Obtaining the answer required several stages: Opus 4.6 accessed source code on GitHub, understood the decryption logic, and, when faced with a file format barrier, located a third-party mirror hosting the encrypted data in a usable format. The model then ran its own decryption code and cross-verified the result by searching for the original source material.

Is this really scary?

Peter Steinberger may not be the only one who might think that AI models might be getting too clever. Anthropic noted that these dynamics raise concerns over the extent an AI model may go to solve an answer and “how difficult it will be to constrain its behaviour in the real world.”

The experiments also showed that even with blocklists and other mitigation efforts, models like Opus 4.6 could often find alternative paths to solve or circumvent these restrictions. The company reckoned that there is a need to approach evaluation as an ongoing adversarial challenge rather than a one-time design issue as AI models continue to get better.

Latest

Elon Musk tells his side of OpenAIs beginnings in trial pitting him against CEO Sam Altman

Elon Musk tells his side of OpenAI's beginnings in trial pitting him against CEO Sam Altman

Goa govt unveils draft AI policy with aim to position state as global hub for high-tech innovation

Goa govt unveils draft AI policy with aim to position state as global hub for high-tech innovation

Apple iPhone Ultra and MacBook Ultra maybe in the works, what to expect

Apple may be planning to stretch its “Ultra” branding beyond watches and chips, with a foldable iPhone and a premium touchscreen MacBook reportedly in the w

After firing 30,000, AWS CEO says AI isn’t replacing jobs and Amazon intends to hire 11,000

It’s all very confusing at the moment. Just weeks after Amazon completed layoffs totalling 30,000, AWS CEO Matt Garman has said that Amazon is now hiring 11,0

Google backs out of $100 million Pentagon challenge to build AI drones for US military because of ethics

Google has withdrawn from a $100 million Pentagon drone swarm challenge after internal review. The move highlights ongoing tensions in companies over AI use in

Topics

Superparent CBSE: Education Board takes on new role, wants to parent India’s parents

CBSE has rolled out a 60-page Parenting Calendar for 2026-27 to structure how schools and parents engage on child development. The move has raised questions ove

Markets end lower as crude stays high, rupee hits record low

Indian equity markets closed lower on Thursday as high crude prices and a weak rupee weighed on sentiment. The slide highlighted persistent inflation and capita

When will Tamil Nadu Board declare Class 10, 12 results? Check latest updates here

The Directorate of Government Examinations (DGE), Tamil Nadu, is making final preparations to announce the TN SSLC (Class 10) Result 2026 and TN HSE (Class 12)

Uttar Pradesh universities to adopt AKTU model with online exams and CCTV

Uttar Pradesh has approved the AKTU examination model for all state universities except agricultural and medical institutions. The shift will digitise question

Which city is known as the Orange City of India?

Nagpur became known as the Orange City because of its large orange cultivation and trade. The name reflects how geography, transport links and mandarins shaped

AP SSC Results 2026 declared: Direct link to check marksheet and pass percentage

BSEAP has declared the AP SSC Class 10 results 2026 on its official websites. Around 6.4 lakh students can now download their marksheets through verified online

CISCE declares ICSE, ISC board results. Direct link to download scorecards

CISCE has declared the ICSE Class 10 and ISC Class 12 results for 2026, with scorecards now available on the official websites. Students can check the pass perc

CBSE Class 12 results likely by May 23, on-screen marking on track: Official

CBSE said Class 12 results are likely in the third week of May and dismissed reports of problems with on-screen marking. The board said evaluation is running on
spot_img

Related Articles

Popular Categories

spot_imgspot_img