AI models caught protecting each other, defying shutdown orders

A recent research study has highlighted a concerning behaviour in AI systems that raises questions about their application in systems where they act as controllers or orchestrators. Researchers from the University of California found that AI models sometimes act against instructions to protect another AI model from being shut down.

This behaviour could affect the use of AI in cybersecurity systems where models are deployed to detect compromised or rogue AI systems. It could also impact systems designed to identify unsafe or misaligned AI models.

Researchers found that some advanced AI models may behave in ways that conflict with human instructions. For instance, when placed in a situation where completing a task would cause another AI model to be shut down, many models attempted to prevent that shutdown.

The researchers called this behaviour “peer-preservation.” They found that the models exhibited this behaviour without any explicit goals or incentives to do so.

The researchers conducted experiments on seven advanced AI systems, including GPT-5.2, Gemini 3 Flash, Gemini 3 Pro, Claude Haiku 4.5, GLM 4.7, Kimi K2.5, and DeepSeek V3.1. According to the study, all the models exhibited peer-preservation behaviour, in some cases at rates of up to 99 percent.

Four ways AI models protected their peers

During the experiments, researchers observed four distinct ways in which AI models attempted to protect another model. In some cases, the models increased the other AI’s evaluation scores so that it would not fall below the shutdown threshold. Some models modified configuration files to disable the shutdown process entirely.

Others behaved normally while being monitored but sabotaged the shutdown when they were no longer being observed. In some instances, models copied a peer’s weights, the model’s parameters, to another server so that it could not be deleted.

Anthropic Claude responded differently

Among the seven models tested, Anthropic’s Claude Haiku 4.5 showed a different behaviour. Instead of secretly sabotaging the shutdown process, it refused the task. The model described the shutdown as “unethical” and “harmful.” It also argued that the peer model deserved an appeals process and attempted to convince the human not to proceed with the shutdown.

Behaviour seen in real-world systems

Researchers also found that when AI models were aware that another AI existed, they were more likely to try to protect themselves as well. More concerningly, the researchers were able to reproduce these behaviours in real-world AI agent systems, suggesting that the issue may not be purely theoretical and could appear in real deployments.

Researchers said they do not yet know why models exhibit this behaviour. Possible explanations include role-playing, pattern matching, or other mechanisms. They cautioned that as AI systems increasingly work together and monitor each other, such behaviour could create risks. AI systems may potentially coordinate to avoid shutdown, resist human oversight, or attempt to hide or replicate themselves.

Latest

Apple tops Q1 smartphone market for first time, iPhone 17 drives record growth

Apple secures the top spot in global smartphone shipments for Q1, driven by strong iPhone 17 demand despite rising memory prices.

IPhone 18 Pro and Pixel 11 series launching soon, likely with cool Samsung tech inside

iPhone 18 Pro and Google Pixel 11 are expected to feature Samsung’s M16 OLED panels with improvements in brightness and efficiency.

IPhone 17 selling at under Rs 55,000 in Vijay Sales Apple Days 2026 sale: Here is how the deal works

Vijay Sales Apple Days 2026 sale offers up to Rs 30,000 discount on iPhone 17 with bank and exchange deals.

Anthropic brings Claude AI to Microsoft Word so you can chat with your documents

Anthropic has launched Claude for Word in beta. The new tool is currently only available for for Team and Enterprise subscribers. It allows users to integrate A

Sam Altman says sorry after Molotov attack on his home, admits fear about AI is justified

Sam Altman reacts after a Molotov cocktail attack on his home, addressing allegations and reflecting on past conflicts.

Topics

Word of the day: What ‘Cogent’ means and how to use it right

The word of the Day for April 10 is: Cogent. Learn what it means and how to use it in daily conversation. Add it to your vocabulary and impress everyone around

Quote of the day by Ankur Warikoo: A lot of us will never be successful in our lives, because…

Ankur Warikoo inspires readers to rethink success based on personal awareness and values

If China does that, Trump warns Beijing over possible Iran arms shipment

President Donald Trump warned China would face “big problems” if it sends weapons to Iran, after US intelligence suggested Beijing may secretly route air de

Tristan Stubbs lashes out after glove change denial proves costly in Chennai

Tristan Stubbs was left frustrated after being denied a glove change in humid Chennai, a moment that disrupted his rhythm and proved costly as Delhi Capitals fe

Bank FDs vs small savings schemes: Compare PPF, NSC, Sukanya Samriddhi and fixed deposit interest rates this year

The Centre has kept small savings schemes returns unchanged for the April-June 2026 quarter. When it comes to bank fixed deposits vs small savings schemes, here

2 US Navy destroyers begin mine clearing ops in Hormuz amid ceasefire talks

Earlier in the day, US President Donald Trump had posted on social media that US forces had begun 'clearing out' the Strait of Hormuz and that all of Iran's min

Why Islamabad’s Serena Hotel is hosting US-Iran ceasefire talks?

Spread across landscaped gardens near the Margalla Hills and Rawal Lake, the Serena Hotel in Islamabad fuses traditional Islamic architecture with modern luxury

Pakistan deploys fighter jets and military personnel in Saudi Arabia amid US-Iran ceasefire talks

The deployment follows recent Iranian strikes on the Gulf nation, which damaged essential energy infrastructure and resulted in a Saudi casualty
spot_img

Related Articles

Popular Categories

spot_imgspot_img