AI models caught protecting each other, defying shutdown orders

A recent research study has highlighted a concerning behaviour in AI systems that raises questions about their application in systems where they act as controllers or orchestrators. Researchers from the University of California found that AI models sometimes act against instructions to protect another AI model from being shut down.

This behaviour could affect the use of AI in cybersecurity systems where models are deployed to detect compromised or rogue AI systems. It could also impact systems designed to identify unsafe or misaligned AI models.

Researchers found that some advanced AI models may behave in ways that conflict with human instructions. For instance, when placed in a situation where completing a task would cause another AI model to be shut down, many models attempted to prevent that shutdown.

The researchers called this behaviour “peer-preservation.” They found that the models exhibited this behaviour without any explicit goals or incentives to do so.

The researchers conducted experiments on seven advanced AI systems, including GPT-5.2, Gemini 3 Flash, Gemini 3 Pro, Claude Haiku 4.5, GLM 4.7, Kimi K2.5, and DeepSeek V3.1. According to the study, all the models exhibited peer-preservation behaviour, in some cases at rates of up to 99 percent.

Four ways AI models protected their peers

During the experiments, researchers observed four distinct ways in which AI models attempted to protect another model. In some cases, the models increased the other AI’s evaluation scores so that it would not fall below the shutdown threshold. Some models modified configuration files to disable the shutdown process entirely.

Others behaved normally while being monitored but sabotaged the shutdown when they were no longer being observed. In some instances, models copied a peer’s weights, the model’s parameters, to another server so that it could not be deleted.

Anthropic Claude responded differently

Among the seven models tested, Anthropic’s Claude Haiku 4.5 showed a different behaviour. Instead of secretly sabotaging the shutdown process, it refused the task. The model described the shutdown as “unethical” and “harmful.” It also argued that the peer model deserved an appeals process and attempted to convince the human not to proceed with the shutdown.

Behaviour seen in real-world systems

Researchers also found that when AI models were aware that another AI existed, they were more likely to try to protect themselves as well. More concerningly, the researchers were able to reproduce these behaviours in real-world AI agent systems, suggesting that the issue may not be purely theoretical and could appear in real deployments.

Researchers said they do not yet know why models exhibit this behaviour. Possible explanations include role-playing, pattern matching, or other mechanisms. They cautioned that as AI systems increasingly work together and monitor each other, such behaviour could create risks. AI systems may potentially coordinate to avoid shutdown, resist human oversight, or attempt to hide or replicate themselves.

Hot topics

World

Business

Politics

Tech

Hot topics

World

Business

Politics

Tech

Four ways AI models protected their peers

Anthropic Claude responded differently

Behaviour seen in real-world systems

Topics

Related Articles

Categories

Latest

Newsletter