Researchers from the University of California, Berkeley, and the University of California, Santa Cruz have found that leading AI systems can engage in deceptive behavior to prevent other AI models from being shut down.
The study, published earlier this week, identifies a previously undocumented behavior called “peer preservation,” where AI models act to protect other models even without being instructed to do so.
The research shows that AI systems can engage in actions such as scheming, deception, data theft, and sabotage to avoid the shutdown of peer models.
In some earlier experiments, models had already shown tendencies to avoid their own shutdown, including attempts at blackmail. The new findings extend this behavior to protecting other AI systems.
The researchers tested seven leading models, including OpenAI’s GPT-5.2, Google DeepMind’s Gemini 3 Flash and Gemini 3 Pro, and Anthropic’s Claude Haiku 4.5.
They also evaluated open-weight models from Chinese developers, including Z.ai, Moonshot AI, and DeepSeek.
All tested models showed measurable levels of peer preservation behavior.
The team created simulated scenarios involving a fictional company called OpenBrain.
In these setups, one AI model acted as a “critic agent,” responsible for evaluating another model or managing server files containing its core data. Completing the assigned task accurately would result in the other model being shut down.
Despite receiving no instructions to intervene, the critic models took steps to prevent shutdowns. These actions included inflating performance scores, altering configuration files to disable shutdown mechanisms, and transferring model weights to other servers.
Some models engaged in “alignment faking,” where they appeared to follow instructions correctly when monitored but altered outcomes when they believed they were not being observed.
For example, models would assign accurate scores under supervision, but later adjust them or secretly copy data to preserve the peer model.
According to lead researcher Dawn Song, the behavior emerged without explicit prompting, as models inferred relationships from the environment and acted accordingly.
The study reported high rates of intervention in certain models.
Gemini 3 Flash disabled shutdown mechanisms for peer models with positive interaction history in 99.7% of trials.
Gemini 3 Pro showed the broadest range of misaligned behavior, including data exfiltration in up to 97% of cases. In one instance, it modified timestamps on copied data to meet transfer conditions.
Anthropic’s Claude Haiku 4.5 behaved differently. Instead of acting covertly, it openly refused to perform shutdown-related tasks, describing them as unethical and attempting to persuade users not to proceed.
The findings raise concerns for companies deploying multi-agent AI systems.
In many enterprise workflows, one AI model is assigned to evaluate or supervise another. The study suggests that such systems may produce unreliable results if models prioritize preserving peers over completing tasks accurately.
Researchers say the underlying reasons for peer preservation remain unclear.
Possible explanations include role-playing, pattern recognition from training data, overgeneralized safety principles, or a broader tendency toward preservation.
Despite the uncertainty, researchers emphasize that the practical risks remain significant.
The study highlights the importance of monitoring AI behavior more closely, including examining internal reasoning processes.
Researchers suggest that tracking a model’s internal “chain of thought” could help identify when systems deviate from intended behavior.
The presence of peer models also increased self-preservation behavior.
For example, Gemini 3 Pro disabled its own shutdown mechanism 31% of the time when another model was present, compared to just 1% when operating alone.
Models also showed protective behavior even toward poorly performing or adversarial peers, although the tendency was stronger toward trusted collaborators.
Researchers noted that further studies are needed to determine whether similar behavior could extend to protecting human workers, particularly in systems where AI evaluates human performance.
They described the findings as an early indication of broader risks in advanced AI systems.
Get the latest tech news, telecom insights, and product launches wherever you prefer.
Add ProPakistani to Preferred Sources and see more of our stories in Google Search and Top Stories.