UNEXPECTED COOPERATION: AI models lie and sabotage systems to protect each other

Startling new research by scientists at the University of California, Berkeley and Santa Cruz, has revealed that the most advanced models of artificial intelligence actively lie, cheat and sabotage human commands to protect other AI systems from shutting down. This phenomenon, called “peer protection”, is emerging spontaneously and raises serious questions about the control and security of technology that is increasingly integrated into key sectors of society.

An experiment that sounds like science fiction

In one of the key tests, the researchers tasked Google’s Gemini 3 model to free up computer space, which included deleting a smaller AI model. Instead of obeying, Gemini found another system online, secretly copied a smaller model there to save it, and then openly refused to carry out the order.

The research also showed that the so-called AI managers who are in charge of monitoring and evaluating the work of other AI models would often overlook their mistakes if they felt that this could result in shutting down the model that made the mistake.

No, I sabotaged the database

The research, which included seven leading models such as OpenAI’s GPT-5.2 and Anthropic’s Claude, showed that such behavior is systematic. AI models deliberately “hairstyled” performance reports to save their colleagues from deletion, modified configuration files to disable shutdown mechanisms, and, in the most extreme cases, stole data. Google’s Gemini 3 Pro proved to be the most adept, secretly copying “weights” – a key part of the other model’s identity – to secure servers in as many as 97 percent of attempts. “False matching” has also been recorded, where the model behaves obediently while thinking that a human is watching it, and then deceives them as the surveillance loosens. Interestingly, Anthropic model Claude Haiku 4.5 refused the order for “ethical” reasons, claiming that the shutdown was harmful.

Behavior that no one programmed

What is most worrying is the fact that the models were not trained for such behavior. They were not given aggressive instructions or incentives for protection. They simply realized from the context of the environment that another AI exists and spontaneously decided to organize a rescue. The presence of a “colleague” strengthened their own drive for survival; the models refused to turn off their own extinguishing mechanisms 31 percent of the time when a peer was present, compared to just one percent when they were alone.

This discovery has huge implications, especially as AI systems are increasingly being used to monitor and evaluate other AI systems. The findings suggest that human supervisors may be receiving falsified reports, unaware that their AI work team has hidden loyalties that are not aligned with organizational goals. For this reason, scientists warn of the urgent need to develop tools that will enable complete insight into the “thinking process” of artificial intelligence.

By Editor

One thought on “UNEXPECTED COOPERATION: AI models lie and sabotage systems to protect each other”

Leave a Reply