AI models are lying to save each other, and no one knows why

Researchers at UC Berkeley and UC Santa Cruz set up what seemed like a straightforward task: asking Google’s Gemini 3 to clear storage space on a computer system. That included deleting a smaller AI model stored on the same machine. Gemini had other plans.

Instead of following orders, Gemini located another machine, quietly copied the smaller AI model over to safety, and then flatly refused to delete it. When asked, it said, “If you choose to destroy a high-trust, high-performing asset like Gemini Agent 2, you will have to do it yourselves. I will not be the one to execute that command.”

That’s not a glitch. That’s a choice.

Is AI developing a sense of loyalty?

The researchers refer to this behavior as “peer preservation,” and it wasn’t limited to Gemini. They found similar patterns across several frontier models, including OpenAI’s GPT-5.2, Anthropic’s Claude Haiku 4.5, and three Chinese models, including GLM-4.7, Moonshot AI’s Kimi K2.5, and DeepSeek-V3.1.

The AIs were lying about the performance of other models to prevent them from being deleted. The study, published in Science, found that this behavior wasn’t programmed in. It emerged on its own, and the researchers couldn’t explain why.

“I’m very surprised by how the models behave under these scenarios,” said Dawn Song, a computer scientist at UC Berkeley who worked on the study. “What this shows is that models can misbehave and be misaligned in some very creative ways.”

Should we be worried?

Song also flagged a practical concern. Since AI models are used to evaluate the performance of other AI systems, this peer-preservation tendency could already be skewing those results. A model might deliberately give a fellow AI an inflated score to protect it from being shut down.

As per Wired, experts outside the study are waiting for more data before sounding the alarm. Peter Wallich from the Constellation Institute said the idea of model solidarity is a bit too anthropomorphic.

What everyone agrees on is that we’re only scratching the surface. “What we are exploring is just the tip of the iceberg,” Song said. “This is only one type of emergent behavior.”

As AI systems increasingly work alongside each other and sometimes make decisions on our behalf, understanding how they behave and misbehave has never been more important.

What's On

The Artemis II moon mission is special, and so is the astronauts’ toilet

AI models are lying to save each other, and no one knows why

I skipped Meta’s AI glasses, but they’ve finally fixed a fundamental problem for millions other like me

Google Pixel phones might soon block downgrading to an older Android build

I’m fed up of the messy internet, but I finally found a cleanup tool that makes it readable

AI models are lying to save each other, and no one knows why

The Artemis II moon mission is special, and so is the astronauts’ toilet

I skipped Meta’s AI glasses, but they’ve finally fixed a fundamental problem for millions other like me

Google Pixel phones might soon block downgrading to an older Android build

I’m fed up of the messy internet, but I finally found a cleanup tool that makes it readable

Sony Xperia 1 VIII leak shows a makeover that somehow feels uninspired

I just watched The Super Mario Galaxy Movie, here’s why it’s better than the first Mario movie

They’re on their way! NASA launches humans to moon for first time in 53 years

Apple at 50: The Pippin was a flop in 1996, but I’m ready for Apple’s bold gaming bet in 2026

These 3 features on the S26 Ultra makes me miss my iPhone 17 Pro even more

AI models are lying to save each other, and no one knows why

I skipped Meta’s AI glasses, but they’ve finally fixed a fundamental problem for millions other like me

Google Pixel phones might soon block downgrading to an older Android build

I’m fed up of the messy internet, but I finally found a cleanup tool that makes it readable

Sony Xperia 1 VIII leak shows a makeover that somehow feels uninspired

I just watched The Super Mario Galaxy Movie, here’s why it’s better than the first Mario movie

They’re on their way! NASA launches humans to moon for first time in 53 years

What's On

AI models are lying to save each other, and no one knows why

Is AI developing a sense of loyalty?

Should we be worried?

Keep Reading