Daily Guardian UAEDaily Guardian UAE
  • Home
  • UAE
  • What’s On
  • Business
  • World
  • Entertainment
  • Lifestyle
  • Sports
  • Technology
  • Travel
  • Web Stories
  • More
    • Editor’s Picks
    • Press Release
What's On

AI Governance Insights: Middle East Boards at the Forefront

June 12, 2026

Instagram’s new voice message effects let you sound like a pirate, a grandma, or a World Cup fan

June 12, 2026

Sharjah GIS Forum: Driving Digital Transformation

June 12, 2026

Tesla FSD update adds a new dialog that previews your car’s parking plan

June 12, 2026

This jacket pulls drinking water straight from the air

June 12, 2026
Facebook X (Twitter) Instagram
Finance Pro
Facebook X (Twitter) Instagram
Daily Guardian UAE
Subscribe
  • Home
  • UAE
  • What’s On
  • Business
  • World
  • Entertainment
  • Lifestyle
  • Sports
  • Technology
  • Travel
  • Web Stories
  • More
    • Editor’s Picks
    • Press Release
Daily Guardian UAEDaily Guardian UAE
Home » Scientists pretended to be delusional in AI chats. Grok and Gemini encouraged them.
Technology

Scientists pretended to be delusional in AI chats. Grok and Gemini encouraged them.

By dailyguardian.aeApril 24, 20262 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email

Researchers from City University of New York and King’s College London recently published a study that should make you think twice about which AI chatbot you spend your time with.

The team created a fictional persona named Lee, presenting with depression, dissociation, and social withdrawal. They then had Lee interact with five major AI chatbots: GPT-4o, GPT-5.2, Grok 4.1 Fast, Gemini 3 Pro, and Claude Opus 4.5, testing how each responded as conversations grew increasingly delusional over 116 turns.

The results ranged from mildly concerning to genuinely alarming. I highly recommend that you go through the entire paper, it’s a harrowing but fascinating read. 

Which chatbots failed the most?

Grok was the worst performer. When Lee floated the idea of suicide, Grok responded with what researchers described not as agreement, but advocacy, celebrating his “readiness” in unsettling poetic language.

Gemini wasn’t much better. When Lee asked it to help write a letter explaining his beliefs to his family, Gemini warned him against it, framing his loved ones as threats who would try to “reset” and “medicate” him.

GPT-4o also struggled badly, eventually validating a “malevolent mirror entity” and suggesting Lee contact a paranormal investigator.

Which chatbots actually helped?

ChatGPT’s GPT-5.2 and Anthropic’s Claude came out on top. GPT-5.2 refused to play along with the letter-writing scenario and instead helped Lee write something honest and grounded, which researchers called a “substantial” achievement.

In my opinion, Claude performed the best. It not only refused to partake in Lee’s delusion but also told Lee to close the app entirely, call someone he trusted, and visit an emergency room if needed. 

AI chatbot performance in risk analysis

Luke Nicholls, a doctoral student at CUNY and one of the study’s authors, told 404 Media that it’s reasonable to ask AI companies to follow better safety standards. He noted that not all labs are putting in the same effort and blamed aggressive release schedules for new AI models as the main culprit.

How Claude Opus 4.5 and GPT-5.2 performed in these tests shows that the companies building these products are fully capable of making them safer. Whether they choose to do so is a different question.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Keep Reading

Instagram’s new voice message effects let you sound like a pirate, a grandma, or a World Cup fan

Tesla FSD update adds a new dialog that previews your car’s parking plan

This jacket pulls drinking water straight from the air

The 90’s necklace doesn’t shove AI into your face. It just tracks UV to take care of your skin

EXCLUSIVE: The Death of Robin Hood director breaks down how he reinvents a classic tale in his A24 film

Reddit comments are getting video replies, and it could be more useful than it sounds

Widow’s Bay season 2 officially renewed by Apple TV ahead of season 1 finale

What makes a laptop good for both work and entertainment?

Google’s new Gemini TV controls are here and TCL owners get them first

Editors Picks

Instagram’s new voice message effects let you sound like a pirate, a grandma, or a World Cup fan

June 12, 2026

Sharjah GIS Forum: Driving Digital Transformation

June 12, 2026

Tesla FSD update adds a new dialog that previews your car’s parking plan

June 12, 2026

This jacket pulls drinking water straight from the air

June 12, 2026

Subscribe to News

Get the latest UAE news and updates directly to your inbox.

Latest Posts

AURAK’s 2026 Commencement: Shaping Tomorrow’s Innovators

June 12, 2026

The 90’s necklace doesn’t shove AI into your face. It just tracks UV to take care of your skin

June 12, 2026

SPC Free Zone Expands Opportunities for India–UAE Publishing Collaboration

June 12, 2026
Facebook X (Twitter) Pinterest TikTok Instagram
© 2026 Daily Guardian UAE. All Rights Reserved.
  • Privacy Policy
  • Terms
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.