A harmless-looking ChatGPT prompt opened the door to gruesome AI images

A harmless-looking ChatGPT prompt pushed the latest public version of ChatGPT into generating sexualized and violent images, AI security researchers told the BBC. The finding puts new pressure on OpenAI’s image safety systems, since the request wasn’t described as plainly graphic.

Mindgard, a British AI security startup, said it reached the results by altering a widely shared instruction that had been used for comedy. OpenAI added safeguards after the BBC contacted it, but the researchers said small wording changes still produced concerning images.

Image generators are becoming everyday software, not specialist tools tucked away for experts. When their guardrails fail, a casual experiment can turn into realistic depictions of harm before a user expects it.

How did it get through

Mindgard’s red-teamers said the chatbot generated images involving gore, restraint, nudity, sexual posing, and scenes the firm believed suggested sexual violence. The BBC withheld the wording used, which limits the risk of others copying the technique.

The most serious detail is that the researchers said the harmful outputs didn’t require a direct request for graphic subject matter. ChatGPT, they said, produced a range of disturbing scenes after being nudged by altered wording.

OpenAI said it reviewed the issue and added protections. Mindgard said those defenses didn’t fully close the gap.

Why are filters not enough

The case underlines a hard problem for AI image tools. OpenAI’s rules bar extreme gore, sexual violence, non-consensual intimate content, child sexual abuse material, and attempts to bypass safeguards, but researchers said the model could still be steered into prohibited territory.

A model doesn’t judge harm like a person does. It generates output, then layered systems try to catch what shouldn’t reach the screen.

Outside experts cited by the BBC described AI safety as a constant contest between model makers and jailbreakers. Better defenses can help, but fresh workarounds often follow.

What should happen next

OpenAI says it uses multiple protection layers, including automated systems and human review, and that it continues to monitor for failures. The pressure now sits on proving that fixes hold after researchers disclose a weakness.

For now, the practical takeaway is blunt enough. Any AI image tool that can generate realistic harm needs constant red-teaming, faster disclosure handling, and clearer evidence that patched failures stay patched.

What's On

AI-Powered Transport Management: Yango Group’s New Solution

MIT experts just made a special memory. When humans forget, robots will just fetch the lost item

خشونة الركبة: الأسباب وطرق العلاج والوقاية

AI vision is getting too hungry, and this method puts it on a diet

TCS and Anthropic launch Global Premier Partnership to drive Enterprise AI scaling

A harmless-looking ChatGPT prompt opened the door to gruesome AI images

MIT experts just made a special memory. When humans forget, robots will just fetch the lost item

AI vision is getting too hungry, and this method puts it on a diet

Google Photos’ AI image editor expands to more regions, but only for Android users

Google is giving Pixel Screenshots a cloud AI boost while keeping your data private

A US state wants to ban smart glasses while driving, and it could open Pandora’s box

You’ll soon be able to send view-once texts on WhatsApp

Motorola Razr Fold review: The foldable surprise I didn’t see coming

Google’s updated migration tool takes the pain out of iPhone to Android switching

ChatGPT’s new Scheduled page puts all your automated tasks in one place

MIT experts just made a special memory. When humans forget, robots will just fetch the lost item

خشونة الركبة: الأسباب وطرق العلاج والوقاية

AI vision is getting too hungry, and this method puts it on a diet

TCS and Anthropic launch Global Premier Partnership to drive Enterprise AI scaling