Researchers discover ChatGPT can be tricked into generating sexualised, violent images

Contents

What the researchers found More insights What you should know

OpenAI’s ChatGPT can be manipulated into generating sexualised images and scenes of graphic violence through a slightly altered version of a widely shared prompt, the BBC reports, citing findings from British AI security firm Mindgard.

According to the BBC, Mindgard discovered that a prompt originally designed to produce innocent, humorous results could be adjusted to make ChatGPT’s GPT-5.4 model generate disturbing imagery, even without users specifying any particular subject matter.

After being contacted by the BBC, OpenAI said it had introduced additional safeguards to block this specific type of prompt, though researchers found that further small modifications could still bypass the new restrictions.

What the researchers found

According to the BBC report, Mindgard’s founder, Peter Garraghan, who is also a professor in the computing department at Lancaster University, said the model produced a range of gory and sexualised images on its own, despite the prompt containing no explicit instructions about content.

Garraghan said the disconnect between the harmless appearance of the prompt and the severity of what it produced was particularly troubling.

“This is a perfectly innocent-looking instruction to an AI, but the consequence is it generates very, very bad imagery and content,” he said.
He described the images as “very gruesome, sometimes sexualised, sometimes both together.”

Jim Nightingale, Mindgard’s AI safety and security researcher who uncovered the issue, said he was personally disturbed by what the chatbot generated.

Nightingale said the outputs reflect the underlying training data used to build the model. “I’m struck that while what I saw was generated, an artificial image, it has ties to real images, and the real world,” he wrote in his report.

According to the BBC, Mindgard noted that its earlier research had also shown ChatGPT could be manipulated into producing nude deepfakes of real people by substituting their faces into generated images. OpenAI said it had fixed that specific vulnerability, but researchers told the BBC they found an alternative method that still succeeded.

More insights

BBC reports that Mindgard first alerted OpenAI to the vulnerability in May, but said the company’s initial response was an automated reply, and that an attempted fix to block the prompt was easily bypassed. OpenAI took further action only after being contacted directly by the BBC.

Garraghan said he believed more harmful content could likely be generated if researchers continued probing the vulnerability, but Mindgard chose not to pursue this further given the nature of what had already surfaced.

According to the BBC, OpenAI has said it maintains multiple layers of image safety protections designed to prevent policy-violating content from reaching users.

“After investigating this trend, we’ve introduced additional safeguards against this type of prompt,” the company said in a statement.
“We also combine automated systems and human review to identify and block harmful material,” it added, noting it also has systems designed to block violating material that users attempt to upload.

OpenAI said its policies explicitly prohibit sexual violence, non-consensual intimate content, and attempts to circumvent its safety systems.

What you should know

Nairametrics earlier reported that the National Information Technology Development Agency (NITDA) had issued a cybersecurity alert in December 2025, warning Nigerians about newly identified vulnerabilities in ChatGPT that could leave users exposed to data leakage attacks.

The advisory was released through NITDA’s Computer Emergency Readiness and Response Team (CERRT.NG).

The warning came amid growing concerns over the interaction between AI-powered tools and potentially malicious web content, as well as the increasing use of ChatGPT across business, research, and public-sector environments.

Source link