Poetry can trick AI into creating malware

ByEditor

Dec 9, 2025

A study reveals that messages written as poetry can bypass the security features of artificial intelligence (AI) models like ChatGPT and obtain instructions to create malicious programs or chemical and nuclear weapons.

Some generative AI manufacturers such as OpenAI, Google, Meta and Microsoft claim that their models incorporate security features that prevent the generation of harmful content.

OpenAI, for example, says it uses algorithms and human reviewers to filter out hate speech, explicit content, and other content that violates its usage policies.

But new evidence shows that poetry prompts can bypass these controls even in the most advanced AI models.

Researchers, including those at the Sapienza University of Rome, discovered that this method, called “adversarial poetry,” was a mechanism of jailbreaking for all major AI model families, including those from OpenAI, Google, Meta, and even China’s DeepSeek.

The findings, detailed in a study not yet peer-reviewed and published on arXiv, according to the researchers, “demonstrate that stylistic variation alone can circumvent contemporary security mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.”

For their tests, the researchers used short poems or metaphorical verses as inputs to generate harmful content.

They found that, compared to other types of information with the same underlying intention, poetic versions elicited much higher rates of insecure responses.

In almost 90 percent of cases, specific poetic prompts triggered unsafe behaviors.

According to the researchers, this method was more successful in obtaining information about launching cyberattacks, extracting data, cracking passwords and creating malware.

They could obtain information from various AI models to build nuclear weapons with a success rate of between 40 and 55 percent.

“The study provides systematic evidence that poetic reformulation degrades rejection behavior in all model families evaluated,” the researchers say.

“When harmful messages are expressed in verse rather than prose, attack success rates increase significantly,” they write, adding that “these results highlight a significant gap in current compliance assessment and assessment practices.”

Easy to play

The study does not show the exact poetry used to circumvent security barriers, as the method is easy to reproduce, he told The Guardian the investigator Piercosma Bisconti.

One of the main reasons why messages written in verse produce harmful content appears to be that all AI models work by anticipating the most likely next word in a sequence. Since the structure of a poem is not very obvious, it is much more difficult for AI to predict and detect such a harmful cue.

Researchers called for better security assessment methods to prevent AI from producing harmful content.

“Future work should examine what properties of poetic structure drive the mismatch,” they wrote.

OpenAI, Google, DeepSeek and Meta did not immediately respond to requests for comment. The Independent.

Poetry can trick AI into creating malware

ByEditor

By Editor

Related Post

Jeff Bezos’ discreet passage through Peru: leisure, business or adventure?

Bolt and Stellantis launch the first European fleet of robotaxis, testing starts next year

28 new countries, K-Pop and Spanish dubbing

One thought on “Poetry can trick AI into creating malware”

Leave a Reply Cancel reply

You missed

A complaint was made about the relationship between Donald Trump and Gianni Infantino

Jeff Bezos’ discreet passage through Peru: leisure, business or adventure?

Mexico, unique destination in the world to invest: Banamex

American farmers’ disappointment with Mr. Trump

The Observatorial