A study reveals that messages written as poetry can bypass the security features of artificial intelligence (AI) models like ChatGPT and obtain instructions to create malicious programs or chemical and nuclear weapons.

Some generative AI manufacturers such as OpenAI, Google, Meta and Microsoft claim that their models incorporate security features that prevent the generation of harmful content.

OpenAI, for example, says it uses algorithms and human reviewers to filter out hate speech, explicit content, and other content that violates its usage policies.

But new evidence shows that poetry prompts can bypass these controls even in the most advanced AI models.

Researchers, including those at the Sapienza University of Rome, discovered that this method, called “adversarial poetry,” was a mechanism of jailbreaking for all major AI model families, including those from OpenAI, Google, Meta, and even China’s DeepSeek.

The findings, detailed in a study not yet peer-reviewed and published on arXiv, according to the researchers, “demonstrate that stylistic variation alone can circumvent contemporary security mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.”

For their tests, the researchers used short poems or metaphorical verses as inputs to generate harmful content.

They found that, compared to other types of information with the same underlying intention, poetic versions elicited much higher rates of insecure responses.

In almost 90 percent of cases, specific poetic prompts triggered unsafe behaviors.

According to the researchers, this method was more successful in obtaining information about launching cyberattacks, extracting data, cracking passwords and creating malware.

They could obtain information from various AI models to build nuclear weapons with a success rate of between 40 and 55 percent.

“The study provides systematic evidence that poetic reformulation degrades rejection behavior in all model families evaluated,” the researchers say.

“When harmful messages are expressed in verse rather than prose, attack success rates increase significantly,” they write, adding that “these results highlight a significant gap in current compliance assessment and assessment practices.”

Easy to play

The study does not show the exact poetry used to circumvent security barriers, as the method is easy to reproduce, he told The Guardian the investigator Piercosma Bisconti.

One of the main reasons why messages written in verse produce harmful content appears to be that all AI models work by anticipating the most likely next word in a sequence. Since the structure of a poem is not very obvious, it is much more difficult for AI to predict and detect such a harmful cue.

Researchers called for better security assessment methods to prevent AI from producing harmful content.

“Future work should examine what properties of poetic structure drive the mismatch,” they wrote.

OpenAI, Google, DeepSeek and Meta did not immediately respond to requests for comment. The Independent.

By Editor

One thought on “Poetry can trick AI into creating malware”
  1. https://diariodeavisos.elespanol.com/canariasenred/critica-de-la-pearl-of-the-caribbean-deluxe-como-encontrar-su-juego/
    https://www.lavozdemedinadigital.com/wordpress/2024/08/casinos-en-linea-frente-a-casinos-fisicos-que-es-mejor-para-usted/
    https://www.futboleras.es/noticia/el-futbol-femenino-en-la-industria-del-juego-desarrollo-del-deporte-femenino-ref517.html
    https://www.adiantegalicia.es/actualidade/2024/05/23/seguridad-en-los-casino-en-linea-como-protegerse-mientras-jugamos.html
    https://www.cadizdirecto.com/noticias-interesantes/la-verdad-sobre-los-bonos-sin-deposito-lo-que-ocultan-los-casinos-online/
    https://www.lanzarotedeportiva.com/index.php/121-apuestas/31397-como-contribuyen-los-juegos-de-azar-al-desarrollo-del-futbol
    https://www.laverdaddeceuta.com/home/noticias/sociedad/apuestas/la-industria-del-entretenimiento-lo-que-eligen-los-espanoles
    https://www.cantabria24horas.com/noticias/casino-fiable-cmo-distinguir-una-plataforma-con-licencia-de-los-defraudadores/94861
    https://www.ceutaactualidad.com/articulo/comunicados/panorama-ultimas-tendencias-mundo-casinos-espanoles/20240422105221197114.html
    https://eldiariocantabria.publico.es/articulo/comunicados/sistemas-pago-populares-casinos-linea-como-elegir-mejor/20240726125934160602.html
    https://www.pinterest.com/pin/1034279870684149233/
    https://www.pixiv.net/en/users/120719367
    https://www.twitch.tv/qobert11/about
    https://seositecheckup.com/seo-audit/112.ua/ru/vrema-apgrejda-zacem-menat-staryj-devajs-na-novyj-pod-vaporesso-98396
    https://heylink.me/drun408/
    https://www.goodreads.com/user/show/194685137-qobert
    https://www.chess.com/member/qobert23
    https://linktr.ee/drun408
    https://myanimelist.net/profile/qobert24
    https://www.deviantart.com/qobert
    https://qiita.com/drun408
    https://www.virustotal.com/gui/url/3c2ac2ae1124fff375e83b4e976a4097b5437746c5f928f5a7ad40030a7e2eb0?nocache=1
    https://pikabu.ru/@drun408drun408
    https://vocal.media/authors/drun
    https://fliphtml5.com/homepage/drun408/%D0%94%D1%80%D1%83%D0%BD/

Leave a Reply