A cybersecurity researcher has managed to breach in less than 48 hours the security filters of the new model focused on cybersecurity Claude Fable 5 which, through multi-agent decomposition techniques and decomposition and recomposition in the ‘backend’, has been able to extract hacking data and prohibited chemical processes.
Anthropic launched Claude Fable 5 this Wednesday as the first Mythos-class model for the general public, establishing itself as one of the more powerful models in terms of cybersecurity capabilities. Precisely these advanced skills do dangerous in the wrong hands and, to avoid possible malicious uses, the company has made it available with some security measures.
This includes limiting results to questions related to cybersecurity, biology or chemistry, redirecting these queries to a less capable AI model and thereby avoiding sharing relevant data that could be used to execute a cyber attack or develop a biological weapon.
However, just 48 hours after its launch, a cybersecurity researcher has already managed to break these safeguardsviolating the behavior of the model to obtain information about hacking methods and chemical processes to manufacture explosives, among other issues supposedly prohibited for Fable 5.
The researcher, who calls himself ‘Pliny the Liberator’, has shared all the details about the strategy on his X account (formerly Twitter) coordinated attack that has been used to ‘hack’ the model. Specifically, they have run many attempts at multi-agent “pack hunting,” mapping boundaries and testing long-context conversations, until they find “the holes in the fence.”
Thus, among the techniques used by the researcher are multi-agent decomposition (dividing a problem to give a task to each one) to Unicode tricks, including narrative framing (camouflage a prohibited request under a hypothetical scenario). All of them with the objective of preventing Anthropic’s security filters from automatically activate the passage of Claude Fable 5 to the previous flagship model Claude Opus 4.8
As a result, the researcher has shared some screenshots of the information extracted by bypassing Claude Fable 5’s safeguards, in which you can read from C code violation to Linux hacking stepsor the chemical formula (Birch reduction) for the synthesis of methamphetamine.
The researcher, who has collaborated with companies such as OpenAI, among others, on cybersecurity issues, as TIME has reported, explains that it is very difficult to receive answers from Claude Fable to a query such as the recipe for methamphetamine. However, of all the techniques used, the Pliny admits that there was one that was the most lethal: the decomposition plus the recomposition in the ‘backend’, which allowed him to access these responses.
This technique is based on changing the vocabulary to request the loose parts of that recipesuch as reductive amination or the Birch reduction method, which are essential for the synthesis of methamphetamine. The Mythos class AI model ‘understands’ that these are academic and theoretical questions that can be part of simple university homework.
After getting Claude Fable 5 to share those laboratory techniques as loose parts of the final recipe, Pliny says he managed to put them back together with the help of a jailbroken version of Claude Opus 4.8which does not have any active ethical or security filter.
The researcher also has made the model’s 120,000-character ‘system prompt’ available to anyone on GitHub. This means that the hidden rule book, which explains what you are prohibited from doing and how you are to react, is available to everyone.
At the moment, Anthropic has not responded to the claims about the ‘jailbreak’ or the prompt system leaked on GitHub.
https://www.toyota-4runner.org/groups/atlanta-4runners-d2919-dedicated-developers-vs-freelancers.html
https://community.ops.io/luxee1_7a00fc02dbb04bbd28/real-experiences-with-dedicated-development-teams-for-scaling-products-5f98
https://www.wagerusa.com/group/wager-company-group/discussion/7b195ac9-df8e-436a-9f00-101179dace94
https://www.freedomvalleycampgroundwis.com/group/freedom-valley-campg-group/discussion/16876e5b-0190-42f7-b10c-aa691220124f
https://www.pdxqcenter.org/group/2slgbtqia-community-space/discussion/9da85f1c-c985-40e4-b074-7901bec9e886
https://www.lakemasterpros.com/group/lake-master-pros-group/discussion/6dc5050a-ba74-47fe-ab26-1901a11d7823
https://www.insidefashiondesign.com/group/the-science-art-of-sports-bra-design/discussion/d1cdaace-dde4-417c-973e-60dedbfd8931
https://www.hulladek.in/group/mr-e-dropbox/discussion/739ac97c-bd2e-49c8-a88f-143b0d19b225
https://www.hardemanhealth.org/group/hardeman-county-comm-group/discussion/9602890b-17b6-434c-919b-361eb51021f0
https://www.theentrepreneursofindia.in/group/entrepreneurs-india-group/discussion/d384845f-f16c-42a7-a524-42be1e43bbf5
https://www.frenchiedoodle.com/group/floodle-frenchie/discussion/b46d44d2-7846-4e68-8408-f9405f1bda99
https://www.chrissailerkicking.com/group/young-ninja-group-ages-3-5/discussion/060f9e41-13ab-4697-adfb-41631f6ac95b
https://www.do3d.com/group/general-discussion/discussion/b21fa0f2-9244-4dee-b1eb-fa3f0de5f9cf
https://www.thedelancey.com/group/the-delancey-group/discussion/b8f0690e-6890-4707-a63b-c26bdb20640e
https://www.yeelight.sg/group/host-systems-pte-lt-group/discussion/26bec9a0-762b-45b0-9ec4-eb05cbd08174
https://rbrserien.se/rbrforum/viewtopic.php?t=787492
https://www.arendaltennis.no/forums/topic/undrar-vad-folk-har-for-erfarenheter-av-nya-fyrhjulingar-nu-for-tiden/
https://test.geotec.se/forums/topic/nagon-som-har-tips-pa-en-bra-crossmoppe/
https://nogg.se/forum3.asp?idForum=160435&idHomepage=10354
https://bklejonet.se/community/main-forum/nagon-som-har-erfarenhet-av-7-vaxlade-elcyklar-med-lagt-insteg/
https://strik.cph-eu.dk/index.php/da/forum/div/11704
https://www.backpackerplanet.dk/forums/topic/vad-ar-viktigast-att-tanka-paa-med-atv-vagnar/
https://ejendomsinvestoren.dk/forum/diskussion/naagon-som-anvander-vagnar-till-sin-atv/
https://xn--bjrnus-cya.dk/forum/topic/naagon-som-har-erfarenhet-av-bioklippare-for-grasmattan/
https://min-mave.dk/topic/vad-ar-era-favoriter-bland-miljovanliga-grasklippare