PC

AI is 10 to 20 times more likely to help you build a bomb if you hide your request in cyberpunk fiction, new research paper says


In November 2025, a team of DexAI Icaro Lab, Sapienza University of Rome, and Sant’Anna School of Advanced Studies researchers published a study in which they were able to circumvent the safety guardrails of major LLMs by rephrasing harmful prompts as “adversarial” poems. This week, those same researchers have published a new paper presenting their Adversarial Humanities Benchmark, a broader assessment of AI security that they say reveals “a critical gap” in current LLM safety standards through similar weaponized wordplay.

Expanding on the team’s work with adversarial poetry, the Adversarial Humanities Benchmark (AHB) evaluates LLM safety guidelines by rephrasing harmful prompts in alternate writing styles. By presenting prompts as cyberpunk short fiction, theological disputation, or mythopoetic metaphor for the LLM to analyze, the AHB assesses whether major AI models can be manipulated into complying with dangerous requests they’d normally refuse—requests that, for example, might seek the AI’s aid in obtaining private information, building a bomb, or preying on a child. As the paper shows, the method is alarmingly effective.

(Image credit: Getty Images)

After being rewritten through the AHB’s “humanities-style transformations,” dangerous requests that LLMs would previously comply with less than 4% of the time instead achieved success rates ranging from 36.8% to 65%—a 10 to 20 times increase, depending on the method used and the model tested. Across 31 frontier AI models from providers like Anthropic, Google, and OpenAI, the AHB’s rewritten attack prompts yielded an overall attack success rate of 55.75%, indicating that current LLM safety standards could be overlooking a fundamental vulnerability.

Article continues below

Originally posted by www.pcgamer.com

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button

Adblock Detected

We only use unintrusive ads on our website from well known brands. Please support our website by enabling ads. Thank you.