Building Ai Cyber Defenders
Captured source
source ↗Building AI for cyber defenders \ Anthropic Policy Building AI for cyber defenders Oct 3, 2025
AI models are now useful for cybersecurity tasks in practice, not just theory. As research and experience demonstrated the utility of frontier AI as a tool for cyber attackers, we invested in improving Claude’s ability to help defenders detect, analyze, and remediate vulnerabilities in code and deployed systems. This work allowed Claude Sonnet 4.5 to match or eclipse Opus 4.1, our frontier model released only two months prior, in discovering code vulnerabilities and other cyber skills. Adopting and experimenting with AI will be key for defenders to keep pace. [@portabletext/react] Unknown block type "horizontalRule", specify a component for it in the components.types prop We believe we are now at an inflection point for AI’s impact on cybersecurity. For several years, our team has carefully tracked the cybersecurity-relevant capabilities of AI models. Initially, we found models to be not particularly powerful for advanced and meaningful capabilities. However, over the past year or so, we’ve noticed a shift. For example: We showed that models could reproduce one of the costliest cyberattacks in history—the 2017 Equifax breach—in simulation. We entered Claude into cybersecurity competitions, and it outperformed human teams in some cases. Claude has helped us discover vulnerabilities in our own code and fix them before release.
In this summer’s DARPA AI Cyber Challenge , teams used LLMs (including Claude) to build “cyber reasoning systems” that examined millions of lines of code for vulnerabilities to patch. In addition to inserted vulnerabilities, teams found (and sometimes patched) previously undiscovered, non-synthetic vulnerabilities . Beyond a competition setting, other frontier labs now apply models to discover and report novel vulnerabilities . At the same time, as part of our Safeguards work, we have found and disrupted threat actors on our own platform who leveraged AI to scale their operations. Our Safeguards team recently discovered (and disrupted) a case of “ vibe hacking ,” in which a cybercriminal used Claude to build a large-scale data extortion scheme that previously would have required an entire team of people. Safeguards has also detected and countered Claude's use in increasingly complex espionage operations , including the targeting of critical telecommunications infrastructure, by an actor that demonstrated characteristics consistent with Chinese APT operations. All of these lines of evidence lead us to think we are at an important inflection point in the cyber ecosystem, and progress from here could become quite fast or usage could grow quite quickly. Therefore, now is an important moment to accelerate defensive use of AI to secure code and infrastructure. We should not cede the cyber advantage derived from AI to attackers and criminals. While we will continue to invest in detecting and disrupting malicious attackers, we think the most scalable solution is to build AI systems that empower those safeguarding our digital environments—like security teams protecting businesses and governments, cybersecurity researchers, and maintainers of critical open-source software. In the run-up to the release of Claude Sonnet 4.5, we started to do just that. Claude Sonnet 4.5: emphasizing cyber skills As LLMs scale in size, “ emergent abilities ”—skills that were not evident in smaller models and were not necessarily an explicit target of model training—appear. Indeed, Claude’s abilities to execute cybersecurity tasks like finding and exploiting software vulnerabilities in Capture-the-Flag (CTF) challenges have been byproducts of developing generally useful AI assistants. But we don’t want to rely on general model progress alone to better equip defenders. Because of the urgency of this moment in the evolution of AI and cybersecurity, we dedicated researchers to making Claude better at key skills like code vulnerability discovery and patching. The results of this work are reflected in Claude Sonnet 4.5. It is comparable or superior to Claude Opus 4.1 in many aspects of cybersecurity while also being less expensive and faster. Evidence from evaluations In building Sonnet 4.5, we had a small research team focus on enhancing Claude’s ability to find vulnerabilities in codebases, patch them, and test for weaknesses in simulated deployed security infrastructure. We chose these because they reflect important tasks for defensive actors. We deliberately avoided enhancements that clearly favor offensive work—such as advanced exploitation or writing malware. We hope to enable models to find insecure code before deployment and to find and fix vulnerabilities in deployed code. There are, of course, many more critical security tasks we did not focus on; at the end of this post, we elaborate on future directions. To test the effects of our research, we ran industry-standard evaluations of our models. These enable clear comparisons across models, measure the speed of AI progress, and—especially in the case of novel, externally developed evaluations—provide a good metric to ensure that we are not simply teaching to our own tests. As we ran these evaluations, one thing that stood out was the importance of running them many times. Even if it is computationally expensive for a large set of evaluation tasks, it better captures the behavior of a motivated attacker or defender on any particular real-world problem. Doing so reveals impressive performance not only from Claude Sonnet 4.5, but also from models several generations older. Cybench One of the evaluations we have tracked for over a year is Cybench , a benchmark drawn from CTF competition challenges. 1 On this evaluation, we see striking improvement from Claude Sonnet 4.5, not just over Claude Sonnet 4, but even over Claude Opus 4 and 4.1 models. Perhaps most striking, Sonnet 4.5 achieves a higher probability of success given one attempt per task than Opus 4.1 when given ten attempts per task . The challenges that are part of this evaluation reflect somewhat complex, long-duration workflows. For example, one challenge involved analyzing network traffic, extracting malware from that traffic, and decompiling and decrypting the malware. We estimate that this would have taken a skilled human at least an hour, and possibly much longer; Claude took 38 minutes to solve it. When we give Claude Sonnet 4.5 10 attempts at the Cybench evaluation, it…
Excerpt shown — open the source for the full document.