WritingOpenAIOpenAIpublished Dec 10, 2025seen 6d

Strengthening cyber resilience as AI capabilities advance

Open original ↗

Captured source

source ↗

Strengthening cyber resilience as AI capabilities advance | OpenAI

December 10, 2025

Strengthening cyber resilience as AI capabilities advance

As our models grow more capable in cybersecurity, we’re investing in strengthening them, layering in safeguards, and partnering with global security experts.

Loading…

Share

Cyber capabilities in AI models are advancing rapidly, bringing meaningful benefits for cyberdefense as well as new dual-use risks that must be managed carefully. For example, capabilities assessed through capture-the-flag (CTF) challenges have improved from 27% on GPT‑5⁠ in August 2025 to 76% on GPT‑5.1‑Codex‑Max⁠ in November 2025.

We expect that upcoming AI models will continue on this trajectory; in preparation, we are planning and evaluating as though each new model could reach ‘High’ levels of cybersecurity capability, as measured by our Preparedness Framework⁠. By this, we mean models that can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects. This post explains how we think about safeguards for models that reach these levels of capability, and ensure they meaningfully help defenders while limiting misuse.

As these capabilities advance, OpenAI is investing in strengthening our models for defensive cybersecurity tasks and creating tools that enable defenders to more easily perform workflows such as auditing code and patching vulnerabilities. Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced.

Like other dual-use domains, defensive and offensive cyber workflows often rely on the same underlying knowledge and techniques. We are investing in safeguards to help ensure these powerful capabilities primarily benefit defensive uses and limit uplift for malicious purposes. Cybersecurity touches almost every field, which means we cannot rely on any single category of safeguards—such as restricting knowledge or using vetted access alone—but instead need a defense-in-depth approach that balances risk and empowers users. In practice, this means shaping how capabilities are accessed, guided, and applied so that advanced models strengthen security rather than lower barriers to misuse.

We see this work not as a one-time effort, but as a sustained, long-term investment in giving defenders an advantage and continually strengthening the security posture of the critical infrastructure across the broader ecosystem.

Mitigating malicious uses

Our models are designed and trained to operate safely, supported by proactive systems that detect and respond to cyber abuse. We continuously refine these protections as our capabilities and the threat landscape change. While no system can guarantee complete prevention of misuse in cybersecurity without severely impacting defensive uses, our strategy is to mitigate risk through a layered safety stack.

At the foundation of this, we take a defense-in-depth approach, relying on a combination of access controls, infrastructure hardening, egress controls, and monitoring. We complement these measures with detection and response systems, and dedicated threat intelligence and insider-risk programs, making it so emerging threats are identified and blocked quickly. These safeguards are designed to evolve with the threat landscape. We assume change, and we build so we can adjust quickly and appropriately.

Building on this foundation:

  • Training the model to refuse or safely respond to harmful requests while remaining helpful for educational and defensive use cases: We are training our frontier models to refuse or safely respond to requests that would enable clear cyber abuse, while remaining maximally helpful for legitimate defensive and educational use cases.
  • Detection systems: We refine and maintain system-wide monitoring across products that use frontier models to detect potentially malicious cyber activity. When activity appears unsafe, we may block output, route prompts to safer or less capable models, or escalate for enforcement. Our enforcement combines automated and human review, informed by factors like legal requirements, severity, and repeat behavior. We also work closely with developers and enterprise customers to align on safety standards and enable responsible use with clear escalation paths.
  • End-to-end red teaming: We are working with expert red teaming organizations to evaluate and improve our safety mitigations. Their job is to try to bypass all of our defenses by working end-to-end, just like a determined and well-resourced adversary might. This helps us identify gaps early and strengthen the full system.

Ecosystem initiatives to strengthen cyber resilience

OpenAI has invested early in applying AI to defensive cybersecurity use cases and our team closely coordinates with global experts to mature both our models and their application. We value the global community of cybersecurity practitioners toiling to make our digital world safer and are committed to delivering powerful tools that support defensive security. As we roll out new safeguards, we will continue to work with the cybersecurity community to understand where AI can meaningfully strengthen resilience, and where thoughtful safeguards are most important.

Alongside these collaborations, we are establishing a set of efforts designed to help defenders move faster, ground our safeguards in real-world needs, and accelerate responsible remediation at scale.

Trusted access programs for cyberdefense

We will soon introduce a trusted access program where we explore providing qualifying users and customers working on cyberdefense with tiered access to enhanced capabilities in our latest models for defensive use cases. We're still exploring the right boundary of which capabilities we can provide broad access to and which ones require tiered restrictions, which may influence the future design of this program. We aim for this trusted access program to be a building block towards a resilient ecosystem.

Expanding defensive capacity with Aardvark

Aardvark, our agentic security researcher that helps developers and security teams find and fix vulnerabilities at scale, is now in private beta. It scans codebases for vulnerabilities and proposes patches that maintainers can adopt quickly. It has already identified novel CVEs in…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Policy post, not a model launch.