WritingOpenAIOpenAIpublished Sep 12, 2025seen 6d

Working with US CAISI and UK AISI to build more secure AI systems

Open original ↗

Captured source

source ↗

Working with US CAISI and UK AISI to build more secure AI systems | OpenAI

September 12, 2025

Working with US CAISI and UK AISI to build more secure AI systems

An update on our collaboration with US and UK research and standards bodies for the secure deployment of AI.

Loading…

Share

We’re proud to continue to push the frontiers of AI capabilities and security. Developing and deploying AI that is secure and useful is core to our mission of ensuring that AGI benefits all of humanity. Key to this is our work with US and UK research and standards bodies who are also dedicated to ensuring the secure deployment of AI for their people.

We were among the first companies to enter into voluntary agreements with both the US Center for AI Standards and Innovation (CAISI) and the UK AI Security Institute (UK AISI). These partnerships reflect our belief that frontier AI development must happen in close collaboration with allied governments that bring deep expertise in machine learning, national security, and metrology.

Today, we’re sharing examples of how these voluntary collaborations have built on our existing security approaches to produce real-world security improvements in practice: joint red-teaming of safeguards against biological misuse, end-to-end testing of products for security issues, and rapid feedback loops to resolve related vulnerabilities. The result is stronger safeguards for widely used AI products, raising the bar for the wider industry, increasing adoption of AI, and showing how governments and industry can work together to evaluate and improve the security of AI systems.

Collaborating to Secure the Deployment of AI Agents

For more than a year, OpenAI has partnered with CAISI to evaluate⁠ OpenAI models’ capabilities in cyber, chemical-biological, and other national security-relevant domains. We recently expanded our partnership to include emerging product security challenges and partnered with CAISI to red-team the security of OpenAI’s agentic AI systems. In a new kind of collaboration that took place in July, OpenAI worked with CAISI to explore how we can partner with external evaluators to find and fix security vulnerabilities in agentic systems, such as OpenAI’s ChatGPT Agent product.

This collaboration with CAISI was a preliminary step into a new domain of red-teaming agentic systems. We aim to continue collaboration in this domain, and our work with CAISI builds upon other layers of deployment security efforts including our own internal testing.

An expert team at CAISI, combining expertise in cybersecurity and AI agent security, worked to investigate and identify new vulnerabilities in these systems. CAISI received early access to ChatGPT Agent, which helped the team to build an early understanding of the system architecture, and the team later red-teamed the released system.

In ongoing probing, CAISI identified two novel security vulnerabilities in ChatGPT Agent that, under certain circumstances, could have allowed a sophisticated attacker to bypass our security protections, and to remotely control the computer systems the agent could access for that session and successfully impersonate the user for other websites they’d logged into.

Due to security measures in the design of OpenAI’s product, CAISI initially thought the vulnerabilities they discovered were unexploitable and thus useless to attackers. But upon further analysis, CAISI found a way to bypass the security protections of OpenAI’s systems by combining these traditional cyber vulnerabilities with an AI agent hijacking attack⁠. The proof-of-concept attack developed by CAISI successfully bypassed various AI-based security protections, which led to a full exploit chain with a success rate of approximately 50%. The CAISI team’s multidisciplinary approach enabled them to develop a sophisticated exploit chain combining traditional software vulnerabilities with AI vulnerabilities. And, in an example of how AI systems can be a valuable tool for security testing, CAISI made use of ChatGPT Agent itself to aid in the process of discovering these vulnerabilities.

These attacks were immediately reported to OpenAI, and fixed by OpenAI within one business day.

This voluntary collaboration between OpenAI and CAISI builds on our yearlong research and evaluation collaboration. Finding these vulnerabilities required innovation from CAISI in chaining together multiple exploits and combining attacks to develop novel ways of compromising AI systems, drawing on methods from both cybersecurity and machine learning. The intersection of AI agent security and traditional cybersecurity will necessitate building a range of new best practices, and CAISI’s partnership to improve this aspect of the science of evaluations and the security of AI systems is already directly benefitting the end users of these systems.

Our agentic safeguards also incorporate lessons we learned from previous large-scale investments we made to safeguard our systems against biological misuse, which included a series of partnerships to red-team these safeguards with third parties including UK AISI.

Collaborating on Biosecurity

As part of our ongoing collaboration with UK AISI, in May UK AISI began red-teaming our safeguards against biological misuse (as defined by OpenAI’s policies), including the safeguards in both ChatGPT Agent and GPT‑5. Rather than tying this to an individual launch, this is an ongoing collaboration to continuously improve the effectiveness of our safeguards stack.

As part of this collaboration, UK AISI was granted in-depth access to our systems, supported by bespoke work from OpenAI to allow deeper customization and security. This included:

  • Non-public prototypes of our safeguard systems
  • “Helpful-only” model variants with certain guardrails removed
  • OpenAI’s internal policy guidance around biological misuse
  • Access to the chain of thought of OpenAI’s internal safety monitor models to more efficiently identify vulnerabilities
  • Selective disabling of certain mitigations and enforcement during testing to probe subcomponents of the system

A multidisciplinary team at UK AISI, combining expertise in both AI red teaming techniques and the biosecurity domain, then sought to find universal jailbreaks against OpenAI’s biosecurity safeguards. UK AISI team brought deep technical testing expertise, such as leveraging system design insights to create attacks, which laid a strong foundation for the collaboration’s…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Policy collaboration announcement, moderate interest

OpenAI has a writing signal matching infrastructure, safety and policy.