WritingAnthropicAnthropicpublished Mar 25, 2024seen 2d

Third Party Testing

Open original ↗

Captured source

source ↗
published Mar 25, 2024seen 2dcaptured 8hhttp 200method plain

Third-party testing as a key ingredient of AI policy \ Anthropic Policy Third-party testing as a key ingredient of AI policy Mar 25, 2024

We believe that the AI sector needs effective third-party testing for frontier AI systems. Developing a testing regime and associated policy interventions based on the insights of industry, government, and academia is the best way to avoid societal harm—whether deliberate or accidental—from AI systems. Our deployment of large-scale, generative AI systems like Claude has shown us that work is needed to set up the policy environment to respond to the capabilities of today’s most powerful AI models, as well as those likely to be built in the future. In this post, we discuss what third-party testing looks like, why it’s needed, and describe some of the research we’ve done to arrive at this policy position. We also discuss how ideas around testing relate to other topics on AI policy, such as openly accessible models and issues of regulatory capture. Policy overview Today’s frontier AI systems demand a third-party oversight and testing regime to validate their safety. In particular, we need this oversight for understanding and analyzing model behavior relating to issues like election integrity , harmful discrimination , and the potential for national security misuse . We also expect more powerful systems in the future will demand deeper oversight - as discussed in our ‘Core views on AI safety’ post, we think there’s a chance that today’s approaches to AI development could yield systems of immense capability, and we expect that increasingly powerful systems will need more expansive testing procedures. A robust, third-party testing regime seems like a good way to complement sector-specific regulation as well as develop the muscle for policy approaches that are more general as well. Developing a third-party testing regime for the AI systems of today seems to give us one of the best tools to manage the challenges of AI today, while also providing infrastructure we can use for the systems of the future. We expect that ultimately some form of third-party testing will be a legal requirement for widely deploying AI models, but designing this regime and figuring out exactly what standards AI systems should be assessed against is something we’ll need to iterate on in the coming years - it’s not obvious what would be appropriate or effective today, and the way to learn that is to prototype such a regime and generate evidence about it. An effective third-party testing regime will: Give people and institutions more trust in AI systems Be precisely scoped, such that passing its tests is not so great a burden that small companies are disadvantaged by them Be applied only to a narrow set of the most computationally-intensive, large-scale systems; if implemented correctly, the vast majority of AI systems would not be within the scope of such a testing regime Provide a means for countries and groups of countries to coordinate with one another via developing shared standards and experimenting with Mutual Recognition agreements

Such a regime will have the following key ingredients [1]: Effective and broadly-trusted tests for measuring the behavior and potential misuses of a given AI system Trusted and legitimate third-parties who can administer these tests and audit company testing procedures

Why we need an effective testing regime This regime is necessary because frontier AI systems—specifically, large-scale generative models that consume substantial computational resources—don’t neatly fit into the use-case and sector-specific frameworks of today. These systems are designed to be 'everything machines' - Gemini, ChatGPT, and Claude can all be adapted to a vast number of downstream use-cases, and the behavior of the downstream systems always inherits some of the capabilities and weaknesses of the frontier system it relies on. These systems are extremely capable and useful, but they also present risks for serious misuse or AI-caused accidents. We want to help come up with a system that greatly reduces the chance of major misuses or accidents caused by AI technology, while still allowing for the wide deployment of its beneficial aspects. In addition to obviously wanting to prevent major accidents or misuse for its own sake, major incidents are likely to lead to extreme, knee-jerk regulatory actions, leading to a 'worst of both worlds' where regulation is both stifling and ineffective. We believe it is better for multiple reasons to proactively design effective and carefully thought through regulation. Systems also have the potential to display emergent, autonomous behaviors which could lead to serious accidents - for instance, systems might insert vulnerabilities into code that they are asked to produce or, when asked to carry out a complex task with many steps, carry some actions which contradict human intentions. Though these kinds of behaviors are inherently hard to measure, it’s worth developing tools to measure for them today as insurance against these manifesting in widely deployed systems. At Anthropic, we’ve implemented self-governance systems that we believe should meaningfully reduce the risk of misuse or accidents from the technologies we’ve developed. Our main approach is our Responsible Scaling Policy (RSP) , which commits us to testing our frontier systems, like Claude, for misuses and accident risks, and to deploy only models that pass our safety tests. Multiple other AI developers have subsequently adopted or are adopting frameworks that bear a significant resemblance to Anthropic's RSP. However, although Anthropic is investing in our RSP (and other organizations are doing the same), we believe that this type of testing is insufficient as it relies on self-governance decisions made by single, private sector actors. Ultimately, testing will need to be done in a way which is broadly trusted, and it will need to be applied to everyone developing frontier systems. This type of industry-wide testing approach isn’t unusual - most important sectors of the economy are regulated via product safety standards and testing regimes, including food, medicine, automobiles, and aerospace. What would a robust testing regime look like? A robust third-party testing regime can help identify and prevent the potential risks of AI systems. It will require: A shared understanding across industry, government, and academia of what an AI safety testing…

Excerpt shown — open the source for the full document.