WritingAnthropicAnthropicpublished Oct 31, 2024seen 2d

The Case For Targeted Regulation

Open original ↗

Captured source

source ↗
published Oct 31, 2024seen 2dcaptured 12hhttp 200method plain

The case for targeted regulation \ Anthropic Policy The case for targeted regulation Oct 31, 2024

Increasingly powerful AI systems have the potential to accelerate scientific progress , unlock new medical treatments, and grow the economy. But along with the remarkable new capabilities of these AIs come significant risks. Governments should urgently take action on AI policy in the next eighteen months. The window for proactive risk prevention is closing fast. Judicious, narrowly-targeted regulation can allow us to get the best of both worlds: realizing the benefits of AI while mitigating the risks. Dragging our feet might lead to the worst of both worlds: poorly-designed, knee-jerk regulation that hampers progress while also failing to be effective at preventing risks. In this post, we suggest some principles for how governments can meaningfully reduce catastrophic risks while supporting innovation in AI’s thriving scientific and commercial sectors. Urgency In the last year, AI systems have grown dramatically better at math, graduate-level reasoning, and computer coding, along with many other capabilities. Inside AI companies, we see continued progress on as-yet undisclosed systems and results. These advances offer many positive applications. But progress in these same broad capabilities also brings with it the potential for destructive applications, either from the misuse of AI in domains such as cybersecurity or biology, or from the accidental or autonomous behavior of the AI system itself. In the realm of cyber capabilities, models have rapidly advanced on a broad range of coding tasks and cyber offense evaluations. On the SWE-bench software engineering task, models have improved from being able to solve 1.96% of a test set of real-world coding problems ( Claude 2, October 2023 ) to 13.5% ( Devin, March 2024 ) to 49% ( Claude 3.5 Sonnet, October 2024 ). Internally, our Frontier Red Team has found that current models can already assist on a broad range of cyber offense-related tasks, and we expect that the next generation of models—which will be able to plan over long, multi-step tasks—will be even more effective. On the potential for AI exacerbating CBRN (chemical, biological, radiological, and nuclear) misuses, the UK AI Safety Institute tested a range of models from industry actors (including Anthropic) and concluded that: ...models can be used to obtain expert-level knowledge about biology and chemistry. For several models, replies to science questions were on par with those given by PhD-level experts. AI systems have progressed dramatically in their understanding of the sciences in the last year. The widely used benchmark GPQA saw scores on its hardest section grow from 38.8% when it was released in November 2023, to 59.4% in June 2024 ( Claude 3.5 Sonnet ), to 77.3% in September ( OpenAI o1 ; human experts score 81.2%). Our Frontier Red Team has also found continued progress in CBRN capabilities. For now, the uplift of having access to a frontier model relative to existing software and internet tools is still relatively small, however it is growing rapidly. As models advance in capabilities, the potential for misuse is likely to continue on a similar scaling trend. About a year ago, we warned that frontier models might pose real risks in the cyber and CBRN domains within 2-3 years. Based on the progress described above, we believe we are now substantially closer to such risks. Surgical, careful regulation will soon be needed. A year of Anthropic’s Responsible Scaling Policy Grappling with the catastrophic risks of AI systems is rife with uncertainty. We see the initial glimmers of risks that could become serious in the near future, but we don’t know exactly when the real dangers will arrive. We want to make the critical preparations well in advance. At Anthropic, we try to deal with this challenge via our Responsible Scaling Policy (RSP): an adaptive framework for identifying, evaluating, and mitigating catastrophic risks. The first principle of the RSP is that it is proportionate : the strength of our safety and security measures increase in proportion with defined “capability thresholds” that the AI systems meet. The “if-then” structure requires safety and security measures to be applied, but only when models become capable enough to warrant them. The second key idea is that the RSP should be iterative : we regularly measure the capabilities of our models and rethink our security and safety approaches in light of how things have developed. Anthropic has had a formal RSP in place since September 2023 (and recently released an updated version ), and other frontier model labs have, to varying degrees, adopted similar plans. RSPs serve many useful purposes: They increase a developer’s investment in computer security and in safety evaluations . Both security and evaluations are typically built after encountering problems, but RSPs publicly commit a developer to develop and resource these areas ahead of time . At Anthropic, teams such as Security, Trust & Safety, Interpretability, and the Frontier Red Team have had to ramp up hiring to have a reasonable chance of achieving the safety and security prerequisites set out by our RSP. Properly implemented, RSPs drive organizational structure and priorities. They become a key part of product roadmaps, rather than just being a policy on paper; RSPs serve as a forcing function for a developer to flesh out risks and threat models . Such models tend to be rather abstract, but an RSP makes them interact directly with the day-to-day business of a company, forcing developers to make them as specific and well-grounded as possible, and to reassess them over time; Having an RSP encourages developers to be transparent about their computer security and misuse mitigation practices . Our RSP commits us to internally document our findings and recommendations regarding the safeguards we’ve implemented. We’ve also found that our RSP has naturally generated much of the substantive work we’ve needed to meet voluntary commitments such as the White House Voluntary AI Commitments and those made at the Bletchley Park AI Safety Summit.

Our Responsible Scaling Policy isn’t perfect, but as we’ve repeatedly deployed models with it, we’re getting better at making it run smoothly while also testing for risks. Despite the need for iteration and course-corrections, we are fundamentally convinced that RSPs are a workable policy with which AI companies can…

Excerpt shown — open the source for the full document.

Notability

Plea for regulation perceived as anticompetitive panic, not genuine safety concern.