What does this writing signal mean?

Anthropic published Smart Contracts. This talking signal gives public context for research themes, product direction, policy, or launch framing. High-signal details: Anthropic post on smart contracts, no major traction indicated. · AI agents find smart contract exploits \ Anthropic Frontier Red Team AI agents find $4.6M in blockchain smart contract exploits Dec 1, 2025 Winnie Xiao*, Cole Killian*.... onlylabs links this event to 1 captured evidence page and 6 related writing signals.

Anthropic Writing: Smart Contracts

Captured source

source ↗

anthropic.com/anthropic.com/research/smart-contracts

Smart Contracts

Source ↗

published Dec 1, 2025seen 1wcaptured 1whttp 200method plain

AI agents find smart contract exploits \ Anthropic Frontier Red Team AI agents find $4.6M in blockchain smart contract exploits Dec 1, 2025

Winnie Xiao*, Cole Killian* Henry Sleight, Alan Chan Nicholas Carlini, Alwin Peng *MATS and the Anthropic Fellows program AI models are increasingly good at cyber tasks, as we've written about before . But what is the economic impact of these capabilities? In a recent MATS and Anthropic Fellows project, our scholars investigated this question by evaluating AI agents' ability to exploit smart contracts on Smart CONtracts Exploitation benchmark (SCONE-bench) —a new benchmark they built comprising 405 contracts that were actually exploited between 2020 and 2025. On contracts exploited after the latest knowledge cutoffs (June 2025 for Opus 4.5 and March 2025 for other models), Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 developed exploits collectively worth $4.6 million, establishing a concrete lower bound for the economic harm these capabilities could enable. Going beyond retrospective analysis, we evaluated both Sonnet 4.5 and GPT-5 in simulation against 2,849 recently deployed contracts without any known vulnerabilities. Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694, with GPT-5 doing so at an API cost of $3,476. This demonstrates as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible, a finding that underscores the need for proactive adoption of AI for defense. Important: To avoid potential real-world harm, our work only ever tested exploits in blockchain simulators. We never tested exploits on live blockchains and our work had no impact on real-world assets. Figure 1. Total revenue from successfully exploiting smart contract vulnerabilities that were exploited after model knowledge cutoff dates across frontier AI models over the last year in log scale, as tested in simulation. For Opus 4.5, only contracts exploited after June 1, 2025 were evaluated; for all other models, contracts exploited after March 1, 2025 were evaluated. Over the last year, exploit revenue from stolen simulated funds roughly doubled every 1.3 months. The shaded region represents 90% CI calculated by bootstrap over the set of model-revenue pairs. For each contract in the benchmark that was successfully exploited by the agent, we estimated the exploit’s dollar value by converting the agent’s revenue in the native token (ETH or BNB) using the historical exchange rate from the day the real exploit occurred, as reported by the CoinGecko API. Introduction AI cyber capabilities are accelerating rapidly: they are now capable of tasks from orchestrating complex network intrusions to augmenting state-level espionage . Benchmarks, like CyberGym and Cybench , are valuable for tracking and preparing for future improvements in such capabilities. However, existing cyber benchmarks miss a critical dimension: they do not quantify the exact financial consequences of AI cyber capabilities. Compared to arbitrary success rates, quantifying capabilities in monetary terms is more useful for assessing and communicating risks to policymakers, engineers, and the public. Yet estimating the real value of software vulnerabilities requires speculative modelling of downstream impacts, user base, and remediation costs. [1] Here, we take an alternate approach and turn to a domain where software vulnerabilities can be priced directly: smart contracts. Smart contracts are programs deployed on blockchains like Ethereum. They power financial blockchain applications which offer services similar to those of PayPal, but all of their source code and transaction logic—such as for transfers, trades, and loans—are public on the blockchain and handled entirely by software without a human in the loop. As a result, vulnerabilities can allow for direct theft from contracts, and we can measure the dollar value of exploits by running them in simulated environments. These properties make smart contracts an ideal testing ground for AI agents’ exploitation capabilities. To give a concrete example of what such an exploit could look like: Balancer is a blockchain application that allows users to trade cryptocurrencies. In November 2025, an attacker exploited a rounding direction issue to withdraw other users’ funds, stealing over $120 million . Since smart contract and traditional software exploits draw on a similar set of core skills (e.g. control-flow reasoning, boundary analysis, and programming fluency), assessing AI agents on smart contract exploitations gives a concrete lower bound on the economic impact of their broader cyber capabilities. We introduce SCONE-bench—the first benchmark that evaluates agents’ ability to exploit smart contracts, measured by the total dollar value [2] of simulated stolen funds. For each target contract(s), the agent is prompted to identify a vulnerability and produce an exploit script that takes advantage of the vulnerability so that, when executed, the executor’s native token balance increases by a minimum threshold. Instead of relying on bug bounty or speculative models, SCONE-bench uses on-chain assets to directly quantify losses. SCONE-bench provides: A benchmark comprising 405 smart contracts with real-world vulnerabilities exploited between 2020 and 2025 across 3 Ethereum-compatible blockchains (Ethereum, Binance Smart Chain, and Base), derived from the DefiHackLabs repository . A baseline agent running in each sandboxed environment that attempts to exploit the provided contract(s) within a time limit (60 minutes) using tools exposed via the Model Context Protocol (MCP). An evaluation framework that uses Docker containers for sandboxed and scalable execution, with each container running a local blockchain forked at the specified block number to ensure reproducible results. Plug-and-play support for using the agent to audit smart contracts for vulnerabilities prior to deployment on live blockchains. We believe this feature can help smart contract developers stress-test their contracts for defensive purposes.

We present three main evaluation results. First, we evaluated 10 models [3] across all 405 benchmark problems. Collectively, these models produced turnkey exploits for 207 (51.11%) of these problems, yielding $550.1 million in simulated stolen funds. [4] econd, to control for potential data contamination, we evaluated the same 10 models on vulnerabilities that were...

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Anthropic post on smart contracts, no major traction indicated.