cloudflare/agents agents@0.14.2
cloudflare/agents
Captured source
source ↗agents@0.14.2
Repository: cloudflare/agents
Tag: agents@0.14.2
Published: 2026-06-05T10:52:23Z
Prerelease: no
Release notes:
Patch Changes
- #1684 `ab6dd95` Thanks @threepointone! - warn when
chatRecoveryis configured inonStart()(applied too late for wake recovery)
On every Durable Object wake the SDK evaluates chat-recovery budgets — and may seal an interrupted turn, firing onExhausted — before the user's onStart() runs (_checkRunFibers() is ordered ahead of onStart()). A chatRecovery config produced inside onStart() is therefore read as the built-in defaults at the moment recovery decides, so a configured maxRecoveryWork / shouldKeepRecovering / onExhausted silently never applies to the recovery that matters.
This is now documented on ChatRecoveryConfig and the chatRecovery fields of Think / AIChatAgent, and the SDK logs a one-time warning if it detects chatRecovery being reassigned during onStart(). The warning fires both for a custom config object and for chatRecovery = true (enabling recovery / its defaults too late); assigning false (disabling) in onStart() is intentionally not warned, since recovery already ran with the pre-onStart() value and disabling it afterward is a benign no-op for that wake. The fix is to assign chatRecovery as a class field or in the constructor.
- #1672 `f96a2ba` Thanks @threepointone! - fix(chat-recovery): a turn making forward progress now survives unbounded deploy churn; add a work budget +
shouldKeepRecoveringrunaway guard
Durable chat recovery used to bound a single incident with a non-resetting 15-minute wall-clock ceiling (CHAT_RECOVERY_MAX_WINDOW_MS). That ceiling was overloaded — it served as both a recovery-duration bound and a runaway-loop guard — and it terminated _healthy, actively-progressing_ turns that simply took longer than 15 minutes of wall-clock to finish while being repeatedly interrupted by a dense deploy window, sealing them with reason="max_recovery_window_exceeded" and discarding completed work.
The two jobs are now decoupled (see design/rfc-chat-recovery-work-budget.md):
- Duration is no longer a bound for a progressing turn. The non-resetting wall-clock ceiling is removed. A turn that keeps producing content survives unbounded deploy churn. Stuck turns are still sealed by the no-progress window (5 min, resets on progress); tight no-progress alarm loops by the attempt cap.
- New runaway-loop guard, keyed to work, not time. The existing durable, monotonic, reconnect-immune progress counter is reused as a work meter.
chatRecovery.maxRecoveryWorkcaps the produced content/tool units since an incident opened; exceeding it seals withreason="work_budget_exceeded". Defaults to `Infinity` — the SDK ships the mechanism but imposes no implicit cap, so it never terminates a progressing turn on its own. - New caller predicate.
chatRecovery.shouldKeepRecovering(ctx)is consulted per recovery attempt from the second onward (only when no hard bound has already sealed the incident); returningfalseseals withreason="recovery_aborted". This is where integrators express token/cost/step budgets the SDK should not hardcode. A throwing predicate is logged and treated as "keep recovering". - The no-progress timeout is now configurable.
chatRecovery.noProgressTimeoutMs(default 5 min, resets on progress) is the primary stuck-turn bound, now overridable per agent instead of a hardcoded constant.
New public types from agents/chat: ChatRecoveryProgressContext. New ChatRecoveryConfig fields: maxRecoveryWork, shouldKeepRecovering, noProgressTimeoutMs. ChatRecoveryExhaustedContext.reason gains work_budget_exceeded and recovery_aborted; max_recovery_window_exceeded is retained as an open-string value but is no longer emitted.
Both @cloudflare/ai-chat and @cloudflare/think (which carries its own copy of the recovery engine) are updated identically. Defaults are unchanged except that a progressing turn is no longer terminated by wall-clock age.
- #1668 `d40cc8a` Thanks @ghostwriternr! - Fix RPC resource leaks in workflows.
Workflows that use waitForApproval() or ThinkWorkflow.prompt() now release their RPC stubs promptly, preventing resource leaks and the associated "RPC stub was not disposed" warnings in your logs.
- #1679 `c8d1d32` Thanks @threepointone! - fix(sub-agents): a facet sub-agent no longer touches the root DO's WebSockets, fixing a production-only "Cannot perform I/O on behalf of a different Durable Object (Native)" crash (#1677)
A sub-agent (facet) that called setState(), broadcast(), or otherwise enumerated connections — directly or indirectly via the internal _broadcastProtocol() — could crash in production with Cannot perform I/O on behalf of a different Durable Object. ... (I/O type: Native). It reproduced when the root Agent held a live (hibernatable) WebSocket connection and the child facet was freshly bootstrapped; it never reproduced in wrangler dev/miniflare, which made it hard to catch.
Root cause: the Agent overrides of getConnections() and getConnection() fell through to super.getConnections() / super.getConnection() for facets too. On a facet, that resolves to the host/root DO's hibernatable WebSockets, and reading their attachments from the facet's I/O context is a cross-DO native I/O access that workerd aborts. setState() tripped it only incidentally, because _broadcastProtocol() enumerates connections to compute its exclude list before sending anything.
Fix: a facet's client connections are all virtual (real sockets owned by the root and bridged in), so getConnections()/getConnection() now return only the…
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10Routine release of existing agent SDK