microsoft/amplifier-module-provider-anthropic
Python
Captured source
source ↗microsoft/amplifier-module-provider-anthropic
Description: Reference implementation for an Anthropic provider for the Amplifier project
Language: Python
License: MIT
Stars: 2
Forks: 10
Open issues: 8
Created: 2025-10-08T21:53:28Z
Pushed: 2026-06-10T06:17:31Z
Default branch: main
Fork: no
Archived: no
README:
Amplifier Anthropic Provider Module
Claude model integration for Amplifier via Anthropic API.
Prerequisites
- Python 3.11+
- [UV](https://github.com/astral-sh/uv) - Fast Python package manager
Installing UV
# macOS/Linux/WSL curl -LsSf https://astral.sh/uv/install.sh | sh # Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
Purpose
Provides access to Anthropic's Claude models (Claude 4 series: Sonnet, Opus, Haiku) as an LLM provider for Amplifier.
Contract
Module Type: Provider Mount Point: providers Entry Point: amplifier_module_provider_anthropic:mount
Supported Models
claude-sonnet-4-5- Claude Sonnet 4.5 (recommended, default)claude-opus-4-6- Claude Opus 4.6 (most capable)claude-haiku-4-5- Claude Haiku 4.5 (fastest, cheapest)
Configuration
[[providers]]
module = "provider-anthropic"
name = "anthropic"
config = {
default_model = "claude-sonnet-4-5",
max_tokens = 8192,
temperature = 1.0,
debug = false, # Enable standard debug events
raw_debug = false # Enable ultra-verbose raw API I/O logging
}Debug Configuration
Standard Debug (debug: true):
- Emits
llm:request:debugandllm:response:debugevents - Contains request/response summaries with message counts, model info, usage stats
- Moderate log volume, suitable for development
Raw Debug (debug: true, raw_debug: true):
- Emits
llm:request:rawandllm:response:rawevents - Contains complete, unmodified request params and response objects
- Extreme log volume, use only for deep provider integration debugging
- Captures the exact data sent to/from Anthropic API before any processing
Example:
providers: - module: provider-anthropic config: debug: true # Enable debug events raw_debug: true # Enable raw API I/O capture default_model: claude-sonnet-4-5
Retry and Error Handling
The provider disables SDK built-in retries (max_retries=0) and manages retries itself via amplifier_core.utils.retry.retry_with_backoff(). This gives the provider full control over backoff timing, retry-after header honoring, and per-error-class delay scaling.
Error Translation
SDK exceptions are translated to kernel errors before the retry loop sees them. All translations preserve the original exception as __cause__ for debugging.
| SDK Exception | Condition | Kernel Error | Status | Retryable | | --- | --- | --- | --- | --- | | RateLimitError | 429 | RateLimitError | 429 | Yes | | OverloadedError | 529 | ProviderUnavailableError | 529 | Yes (10× backoff) | | InternalServerError | 5xx | ProviderUnavailableError | 5xx | Yes | | AuthenticationError | 401 | AuthenticationError | 401 | No | | BadRequestError | context length / too many tokens | ContextLengthError | 400 | No | | BadRequestError | safety / content filter / blocked | ContentFilterError | 400 | No | | BadRequestError | other | InvalidRequestError | 400 | No | | APIStatusError | 403 | AccessDeniedError | 403 | No | | APIStatusError | 404 | NotFoundError | 404 | No | | APIStatusError | other non-5xx | LLMError | — | No | | asyncio.TimeoutError | — | LLMTimeoutError | — | Yes | | Other | — | LLMError | — | Yes |
Backoff Formula
Each retry delay is computed as follows:
base_delay = min_retry_delay × 2^(attempt - 1) capped_delay = min(base_delay, max_retry_delay) scaled_delay = capped_delay × delay_multiplier # 1.0 for most errors, 10.0 for 529 final_delay = max(scaled_delay, retry_after) # server retry-after as floor sleep = final_delay ± (final_delay × jitter) # randomised ± jitter fraction
Example: 529 Overloaded (10× multiplier, defaults)
| Attempt | base_delay | capped | ×10 | Sleep | | --- | --- | --- | --- | --- | | 1 | 1s | 1s | 10s | 10s | | 2 | 2s | 2s | 20s | 20s | | 3 | 4s | 4s | 40s | 40s | | 4 | 8s | 8s | 80s | 80s | | 5 | 16s | 16s | 160s | 160s |
Total wait ≈ 310s (~5 min) before the request is abandoned.
Retry Configuration
providers: - module: provider-anthropic config: max_retries: 5 min_retry_delay: 1.0 max_retry_delay: 60.0 retry_jitter: 0.2 overloaded_delay_multiplier: 10.0
| Key | Default | Description | | --- | --- | --- | | max_retries | 5 | Maximum retry attempts before giving up | | min_retry_delay | 1.0 | Base delay in seconds for the first retry | | max_retry_delay | 60.0 | Cap on the base delay (before multiplier) | | retry_jitter | 0.2 | Jitter fraction (0.0–1.0). Also accepts true (→ 0.2) or false (→ 0.0) for backward compatibility | | overloaded_delay_multiplier | 10.0 | Multiplier applied to delays for 529 Overloaded errors |
Events
A provider:retry event is emitted before each retry sleep with the following fields:
| Field | Description | | --- | --- | | provider | Provider name ("anthropic") | | model | Model being called | | attempt | Current retry attempt number | | max_retries | Configured maximum retries | | delay | Computed sleep duration in seconds | | retry_after | Server retry-after value (or null) | | error_type | Kernel error class name | | error_message | Error description |
Beta Headers
Anthropic provides experimental features through beta headers. Enable these features by adding the beta_headers configuration field.
Configuration
Single beta header:
providers: - module: provider-anthropic config: default_model: claude-sonnet-4-5 beta_headers: "context-1m-2025-08-07" # Enable 1M token context window
Multiple beta headers:
providers: - module: provider-anthropic config: default_model: claude-sonnet-4-5 beta_headers: - "context-1m-2025-08-07" - "future-feature-header"
1M Token Context Window
Claude Sonnet 4.5 supports a 1M token context window when the context-1m-2025-08-07 beta header is enabled:
providers: - module: provider-anthropic config: default_model: claude-sonnet-4-5 beta_headers: "context-1m-2025-08-07" max_tokens: 8192 # Output tokens remain separate from context window
With this configuration:
- Context window: Up to 1M tokens of input…
Excerpt shown — open the source for the full document.