Anthropic
Use Claude Sonnet 4, Claude Sonnet 4.5, and other Anthropic models with docker-agent.
Setup
# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..."
Workload Identity Federation (no API key)
Authenticate with short-lived tokens minted from your own OIDC identity
provider instead of a long-lived API key. See Anthropic’s
Workload Identity Federation guide
to provision a Federation Rule, then configure docker-agent with a typed
auth: block:
providers:
anthropic-wif:
provider: anthropic
auth:
type: workload_identity_federation
workload_identity_federation:
federation_rule_id: fdrl_REPLACE_ME
organization_id: 00000000-0000-0000-0000-000000000000
# Optional: only required for target_type=SERVICE_ACCOUNT rules.
service_account_id: svac_REPLACE_ME
identity_token:
# Pick exactly one of: file, env, command, url
file: /var/run/secrets/anthropic.com/token
models:
claude:
provider: anthropic-wif
model: claude-sonnet-4-5
identity_token accepts four mutually exclusive sources:
| Source | When to use |
|---|---|
file |
Kubernetes projected service-account tokens, SPIFFE/SPIRE helpers, Vault sidecars — anything that rotates a file on disk |
env |
The token is already exported in an environment variable |
command |
Shell out to a CLI on every refresh (gcloud auth print-identity-token, az account get-access-token, …) |
url |
Fetch from an HTTP(S) endpoint (cloud metadata servers, GitHub Actions OIDC token URL, …) |
For url, both the URL and any header values support ${VAR} expansion
against the runtime environment, which lets you wire the GitHub Actions OIDC
token endpoint without putting secrets in YAML:
identity_token:
url: ${ACTIONS_ID_TOKEN_REQUEST_URL}&audience=https://api.anthropic.com
headers:
Authorization: bearer ${ACTIONS_ID_TOKEN_REQUEST_TOKEN}
response_field: value
auth: is mutually exclusive with --gateway. Token-refresh failures are
surfaced through the normal error path with a clear anthropic workload
identity federation: failed to refresh identity token from <kind> source
(federation_rule=fdrl_…): ... message in the TUI.
A complete walkthrough of all four sources lives in
examples/anthropic_wif.yaml.
Configuration
Inline
agents:
root:
model: anthropic/claude-sonnet-4-5
Named Model
models:
claude:
provider: anthropic
model: claude-sonnet-4-5
max_tokens: 64000
Available Models
| Model ID | Description |
|---|---|
claude-opus-4-7 |
Highest-capability Opus model; supports task budget |
claude-sonnet-4-5 |
Most capable Sonnet; supports extended thinking |
claude-sonnet-4-0 |
Previous Sonnet generation, still supported |
claude-haiku-4-5 |
Fast and inexpensive, good for tight loops |
Thinking Budget
Anthropic accepts either an integer token budget or a string effort value. Thinking is off unless you set thinking_budget; when set, interleaved thinking is auto-enabled.
Token budget (1024–32768; works on all extended-thinking Claude models):
models:
claude-deep:
provider: anthropic
model: claude-sonnet-4-5
thinking_budget: 16384 # must be < max_tokens
Adaptive / effort-based (Claude Opus 4.6+ only — every string value is sent as adaptive thinking via output_config.effort):
models:
opus-adaptive:
provider: anthropic
model: claude-opus-4-6
thinking_budget: adaptive # model decides effort (defaults to high)
opus-effort:
provider: anthropic
model: claude-opus-4-6
thinking_budget: high # low | medium | high | xhigh | max (same as adaptive/<effort>)
On models that reject token-based thinking (Opus 4.6, 4.7, 4.8), an integer budget is automatically coerced to adaptive with a logged warning. See the Thinking / Reasoning guide for the full cross-provider reference.
Interleaved Thinking
Auto-enabled whenever a thinking budget is configured on a Claude model. Allows tool calls during model reasoning for more integrated problem-solving:
models:
claude:
provider: anthropic
model: claude-sonnet-4-5
provider_opts:
interleaved_thinking: false # disable if needed
Task Budget
task_budget caps the total number of tokens the model may spend across a
multi-step agentic task — combined thinking, tool calls, and final output. It
is forwarded as
output_config.task_budget
and is ideal for letting long-running agents self-regulate effort without
tightening max_tokens on every call.
docker-agent automatically attaches the required task-budgets-2026-03-13
beta header whenever this field is set. You can configure task_budget on
any Claude model — docker-agent never gates it by model name. At the time
of writing, only Claude Opus 4.7 actually honors the field; other Claude
models (Sonnet 4.5, Opus 4.5 / 4.6, etc.) are expected to reject requests
that include it. Check the Anthropic release notes linked above for the
current list of supported models.
models:
opus:
provider: anthropic
model: claude-opus-4-7
task_budget: 128000 # integer shorthand → { type: tokens, total: 128000 }
thinking_budget: adaptive
Object form (forward-compatible with future budget types):
opus:
provider: anthropic
model: claude-opus-4-7
task_budget:
type: tokens
total: 128000
See the full schema on the Model Configuration page.
Server-Side Fallbacks
When the primary model refuses a request (e.g. Claude Fable 5’s safety
classifiers ending the turn with stop reason refusal), Anthropic can retry
the request with backup models in a single round trip. Set fallbacks in
provider_opts to a list of model IDs, in priority order:
models:
fable:
provider: anthropic
model: claude-fable-5
provider_opts:
fallbacks:
- claude-opus-4-8
- claude-sonnet-4-6
docker-agent automatically attaches the required
server-side-fallback-2026-06-01 beta header and forwards the option as
fallbacks: [{"model": "..."}]. The response’s model field reports which
model actually served the request.
Fallback models receive the exact same request as the primary model (thinking configuration, task budget, beta features, …), so list only models that accept the same request shape. Not available on Bedrock, Vertex AI, or the Message Batches API.
Thinking Display
Controls whether thinking blocks are returned in responses when thinking is enabled. Claude Opus 4.7 hides thinking content by default (omitted); earlier Claude 4 models default to summarized. Set thinking_display in provider_opts to override:
models:
claude-opus-4-7:
provider: anthropic
model: claude-opus-4-7
thinking_budget: adaptive
provider_opts:
thinking_display: summarized # "summarized", "display", or "omitted"
Valid values:
summarized: thinking blocks are returned with summarized thinking text (default for Claude 4 models prior to Opus 4.7).display: thinking blocks are returned for display (use this to re-enable thinking output on Opus 4.7).omitted: thinking blocks are returned with an empty thinking field; the signature is still returned for multi-turn continuity (default for Opus 4.7). Useful to reduce time-to-first-text-token when streaming.
Note: thinking_display applies to both thinking_budget with token counts and adaptive/effort-based budgets. Full thinking tokens are billed regardless of the thinking_display value.
Anthropic thinking budget values below 1024 or greater than or equal to max_tokens are ignored (a warning is logged).