Model Configuration
Complete reference for defining models with providers, parameters, and reasoning settings.
Full Schema
models:
model_name:
first_available: [list] # Optional: candidate model refs, tried in order by available credentials.
# Mutually exclusive with other model settings.
provider: string # Required unless using first_available. One of: openai, anthropic, google, amazon-bedrock,
# dmr, mistral, xai, nebius, minimax, requesty, azure, ollama,
# github-copilot, or a named provider defined under the top-level
# `providers:` section.
model: string # Required: model identifier
temperature: float # Optional: 0.0–2.0 (provider-dependent; e.g. Anthropic caps at 1.0)
max_tokens: integer # Optional: response length limit
top_p: float # Optional: 0.0–1.0
frequency_penalty: float # Optional: -2.0–2.0
presence_penalty: float # Optional: -2.0–2.0
base_url: string # Optional: custom API endpoint
token_key: string # Optional: env var for API token
thinking_budget: string|int # Optional: reasoning effort
task_budget: int|object # Optional: total task token budget (Anthropic)
parallel_tool_calls: boolean # Optional: allow parallel tool calls
track_usage: boolean # Optional: track token usage
routing: [list] # Optional: rule-based model routing
provider_opts: # Optional: provider-specific options
key: value
Properties Reference
| Property | Type | Required | Description |
|---|---|---|---|
first_available |
array | ✗ | Candidate model references tried in order; selects the first whose credentials are configured. Mutually exclusive with other model settings. |
provider |
string | ✓/✗ | Required for regular model definitions; omitted for first_available selectors. Provider: openai, anthropic, google, amazon-bedrock, dmr, mistral, xai, nebius, minimax, requesty, azure, ollama, github-copilot, or any named provider. |
model |
string | ✓/✗ | Required for regular model definitions; omitted for first_available selectors. Model name (e.g., gpt-4o, claude-sonnet-4-5, gemini-3.5-flash) |
temperature |
float | ✗ | Sampling randomness. Range is provider-dependent — typically 0.0–2.0 (Anthropic caps at 1.0). 0.0 is deterministic. |
max_tokens |
int | ✗ | Maximum response length in tokens |
top_p |
float | ✗ | Nucleus sampling threshold (0.0–1.0) |
frequency_penalty |
float | ✗ | Penalize repeated tokens (-2.0–2.0) |
presence_penalty |
float | ✗ | Encourage topic diversity (-2.0–2.0) |
base_url |
string | ✗ | Custom API endpoint URL (for self-hosted or proxied endpoints) |
token_key |
string | ✗ | Environment variable name containing the API token (overrides provider default) |
thinking_budget |
string/int | ✗ | Reasoning effort control |
task_budget |
int/object | ✗ | Total token budget for an agentic task (forwarded to Anthropic; see Task Budget). |
parallel_tool_calls |
boolean | ✗ | Allow model to call multiple tools at once |
track_usage |
boolean | ✗ | Track and report token usage for this model |
routing |
array | ✗ | Rule-based routing to different models. See Model Routing. |
provider_opts |
object | ✗ | Provider-specific options (see provider pages) |
title_model |
string | ✗ | Model used for session-title generation. Can be a named model from the models: section or an inline provider/model string. When omitted, the agent’s primary model generates titles. Cannot be combined with first_available. |
Delegating Session-Title Generation
The title_model field lets a heavyweight primary model hand off the cheap
title-generation call to a smaller, faster model:
model: anthropic/claude-opus-4-7
title_model: anthropic/claude-haiku-4-5
The value can be a named entry from the models stanza or an inline
provider/model string. When omitted, the agent’s primary model generates
titles.
title_model cannot be combined with first_available model selection — the combination is rejected at validation time.
First Available Models
Use first_available when the same agent should work with whichever provider credentials are available in the current environment. docker-agent checks the candidates in order at load time and replaces the selector with the first candidate whose required environment variables are configured.
models:
smart:
first_available:
- anthropic/claude-sonnet-4-6
- openai/gpt-5
- google/gemini-3.5-flash
- dmr/ai/qwen3 # local fallback; no API key required
agents:
root:
model: smart
instruction: You are a helpful assistant.
Candidates can be inline provider/model references or names from the same models: section. Local providers such as dmr and ollama do not require credentials, so they are useful as final fallbacks.
If none of the candidates has credentials configured, docker-agent reports the missing environment variables grouped by candidate. You only need to configure one group of credentials, not every provider in the list.
A first_available model is only a selector. It cannot be combined with provider, model, routing, token_key, budgets, sampling options, or other model settings. Put those settings on named candidate models instead:
models:
claude:
provider: anthropic
model: claude-sonnet-4-6
max_tokens: 64000
gpt:
provider: openai
model: gpt-5
thinking_budget: low
smart:
first_available:
- claude
- gpt
- dmr/ai/qwen3
See examples/first_available.yaml for a complete example.
Thinking Budget
Control how much reasoning the model does before responding. This varies by provider:
OpenAI
Uses effort levels as strings:
models:
gpt:
provider: openai
model: gpt-5
thinking_budget: low # minimal | low | medium | high | xhigh
Anthropic
Uses an integer token budget (1024–32768), or — on adaptive-capable models (Opus 4.6+) — adaptive, adaptive/<effort>, or a bare effort level:
models:
claude:
provider: anthropic
model: claude-sonnet-4-5
thinking_budget: 16384 # must be < max_tokens
opus:
provider: anthropic
model: claude-opus-4-6
thinking_budget: adaptive # or adaptive/<effort>, or low | medium | high | xhigh | max
Google Gemini 2.5
Uses an integer token budget. 0 disables, -1 lets the model decide:
models:
gemini:
provider: google
model: gemini-2.5-flash
thinking_budget: -1 # dynamic (default)
Google Gemini 3
Uses effort levels like OpenAI:
models:
gemini3:
provider: google
model: gemini-3-flash
thinking_budget: medium # minimal | low | medium | high
Disabling Thinking
Works for all providers:
thinking_budget: none # or 0
Models that always reason (OpenAI o-series, gpt-5, Gemini 3) fall back to the API default and still reason internally.
See the Thinking / Reasoning guide for per-provider details, including AWS Bedrock and Docker Model Runner.
Task Budget
Anthropic-only.
task_budget caps the total number of tokens the model may spend across a
multi-step agentic task — combining thinking, tool calls, and final output
tokens. It lets long-running agents self-regulate effort without having to
choose a tight per-call max_tokens.
It is forwarded to Anthropic’s
output_config.task_budget
request field. docker-agent automatically attaches the required
task-budgets-2026-03-13 beta header whenever this field is set.
You can configure task_budget on any Claude model — docker-agent never
gates it by model name. At the time of writing only Claude Opus 4.7
actually honors the field; other Claude models will reject requests that
include it. Check the Anthropic release notes linked above for the current
list of supported models.
Integer shorthand
models:
opus:
provider: anthropic
model: claude-opus-4-7
task_budget: 128000 # total tokens for the whole task
thinking_budget: adaptive # works nicely together
Object form
Equivalent, and forward-compatible with future budget types:
models:
opus:
provider: anthropic
model: claude-opus-4-7
task_budget:
type: tokens # only "tokens" is supported today
total: 128000
Setting task_budget: 0 (or omitting the field) disables the feature — the
model falls back to the provider’s default behavior.
Like other inheritable model settings, task_budget can also be declared on a
provider definition and is
inherited by every model that references that provider.
See examples/task_budget.yaml for a complete example.
Interleaved Thinking
For Anthropic and Bedrock Claude models, interleaved thinking allows tool calls during model reasoning. It is auto-enabled whenever a thinking budget is configured:
models:
claude:
provider: anthropic
model: claude-sonnet-4-5
thinking_budget: 8192
# interleaved_thinking is auto-enabled when thinking_budget is set
provider_opts:
interleaved_thinking: false # disable if needed
Thinking Display (Anthropic)
For Anthropic Claude models, thinking_display controls whether thinking blocks are returned in responses when thinking is enabled. Claude Opus 4.7 hides thinking content by default (omitted); set this provider option to receive summarized thinking:
models:
opus-4-7:
provider: anthropic
model: claude-opus-4-7
thinking_budget: adaptive
provider_opts:
thinking_display: summarized # "summarized", "display", or "omitted"
See the Anthropic provider page for details.
Custom HTTP Headers
For OpenAI-compatible providers (openai, github-copilot, mistral, xai,
nebius, minimax, ollama, and any custom provider using the OpenAI API),
provider_opts.http_headers adds arbitrary HTTP headers to every outgoing
request:
models:
my_model:
provider: openai
model: gpt-4o
provider_opts:
http_headers:
X-Request-Source: docker-agent
X-Tenant-Id: my-team
Header names are matched case-insensitively. The github-copilot provider
automatically sets Copilot-Integration-Id: copilot-developer-cli — see the
GitHub Copilot provider page
for details.
Examples by Provider
models:
# OpenAI
gpt:
provider: openai
model: gpt-5
# Anthropic
claude:
provider: anthropic
model: claude-sonnet-4-5
max_tokens: 64000
# Google Gemini
gemini:
provider: google
model: gemini-3.5-flash
temperature: 0.5
# AWS Bedrock
bedrock:
provider: amazon-bedrock
model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
provider_opts:
region: us-east-1
# Docker Model Runner (local)
local:
provider: dmr
model: ai/qwen3
max_tokens: 8192
For detailed provider setup, see the Model Providers section.
Custom Endpoints
Use base_url to point to custom or self-hosted endpoints:
models:
# Azure OpenAI
azure_gpt:
provider: openai
model: gpt-4o
base_url: https://my-resource.openai.azure.com/openai/deployments/gpt-4o
token_key: AZURE_OPENAI_API_KEY
# Self-hosted vLLM
local_llama:
provider: openai # vLLM is OpenAI-compatible
model: meta-llama/Llama-3.2-3B-Instruct
base_url: http://localhost:8000/v1
# Proxy or gateway
proxied:
provider: openai
model: gpt-4o
base_url: https://proxy.internal.company.com/openai/v1
token_key: INTERNAL_API_KEY
See Local Models for more examples of custom endpoints.
Inheriting from Provider Definitions
Models can reference a named provider to inherit shared defaults. Model-level settings always take precedence:
providers:
my_anthropic:
provider: anthropic
token_key: MY_ANTHROPIC_KEY
max_tokens: 16384
thinking_budget: 8192
temperature: 0.5
models:
claude:
provider: my_anthropic
model: claude-sonnet-4-5
# Inherits max_tokens, thinking_budget, temperature from provider
claude_fast:
provider: my_anthropic
model: claude-haiku-4-5
thinking_budget: 1024 # Overrides provider default
See Provider Definitions for the full list of inheritable properties.