Model Configuration
Complete reference for defining models with providers, parameters, and reasoning settings.
Full Schema
models:
model_name:
provider: string # Required: openai, anthropic, google, amazon-bedrock, dmr
model: string # Required: model identifier
temperature: float # Optional: 0.0–1.0
max_tokens: integer # Optional: response length limit
top_p: float # Optional: 0.0–1.0
frequency_penalty: float # Optional: 0.0–2.0
presence_penalty: float # Optional: 0.0–2.0
base_url: string # Optional: custom API endpoint
token_key: string # Optional: env var for API token
thinking_budget: string|int # Optional: reasoning effort
task_budget: int|object # Optional: total task token budget (Anthropic)
parallel_tool_calls: boolean # Optional: allow parallel tool calls
track_usage: boolean # Optional: track token usage
routing: [list] # Optional: rule-based model routing
provider_opts: # Optional: provider-specific options
key: value
Properties Reference
| Property | Type | Required | Description |
|---|---|---|---|
provider |
string | ✓ | Provider: openai, anthropic, google, amazon-bedrock, dmr, mistral, xai |
model |
string | ✓ | Model name (e.g., gpt-4o, claude-sonnet-4-0, gemini-2.5-flash) |
temperature |
float | ✗ | Randomness. 0.0 = deterministic, 1.0 = creative |
max_tokens |
int | ✗ | Maximum response length in tokens |
top_p |
float | ✗ | Nucleus sampling threshold |
frequency_penalty |
float | ✗ | Penalize repeated tokens (0.0–2.0) |
presence_penalty |
float | ✗ | Encourage topic diversity (0.0–2.0) |
base_url |
string | ✗ | Custom API endpoint URL (for self-hosted or proxied endpoints) |
token_key |
string | ✗ | Environment variable name containing the API token (overrides provider default) |
thinking_budget |
string/int | ✗ | Reasoning effort control |
task_budget |
int/object | ✗ | Total token budget for an agentic task (forwarded to Anthropic; see Task Budget). |
parallel_tool_calls |
boolean | ✗ | Allow model to call multiple tools at once |
track_usage |
boolean | ✗ | Track and report token usage for this model |
routing |
array | ✗ | Rule-based routing to different models. See Model Routing. |
provider_opts |
object | ✗ | Provider-specific options (see provider pages) |
Thinking Budget
Control how much reasoning the model does before responding. This varies by provider:
OpenAI
Uses effort levels as strings:
models:
gpt:
provider: openai
model: gpt-5-mini
thinking_budget: low # minimal | low | medium | high
Anthropic
Uses an integer token budget (1024–32768):
models:
claude:
provider: anthropic
model: claude-sonnet-4-5
thinking_budget: 16384 # must be < max_tokens
Google Gemini 2.5
Uses an integer token budget. 0 disables, -1 lets the model decide:
models:
gemini:
provider: google
model: gemini-2.5-flash
thinking_budget: -1 # dynamic (default)
Google Gemini 3
Uses effort levels like OpenAI:
models:
gemini3:
provider: google
model: gemini-3-flash
thinking_budget: medium # minimal | low | medium | high
Disabling Thinking
Works for all providers:
thinking_budget: none # or 0
Task Budget
Anthropic-only.
task_budget caps the total number of tokens the model may spend across a
multi-step agentic task — combining thinking, tool calls, and final output
tokens. It lets long-running agents self-regulate effort without having to
choose a tight per-call max_tokens.
It is forwarded to Anthropic’s
output_config.task_budget
request field. docker-agent automatically attaches the required
task-budgets-2026-03-13 beta header whenever this field is set.
You can configure task_budget on any Claude model — docker-agent never
gates it by model name. At the time of writing only Claude Opus 4.7
actually honors the field; other Claude models will reject requests that
include it. Check the Anthropic release notes linked above for the current
list of supported models.
Integer shorthand
models:
opus:
provider: anthropic
model: claude-opus-4-7
task_budget: 128000 # total tokens for the whole task
thinking_budget: adaptive # works nicely together
Object form
Equivalent, and forward-compatible with future budget types:
models:
opus:
provider: anthropic
model: claude-opus-4-7
task_budget:
type: tokens # only "tokens" is supported today
total: 128000
Setting task_budget: 0 (or omitting the field) disables the feature — the
model falls back to the provider’s default behavior.
Like other inheritable model settings, task_budget can also be declared on a
provider definition and is
inherited by every model that references that provider.
Interleaved Thinking
For Anthropic and Bedrock Claude models, interleaved thinking allows tool calls during model reasoning. This is enabled by default:
models:
claude:
provider: anthropic
model: claude-sonnet-4-5
# interleaved_thinking defaults to true
provider_opts:
interleaved_thinking: false # disable if needed
Thinking Display (Anthropic)
For Anthropic Claude models, thinking_display controls whether thinking blocks are returned in responses when thinking is enabled. Claude Opus 4.7 hides thinking content by default (omitted); set this provider option to receive summarized thinking:
models:
opus-4-7:
provider: anthropic
model: claude-opus-4-7
thinking_budget: adaptive
provider_opts:
thinking_display: summarized # "summarized", "display", or "omitted"
See the Anthropic provider page for details.
Examples by Provider
models:
# OpenAI
gpt:
provider: openai
model: gpt-5-mini
# Anthropic
claude:
provider: anthropic
model: claude-sonnet-4-0
max_tokens: 64000
# Google Gemini
gemini:
provider: google
model: gemini-2.5-flash
temperature: 0.5
# AWS Bedrock
bedrock:
provider: amazon-bedrock
model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
provider_opts:
region: us-east-1
# Docker Model Runner (local)
local:
provider: dmr
model: ai/qwen3
max_tokens: 8192
For detailed provider setup, see the Model Providers section.
Custom Endpoints
Use base_url to point to custom or self-hosted endpoints:
models:
# Azure OpenAI
azure_gpt:
provider: openai
model: gpt-4o
base_url: https://my-resource.openai.azure.com/openai/deployments/gpt-4o
token_key: AZURE_OPENAI_API_KEY
# Self-hosted vLLM
local_llama:
provider: openai # vLLM is OpenAI-compatible
model: meta-llama/Llama-3.2-3B-Instruct
base_url: http://localhost:8000/v1
# Proxy or gateway
proxied:
provider: openai
model: gpt-4o
base_url: https://proxy.internal.company.com/openai/v1
token_key: INTERNAL_API_KEY
See Local Models for more examples of custom endpoints.
Inheriting from Provider Definitions
Models can reference a named provider to inherit shared defaults. Model-level settings always take precedence:
providers:
my_anthropic:
provider: anthropic
token_key: MY_ANTHROPIC_KEY
max_tokens: 16384
thinking_budget: high
temperature: 0.5
models:
claude:
provider: my_anthropic
model: claude-sonnet-4-5
# Inherits max_tokens, thinking_budget, temperature from provider
claude_fast:
provider: my_anthropic
model: claude-haiku-4-5
thinking_budget: low # Overrides provider default
See Provider Definitions for the full list of inheritable properties.