Model Configuration

Complete reference for defining models with providers, parameters, and reasoning settings.

Full Schema

models:
  model_name:
    provider: string # Required: openai, anthropic, google, amazon-bedrock, dmr
    model: string # Required: model identifier
    temperature: float # Optional: 0.0–1.0
    max_tokens: integer # Optional: response length limit
    top_p: float # Optional: 0.0–1.0
    frequency_penalty: float # Optional: 0.0–2.0
    presence_penalty: float # Optional: 0.0–2.0
    base_url: string # Optional: custom API endpoint
    token_key: string # Optional: env var for API token
    thinking_budget: string|int # Optional: reasoning effort
    parallel_tool_calls: boolean # Optional: allow parallel tool calls
    track_usage: boolean # Optional: track token usage
    routing: [list] # Optional: rule-based model routing
    provider_opts: # Optional: provider-specific options
      key: value

Properties Reference

Property	Type	Required	Description
`provider`	string	✓	Provider: `openai`, `anthropic`, `google`, `amazon-bedrock`, `dmr`, `mistral`, `xai`
`model`	string	✓	Model name (e.g., `gpt-4o`, `claude-sonnet-4-0`, `gemini-2.5-flash`)
`temperature`	float	✗	Randomness. `0.0` = deterministic, `1.0` = creative
`max_tokens`	int	✗	Maximum response length in tokens
`top_p`	float	✗	Nucleus sampling threshold
`frequency_penalty`	float	✗	Penalize repeated tokens (0.0–2.0)
`presence_penalty`	float	✗	Encourage topic diversity (0.0–2.0)
`base_url`	string	✗	Custom API endpoint URL (for self-hosted or proxied endpoints)
`token_key`	string	✗	Environment variable name containing the API token (overrides provider default)
`thinking_budget`	string/int	✗	Reasoning effort control
`parallel_tool_calls`	boolean	✗	Allow model to call multiple tools at once
`track_usage`	boolean	✗	Track and report token usage for this model
`routing`	array	✗	Rule-based routing to different models. See Model Routing.
`provider_opts`	object	✗	Provider-specific options (see provider pages)

Thinking Budget

Control how much reasoning the model does before responding. This varies by provider:

OpenAI

Uses effort levels as strings:

models:
  gpt:
    provider: openai
    model: gpt-5-mini
    thinking_budget: low # minimal | low | medium | high

Anthropic

Uses an integer token budget (1024–32768):

models:
  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    thinking_budget: 16384 # must be < max_tokens

Google Gemini 2.5

Uses an integer token budget. 0 disables, -1 lets the model decide:

models:
  gemini:
    provider: google
    model: gemini-2.5-flash
    thinking_budget: -1 # dynamic (default)

Google Gemini 3

Uses effort levels like OpenAI:

models:
  gemini3:
    provider: google
    model: gemini-3-flash
    thinking_budget: medium # minimal | low | medium | high

Disabling Thinking

Works for all providers:

thinking_budget: none # or 0

ℹ️ Runtime Toggle

Even when thinking is disabled in config, you can enable it during a session using the /think command in the TUI.

Interleaved Thinking

For Anthropic and Bedrock Claude models, interleaved thinking allows tool calls during model reasoning. This is enabled by default:

models:
  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    # interleaved_thinking defaults to true
    provider_opts:
      interleaved_thinking: false # disable if needed

Examples by Provider

models:
  # OpenAI
  gpt:
    provider: openai
    model: gpt-5-mini

  # Anthropic
  claude:
    provider: anthropic
    model: claude-sonnet-4-0
    max_tokens: 64000

  # Google Gemini
  gemini:
    provider: google
    model: gemini-2.5-flash
    temperature: 0.5

  # AWS Bedrock
  bedrock:
    provider: amazon-bedrock
    model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
    provider_opts:
      region: us-east-1

  # Docker Model Runner (local)
  local:
    provider: dmr
    model: ai/qwen3
    max_tokens: 8192

For detailed provider setup, see the Model Providers section.

Custom Endpoints

Use base_url to point to custom or self-hosted endpoints:

models:
  # Azure OpenAI
  azure_gpt:
    provider: openai
    model: gpt-4o
    base_url: https://my-resource.openai.azure.com/openai/deployments/gpt-4o
    token_key: AZURE_OPENAI_API_KEY

  # Self-hosted vLLM
  local_llama:
    provider: openai # vLLM is OpenAI-compatible
    model: meta-llama/Llama-3.2-3B-Instruct
    base_url: http://localhost:8000/v1

  # Proxy or gateway
  proxied:
    provider: openai
    model: gpt-4o
    base_url: https://proxy.internal.company.com/openai/v1
    token_key: INTERNAL_API_KEY

See Local Models for more examples of custom endpoints.

← Previous Agent Config Next → Tool Config