Model Configuration

Complete reference for defining models with providers, parameters, and reasoning settings.

Full Schema

models:
  model_name:
    provider: string # Required: openai, anthropic, google, amazon-bedrock, dmr
    model: string # Required: model identifier
    temperature: float # Optional: 0.0–1.0
    max_tokens: integer # Optional: response length limit
    top_p: float # Optional: 0.0–1.0
    frequency_penalty: float # Optional: 0.0–2.0
    presence_penalty: float # Optional: 0.0–2.0
    base_url: string # Optional: custom API endpoint
    token_key: string # Optional: env var for API token
    thinking_budget: string|int # Optional: reasoning effort
    parallel_tool_calls: boolean # Optional: allow parallel tool calls
    track_usage: boolean # Optional: track token usage
    routing: [list] # Optional: rule-based model routing
    provider_opts: # Optional: provider-specific options
      key: value

Properties Reference

Property Type Required Description
provider string Provider: openai, anthropic, google, amazon-bedrock, dmr, mistral, xai
model string Model name (e.g., gpt-4o, claude-sonnet-4-0, gemini-2.5-flash)
temperature float Randomness. 0.0 = deterministic, 1.0 = creative
max_tokens int Maximum response length in tokens
top_p float Nucleus sampling threshold
frequency_penalty float Penalize repeated tokens (0.0–2.0)
presence_penalty float Encourage topic diversity (0.0–2.0)
base_url string Custom API endpoint URL (for self-hosted or proxied endpoints)
token_key string Environment variable name containing the API token (overrides provider default)
thinking_budget string/int Reasoning effort control
parallel_tool_calls boolean Allow model to call multiple tools at once
track_usage boolean Track and report token usage for this model
routing array Rule-based routing to different models. See Model Routing.
provider_opts object Provider-specific options (see provider pages)

Thinking Budget

Control how much reasoning the model does before responding. This varies by provider:

OpenAI

Uses effort levels as strings:

models:
  gpt:
    provider: openai
    model: gpt-5-mini
    thinking_budget: low # minimal | low | medium | high

Anthropic

Uses an integer token budget (1024–32768):

models:
  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    thinking_budget: 16384 # must be < max_tokens

Google Gemini 2.5

Uses an integer token budget. 0 disables, -1 lets the model decide:

models:
  gemini:
    provider: google
    model: gemini-2.5-flash
    thinking_budget: -1 # dynamic (default)

Google Gemini 3

Uses effort levels like OpenAI:

models:
  gemini3:
    provider: google
    model: gemini-3-flash
    thinking_budget: medium # minimal | low | medium | high

Disabling Thinking

Works for all providers:

thinking_budget: none # or 0
ℹ️ Runtime Toggle

Even when thinking is disabled in config, you can enable it during a session using the /think command in the TUI.

Interleaved Thinking

For Anthropic and Bedrock Claude models, interleaved thinking allows tool calls during model reasoning. This is enabled by default:

models:
  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    # interleaved_thinking defaults to true
    provider_opts:
      interleaved_thinking: false # disable if needed

Examples by Provider

models:
  # OpenAI
  gpt:
    provider: openai
    model: gpt-5-mini

  # Anthropic
  claude:
    provider: anthropic
    model: claude-sonnet-4-0
    max_tokens: 64000

  # Google Gemini
  gemini:
    provider: google
    model: gemini-2.5-flash
    temperature: 0.5

  # AWS Bedrock
  bedrock:
    provider: amazon-bedrock
    model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
    provider_opts:
      region: us-east-1

  # Docker Model Runner (local)
  local:
    provider: dmr
    model: ai/qwen3
    max_tokens: 8192

For detailed provider setup, see the Model Providers section.

Custom Endpoints

Use base_url to point to custom or self-hosted endpoints:

models:
  # Azure OpenAI
  azure_gpt:
    provider: openai
    model: gpt-4o
    base_url: https://my-resource.openai.azure.com/openai/deployments/gpt-4o
    token_key: AZURE_OPENAI_API_KEY

  # Self-hosted vLLM
  local_llama:
    provider: openai # vLLM is OpenAI-compatible
    model: meta-llama/Llama-3.2-3B-Instruct
    base_url: http://localhost:8000/v1

  # Proxy or gateway
  proxied:
    provider: openai
    model: gpt-4o
    base_url: https://proxy.internal.company.com/openai/v1
    token_key: INTERNAL_API_KEY

See Local Models for more examples of custom endpoints.