Docker Model Runner

Run AI models locally with Docker — no API keys, no costs, full data privacy.

Overview

Docker Model Runner (DMR) lets you run open-source AI models directly on your machine. Models run in Docker, so there’s no API key needed and no data leaves your computer.

💡 No API key needed

DMR runs models locally — your data never leaves your machine. Great for development, sensitive data, or offline use.

Prerequisites

Configuration

Inline

agents:
  root:
    model: dmr/ai/qwen3

Named Model

models:
  local:
    provider: dmr
    model: ai/qwen3
    max_tokens: 8192

Available Models

Any model available through Docker Model Runner can be used. Common options:

Model Description
ai/qwen3 Qwen 3 — versatile, good for coding and general tasks
ai/llama3.2 Llama 3.2 — Meta’s open-source model

Runtime Flags

Pass flags to the underlying inference runtime (e.g., llama.cpp) using provider_opts.runtime_flags:

models:
  local:
    provider: dmr
    model: ai/qwen3
    max_tokens: 8192
    provider_opts:
      runtime_flags: ["--ngl=33", "--top-p=0.9"]

Runtime flags also accept a single string:

provider_opts:
  runtime_flags: "--ngl=33 --top-p=0.9"

Parameter Mapping

cagent model config fields map to llama.cpp flags automatically:

Config llama.cpp Flag
temperature --temp
top_p --top-p
frequency_penalty --frequency-penalty
presence_penalty --presence-penalty
max_tokens --context-size

runtime_flags always take priority over derived flags on conflict.

Speculative Decoding

Use a smaller draft model to predict tokens ahead for faster inference:

models:
  fast-local:
    provider: dmr
    model: ai/qwen3:14B
    max_tokens: 8192
    provider_opts:
      speculative_draft_model: ai/qwen3:0.6B-F16
      speculative_num_tokens: 16
      speculative_acceptance_rate: 0.8

Custom Endpoint

If base_url is omitted, cagent auto-discovers the DMR endpoint. To set manually:

models:
  local:
    provider: dmr
    model: ai/qwen3
    base_url: http://127.0.0.1:12434/engines/llama.cpp/v1

Troubleshooting