Docker Model Runner
Run AI models locally with Docker — no API keys, no costs, full data privacy.
Overview
Docker Model Runner (DMR) lets you run open-source AI models directly on your machine. Models run in Docker, so there’s no API key needed and no data leaves your computer.
DMR runs models locally — your data never leaves your machine. Great for development, sensitive data, or offline use.
Prerequisites
- Docker Desktop with the Model Runner feature enabled
- Verify with:
docker model status --json
Configuration
Inline
agents:
root:
model: dmr/ai/qwen3
Named Model
models:
local:
provider: dmr
model: ai/qwen3
max_tokens: 8192
Available Models
Any model available through Docker Model Runner can be used. Common options:
| Model | Description |
|---|---|
ai/qwen3 |
Qwen 3 — versatile, good for coding and general tasks |
ai/llama3.2 |
Llama 3.2 — Meta’s open-source model |
Runtime Flags
Pass flags to the underlying inference runtime (e.g., llama.cpp) using provider_opts.runtime_flags:
models:
local:
provider: dmr
model: ai/qwen3
max_tokens: 8192
provider_opts:
runtime_flags: ["--ngl=33", "--top-p=0.9"]
Runtime flags also accept a single string:
provider_opts:
runtime_flags: "--ngl=33 --top-p=0.9"
Parameter Mapping
cagent model config fields map to llama.cpp flags automatically:
| Config | llama.cpp Flag |
|---|---|
temperature |
--temp |
top_p |
--top-p |
frequency_penalty |
--frequency-penalty |
presence_penalty |
--presence-penalty |
max_tokens |
--context-size |
runtime_flags always take priority over derived flags on conflict.
Speculative Decoding
Use a smaller draft model to predict tokens ahead for faster inference:
models:
fast-local:
provider: dmr
model: ai/qwen3:14B
max_tokens: 8192
provider_opts:
speculative_draft_model: ai/qwen3:0.6B-F16
speculative_num_tokens: 16
speculative_acceptance_rate: 0.8
Custom Endpoint
If base_url is omitted, cagent auto-discovers the DMR endpoint. To set manually:
models:
local:
provider: dmr
model: ai/qwen3
base_url: http://127.0.0.1:12434/engines/llama.cpp/v1
Troubleshooting
- Plugin not found: Ensure Docker Model Runner is enabled in Docker Desktop. cagent will fall back to the default URL.
- Endpoint empty: Verify the Model Runner is running with
docker model status --json. - Performance: Use
runtime_flagsto tune GPU layers (--ngl) and thread count (--threads).