Chat Server
Expose your agents through an OpenAI-compatible Chat Completions API so any tool that already speaks OpenAI can drive a docker-agent agent.
Overview
The docker agent serve chat command starts an HTTP server that exposes one or
more agents through an OpenAI-compatible Chat Completions API at
/v1/chat/completions and /v1/models. Any client that already speaks the
OpenAI protocol — for example
Open WebUI, curl, the OpenAI
Python SDK, or LangChain — can drive a docker-agent agent without any custom
integration.
# Single agent — exposed as the model `root`
$ docker agent serve chat agent.yaml
# Multi-agent config — every agent in the team becomes a model
$ docker agent serve chat ./team.yaml
# Pick a specific agent from a multi-agent config
$ docker agent serve chat ./team.yaml --agent reviewer
# Run an agent straight from the registry
$ docker agent serve chat agentcatalog/pirate --listen 127.0.0.1:9090
# Require a Bearer token, sourced from an env var
$ docker agent serve chat agent.yaml --api-key-env CHAT_BEARER_TOKEN
Use the chat server when you want to plug docker-agent into existing OpenAI-compatible tooling (chat UIs, IDE integrations, OpenAI SDK clients). Use the API server when you want full control over sessions, agent execution, tool-call confirmations, and streamed runtime events.
Endpoints
The OpenAI-compatible endpoints live under the /v1 prefix to match the
OpenAI API surface. The OpenAPI specification is served at the top level so it
can be discovered without authentication.
| Method | Path | Description |
|---|---|---|
GET |
/v1/models |
List the agents that this server exposes as models |
POST |
/v1/chat/completions |
Send messages and receive a completion (regular or streaming) |
GET |
/openapi.json |
OpenAPI specification for the chat server |
The model identifier in POST /v1/chat/completions is the agent name.
For a single-agent config that’s typically root; for a multi-agent config,
each named agent becomes its own selectable model.
Quick Start
# 1. Start the server
$ docker agent serve chat agent.yaml
Listening on 127.0.0.1:8083
OpenAI-compatible chat completions endpoint: http://127.0.0.1:8083/v1/chat/completions
# 2. List exposed agents (models)
$ curl http://127.0.0.1:8083/v1/models
{"object":"list","data":[{"id":"root","object":"model","owned_by":"docker-agent"}]}
# 3. Send a chat request
$ curl http://127.0.0.1:8083/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "root",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Streaming
Set "stream": true in the request body to receive a Server-Sent Events
(SSE) stream of OpenAI-format chat.completion.chunk deltas:
$ curl -N http://127.0.0.1:8083/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "root",
"stream": true,
"messages": [{"role": "user", "content": "Stream a poem"}]
}'
Drive it from the OpenAI Python SDK
Because the wire format is OpenAI-compatible, point any OpenAI client at the
chat server’s base_url and use the agent name as the model:
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:8083/v1",
api_key="not-needed-when-no-api-key-flag", # required by the SDK, ignored if no auth
)
resp = client.chat.completions.create(
model="root",
messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)
Server-side Conversation Caching
By default the server is stateless: every request must contain the full
message history, exactly like OpenAI’s API. Enable server-side caching by
setting --conversations-max to a positive value, then send a stable
X-Conversation-Id header on each request:
$ docker agent serve chat agent.yaml --conversations-max 100 --conversation-ttl 30m
$ curl http://127.0.0.1:8083/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'X-Conversation-Id: my-thread-1' \
-d '{
"model": "root",
"messages": [{"role": "user", "content": "Remember my name is Alice"}]
}'
$ curl http://127.0.0.1:8083/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'X-Conversation-Id: my-thread-1' \
-d '{
"model": "root",
"messages": [{"role": "user", "content": "What is my name?"}]
}'
Cached conversations are evicted after --conversation-ttl of inactivity, or
when the cache hits --conversations-max items (oldest entries are evicted
first).
Authentication
The chat server has no authentication by default. To require a Bearer
token, pass --api-key (literal value) or --api-key-env (name of an
environment variable that holds the value):
$ docker agent serve chat agent.yaml --api-key-env CHAT_BEARER_TOKEN
Clients must then send an Authorization: Bearer <token> header on every
request to /v1/*. Both /v1/models and /v1/chat/completions are
protected once a key is set.
The default listen address is 127.0.0.1:8083. If you bind to a non-loopback address, always set --api-key or --api-key-env — there is no other authentication layer.
CORS
CORS is disabled by default. To allow a browser-based client to call the
server, set --cors-origin to the exact origin (scheme + host + port) that
should be allowed:
$ docker agent serve chat agent.yaml --cors-origin https://my-ui.example.com
CLI Flags
docker agent serve chat <agent-file>|<registry-ref> [flags]
| Flag | Default | Description |
|---|---|---|
-a, --agent <name> |
(all agents) | Name of the agent to expose. If omitted, every agent in the config is exposed as a separate model. |
-l, --listen <addr> |
127.0.0.1:8083 |
Address to listen on. |
--cors-origin <origin> |
(none) | Allowed CORS origin (e.g. https://example.com). Empty disables CORS. |
--api-key <token> |
(none) | Required Bearer token clients must present (Authorization: Bearer <token>). Empty disables auth. |
--api-key-env <name> |
(none) | Read the API key from this environment variable instead of the command line. |
--max-request-size <bytes> |
1048576 (1 MiB) |
Maximum request body size. |
--request-timeout <dur> |
5m |
Per-request timeout (covers model + tool calls + streaming). |
--conversations-max <n> |
0 |
Cache up to N conversations server-side, keyed by X-Conversation-Id. 0 disables — clients must resend history. |
--conversation-ttl <dur> |
30m |
Idle TTL after which a cached conversation is evicted. |
--max-idle-runtimes <n> |
4 |
Maximum number of idle runtimes pooled per agent. 0 disables pooling. |
All runtime configuration flags
(--working-dir, --env-from-file, --models-gateway, --hook-*, …) are
also accepted.
Open WebUI Integration
Open WebUI can talk to any OpenAI-compatible endpoint. To plug docker-agent in:
-
Start the chat server, optionally with auth:
$ docker agent serve chat agent.yaml \ --listen 127.0.0.1:8083 \ --cors-origin http://localhost:3000 \ --api-key-env OPEN_WEBUI_TOKEN -
In Open WebUI, add an OpenAI-compatible connection:
- API Base URL:
http://127.0.0.1:8083/v1 - API Key: the value of
OPEN_WEBUI_TOKEN
- API Base URL:
-
Each agent in your config appears as a selectable model.
For the docker-agent–native HTTP API (sessions, tool-call confirmation, runtime events), see the API Server. For full CLI flag documentation, see the CLI Reference.