Chat Server

Expose your agents through an OpenAI-compatible Chat Completions API so any tool that already speaks OpenAI can drive a docker-agent agent.

Overview

The docker agent serve chat command starts an HTTP server that exposes one or more agents through an OpenAI-compatible Chat Completions API at /v1/chat/completions and /v1/models. Any client that already speaks the OpenAI protocol — for example Open WebUI, curl, the OpenAI Python SDK, or LangChain — can drive a docker-agent agent without any custom integration.

# Single agent — exposed as the model `root`
$ docker agent serve chat agent.yaml

# Multi-agent config — every agent in the team becomes a model
$ docker agent serve chat ./team.yaml

# Pick a specific agent from a multi-agent config
$ docker agent serve chat ./team.yaml --agent reviewer

# Run an agent straight from the registry
$ docker agent serve chat agentcatalog/pirate --listen 127.0.0.1:9090

# Require a Bearer token, sourced from an env var
$ docker agent serve chat agent.yaml --api-key-env CHAT_BEARER_TOKEN

When to use chat server vs. API server

Use the chat server when you want to plug docker-agent into existing OpenAI-compatible tooling (chat UIs, IDE integrations, OpenAI SDK clients). Use the API server when you want full control over sessions, agent execution, tool-call confirmations, and streamed runtime events.

Endpoints

The OpenAI-compatible endpoints live under the /v1 prefix to match the OpenAI API surface. The OpenAPI specification is served at the top level so it can be discovered without authentication.

Method	Path	Description
`GET`	`/v1/models`	List the agents that this server exposes as models
`POST`	`/v1/chat/completions`	Send messages and receive a completion (regular or streaming)
`GET`	`/openapi.json`	OpenAPI specification for the chat server

The model identifier in POST /v1/chat/completions is the agent name. For a single-agent config that’s typically root; for a multi-agent config, each named agent becomes its own selectable model.

Quick Start

# 1. Start the server
$ docker agent serve chat agent.yaml
Listening on 127.0.0.1:8083
OpenAI-compatible chat completions endpoint: http://127.0.0.1:8083/v1/chat/completions

# 2. List exposed agents (models)
$ curl http://127.0.0.1:8083/v1/models
{"object":"list","data":[{"id":"root","object":"model","owned_by":"docker-agent"}]}

# 3. Send a chat request
$ curl http://127.0.0.1:8083/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "root",
      "messages": [{"role": "user", "content": "Hello!"}]
    }'

Streaming

Set "stream": true in the request body to receive a Server-Sent Events (SSE) stream of OpenAI-format chat.completion.chunk deltas:

$ curl -N http://127.0.0.1:8083/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "root",
      "stream": true,
      "messages": [{"role": "user", "content": "Stream a poem"}]
    }'

Drive it from the OpenAI Python SDK

Because the wire format is OpenAI-compatible, point any OpenAI client at the chat server’s base_url and use the agent name as the model:

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8083/v1",
    api_key="not-needed-when-no-api-key-flag",  # required by the SDK, ignored if no auth
)

resp = client.chat.completions.create(
    model="root",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

Server-side Conversation Caching

By default the server is stateless: every request must contain the full message history, exactly like OpenAI’s API. Enable server-side caching by setting --conversations-max to a positive value, then send a stable X-Conversation-Id header on each request:

$ docker agent serve chat agent.yaml --conversations-max 100 --conversation-ttl 30m

$ curl http://127.0.0.1:8083/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -H 'X-Conversation-Id: my-thread-1' \
    -d '{
      "model": "root",
      "messages": [{"role": "user", "content": "Remember my name is Alice"}]
    }'

$ curl http://127.0.0.1:8083/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -H 'X-Conversation-Id: my-thread-1' \
    -d '{
      "model": "root",
      "messages": [{"role": "user", "content": "What is my name?"}]
    }'

Cached conversations are evicted after --conversation-ttl of inactivity, or when the cache hits --conversations-max items (oldest entries are evicted first).

Authentication

The chat server has no authentication by default. To require a Bearer token, pass --api-key (literal value) or --api-key-env (name of an environment variable that holds the value):

$ docker agent serve chat agent.yaml --api-key-env CHAT_BEARER_TOKEN

Clients must then send an Authorization: Bearer <token> header on every request to /v1/*. Both /v1/models and /v1/chat/completions are protected once a key is set.

Public exposure

The default listen address is 127.0.0.1:8083. If you bind to a non-loopback address, always set --api-key or --api-key-env — there is no other authentication layer.

CORS

CORS is disabled by default. To allow a browser-based client to call the server, set --cors-origin to the exact origin (scheme + host + port) that should be allowed:

$ docker agent serve chat agent.yaml --cors-origin https://my-ui.example.com

CLI Flags

docker agent serve chat <agent-file>|<registry-ref> [flags]

Flag	Default	Description
`-a, --agent <name>`	(all agents)	Name of the agent to expose. If omitted, every agent in the config is exposed as a separate model.
`-l, --listen <addr>`	`127.0.0.1:8083`	Address to listen on.
`--cors-origin <origin>`	(none)	Allowed CORS origin (e.g. `https://example.com`). Empty disables CORS.
`--api-key <token>`	(none)	Required Bearer token clients must present (`Authorization: Bearer <token>`). Empty disables auth.
`--api-key-env <name>`	(none)	Read the API key from this environment variable instead of the command line.
`--max-request-size <bytes>`	`1048576` (1 MiB)	Maximum request body size.
`--request-timeout <dur>`	`5m`	Per-request timeout (covers model + tool calls + streaming).
`--conversations-max <n>`	`0`	Cache up to N conversations server-side, keyed by `X-Conversation-Id`. `0` disables — clients must resend history.
`--conversation-ttl <dur>`	`30m`	Idle TTL after which a cached conversation is evicted.
`--max-idle-runtimes <n>`	`4`	Maximum number of idle runtimes pooled per agent. `0` disables pooling.

All runtime configuration flags (--working-dir, --env-from-file, --models-gateway, --hook-*, …) are also accepted.

Open WebUI Integration

Open WebUI can talk to any OpenAI-compatible endpoint. To plug docker-agent in:

Start the chat server, optionally with auth:

 $ docker agent serve chat agent.yaml \
     --listen 127.0.0.1:8083 \
     --cors-origin http://localhost:3000 \
     --api-key-env OPEN_WEBUI_TOKEN

In Open WebUI, add an OpenAI-compatible connection:
- API Base URL: http://127.0.0.1:8083/v1
- API Key: the value of OPEN_WEBUI_TOKEN
Each agent in your config appears as a selectable model.