Troubleshooting
Common issues and how to resolve them when working with cagent.
Common Errors
Context Window Exceeded
Error message: context_length_exceeded or similar.
- Use
/compactin the TUI to summarize and reduce conversation history - Set
num_history_itemsin agent config to limit messages sent to the model - Switch to a model with larger context (e.g., Claude 200K, Gemini 2M)
- Break large tasks into smaller conversations
Max Iterations Reached
The agent hit its max_iterations limit without completing the task.
- Increase
max_iterationsin agent config (default is unlimited, but many agents set 20-50) - Check if the agent is stuck in a loop (enable
--debugto see tool calls) - Break complex tasks into smaller steps
Model Fallback Triggered
When the primary model fails, cagent automatically switches to fallback models. Look for log messages like "Switching to fallback model".
- 429 errors: Rate limited — the cooldown period keeps using the fallback
- 5xx errors: Server issues — retries with exponential backoff first, then falls back
- 4xx errors: Client errors — skips directly to next model
Configure fallback behavior in your agent config:
agents:
root:
model: anthropic/claude-sonnet-4-0
fallback:
models: [openai/gpt-4o, openai/gpt-4o-mini]
retries: 2 # retries per model for 5xx errors
cooldown: 1m # how long to stick with fallback after 429
Debug Mode
The first step for any issue is enabling debug logging. This provides detailed information about what cagent is doing internally.
# Enable debug logging (writes to ~/.cagent/cagent.debug.log)
$ cagent run config.yaml --debug
# Write debug logs to a custom file
$ cagent run config.yaml --debug --log-file ./debug.log
# Enable OpenTelemetry tracing for deeper analysis
$ cagent run config.yaml --otel
Always enable --debug when reporting issues. The log file contains detailed traces of API calls, tool executions, and agent interactions.
Agent Not Responding
API keys not set
Each model provider requires its own API key as an environment variable:
| Provider | Environment Variable |
|---|---|
| OpenAI | OPENAI_API_KEY |
| Anthropic | ANTHROPIC_API_KEY |
| Google Gemini | GOOGLE_API_KEY |
| Mistral | MISTRAL_API_KEY |
| xAI | XAI_API_KEY |
| AWS Bedrock | AWS_BEARER_TOKEN_BEDROCK or AWS credentials chain |
# Verify your keys are set
$ env | grep API_KEY
Incorrect model name
Model names must match the provider’s naming exactly. Common mistakes:
- Using
gpt-4instead ofgpt-4o - Using a deprecated model name
- Model references are case-sensitive:
openai/gpt-4o≠openai/GPT-4o
Network connectivity
If the agent hangs or times out, check that you can reach the provider’s API endpoint. Firewalls, VPNs, or proxy settings may block requests.
Tool Execution Failures
MCP tools not found or failing
- Ensure the MCP tool command is installed and on your
PATH - Check file permissions — tools need to be executable
- Test MCP tools independently before integrating with cagent
- For Docker-based MCP tools (
ref: docker:*), ensure Docker Desktop is running
Filesystem / shell tool errors
- Verify the agent has the correct toolset configured (
type: filesystem,type: shell) - Check that the working directory exists and is accessible
- On macOS, ensure terminal has the necessary permissions (e.g., Full Disk Access)
Tool lifecycle issues
MCP tools using stdio transport must complete the initialization handshake before becoming available. If tools fail silently:
- Enable
--debugand look for MCP protocol messages in the log - Check that the MCP server process starts and responds to
initialize - Verify environment variables required by the tool are set (check
envandenv_filein the toolset config)
Configuration Errors
YAML syntax issues
cagent validates config at startup and reports errors with line numbers. Common problems:
- Incorrect indentation (YAML is whitespace-sensitive)
- Missing quotes around values containing special characters (
:,#,{,}) - Using tabs instead of spaces
Missing references
- All agents in
sub_agentsmust be defined in theagentssection - Named model references must exist in the
modelssection (or use inline format likeopenai/gpt-4o) - RAG source names referenced by agents must be defined in the
ragsection
Toolset validation
- The
pathfield is only valid formemorytoolsets - MCP toolsets need either
command(stdio),remote(SSE/HTTP), orref(Docker) - Provider names must be one of:
openai,anthropic,google,amazon-bedrock,dmr, etc.
Use the JSON schema in your editor for real-time config validation and autocompletion.
Session & Connectivity Issues
Port conflicts
When running cagent as an API server or MCP server, ensure the port is not already in use:
# Check if port 8080 is in use
$ lsof -i :8080
# Use a different port
$ cagent api config.yaml --listen :9090
MCP endpoint accessibility
For remote MCP servers, verify the endpoint is reachable:
# Test SSE endpoint
$ curl -v https://mcp-server.example.com/sse
Multi-tenant isolation
In API server mode, each client gets isolated sessions. If sessions are mixing up:
- Verify client IDs are unique per connection
- Check session timeouts and cleanup in debug logs
Performance Issues
High memory usage
- Large context windows (64K+ tokens) consume significant memory — consider reducing
max_tokens - Use
num_history_itemsin agent config to limit conversation history - For DMR (local models), tune
runtime_flagsfor your hardware (e.g.,--nglfor GPU layers)
Slow responses
- Check if MCP tools are adding latency (visible in debug logs)
- Use the
/costcommand in TUI to see token usage and identify expensive interactions - For DMR, consider enabling speculative decoding for faster inference
Tool resource leaks
Monitor for tools that don’t clean up properly — check debug logs for MCP server start/stop lifecycle events. Orphaned tool processes can consume system resources.
Agent Store Issues
Pull / push failures
# Test registry connectivity
$ docker pull docker.io/username/agent:latest
# Verify pulled agent content
$ cagent pull docker.io/username/agent:latest
Agent content issues
- Ensure the pushed YAML is valid — run
cagent runlocally before pushing - Check that referenced resources (MCP tools, files) are available on the target machine
- For auto-refresh (
--pull-interval), verify the registry is accessible from the server
Log Analysis
When reviewing debug logs, search for these key patterns:
| Log Pattern | What It Indicates |
|---|---|
"Starting runtime stream" |
Agent execution beginning |
"Tool call" |
A tool is being executed |
"Tool call result" |
Tool execution completed |
"Stream stopped" |
Agent finished processing |
HTTP 429 |
Rate limiting — consider adding a fallback model |
context canceled |
Operation was interrupted (timeout or user cancel) |
[RAG Manager] |
RAG retrieval operations |
[Reranker] |
Reranking operations |
If these steps don't resolve your issue, file a bug on the GitHub issue tracker with your debug log attached, or ask on Slack.