Troubleshooting

Common issues and how to resolve them when working with cagent.

Common Errors

Context Window Exceeded

Error message: context_length_exceeded or similar.

Max Iterations Reached

The agent hit its max_iterations limit without completing the task.

Model Fallback Triggered

When the primary model fails, cagent automatically switches to fallback models. Look for log messages like "Switching to fallback model".

Configure fallback behavior in your agent config:

agents:
  root:
    model: anthropic/claude-sonnet-4-0
    fallback:
      models: [openai/gpt-4o, openai/gpt-4o-mini]
      retries: 2 # retries per model for 5xx errors
      cooldown: 1m # how long to stick with fallback after 429

Debug Mode

The first step for any issue is enabling debug logging. This provides detailed information about what cagent is doing internally.

# Enable debug logging (writes to ~/.cagent/cagent.debug.log)
$ cagent run config.yaml --debug

# Write debug logs to a custom file
$ cagent run config.yaml --debug --log-file ./debug.log

# Enable OpenTelemetry tracing for deeper analysis
$ cagent run config.yaml --otel
💡 Tip

Always enable --debug when reporting issues. The log file contains detailed traces of API calls, tool executions, and agent interactions.

Agent Not Responding

API keys not set

Each model provider requires its own API key as an environment variable:

Provider Environment Variable
OpenAI OPENAI_API_KEY
Anthropic ANTHROPIC_API_KEY
Google Gemini GOOGLE_API_KEY
Mistral MISTRAL_API_KEY
xAI XAI_API_KEY
AWS Bedrock AWS_BEARER_TOKEN_BEDROCK or AWS credentials chain
# Verify your keys are set
$ env | grep API_KEY

Incorrect model name

Model names must match the provider’s naming exactly. Common mistakes:

Network connectivity

If the agent hangs or times out, check that you can reach the provider’s API endpoint. Firewalls, VPNs, or proxy settings may block requests.

Tool Execution Failures

MCP tools not found or failing

Filesystem / shell tool errors

Tool lifecycle issues

MCP tools using stdio transport must complete the initialization handshake before becoming available. If tools fail silently:

  1. Enable --debug and look for MCP protocol messages in the log
  2. Check that the MCP server process starts and responds to initialize
  3. Verify environment variables required by the tool are set (check env and env_file in the toolset config)

Configuration Errors

YAML syntax issues

cagent validates config at startup and reports errors with line numbers. Common problems:

Missing references

Toolset validation

ℹ️ Schema Validation

Use the JSON schema in your editor for real-time config validation and autocompletion.

Session & Connectivity Issues

Port conflicts

When running cagent as an API server or MCP server, ensure the port is not already in use:

# Check if port 8080 is in use
$ lsof -i :8080

# Use a different port
$ cagent api config.yaml --listen :9090

MCP endpoint accessibility

For remote MCP servers, verify the endpoint is reachable:

# Test SSE endpoint
$ curl -v https://mcp-server.example.com/sse

Multi-tenant isolation

In API server mode, each client gets isolated sessions. If sessions are mixing up:

Performance Issues

High memory usage

Slow responses

Tool resource leaks

Monitor for tools that don’t clean up properly — check debug logs for MCP server start/stop lifecycle events. Orphaned tool processes can consume system resources.

Agent Store Issues

Pull / push failures

# Test registry connectivity
$ docker pull docker.io/username/agent:latest

# Verify pulled agent content
$ cagent pull docker.io/username/agent:latest

Agent content issues

Log Analysis

When reviewing debug logs, search for these key patterns:

Log Pattern What It Indicates
"Starting runtime stream" Agent execution beginning
"Tool call" A tool is being executed
"Tool call result" Tool execution completed
"Stream stopped" Agent finished processing
HTTP 429 Rate limiting — consider adding a fallback model
context canceled Operation was interrupted (timeout or user cancel)
[RAG Manager] RAG retrieval operations
[Reranker] Reranking operations
⚠️ Still stuck?

If these steps don't resolve your issue, file a bug on the GitHub issue tracker with your debug log attached, or ask on Slack.