Troubleshooting

Common issues and how to resolve them.

Run Failures

Max Steps Exceeded

Symptom: Run fails with “max_steps_exceeded”

Cause: Agent couldn’t complete within step limit

Solutions:

Increase max_steps in agent constraints
Simplify the task
Break into multiple agents/workflows
Check for infinite loops in logic

Timeout

Symptom: Run fails with “timeout”

Cause: Run exceeded max_duration

Solutions:

Increase max_duration
Optimize slow tools
Check for external system delays
Review step complexity

Token Limit

Symptom: Run fails with “token_limit_exceeded”

Cause: LLM token usage exceeded limit

Solutions:

Increase token limit
Reduce context size
Simplify system prompt
Use smaller model for simple tasks

Tool Failures

Tool Timeout

Symptom: Tool call fails with timeout

Cause: External system too slow

Solutions:

Increase tool_timeout
Check external system health
Implement retry logic
Use async processing

Authentication Errors

Symptom: Tool fails with 401/403

Cause: Invalid or expired credentials

Solutions:

Refresh integration credentials
Check permission scopes
Verify API key validity
Review access policies

Rate Limiting

Symptom: Tool fails with 429

Cause: Too many requests to external system

Solutions:

Implement backoff/retry
Reduce request frequency
Batch operations
Request rate limit increase

HITL Issues

Approval Timeout

Symptom: Run cancelled due to approval timeout

Cause: No response within timeout period

Solutions:

Increase decision_timeout
Set up escalation chain
Configure backup approvers
Review notification delivery

Wrong Approver

Symptom: Approvals going to wrong people

Cause: Notification configuration issue

Solutions:

Review notification settings
Check role assignments
Verify channel configurations

Debugging Steps

1. Check Run Status


query GetRun($id: ID!) {
  run(id: $id) {
    status
    failureReason
    steps {
      status
      error
    }
  }
}

2. Review Traces

Look at trace timeline for:

Where execution stopped
What the agent was thinking
What tools were called
Error details

3. Verify Configuration

Check agent settings:

Constraints appropriate?
Tools available?
Model configured correctly?

4. Test Tools Individually

Verify tools work outside agent:

Check credentials
Test with sample inputs
Verify permissions

Common Error Messages

Error	Meaning	Action
`max_steps_exceeded`	Hit step limit	Increase limit or simplify
`timeout`	Hit duration limit	Increase timeout
`tool_not_found`	Tool doesn’t exist	Check tool name
`tool_not_permitted`	Tool not granted	Add capability
`invalid_input`	Bad input data	Check input format
`chain_verification_failed`	Audit issue	Contact support

Getting Help

If issues persist:

Export run details and traces
Check status page
Search documentation
Contact support with run ID