Monitoring Runs
Track agent execution in real-time and review historical performance.
Dashboard Overview
The monitoring dashboard shows:
┌─────────────────────────────────────────────────────────────────────┐
│ Monitoring Dashboard │
│ │
│ Active Runs: 3 Pending Approvals: 2 Failed Today: 1 │
│ │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ ACTIVE RUNS │
│ │
│ ● Employee Offboarding Step 4/8 ████████░░░░ 50% Running │
│ ● Customer Triage Step 2/5 ████░░░░░░░░ 40% Running │
│ ⏸ Invoice Processing Step 3/6 ██████░░░░░░ 50% Paused │
│ │
│ RECENT COMPLETIONS │
│ │
│ ✓ Support Ticket #1234 8 steps 45s Completed 2 min ago │
│ ✓ Data Export Request 3 steps 12s Completed 5 min ago │
│ ✗ Email Campaign 5 steps -- Failed 10 min ago │
│ │
└─────────────────────────────────────────────────────────────────────┘Run Detail View
Click any run to see details:
┌─────────────────────────────────────────────────────────────────────┐
│ Run: Employee Offboarding - Sarah Chen │
│ Status: Running │
│ │
│ Started: 2026-01-15 14:29:55 │
│ Duration: 3m 24s │
│ Steps: 4 of ~8 │
│ │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ STEP TIMELINE │
│ │
│ Step 1: Gather Information ✓ Complete 2.1s │
│ Step 2: Assess Risk ✓ Complete 1.8s │
│ Step 3: Revoke AWS Access ✓ Complete 3.5s │
│ └─ Approval: john@company.com ✓ Approved 45s │
│ Step 4: Revoke GitHub Access ● Running ... │
│ │
│ [View Traces] [Cancel Run] [Export] │
│ │
└─────────────────────────────────────────────────────────────────────┘Step Details
Expand any step to see TDAO phases:
┌─────────────────────────────────────────────────────────────────────┐
│ Step 3: Revoke AWS Access │
│ │
│ THINK 0.8s │
│ └─ "Prioritizing AWS due to high risk. Employee has admin │
│ access to production systems and is leaving for competitor." │
│ │
│ DECIDE 0.3s │
│ └─ Action: revoke_aws_access │
│ └─ Risk Score: 85 (HIGH) │
│ └─ Requires Approval: Yes │
│ │
│ ⏸ APPROVAL 45s │
│ └─ Requested: 14:32:15 │
│ └─ Approved by: john@company.com at 14:33:00 │
│ │
│ ACT 1.2s │
│ └─ Tool: okta_revoke_access │
│ └─ Result: Success (15 resources affected) │
│ │
│ OBSERVE 0.4s │
│ └─ "AWS access revoked. 15 resources no longer accessible." │
│ │
└─────────────────────────────────────────────────────────────────────┘Performance Metrics
Run Statistics
| Metric | Description |
|---|---|
| Duration | Total wall-clock time |
| Steps | Number of TDAO iterations |
| Tokens | LLM tokens consumed |
| Tools | Number of tool invocations |
| Wait Time | Time spent waiting for approvals |
Historical Trends
View trends over time:
- Average run duration
- Success/failure rate
- Token consumption
- Approval wait times
Alerts and Notifications
Configure alerts for:
- Run failures
- Long-running executions
- Approval timeouts
- Error patterns
alerts:
- condition: run_failed
notify: [slack:#agent-alerts, email:ops@company.com]
- condition: duration > 10m
notify: [slack:#agent-alerts]
- condition: approval_timeout
notify: [slack:#approvals, email:managers@company.com]Filtering and Search
Find specific runs:
- By agent
- By status
- By date range
- By user/initiator
- By input content