Skip to content

Latest commit

 

History

History
790 lines (708 loc) · 46.8 KB

File metadata and controls

790 lines (708 loc) · 46.8 KB

Monitoring System

Part of: Page Flow Documentation

Design Reference: System health dashboard with Guardian/Conductor status, trajectory analysis, and intervention management.


Design Notes

Element Specification
Background Light gray #F8F9FA
Status Cards 120px width, status color border
Alignment Bar Gradient green→yellow→red based on score
Intervention Card White card with left border color by type
Trajectory Timeline Vertical timeline with status dots
Header Indicator 🛡️ icon with status color, always visible

Flow 38: System Health Dashboard

┌─────────────────────────────────────────────────────────────┐
│  PAGE: /health (System Health Dashboard)                     │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Header: Logo | [Command] [Projects] [Agents]        │   │
│  │          [Analytics] | 🛡️ | Search | Profile         │   │
│  │                        ^^^^ (Active - highlighted)    │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  System Health                       [Refresh] [⚙️]  │   │
│  │  Real-time monitoring status                         │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Status Overview                                     │   │
│  │                                                      │   │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
│  │  │ Guardian │  │ Conductor│  │ Agents   │  │ Overall  │ │
│  │  │ 🟢 Active│  │ 🟢 Active│  │ 5/5 OK   │  │ 94%      │ │
│  │  │ 12s ago  │  │ 45s ago  │  │ 0 stuck  │  │ Health   │ │
│  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘ │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Tabs: [Overview] [Trajectories] [Interventions]     │   │
│  │        [Insights] [Settings]                         │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Overview Tab (Default)                              │   │
│  │                                                      │   │
│  │  Monitoring Loop Status:                            │   │
│  │  ┌──────────────────────────────────────────────┐  │   │
│  │  │ • Last cycle: 12 seconds ago                 │  │   │
│  │  │ • Cycle interval: 60 seconds                 │  │   │
│  │  │ • Agents monitored: 5                        │  │   │
│  │  │ • Trajectories analyzed: 5                   │  │   │
│  │  │ • Interventions today: 3                     │  │   │
│  │  │ • Avg alignment score: 78%                   │  │   │
│  │  │ • Pattern matches: 2                         │  │   │
│  │  └──────────────────────────────────────────────┘  │   │
│  │                                                      │   │
│  │  Agent Alignment Summary:                           │   │
│  │  ┌──────────────────────────────────────────────┐  │   │
│  │  │ worker-1  ████████████████░░░░ 85% ✅        │  │   │
│  │  │ worker-2  ██████████████░░░░░░ 72% ⚠️        │  │   │
│  │  │ worker-3  ██████████████████░░ 91% ✅        │  │   │
│  │  │ worker-4  ████████████████░░░░ 80% ✅        │  │   │
│  │  │ worker-5  ██████████████████░░ 88% ✅        │  │   │
│  │  └──────────────────────────────────────────────┘  │   │
│  │                                                      │   │
│  │  Quick Actions:                                     │   │
│  │  [Pause All Monitoring] [Export Logs] [View Insights]│   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Status Card States:

  • Guardian: 🟢 Active / 🟡 Paused / 🔴 Error
  • Conductor: 🟢 Active / 🟡 Analyzing / 🔴 Conflict Detected
  • Agents: X/Y OK (count of healthy vs total)
  • Overall: Percentage health score (0-100%)

Overview Tab Features:

  • Real-time monitoring loop metrics
  • Agent alignment summary with visual bars
  • Quick actions for common operations
  • WebSocket updates every monitoring cycle

Flow 39: Trajectory Analysis View

┌─────────────────────────────────────────────────────────────┐
│  PAGE: /health/trajectories (or Trajectories Tab)           │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Tabs: [Overview] [Trajectories] [Interventions]     │   │
│  │        [Insights] [Settings]                         │   │
│  │                    ^^^^^^^^^^^^^ (Active)             │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Active Trajectory Analyses                          │   │
│  │                                                      │   │
│  │  Filter: [All ▼] [On Track] [Drifting] [Critical]   │   │
│  │  Sort: [Alignment ▼] [Last Check] [Agent Name]       │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │ Agent: worker-1                                │ │   │
│  │  │ Status: ✅ On Track                            │ │   │
│  │  │ Alignment: 85% ████████████████░░░░           │ │   │
│  │  │                                                │ │   │
│  │  │ Task: "Implement JWT authentication"           │ │   │
│  │  │ Phase: PHASE_IMPLEMENTATION                    │ │   │
│  │  │ Last Check: 5 seconds ago                      │ │   │
│  │  │                                                │ │   │
│  │  │ Active Constraints (2):                        │ │   │
│  │  │ • "Use Node.js crypto module only"            │ │   │
│  │  │ • "All endpoints must return JSON"            │ │   │
│  │  │                                                │ │   │
│  │  │ Mandatory Steps: 3/4 completed                 │ │   │
│  │  │ ☑ Analyze requirements                        │ │   │
│  │  │ ☑ Create implementation plan                  │ │   │
│  │  │ ☑ Write core functionality                    │ │   │
│  │  │ ☐ Write tests                                 │ │   │
│  │  │                                                │ │   │
│  │  │ [View Full Trajectory] [Send Intervention]    │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │ Agent: worker-2                                │ │   │
│  │  │ Status: ⚠️ Drifting                            │ │   │
│  │  │ Alignment: 72% ██████████████░░░░░░           │ │   │
│  │  │                                                │ │   │
│  │  │ Task: "Add OAuth2 configuration"               │ │   │
│  │  │ Phase: PHASE_IMPLEMENTATION                    │ │   │
│  │  │ Last Check: 8 seconds ago                      │ │   │
│  │  │                                                │ │   │
│  │  │ ⚠️ Drift Detected:                            │ │   │
│  │  │ Agent is working on test configuration        │ │   │
│  │  │ instead of OAuth2 setup                       │ │   │
│  │  │                                                │ │   │
│  │  │ Intervention: Auto-sending in 15s             │ │   │
│  │  │ ███████████░░░░░░░░░░ 15s remaining          │ │   │
│  │  │                                                │ │   │
│  │  │ [View Full Trajectory] [Send Now] [Cancel]    │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │ Agent: worker-3                                │ │   │
│  │  │ Status: ✅ On Track                            │ │   │
│  │  │ Alignment: 91% ██████████████████░░           │ │   │
│  │  │ ...                                           │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
└───────────────────────────┬──────────────────────────────────┘
                            │
                            │ Click "View Full Trajectory"
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│  MODAL: Full Trajectory Analysis                             │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Agent: worker-1 - Full Trajectory                   │   │
│  │                                                  [×] │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Alignment Over Time:                                │   │
│  │                                                      │   │
│  │  100% ┤                    ╭──────────────           │   │
│  │   90% ┤              ╭────╯                          │   │
│  │   80% ┤         ╭───╯                                │   │
│  │   70% ┤    ╭───╯   ← Intervention sent              │   │
│  │   60% ┤───╯                                          │   │
│  │       └────┴────┴────┴────┴────┴────┴────┴────       │   │
│  │        10m  8m   6m   4m   2m   now                  │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Trajectory Timeline:                                │   │
│  │                                                      │   │
│  │  ● 10m ago - Started task "Implement JWT"           │   │
│  │  │                                                   │   │
│  │  ● 8m ago - Analyzing requirements                  │   │
│  │  │          Alignment: 75%                          │   │
│  │  │                                                   │   │
│  │  ● 6m ago - Writing code                            │   │
│  │  │          Alignment: 68% ⚠️                       │   │
│  │  │                                                   │   │
│  │  ● 4m ago - 🛡️ Guardian intervention               │   │
│  │  │          "Focus on core auth flow first"         │   │
│  │  │                                                   │   │
│  │  ● 2m ago - Adjusted approach                       │   │
│  │  │          Alignment: 82% ✅                       │   │
│  │  │                                                   │   │
│  │  ● now - Testing implementation                     │   │
│  │          Alignment: 85% ✅                          │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Constraint Tracking:                                │   │
│  │                                                      │   │
│  │  Active Constraints:                                │   │
│  │  ┌──────────────────────────────────────────────┐  │   │
│  │  │ 1. "Use Node.js crypto module only"          │  │   │
│  │  │    Set: 10m ago │ Violations: 0              │  │   │
│  │  ├──────────────────────────────────────────────┤  │   │
│  │  │ 2. "All endpoints must return JSON"          │  │   │
│  │  │    Set: 8m ago │ Violations: 0               │  │   │
│  │  └──────────────────────────────────────────────┘  │   │
│  │                                                      │   │
│  │  [Send Intervention] [Export Trajectory] [Close]    │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Trajectory Card Features:

  • Real-time alignment score with visual bar
  • Current task and phase
  • Active constraints list
  • Mandatory steps checklist
  • Drift detection with reason
  • Auto-intervention countdown
  • Quick actions (View Full, Send Intervention)

Full Trajectory Modal:

  • Alignment over time chart
  • Event timeline with interventions
  • Constraint tracking history
  • Export and intervention options

Flow 40: Intervention Management

┌─────────────────────────────────────────────────────────────┐
│  PAGE: /health/interventions (or Interventions Tab)          │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Tabs: [Overview] [Trajectories] [Interventions]     │   │
│  │        [Insights] [Settings]                         │   │
│  │                              ^^^^^^^^^^^^^^^ (Active) │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Intervention History                   [Export] [+]  │   │
│  │                                                      │   │
│  │  Summary:                                           │   │
│  │  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐   │   │
│  │  │ 92%    │  │ 46     │  │ 2.3m   │  │ 4      │   │   │
│  │  │ Success│  │ Total  │  │ Avg    │  │ Failed │   │   │
│  │  │ Rate   │  │ Today  │  │ Recovery│  │        │   │   │
│  │  └────────┘  └────────┘  └────────┘  └────────┘   │   │
│  │                                                      │   │
│  │  Filter: [All Types ▼] [All Agents ▼] [Today ▼]     │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │ Today, 11:45 AM                    ✅ Success   │ │   │
│  │  │                                                │ │   │
│  │  │ Agent: worker-2                                │ │   │
│  │  │ Type: Refocus                                  │ │   │
│  │  │                                                │ │   │
│  │  │ Message:                                      │ │   │
│  │  │ "Focus on core authentication flow first.     │ │   │
│  │  │  You appear to be implementing tests before   │ │   │
│  │  │  the core functionality is complete."         │ │   │
│  │  │                                                │ │   │
│  │  │ Trigger: Alignment dropped to 45%             │ │   │
│  │  │ Recovery: 2.1 minutes                         │ │   │
│  │  │ Alignment: 45% → 85%                          │ │   │
│  │  │                                                │ │   │
│  │  │ [View Agent] [View Trajectory] [Copy Message] │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │ Today, 10:30 AM                    ✅ Success   │ │   │
│  │  │                                                │ │   │
│  │  │ Agent: worker-1                                │ │   │
│  │  │ Type: Prioritize                               │ │   │
│  │  │                                                │ │   │
│  │  │ Message:                                      │ │   │
│  │  │ "Complete tests before moving on. The phase   │ │   │
│  │  │  instructions require tests for all new code."│ │   │
│  │  │                                                │ │   │
│  │  │ Trigger: Skipped mandatory step               │ │   │
│  │  │ Recovery: 1.5 minutes                         │ │   │
│  │  │ Alignment: 68% → 91%                          │ │   │
│  │  │                                                │ │   │
│  │  │ [View Agent] [View Trajectory] [Copy Message] │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │ Today, 09:15 AM                    ❌ Failed    │ │   │
│  │  │                                                │ │   │
│  │  │ Agent: worker-4                                │ │   │
│  │  │ Type: Stop                                     │ │   │
│  │  │                                                │ │   │
│  │  │ Message:                                      │ │   │
│  │  │ "Stop current path. You are violating the     │ │   │
│  │  │  constraint: 'no external auth libraries'"    │ │   │
│  │  │                                                │ │   │
│  │  │ Trigger: Constraint violation detected        │ │   │
│  │  │ Recovery: Failed after 5 minutes              │ │   │
│  │  │ Alignment: 42% → 38%                          │ │   │
│  │  │                                                │ │   │
│  │  │ ⚠️ Manual intervention required               │ │   │
│  │  │ [View Agent] [Send Manual Intervention]       │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  │                                                      │   │
│  │  [Load More...]                                      │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
└───────────────────────────┬──────────────────────────────────┘
                            │
                            │ Click [+] or "Send Manual Intervention"
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│  MODAL: Send Manual Intervention                             │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Send Intervention                              [×]  │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Target Agent: [worker-4 ▼]                          │   │
│  │                                                      │   │
│  │  Intervention Type:                                 │   │
│  │  ○ Prioritize - Focus on specific area             │   │
│  │  ● Refocus - Change direction                      │   │
│  │  ○ Stop - Halt current work                        │   │
│  │  ○ Add Constraint - Add new requirement            │   │
│  │  ○ Status Reminder - Request status update         │   │
│  │                                                      │   │
│  │  Message:                                           │   │
│  │  ┌──────────────────────────────────────────────┐  │   │
│  │  │ Stop using the 'jsonwebtoken' library.       │  │   │
│  │  │ Use Node.js built-in crypto module instead.  │  │   │
│  │  │ The constraint "no external auth libraries"  │  │   │
│  │  │ was set at the start of this task.           │  │   │
│  │  │                                              │  │   │
│  │  └──────────────────────────────────────────────┘  │   │
│  │                                                      │   │
│  │  Preview:                                           │   │
│  │  ┌──────────────────────────────────────────────┐  │   │
│  │  │ [GUARDIAN INTERVENTION]                      │  │   │
│  │  │                                              │  │   │
│  │  │ Stop using the 'jsonwebtoken' library...     │  │   │
│  │  └──────────────────────────────────────────────┘  │   │
│  │                                                      │   │
│  │  [Cancel] [Send Intervention]                       │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Intervention Summary Cards:

  • Success rate percentage
  • Total interventions today
  • Average recovery time
  • Failed interventions count

Intervention History Features:

  • Filterable by type, agent, date
  • Shows trigger reason and outcome
  • Alignment before/after comparison
  • Recovery time tracking
  • Quick actions for follow-up

Intervention Types:

Type Icon Description
Prioritize 📌 Focus on specific area
Refocus 🎯 Change direction
Stop Halt current work
Add Constraint Add new requirement
Status Reminder 📋 Request status update

Flow 41: Monitoring Configuration

┌─────────────────────────────────────────────────────────────┐
│  PAGE: /health/settings (or Settings Tab)                    │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Tabs: [Overview] [Trajectories] [Interventions]     │   │
│  │        [Insights] [Settings]                         │   │
│  │                    ^^^^^^^^ (Active)                  │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Monitoring Configuration                            │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │  Guardian Settings                             │ │   │
│  │  │                                                │ │   │
│  │  │  Monitoring Cycle Interval:                   │ │   │
│  │  │  [60] seconds                                 │ │   │
│  │  │  ℹ️ How often Guardian analyzes trajectories   │ │   │
│  │  │                                                │ │   │
│  │  │  Alignment Threshold:                         │ │   │
│  │  │  [70] %                                       │ │   │
│  │  │  ℹ️ Below this, intervention is triggered     │ │   │
│  │  │                                                │ │   │
│  │  │  Auto-Intervention:                           │ │   │
│  │  │  [✓] Enabled                                  │ │   │
│  │  │  ℹ️ Automatically send interventions          │ │   │
│  │  │                                                │ │   │
│  │  │  Intervention Delay:                          │ │   │
│  │  │  [0] seconds after threshold breach           │ │   │
│  │  │  ℹ️ Wait before sending intervention          │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │  Conductor Settings                            │ │   │
│  │  │                                                │ │   │
│  │  │  Coherence Check Interval:                    │ │   │
│  │  │  [120] seconds                                │ │   │
│  │  │                                                │ │   │
│  │  │  Duplicate Detection:                         │ │   │
│  │  │  [✓] Enabled                                  │ │   │
│  │  │                                                │ │   │
│  │  │  Conflict Resolution:                         │ │   │
│  │  │  ● Auto  ○ Manual                             │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │  Notification Preferences                      │ │   │
│  │  │                                                │ │   │
│  │  │  [✓] Alignment drops below threshold          │ │   │
│  │  │  [✓] Intervention sent                        │ │   │
│  │  │  [✓] Agent stuck detected                     │ │   │
│  │  │  [✓] Constraint violation detected            │ │   │
│  │  │  [ ] Every monitoring cycle completed         │ │   │
│  │  │  [✓] Critical issues only                     │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │  Pattern Learning                              │ │   │
│  │  │                                                │ │   │
│  │  │  [✓] Enable adaptive threshold adjustment     │ │   │
│  │  │  ℹ️ System adjusts thresholds based on outcomes│ │   │
│  │  │                                                │ │   │
│  │  │  [✓] Share patterns across projects           │ │   │
│  │  │  ℹ️ Learn from other projects in organization │ │   │
│  │  │                                                │ │   │
│  │  │  Pattern retention:                           │ │   │
│  │  │  [30] days                                    │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  │                                                      │   │
│  │  [Save Changes] [Reset to Defaults]                 │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Configuration Sections:

  • Guardian Settings: Cycle interval, alignment threshold, auto-intervention
  • Conductor Settings: Coherence check interval, duplicate detection, conflict resolution
  • Notification Preferences: Which events to be notified about
  • Pattern Learning: Adaptive thresholds, cross-project learning, retention period

API Endpoints

System Health APIs

GET  /api/v1/watchdog/monitor-status      # Monitor agent health summary
POST /api/v1/watchdog/execute-remediation # Apply remediation policy to agent
GET  /api/v1/watchdog/remediation-history # Remediation history (agent_id?, limit)
GET  /api/v1/watchdog/policies            # Loaded remediation policies

POST /api/validation/give_review          # Submit validation review for task
POST /api/validation/spawn_validator      # Spawn validator agent for task
POST /api/validation/send_feedback        # Deliver validation feedback to agent
GET  /api/validation/status               # Validation status for task (task_id)

WS   /api/v1/ws/events                    # Event stream (filters: event_types, entity_types, entity_ids)

WebSocket Events

MONITORING_CYCLE_COMPLETE    # Guardian completed analysis cycle
ALIGNMENT_CHANGED            # Agent alignment score changed
STEERING_ISSUED              # Intervention sent to agent
INTERVENTION_RESULT          # Intervention success/failure
TRAJECTORY_UPDATE            # Real-time trajectory update
COHERENCE_UPDATE             # Conductor analysis result
PATTERN_LEARNED              # New pattern stored
STUCK_DETECTED               # Agent appears stuck
CONSTRAINT_VIOLATION         # Agent violated constraint

Response Examples

# WATCHDOG (prefix /api/v1/watchdog)
GET  /api/v1/watchdog/monitor-status
  Returns: { issues: [...], total_monitors, healthy_monitors, unhealthy_monitors }

POST /api/v1/watchdog/execute-remediation
  Body: { agent_id, policy_name, reason, watchdog_agent_id }
  Returns: WatchdogActionDTO (action_id, action_type, target_agent_id, remediation_policy, reason, initiated_by, executed_at?, success, escalated_to_guardian, guardian_action_id?, audit_log?, created_at?)

GET  /api/v1/watchdog/remediation-history
  Query: agent_id? string, limit int (1-200, default 50)
  Returns: [WatchdogActionDTO]

GET  /api/v1/watchdog/policies
  Returns: { policy_name: policy_config }

# VALIDATION (prefix /api/validation)
POST /api/validation/give_review
  Body: { task_id, validator_agent_id, validation_passed: bool, feedback: str, evidence?: dict, recommendations?: list[str] }
  Returns: { status, message, iteration }

POST /api/validation/spawn_validator
  Body: { task_id, commit_sha?: str }
  Returns: { validator_agent_id }

POST /api/validation/send_feedback
  Body: { agent_id, feedback: str }
  Returns: { delivered: bool }

GET  /api/validation/status
  Query: task_id (required)
  Returns: { task_id, state, iteration, review_done, last_feedback? }

# EVENTS (WebSocket, prefix /api/v1)
WS   /api/v1/ws/events
  Query: event_types?, entity_types?, entity_ids? (comma-separated)
  Client may send {"type": "subscribe", ...} to update filters
  Messages: { event_type, entity_type, entity_id, payload }

Alerts API

System alerts for monitoring threshold breaches, anomalies, and critical events.

GET /api/v1/alerts

Description: List active alerts with optional filtering

Query Params:

  • severity (optional): Filter by severity (critical, warning, info)
  • limit (default: 100): Maximum alerts to return

Response (200):

[
  {
    "id": "uuid",
    "alert_type": "budget_exceeded",
    "severity": "warning",
    "title": "Budget threshold reached",
    "message": "Ticket AUTH-042 has reached 90% of budget limit",
    "entity_type": "ticket",
    "entity_id": "ticket-uuid",
    "acknowledged": false,
    "resolved": false,
    "created_at": "2025-01-15T10:30:00Z"
  }
]

GET /api/v1/alerts/{alert_id}

Description: Get specific alert details

Path Params: alert_id (uuid)

Response (200):

{
  "id": "uuid",
  "alert_type": "alignment_drop",
  "severity": "critical",
  "title": "Agent alignment dropped below threshold",
  "message": "worker-5 alignment dropped to 45% (threshold: 70%)",
  "entity_type": "agent",
  "entity_id": "worker-5",
  "metadata": {
    "previous_alignment": 82,
    "current_alignment": 45,
    "threshold": 70
  },
  "acknowledged": false,
  "acknowledged_by": null,
  "acknowledged_at": null,
  "resolved": false,
  "resolved_by": null,
  "resolved_at": null,
  "created_at": "2025-01-15T10:30:00Z"
}

POST /api/v1/alerts/{alert_id}/acknowledge

Description: Acknowledge an alert

Path Params: alert_id (uuid)

Request Body:

{
  "acknowledged_by": "user-uuid",
  "notes": "Investigating the issue"
}

Response (200):

{
  "id": "uuid",
  "acknowledged": true,
  "acknowledged_by": "user-uuid",
  "acknowledged_at": "2025-01-15T10:35:00Z"
}

POST /api/v1/alerts/{alert_id}/resolve

Description: Resolve an alert

Path Params: alert_id (uuid)

Request Body:

{
  "resolved_by": "user-uuid",
  "resolution": "Increased budget limit and agent realigned"
}

Response (200):

{
  "id": "uuid",
  "resolved": true,
  "resolved_by": "user-uuid",
  "resolved_at": "2025-01-15T11:00:00Z",
  "resolution": "Increased budget limit and agent realigned"
}

GET /api/v1/alerts/rules

Description: List configured alert rules

Response (200):

[
  {
    "rule_id": "uuid",
    "name": "Budget Warning",
    "condition": "budget_utilization > 80%",
    "severity": "warning",
    "enabled": true
  },
  {
    "rule_id": "uuid",
    "name": "Alignment Critical",
    "condition": "alignment_score < 50%",
    "severity": "critical",
    "enabled": true
  }
]

WebSocket Events (Expanded)

Connection

WS /api/v1/ws/events?event_types=TASK_ASSIGNED,TASK_COMPLETED&entity_types=task,ticket

Query Parameters

  • event_types: Comma-separated event types to subscribe to
  • entity_types: Comma-separated entity types to filter
  • entity_ids: Comma-separated specific entity IDs to watch

Dynamic Subscription

Clients can update subscriptions by sending:

{
  "type": "subscribe",
  "event_types": ["ALIGNMENT_CHANGED", "STEERING_ISSUED"],
  "entity_types": ["agent"],
  "entity_ids": ["worker-5", "worker-3"]
}

Server responds:

{
  "status": "subscribed",
  "filters": {
    "event_types": ["ALIGNMENT_CHANGED", "STEERING_ISSUED"],
    "entity_types": ["agent"],
    "entity_ids": ["worker-5", "worker-3"]
  }
}

Event Message Format

{
  "event_type": "ALIGNMENT_CHANGED",
  "entity_type": "agent",
  "entity_id": "worker-5",
  "payload": {
    "previous_alignment": 82,
    "current_alignment": 68,
    "threshold": 70,
    "at_risk": true
  }
}

Keep-Alive

Server sends ping every 30 seconds:

{"type": "ping"}

Event Types Reference

Event Type Entity Type Description
TASK_CREATED task New task created
TASK_ASSIGNED task Task assigned to agent
TASK_COMPLETED task Task completed
TASK_FAILED task Task failed
TICKET_CREATED ticket New ticket created
TICKET_TRANSITIONED ticket Ticket status changed
AGENT_REGISTERED agent New agent registered
AGENT_HEARTBEAT agent Agent heartbeat received
ALIGNMENT_CHANGED agent Agent alignment score changed
STEERING_ISSUED agent Intervention sent to agent
INTERVENTION_RESULT agent Intervention outcome
MONITORING_CYCLE_COMPLETE system Guardian completed analysis
COHERENCE_UPDATE system Conductor analysis result
PATTERN_LEARNED memory New pattern stored
STUCK_DETECTED agent Agent appears stuck
CONSTRAINT_VIOLATION agent Agent violated constraint
ALERT_CREATED alert New alert generated
ALERT_RESOLVED alert Alert resolved

Related Documentation


Next: See README.md for complete documentation index.