Activity Tracking vs. Diagnostics

Core Philosophy

Marcus employs a clear separation between activity tracking (recording what happened) and diagnostics (analyzing why it happened). This separation improves system maintainability, reduces false assumptions, and provides appropriate information to different audiences.

The Problem: Everything Mixed Together

Anti-Pattern: Diagnostic Logging Everywhere

A common mistake in distributed systems is mixing activity tracking with diagnostic analysis:

# ❌ BAD: Mixed concerns
def handle_task_request(agent_id):
    result = find_task_for_agent(agent_id)

    if not result:
        # Logging mixes WHAT with assumed WHY
        if has_dependency_keywords(error):
            logger.critical("DEPENDENCY ISSUE: ...")  # Assumption!
        elif has_busy_keywords(error):
            logger.warning("Agent busy: ...")
        else:
            logger.error("Unknown failure: ...")

        # Diagnostic analysis in logging code
        run_dependency_analysis()  # Wrong place!
        check_agent_skills()        # Wrong place!

    return result

Problems:

False assumptions: Keywords don't equal root cause
Mixed audiences: Operators need different info than agents
Maintenance burden: Changes require updating multiple locations
Poor separation: Activity tracking code becomes diagnostic code

Marcus Solution: Clear Separation

Two-Layer Design

┌─────────────────────────────────────────────────────┐
│                ACTIVITY TRACKING LAYER              │
│  Records: WHAT happened, WHEN, WHO was involved    │
│  Purpose: Index/table of contents for operations   │
│  Audience: Quick overview, correlation              │
│  Example: MCP Tool Logger                           │
│  Location: Conversation logs                        │
└─────────────────────────────────────────────────────┘
                        ↓
         Points to (when needed)
                        ↓
┌─────────────────────────────────────────────────────┐
│                 DIAGNOSTIC LAYER                    │
│  Analyzes: WHY it happened, root cause, context    │
│  Purpose: Deep investigation, problem solving       │
│  Audience: Operators fixing issues                  │
│  Example: Task Diagnostics, Dependency Analyzer     │
│  Location: Python logs, specialized reports         │
└─────────────────────────────────────────────────────┘

Activity Tracking Layer

Characteristics:

Simple: Records events as they happen
Factual: No interpretation or analysis
Fast: Minimal processing overhead
Consistent: Same format for all events
Indexable: Easy to search and correlate

Example: MCP Tool Logger

# ✅ GOOD: Just record the activity
def log_mcp_tool_response(tool_name, arguments, response):
    """Record WHAT failed and WHEN."""
    if response["success"]:
        logger.debug(f"Tool '{tool_name}' succeeded")
    else:
        logger.warning(
            f"Tool '{tool_name}' returned failure",
            tool_name=tool_name,
            arguments=arguments,
            error=response.get("error"),
            response=response,  # Full context preserved
        )

        # Point to diagnostics (don't run them here!)
        if tool_name == "request_next_task":
            logger.debug("Check Python logs for 'Diagnostic Report'")

Benefits:

✅ No assumptions about cause
✅ Consistent WARNING level
✅ Full context preserved
✅ Fast execution
✅ Easy to maintain

Diagnostic Layer

Characteristics:

Deep: Analyzes context and relationships
Specialized: Purpose-built for specific problems
Selective: Only runs when needed
Detailed: Provides actionable insights
Separate: Runs in appropriate context

Example: Task Diagnostics

# ✅ GOOD: Separate diagnostic system
async def run_automatic_diagnostics(project_tasks, completed_ids, assigned_ids):
    """
    Deep analysis of WHY tasks can't be assigned.

    Runs automatically when request_next_task fails,
    NOT during every tool call.
    """
    # Collect comprehensive data
    collector = TaskDiagnosticCollector(project_tasks)
    stats = collector.collect_filtering_stats(completed_ids, assigned_ids)

    # Analyze dependencies
    analyzer = DependencyChainAnalyzer(project_tasks)
    dependency_issues = analyzer.analyze_chains()

    # Analyze skills
    skill_mismatches = analyzer.analyze_skill_requirements()

    # Generate actionable report
    report = DiagnosticReportGenerator(
        project_tasks, stats, dependency_issues, skill_mismatches
    )

    # Log to Python logs (separate stream)
    logger.info(f"Diagnostic Report (for operators):\n{report.format()}")

    return report

Benefits:

✅ Accurate root cause analysis
✅ Runs in appropriate context
✅ Separate log stream
✅ Actionable recommendations
✅ Purpose-built for problem

Real-World Example: request_next_task

The Scenario

Agent calls request_next_task → receives {"success": false, "error": "No suitable tasks available"}

Why is it failing? (Multiple possible causes)

All tasks assigned to other agents
Dependencies blocking - tasks depend on incomplete work
Skill mismatch - agent lacks required skills
Circular dependencies - deadlock in task chain
Tasks filtered by other criteria (status, priority, etc.)

The MCP response doesn't tell us which!

Activity Tracking Records WHAT/WHEN

Conversation Log Entry:

{
  "timestamp": "2025-01-15T10:35:22.456Z",
  "level": "warning",
  "message": "MCP tool 'request_next_task' returned failure",
  "tool_name": "request_next_task",
  "arguments": {"agent_id": "agent_002"},
  "error": "No suitable tasks available"
}

What we know:

✅ WHAT: request_next_task failed
✅ WHEN: 10:35:22 on Jan 15
✅ WHO: agent_002
❌ WHY: Unknown (need diagnostics)

Diagnostics Analyze WHY

Automatic Trigger:

# In src/marcus_mcp/tools/task.py
if todo_tasks and not assignable_tasks:
    # Activity tracker already logged the failure
    # Now run diagnostics to understand WHY
    diagnostic_report = await run_automatic_diagnostics(...)

Python Log Entry:

2025-01-15 10:35:22,450 INFO - Diagnostic Report (for operators):
=== Task Assignment Diagnostics ===

Total Tasks: 5
TODO Tasks: 3
In Progress: 1
Completed: 1

Filtering Results:
- Started with: 3 TODO tasks
- After dependency filter: 0 tasks ← HERE'S WHY!
- After skill filter: 0 tasks
- After assignment filter: 0 tasks

Dependency Chain Analysis:
- Task "Implement API" (task_456) blocked by incomplete "Setup Database" (task_123)
- Task "Write Tests" (task_789) blocked by incomplete "Implement API" (task_456)

Root Cause: Dependencies blocking
Recommendation: Complete "Setup Database" to unblock chain

Now we know:

✅ WHY: Dependencies blocking
✅ WHICH tasks: Specific IDs and names
✅ WHAT to do: Complete task_123
✅ CONTEXT: Full dependency chain

Why Mixing is Bad: Real Examples

Example 1: Keyword-Based Categorization

# ❌ BAD: Activity tracker tries to diagnose
def log_failure(error_msg):
    if "dependency" in error_msg.lower():
        logger.critical("DEPENDENCY ISSUE!")  # Assumption!
        return "dependency_issue"

# Reality:
error_msg = "No suitable tasks available"
# Could be dependency issue, but also could be:
# - All tasks assigned
# - Skill mismatch
# - No tasks exist
# We can't tell from the message!

Problem: Keyword ≠ Root cause

Example 2: Wrong Log Level Escalation

# ❌ BAD: Escalate based on keywords
if "blocked by" in error:
    logger.critical("Critical dependency issue!")  # Misleading!
else:
    logger.warning("Normal failure")

# Reality:
# "blocked by" might appear in retry message
# But diagnostic shows: "Actually, all tasks are assigned"
# Operator sees CRITICAL but it's not a dependency issue!

Problem: False urgency, wasted investigation time

Example 3: Analysis in Wrong Place

# ❌ BAD: Diagnostic logic in logging code
def log_mcp_failure(tool_name, response):
    logger.warning(f"{tool_name} failed")

    # Diagnostic work in logging layer!
    if tool_name == "request_next_task":
        tasks = get_all_tasks()  # Expensive!
        deps = analyze_dependencies(tasks)  # More expensive!
        logger.info(f"Dependency analysis: {deps}")

# Problems:
# 1. Runs on EVERY failure (wasteful)
# 2. Mixed in logging code (wrong place)
# 3. Duplicate work (diagnostics run elsewhere too)
# 4. Can't leverage existing diagnostic context

Problem: Wrong layer, duplicate work, performance impact

Design Principles

1. Single Responsibility

Activity Tracking:

Records events
Preserves context
Points to diagnostics

Diagnostics:

Analyzes problems
Determines root cause
Recommends actions

Don't mix them!

2. Appropriate Timing

Activity Tracking:

Runs: Always (low overhead)
When: During/after operation
Fast: < 1ms

Diagnostics:

Runs: When needed (selective)
When: After failure detected
Thorough: May take seconds

3. Correct Audience

Activity Tracking:

For: Quick overview, correlation
Format: Structured logs, searchable
Location: Conversation logs (indexed)

Diagnostics:

For: Deep investigation
Format: Detailed reports
Location: Python logs (detailed)

4. No Assumptions

Activity Tracking:

Records facts only
No interpretation
No categorization
Full context preserved

Diagnostics:

Makes informed analysis
Uses full system context
Considers relationships
Provides evidence

Practical Benefits

For Operators

Quick Investigation:

# Step 1: What failed recently?
grep 'returned failure' logs/conversations/marcus_*.log | tail -10

# Step 2: Lots of request_next_task failures?
grep 'request_next_task.*failure' logs/conversations/marcus_*.log | wc -l

# Step 3: Why? Check diagnostics near that time
grep -A 30 'Diagnostic Report' logs/marcus_*.log | tail -50

Benefits:

✅ Fast triage (activity logs)
✅ Deep dive when needed (diagnostics)
✅ Clear correlation (timestamps)
✅ No false assumptions

For Developers

Maintainability:

# Activity tracker: Simple, stable
def log_activity(tool, result):
    """Just record what happened."""
    logger.log(level, message, **context)

# Diagnostics: Complex, evolving
class TaskDiagnostics:
    """Deep analysis, can evolve independently."""
    def analyze_dependencies(self): ...
    def analyze_skills(self): ...
    def generate_report(self): ...

Benefits:

✅ Clear separation of concerns
✅ Easy to test independently
✅ Can evolve separately
✅ Diagnostic complexity doesn't affect logging

For System Performance

Efficient Resource Usage:

# Activity logging: Always on, minimal cost
log_activity()  # < 1ms, always safe

# Diagnostics: Selective, when needed
if failure_needs_investigation:
    run_diagnostics()  # 10-100ms, but selective

Benefits:

✅ Low overhead for activity tracking
✅ Expensive analysis only when needed
✅ No performance impact on happy path

Anti-Patterns to Avoid

❌ Diagnostic Logic in Activity Tracking

# BAD: Mixing concerns
def log_tool_failure(tool_name, response):
    logger.warning(f"{tool_name} failed")

    # Don't do diagnostic work here!
    if "dependency" in str(response):  # Keyword matching
        category = "dependency_issue"  # Assumption!
        logger.critical("Dependency problem detected!")  # Wrong level!
        analyze_dependencies()  # Wrong place!

❌ Activity Tracking in Diagnostic Code

# BAD: Diagnostics shouldn't do activity logging
def run_diagnostics():
    # Diagnostic code
    deps = analyze_dependencies()

    # Don't log activity here!
    logger.warning("Tool failed because dependencies")  # Wrong place!

    return report

❌ Duplicate Information

# BAD: Logging same info in multiple places
def handle_failure():
    # Activity tracker logs it
    log_activity("tool_name", result)

    # Diagnostics also log the failure
    logger.warning("Tool failed")  # Duplicate!

    # Analysis repeats failure details
    report = f"Tool failed because..."  # More duplication!

Implementation Checklist

When building a new feature, ensure clear separation:

Activity Tracking Implementation

Records event occurrence (WHAT/WHEN)
Uses consistent log level
Preserves full context
No interpretation or analysis
Fast execution (< 1ms)
Points to diagnostics if available
Uses conversation logs (or appropriate stream)

Diagnostic Implementation

Runs selectively (when needed)
Analyzes root cause (WHY)
Provides actionable recommendations
Uses full system context
Separate log stream (Python logs, reports)
Can take time (thorough analysis)
Purpose-built for specific problems

Related Concepts

Activity Tracking in Marcus

MCP Tool Logger - Tracks MCP tool operations
Agent Event Logs - Tracks agent lifecycle
Conversation Logs - Tracks PM decisions, worker messages

Diagnostics in Marcus

Task Diagnostics - Analyzes task assignment failures
Dependency Analyzer - Analyzes dependency chains
Assignment Monitor - Detects assignment issues
Error Predictor - Predicts potential failures

Integration Patterns

Hybrid Monitoring - Activity + diagnostics working together
Correlation IDs - Linking activity to diagnostic reports
Layered Logging - Multiple log streams for different purposes

Conclusion

The separation between activity tracking and diagnostics is a fundamental design principle in Marcus. It ensures:

Appropriate information for different audiences
Clear responsibilities for each component
No false assumptions about failure causes
Efficient resource usage (selective expensive analysis)
Maintainability (concerns evolve independently)

When in doubt:

Activity tracking: Record what happened (facts only)
Diagnostics: Analyze why it happened (when needed)

Never mix them in the same code!

FilesExpand file tree

activity-tracking-vs-diagnostics.md

Latest commit

History

activity-tracking-vs-diagnostics.md

File metadata and controls

Activity Tracking vs. Diagnostics

Core Philosophy

The Problem: Everything Mixed Together

Anti-Pattern: Diagnostic Logging Everywhere

Marcus Solution: Clear Separation

Two-Layer Design

Activity Tracking Layer

Diagnostic Layer

Real-World Example: request_next_task

The Scenario

Why is it failing? (Multiple possible causes)

Activity Tracking Records WHAT/WHEN

Diagnostics Analyze WHY

Why Mixing is Bad: Real Examples

Example 1: Keyword-Based Categorization

Example 2: Wrong Log Level Escalation

Example 3: Analysis in Wrong Place

Design Principles

1. Single Responsibility

2. Appropriate Timing

3. Correct Audience

4. No Assumptions

Practical Benefits

For Operators

For Developers

For System Performance

Anti-Patterns to Avoid

❌ Diagnostic Logic in Activity Tracking

❌ Activity Tracking in Diagnostic Code

❌ Duplicate Information

Implementation Checklist

Activity Tracking Implementation

Diagnostic Implementation

Related Concepts

Activity Tracking in Marcus

Diagnostics in Marcus

Integration Patterns

Conclusion