Skip to content

Feature: Observability Timeline with Tool Spans, Model Calls, Cost, and Validation Events #11

@hoangsonww

Description

@hoangsonww

Summary

Create an observability timeline that correlates model calls, tool spans, permission decisions, validation gates, state transitions, cost, cache hits, circuit-breaker behavior, and user prompts across a Forge task.

Problem / Opportunity

Forge has rich runtime pieces: structured logging, events, sessions, tasks, cost tracking, prompt cache, provider routing, rate limiting, circuit breakers, validation gates, and UI WebSocket streams. These are currently useful in isolation, but users need a unified task-level view to answer:

  • Where did this task spend time?
  • Which model/provider was called for each agent role?
  • Which tool action failed or retried?
  • Did validation fail because of lint, typecheck, or build?
  • How much did the task cost, and what was cached?
  • Did fallback routing or circuit breakers affect the outcome?

This feature would make Forge easier to debug, optimize, and trust.

Proposed Feature

Add an observability timeline for every task:

  • Structured spans for model calls, tool calls, permission prompts, validation runs, reviewer checks, and state transitions.
  • CLI output via forge task trace <taskId> with concise and verbose formats.
  • Dashboard timeline with filters for models, tools, validation, permissions, cost, and errors.
  • Export support using a local JSON trace format suitable for debugging and support.
  • Integration with existing logging and redaction so traces are informative but safe.

Scope

Expected implementation areas:

  • src/logging/trace.ts, src/logging/logger.ts, and src/persistence/events.ts.
  • src/models/router.ts, src/models/cost.ts, src/models/cache.ts, src/models/rate-limit.ts, and src/models/circuit-breaker.ts.
  • src/agents/executor.ts, src/core/loop.ts, and src/core/validation.ts.
  • src/tools/registry.ts and individual tool wrappers for span boundaries.
  • src/cli/commands/task.ts for trace output.
  • src/ui/server.ts and UI shell timeline views.
  • Unit tests for span emission, redaction, and timeline reconstruction.

Acceptance Criteria

  • Task traces include start/end timestamps, duration, status, and parent/child relationships for core spans.
  • Model spans include role, provider, model, cache hit status, cost, and fallback/circuit-breaker metadata where available.
  • Tool spans include tool name, risk/side effect class, permission outcome reference, duration, and success/failure.
  • Validation spans distinguish typecheck, lint, build, test, or configured validation command outcomes.
  • CLI trace output supports a concise human-readable view and a JSON export.
  • Dashboard timeline can filter by span type and highlight failures/retries.
  • Redaction is applied before traces are rendered or exported.
  • Tests verify trace reconstruction from persisted events without live provider/tool calls.

Non-Goals

  • Sending traces to a hosted observability vendor by default.
  • Capturing raw secrets, full prompts, or full command output in default traces.
  • Replacing the existing event log or logger; this should structure and correlate them.
  • Adding heavyweight telemetry dependencies without a clear need.

Dependencies / Risks

  • Span volume should be bounded so long tasks do not create excessive local storage.
  • Tool output must be summarized carefully to avoid leaking credentials or overwhelming the UI.
  • The trace schema should be versioned for compatibility with future replay/export tools.
  • Timing measurements need to be monotonic enough for ordering even when system time changes.

Open Questions

  • Should traces be stored in the same event log or a dedicated trace file/table?
  • Should verbose traces include summarized model prompts by default, or require an explicit flag?
  • What retention policy should apply to trace data?
  • Should trace export support Chrome trace format in addition to Forge JSON?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdocumentationImprovements or additions to documentationenhancementNew feature or requestfeatureFeature requesthelp wantedExtra attention is neededquestionFurther information is requested

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions