Feature: Observability Timeline with Tool Spans, Model Calls, Cost, and Validation Events

## Summary

Create an observability timeline that correlates model calls, tool spans, permission decisions, validation gates, state transitions, cost, cache hits, circuit-breaker behavior, and user prompts across a Forge task.

## Problem / Opportunity

Forge has rich runtime pieces: structured logging, events, sessions, tasks, cost tracking, prompt cache, provider routing, rate limiting, circuit breakers, validation gates, and UI WebSocket streams. These are currently useful in isolation, but users need a unified task-level view to answer:

- Where did this task spend time?
- Which model/provider was called for each agent role?
- Which tool action failed or retried?
- Did validation fail because of lint, typecheck, or build?
- How much did the task cost, and what was cached?
- Did fallback routing or circuit breakers affect the outcome?

This feature would make Forge easier to debug, optimize, and trust.

## Proposed Feature

Add an observability timeline for every task:

- Structured spans for model calls, tool calls, permission prompts, validation runs, reviewer checks, and state transitions.
- CLI output via `forge task trace <taskId>` with concise and verbose formats.
- Dashboard timeline with filters for models, tools, validation, permissions, cost, and errors.
- Export support using a local JSON trace format suitable for debugging and support.
- Integration with existing logging and redaction so traces are informative but safe.

## Scope

Expected implementation areas:

- `src/logging/trace.ts`, `src/logging/logger.ts`, and `src/persistence/events.ts`.
- `src/models/router.ts`, `src/models/cost.ts`, `src/models/cache.ts`, `src/models/rate-limit.ts`, and `src/models/circuit-breaker.ts`.
- `src/agents/executor.ts`, `src/core/loop.ts`, and `src/core/validation.ts`.
- `src/tools/registry.ts` and individual tool wrappers for span boundaries.
- `src/cli/commands/task.ts` for trace output.
- `src/ui/server.ts` and UI shell timeline views.
- Unit tests for span emission, redaction, and timeline reconstruction.

## Acceptance Criteria

- [ ] Task traces include start/end timestamps, duration, status, and parent/child relationships for core spans.
- [ ] Model spans include role, provider, model, cache hit status, cost, and fallback/circuit-breaker metadata where available.
- [ ] Tool spans include tool name, risk/side effect class, permission outcome reference, duration, and success/failure.
- [ ] Validation spans distinguish typecheck, lint, build, test, or configured validation command outcomes.
- [ ] CLI trace output supports a concise human-readable view and a JSON export.
- [ ] Dashboard timeline can filter by span type and highlight failures/retries.
- [ ] Redaction is applied before traces are rendered or exported.
- [ ] Tests verify trace reconstruction from persisted events without live provider/tool calls.

## Non-Goals

- Sending traces to a hosted observability vendor by default.
- Capturing raw secrets, full prompts, or full command output in default traces.
- Replacing the existing event log or logger; this should structure and correlate them.
- Adding heavyweight telemetry dependencies without a clear need.

## Dependencies / Risks

- Span volume should be bounded so long tasks do not create excessive local storage.
- Tool output must be summarized carefully to avoid leaking credentials or overwhelming the UI.
- The trace schema should be versioned for compatibility with future replay/export tools.
- Timing measurements need to be monotonic enough for ordering even when system time changes.

## Open Questions

- Should traces be stored in the same event log or a dedicated trace file/table?
- Should verbose traces include summarized model prompts by default, or require an explicit flag?
- What retention policy should apply to trace data?
- Should trace export support Chrome trace format in addition to Forge JSON?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Observability Timeline with Tool Spans, Model Calls, Cost, and Validation Events #11

Summary

Problem / Opportunity

Proposed Feature

Scope

Acceptance Criteria

Non-Goals

Dependencies / Risks

Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Observability Timeline with Tool Spans, Model Calls, Cost, and Validation Events #11

Description

Summary

Problem / Opportunity

Proposed Feature

Scope

Acceptance Criteria

Non-Goals

Dependencies / Risks

Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions