Skip to content

Feat/complete tool timeline#2807

Open
jaikoo wants to merge 43 commits intoultraworkers:mainfrom
deep-thinking-llc:feat/complete-tool-timeline
Open

Feat/complete tool timeline#2807
jaikoo wants to merge 43 commits intoultraworkers:mainfrom
deep-thinking-llc:feat/complete-tool-timeline

Conversation

@jaikoo
Copy link
Copy Markdown

@jaikoo jaikoo commented Apr 26, 2026

No description provided.

code-yeongyu and others added 30 commits April 25, 2026 18:10
Extract 88 format/reporting functions into format/ submodules:
- format/tool_fmt.rs: tool call/result formatting, truncation
- format/status.rs: status reports, git workspace summary
- format/model.rs: model aliases, provenance, resolution
- format/permissions.rs: permission mode parsing/reporting
- format/sessions.rs: session management, history formatting
- format/cost.rs: cost and compact reports
- format/errors.rs: error classification, suggestions
- format/slash_help.rs: help rendering, completions

main.rs reduced from 13,106 to 11,119 lines (-1,987).
All 213 tests pass (179 unit + 34 e2e).
… status bar

Phase 0.2-0.6: Extracted ~7,300 lines from main.rs (13,106 → 5,776 lines)
- app.rs: LiveCli, BuiltRuntime, RuntimeMcpState, CliPermissionPrompter,
  CliToolExecutor, AnthropicRuntimeClient, build_runtime, run_repl
- args.rs: CliAction, CliOutputFormat, parse_args
- cli_commands.rs: 50+ subcommand runners (doctor, resume, export, diff, etc.)
- Made shared types pub(crate), fixed cross-module references

Phase 1: Added tui/status_bar.rs with StatusBarState and StatusBar
- Raw ANSI escape sequences for trait-object compatibility
- Renders model, permission mode, message count, tokens, cost, elapsed time
- Wired into consume_stream on MessageDelta (Usage) events
- Unit tests for truncation, formatting, render output

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…n-tree/agent-context primitives

Implements five phases of missing primitives for autonomous AI coding harness:

Phase 1: Runtime-loaded models.json config (~/.claw/models.json, .claw/models.json)
- Custom providers with base URL, API key (literal or env var), and model definitions
- Provider-prefixed lookup (e.g. ollama/llama3.1:8b) and bare ID matching
- Merged discovery: user-level + project-level providers coexist; same-key project overrides user
- Hooks into metadata_for_model, detect_provider_kind, max_tokens_for_model, model_token_limit
- Custom provider routing in ProviderClient respects api field (anthropic-messages vs openai-completions)

Phase 2: SDK crate with AgentSession, EventBus, SessionManager, ToolRegistry
- AgentSession wraps ConversationRuntime with event-driven lifecycle
- EventBus provides multi-subscriber broadcast channels for session events
- SessionManager handles CRUD for persisted sessions
- ToolRegistry and SdkToolExecutor for tool registration/execution stubs

Phase 3: Extension system with Extension trait, ExtensionRegistry, SimpleExtension

Phase 4: SessionTree with branching/forking/navigation using single-source-of-truth BTreeMap
- Children stored as ID references (not duplicated node data)
- Fork at any node, navigate between branches

Phase 5: AgentContext (thread-safe KV store), AgentTask, TaskRegistry, SessionAgent
- Inter-agent communication via shared AgentContext
- Task lifecycle management with completion/failure tracking

Includes 7 e2e tests for models_file, 27 SDK unit tests, 4 models_file unit tests.
All 1,072 workspace tests pass.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
feat: add runtime models.json config, SDK crate, and extension/session-tree/agent-context primitives
Replace direct truncate_output_for_display calls with collapse_tool_output()
from tui/tool_panel module. Tool output now defaults to 10 visible lines
(down from 60) per the TUI design doc. Includes DISPLAY_TRUNCATION_NOTICE
appended to collapsed output, ToolDisplayConfig for customization.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Reverse-engineered from the implemented codebase and pi-mono reference
research. Covers all 5 phases (models.json, SDK, extensions, session
tree, inter-agent comms), architectural comparison table, design
decisions, remaining gaps, and test coverage summary.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add tui/permission.rs with:
- describe_tool_action() — plain-English descriptions per tool type
- format_enhanced_permission_prompt() — box-drawing borders, ANSI styling
- parse_permission_response() — parses y/n/a/v responses
- PermissionDecision enum (Allow, Deny, AllowAll, ViewInput)
- 11 unit tests

Update CliPermissionPrompter in app.rs:
- Use enhanced prompt for display
- Support 'a' (allow all) — sets approve_all flag
- Support 'v' (view input) — prints raw input, then re-prompts

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add tui/diff_view.rs with:
- parse_unified_diff() — git unified diff parser (DiffLine enum)
- render_colored_diff() — green additions, red deletions, cyan hunk headers
- render_diff_summary() — file-level +/- counts
- format_colored_diff() — full colored diff with summary header
- DiffCounts, count_diff_lines(), count_diff_files()
- 9 unit tests

Wire into cli_commands.rs:
- render_diff_report_for() now uses format_colored_diff()
- Staged/unstaged sections get ANSI coloring

Update mock_parity_harness assertions for new prompt text.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add tui/thinking.rs with:
- ThinkingFrames — infinite cycled dot-wave animation frames (magenta)
- format_thinking_completed() — static 'Reasoned for X.Xs' line
- render_thinking_inline() — dim-colored reasoning indicator
- 5 unit tests

Update render_thinking_block_summary() in app.rs to delegate to the
new module, adding magenta ANSI coloring to all thinking summaries.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replace monolithic startup_banner() with BannerStyle enum and dispatch:
- BannerStyle::Full — original ASCII art (opt-in)
- BannerStyle::Compact — 2-line banner (default)
- BannerStyle::None — empty banner (opt-out)

Add BannerStyle::from_config() for future config file integration.
Add compact_banner() and full_banner() methods to LiveCli.
All 220 tests pass.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add tui/terminal.rs with:
- TerminalSize — thread-safe terminal-dimension tracker
- Periodic polling (1s interval) via crossterm::terminal::size()
- invalidate() for force-refresh on next read
- AtomicU16 storage, no locks on hot path (read path is lock-free)

StatusBar already consumes terminal_width; TerminalSize provides
a reusable shared instance for all TUI components.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add tui/timeline.rs with:
- ToolCallTimeline — accumulator for tool events during a turn
- ToolCallEvent — per-call metadata (step, name, timing, error, truncation)
- start_tool() / complete_tool() builder API
- render() — numbered timeline with elapsed time and line counts
- 6 unit tests

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Remove inherited upstream documentation (PHILOSOPHY.md, PARITY.md,
USAGE.md, ROADMAP.md, prd.json, progress.txt, container.md,
MODEL_COMPATIBILITY.md) — all still available in git history.

Replace with project-aligned docs:
- README.md: agent-first harness overview, quick start, architecture
- docs/ROADMAP.md: 6-phase plan (SDK, agent integration, human DX,
  orchestration, security, developer experience)
- docs/AGENT-INTEGRATION.md: SDK usage, CLI patterns, planned RPC mode,
  event types, model configuration
- docs/HUMAN-DX.md: review workflows, notification strategy, auto-expiring
  demo deployments, tailscale integration, orchestrator interface

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add tui/theme.rs with:
- Theme struct containing all semantic ANSI color constants
- DIM, SUCCESS, ERROR, HIGHLIGHT, THINKING, WARNING, MUTED, etc.
- Composite helpers: truncation_notice(), permission_border()
- 2 unit tests verifying non-empty constants and truncation notice format
- Single import point for all TUI modules to use consistent colors

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Wire ToolCallTimeline into consume_stream: start_tool on ContentBlockStop,
  render at stream end when tools were used
- Replace direct crossterm::terminal::size() with TerminalSize tracker
  for periodic resize checking
- Import Theme in diff_view.rs, status_bar.rs, tool_panel.rs
- Use Theme::status_bar_fg() in StatusBar::render

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Implements `claw --mode rpc` (SDK layer, CLI wiring pending):
- JSON-RPC 2.0 request/response protocol over stdin/stdout
- Methods: session.create, session.turn, session.list, session.destroy
- Session tree ops: session.tree.fork, session.tree.navigate, session.tree.path
- Event subscription: events.subscribe with notification streaming
- Lifecycle: ping, shutdown
- Session tree integration: user/assistant turns tracked as tree nodes
- 11 unit tests covering round-trip RPC, error handling, shutdown

Protocol example:
  -> {"method":"session.create","params":{"model":"claude-sonnet-4-6"},"id":1}
  <- {"result":{"sessionId":"abc123"},"id":1}
  -> {"method":"session.turn","params":{"sessionId":"abc123","input":"hello"},"id":2}
  <- {"result":{"status":"completed","tokensUsed":150},"id":2}

All 1,083 workspace tests pass.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Fix tree inconsistency: only add user/assistant nodes on run_turn
  success, not before the call (prevents orphaned user nodes on failure)
- Fix events.subscribe no-op: store EventBus in ManagedSession and
  actually drain events via subscribe() instead of faking notifications
- Remove dead RpcMethod enum and default_model() (dispatch uses string
  matching, enum was never deserialized)
- Remove unused imports (Arc, AgentSessionEvent)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Adds `claw --mode rpc` (or `claw --mode=rpc`) that starts the JSON-RPC
server over stdin/stdout for agent integration. This is the primary
entry point for non-Rust consumers to integrate with Claw Code.

Usage:
  echo '{"method":"session.create","params":{"model":"claude-sonnet-4-6"},"id":1}' | claw --mode rpc

Also fixes unused import warning in models_file.rs.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- status_bar.rs: content.len() -> content.chars().count() for unicode-safe truncation
- permission.rs: use Theme::WARNING, Theme::DIM, Theme::permission_border()
- diff_view.rs: use Theme::SUCCESS, Theme::ERROR, Theme::HIGHLIGHT, Theme::MUTED, Theme::DIM
- timeline.rs: use Theme::MUTED, Theme::SUCCESS_BOLD, Theme::ERROR_BRIGHT, Theme::HIGHLIGHT, Theme::DIM
- thinking.rs: use Theme::THINKING in format_thinking_completed and render_thinking_inline
- All 230 tests pass

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…ble API client

- Add AgentSessionBuilder with fluent API (model, system_prompt, tools, permission_mode, api_client)
- Add BoxedApiClient type-erased wrapper for any runtime::ApiClient
- Add DummyApiClient as the default no-op client
- Refactor AgentSession to use BoxedApiClient instead of being generic
- Export new types from lib.rs
- Fix doctests to use DummyApiClient

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…, abort, dispose

- steer(): inject mid-turn steering messages into session
- follow_up(): queue follow-up messages for next turn
- set_model() / cycle_model(): runtime model switching with rotation
- compact(): explicit context compaction via runtime CompactionConfig
- abort(): cooperative abort signal for mid-turn cancellation
- dispose(): clean session teardown with lifecycle event emission
- All methods guard against use-after-dispose
- 11 new unit tests (51 total in SDK crate)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- steer() now writes to runtime's session (via session_mut()) not stale SDK clone
- compact() applies compacted_session back to runtime and syncs SDK copy
- run_turn() now checks ensure_not_disposed() before executing
- abort() uses runtime's HookAbortSignal (wired via with_hook_abort_signal) instead of disconnected Arc<AtomicBool>
- set_model() emits SessionLifecycleEvent::ModelChanged instead of misleading Created
- dispose() clears both SDK and runtime session copies
- Add ModelChanged variant to SessionLifecycleEvent

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…ith Theme constants

All \x1b[...m escape sequences in tool_fmt.rs replaced with
Theme::DIM, RESET, SUCCESS_BOLD, ERROR_BRIGHT, WARNING,
HIGHLIGHT, ERROR, SUCCESS, MUTED, COMMAND_BG constants.
230 tests pass.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…d custom handlers

- ToolHandler trait for custom tool execution (Send + Sync)
- define_tool() ergonomic builder: name, description, input/output schemas, handler
- SchemaValidator: JSON Schema validation (type, required, nested properties)
- FnToolHandler: wrap closures as ToolHandler implementations
- Enhanced ToolRegistry: register builtin stubs + custom ToolDefinitions with handlers
- Upgraded SdkToolExecutor: dispatch to custom handlers with input/output validation
- 15 new unit tests (66 total in SDK crate)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- SchemaValidator now supports "integer" type (checks as_i64() for whole numbers)
- SdkToolExecutor rejects malformed JSON input when a non-trivial schema is defined
- 20 new tests covering: all JSON types (null, array, boolean, number, integer),
  empty/malformed schemas, deeply nested property paths, malformed JSON bypass,
  handler error propagation, empty tool name, builtin idempotency, builtin-after-custom
  conflicts, unregistered tool validation passthrough, minimal tool builds,
  full end-to-end pipeline, SchemaValidationError Display formatting

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- TreeEntry enum with 6 typed entries: Message, Compaction, Branch, ModelChange, ThinkingLevel, Custom
- SessionTreeLog: append-only JSONL persistence with automatic tree reconstruction
- build_session_context(): walk active path and collect provider-relevant entries
- Branch labels and summaries stored alongside tree structure
- fork_to_new_file(): extract ancestor subtree into independent JSONL file
- Compaction and model-change entries create tree nodes for full audit trail
- Round-trip file persistence with serde JSON serialization
- active_id() getter on SessionTree for external module access
- 10 new unit tests (95 total in SDK crate)
- Update ROADMAP.md: mark Phase 2.3 done, clarify Phase 2.4 is Python adapter work

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…ence

- P1: load_from_file now skips unparseable trailing lines (crash recovery)
  instead of failing the entire session load
- P2: ModelChange/ThinkingLevel now return error when tree has no active node
  instead of silently becoming zombie entries
- apply_entry now propagates all tree errors instead of silently ignoring them
- 16 new tests: serde round-trip all 6 variants, truncated line recovery,
  corrupted middle line, empty file, ModelChange/ThinkingLevel rejection,
  branch without label, compaction+branch reconstruction, multi-branch,
  build_context with branches and compaction, fork at root/leaf, fork with
  branch/custom entries, duplicate node_id behavior
- Add active_id() getter to SessionTree

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Resolve conflict in main.rs by:
- Removing inline CliAction, CliOutputFormat, LocalHelpTopic, parse_args
  (already extracted to args.rs by Phase 0)
- Adding CliAction::Rpc variant and --mode rpc parsing to args.rs
- Keeping all new mainline files (sdk crate, docs, API changes)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
jaikoo and others added 13 commits April 26, 2026 02:20
TUI refactor: modular tui/ module, compact banner, tool timeline, theme system
…d approval gates

- RiskLevel enum (Low/Medium/High) with ordering
- ChangeRecord/FileChange: structured change tracking with diff hunks
- ReviewGate: configurable approval gates by risk level and sensitive file patterns
- RiskClassifier: auto-classify changes based on file paths and change size
- ReviewManager: submit, approve, reject, request_changes, batch_approve
- Review history with full audit trail
- Glob matching (* and ** patterns) for sensitive file path detection
- 25 unit tests covering all components

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- RiskClassifier: empty file list, .env variants, false positives (environment.rs),
  case-insensitive paths, renamed files
- ReviewManager: sequential ID generation, double-approve rejection, reject/request_changes
  on nonexistent, request_changes→approve history accumulation, duplicate explicit ID,
  all_pending vs pending_reviews distinction, batch approve edge cases
- ReviewGate: custom low-risk gate with sensitive path, no-match gate
- Glob matching: ? wildcard, empty patterns, exact match, case sensitivity, multi-** segments
- Serde round-trips: RiskLevel, Decision, FileChangeType, ReviewGate
- Decision display formatting

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add ProviderKind::DeepSeek, Ollama, Qwen, Vllm variants
- Add ProviderClient::DeepSeek, Ollama, Qwen, Vllm variants
- Add OpenAiCompatConfig::deepseek(), ollama(), qwen(), vllm() constructors
- Add MODEL_REGISTRY entries for deepseek-chat, deepseek-reasoner, deepseek-r1
- Route deepseek*, ollama/*, qwen/*, vllm/* prefixes to new providers
- Change qwen/ prefix to route to external Qwen (non-DashScope)
- Keep bare qwen-* routing to DashScope for backward compat
- Add reasoning_content field to ChunkDelta for DeepSeek R1 thinking
- Handle reasoning/thinking blocks in StreamState (ingest_chunk, finish)
- Add deepseek-reasoner to is_reasoning_model()
- Update strip_routing_prefix() for new provider prefixes
- Add env-var-based detection in detect_provider_kind()
- Add 15 new tests covering aliases, routing, reasoning, configs
- Update exhaustive matches in app.rs and format/model.rs
…scaffolding

- Add startup_banner: Option<String> to RuntimeFeatureConfig in runtime crate
- Add parse_optional_startup_banner() parser for 'startupBanner' config key
- Add RuntimeConfig::startup_banner() accessor
- Wire BannerStyle::from_config() in run_repl() — reads settings.json
- Migrate full_banner() and compact_banner() to use Theme::DIM/Theme::RESET
- Add None arg to run_repl() call sites for new startup_banner parameter
- 230 tests pass, runtime crate compiles clean

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- 5 integration tests (provider_client_integration): DeepSeek routing,
  missing creds, Ollama prefix, vLLM prefix, Qwen external prefix
- 1 e2e streaming test (openai_compat_integration): verify
  reasoning_content produces Thinking blocks with correct indices
- Fix from_env() to skip credential check for no-auth providers
  (Ollama, vLLM) when api_key_env is empty
- New sdk::setup with provider detection, tool detection, SetupReport, session templates
- Fix TOCTOU race in DetectedProvider::check
- Add DeepSeek, Ollama, vLLM, Qwen providers
- Add 9 new edge-case tests (env var present/empty, render branches, full serde equality)
- Fix pre-existing clippy: map+unwrap_or -> is_ok_and, Duration::from_hours

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- NotificationSink trait: ConsoleSink, FileSink, WebhookSink, EmailSink
- Severity enum with 5 levels: Debug → Critical
- EventType enum covering agent lifecycle and review events
- NotificationDispatcher with per-sink filters (severity, event type, tags, exclusions)
- Generic Notification with builder pattern (with_tag, with_payload)
- Full serde support for all types
- 14 unit tests: filtering, severity ordering, file JSONL, sink dispatch, round-trip, edge cases

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…n validation

1. Token limits: add deepseek-chat/deepseek-reasoner to model_token_limit()
   (8,192 max output, 131,072 context window per DeepSeek API docs)
2. Pricing: add deepseek-chat (/bin/bash.27/.10 per M tokens) and
   deepseek-reasoner (/bin/bash.55/.19 per M tokens) to pricing_for_model()
3. README: add Built-in Providers table with all 8 providers, env vars,
   model prefixes, auto-detection order, and usage examples
4. models.json validation: validate the api field against known values
   (openai-completions, anthropic-messages, deepseek, ollama, qwen, vllm)
   in both load_custom_models() and load_and_merge_custom_models()
- Add tests: token limits, pricing (3 tests), api field validation (2 tests)
- Add base_url_fallback_env field to OpenAiCompatConfig for Qwen to fall
  back to OPENAI_BASE_URL when QWEN_BASE_URL is not set
- Add streaming e2e test for reasoning-only stream (thinking → tools,
  no text content) verifying correct index offsets
- Add unit tests for Qwen base URL fallback behavior
- Update all config constructors with new base_url_fallback_env field
- Update read_base_url() to check fallback env var
feat: startup_banner config, Theme migration in app.rs
- Add SharedToolCallTimeline (Arc<Mutex<ToolCallTimeline>>) for
  shared access between streaming client and tool executor
- Add tool_timeline field to CliToolExecutor + constructor param
- Wire complete_tool() in execute() — records duration, error,
  truncation, and line count on each tool result
- Update all CliToolExecutor::new() call sites with None arg
- Re-export SharedToolCallTimeline from tui/mod.rs
- 230 tests pass

Note: passing an active timeline from consume_stream to the executor
requires threading through build_runtime -> ConversationRuntime which
crosses the runtime crate boundary (separate PR).

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…ecutor

- Create SharedToolCallTimeline once in build_runtime_with_plugin_state()
- Pass clone to both AnthropicRuntimeClient and CliToolExecutor
- Wire start_tool() via self.tool_timeline in consume_stream() (instead of
  a local ToolCallTimeline that was invisible to the executor)
- complete_tool() already wired in CliToolExecutor::execute() from PR #4
- Remove unused set_timeline() method since timeline is now passed at
  construction time
- Remove unused ToolCallTimeline import from app.rs
- 230 tests pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants