+
+---
+
+## What this document is
+
+A hands-on tour of Forge's three surfaces — the **interactive REPL**, the **one-shot CLI**, and the **web dashboard** — all driving the same runtime (`src/core/orchestrator.ts`). Every screenshot below is real output; every video clip is an unedited screen capture.
+
+Jump to:
+
+- [Before you start](#before-you-start)
+- [The REPL](#1-the-repl--forge)
+- [The one-shot CLI](#2-the-one-shot-cli--forge-run-)
+- [The web dashboard](#3-the-web-dashboard--forge-ui-start)
+- [Common workflows](#common-workflows)
+- [Tips & gotchas](#tips--gotchas)
+
+---
+
+## Before you start
+
+```bash
+# Install
+npm install -g @hoangsonw/forge
+
+# Health check — lists reachable providers + role→model mapping
+forge doctor
+
+# Pick a local model if you don't have one yet (free, runs on your box)
+ollama pull llama3:8b # ~4.7 GB, general-purpose
+ollama pull qwen2.5:7b # ~4.4 GB, better at code
+```
+
+If `forge doctor` shows at least one green provider, you're ready.
+
+---
+
+## 1. The REPL · `forge`
+
+Interactive shell with multi-turn prompts, slash-command autocomplete, digit shortcuts for interactive prompts, streamed markdown rendering, and live file-change tracking. The REPL is the right surface when you want a **conversation** — asking follow-up questions, iterating on a plan, or exploring a codebase.
+
+### Screenshot
+
+
+
+### Video
+
+https://github.com/user-attachments/assets/550c76ed-ee05-438f-a55d-5be09e2cf78f
+
+> If your markdown viewer doesn't render the video inline, open [`images/REPL.mp4`](images/REPL.mp4) directly.
+
+### Try it
+
+```bash
+forge
+```
+
+Then at the prompt:
+
+```
+[1] forge ❯ summarize src/core/loop.ts in this project
+[2] forge ❯ what are the key state transitions it manages?
+[3] forge ❯ /mode heavy
+[4] forge ❯ add a small helper that counts step retries
+```
+
+Each turn threads the previous ones into the planner's context via `composeDescription` (see `src/core/conversation.ts`), so follow-ups resolve against real prior turns — not hallucinated history.
+
+### What to look for in the demo
+
+- **Launch banner** — mode, task, and phase breadcrumbs (`classify → plan → approve → execute → verify`) print above the progress rail.
+- **Live streaming** — the model's answer reflows token-by-token with markdown formatting (headings, fenced code, lists) forming up in place.
+- **Slash-command dropdown** — type `/` and the fuzzy-ranked slash catalog appears above the prompt. Arrow keys pick, Tab accepts, digit keys jump.
+- **Status line** — shows mode, provider:model, cwd, context usage, turn number, conversation id, plus any active permission flags (`+files`, `+shell`, …).
+- **DONE block** — duration, files changed, and a final completion line after each task.
+
+---
+
+## 2. The one-shot CLI · `forge run "..."`
+
+A single task end-to-end: classify → plan → approve → execute → verify → report. Ideal for CI jobs, batch scripts, and "I know exactly what I want" invocations.
+
+### Screenshot
+
+
+
+### Video
+
+https://github.com/user-attachments/assets/9e1cbbd0-764c-46b4-a937-447ef37fe31a
+
+> If your markdown viewer doesn't render the video inline, open [`images/CLI.mp4`](images/CLI.mp4) directly.
+
+### Try it
+
+```bash
+# A read-only analysis (no mutation risk)
+forge run "summarize src/core/loop.ts"
+
+# A bugfix with auto-approve (skip the plan-approval prompt)
+forge run --yes "fix the off-by-one in pagination.ts"
+
+# Produce a plan without executing it
+forge run --plan-only "add a /health endpoint to the Express server"
+
+# Pick a mode explicitly
+forge run --mode heavy "refactor the auth middleware to use JWTs"
+
+# Deterministic output for reproducibility (temperature 0)
+forge run --deterministic "add JSDoc to every exported fn in src/types"
+```
+
+### Flags worth knowing
+
+| Flag | Effect |
+|---|---|
+| `--yes` | auto-approve plan |
+| `--plan-only` | produce plan, stop |
+| `--mode ` | `fast` · `balanced` · `heavy` · `plan` · `audit` · `debug` · `architect` · `offline-safe` |
+| `--strict` | confirm every action |
+| `--allow-files` / `--allow-shell` / `--allow-network` / `--allow-web` / `--allow-mcp` | session-scoped permission grants |
+| `--skip-permissions` | skip routine prompts (high-risk still asked) |
+| `--deterministic` | temperature 0 for reproducible output |
+| `--non-interactive` | deny any prompt silently (CI-safe) |
+| `--trace` | emit full trace (implies `--debug`) |
+
+See `forge run --help` for the full list.
+
+### What to look for in the demo
+
+- **`━━━ LAUNCHING ━━━`** banner at the start (mode, task, phase pills).
+- **Plan approval prompt** with `Approve / Edit / Reject` — Edit opens `$EDITOR` with the plan JSON.
+- **Per-step execution** with spinner + tool-result echoes.
+- **`━━━ DONE ━━━`** banner at the end with duration, files changed, model cost (when billable).
+
+---
+
+## 3. The web dashboard · `forge ui start`
+
+A local HTTP + WebSocket dashboard (vanilla JS, <120 KB, no CDN). Runs on `http://127.0.0.1:7823`. Best for watching multiple tasks, browsing history, reading long outputs, or driving Forge from a browser tab.
+
+### Screenshot
+
+
+
+### Video
+
+https://github.com/user-attachments/assets/49a9e479-5be6-4cc7-ab5e-c906d0103316
+
+> If your markdown viewer doesn't render the video inline, open [`images/UI.mp4`](images/UI.mp4) directly.
+
+### Try it
+
+```bash
+forge ui start
+# open http://127.0.0.1:7823
+```
+
+Or via Docker Compose (Forge + Ollama + UI in one command):
+
+```bash
+docker compose -f docker/docker-compose.yml up -d
+```
+
+### What you can do in the dashboard
+
+- **Hero input on the Dashboard** — type a prompt, pick a project path (autocomplete from known projects, or hit **Browse…** for a server-side `$HOME`-scoped directory picker), fire the task.
+- **Chat view** — multi-turn conversations with markdown-rendered bot replies.
+- **Task detail view** — live stream of phase events, working-spinner, streamed model output, and a follow-up input that threads prior turns into the next task.
+- **Tasks view** — full history, searchable; click any row to expand/continue.
+- **Plan approval / Edit modal** — when a task hits approval, the plan viewer offers **Reject / Edit… / Approve & run**. Edit opens an inline JSON editor; save re-enters the approval loop with the new plan.
+- **Permission modal** — per-call risk-classified prompts (`Deny / Allow once / Allow for session`).
+- **Live cost + token counters** — for local providers, shows token count; for hosted (OpenAI / Anthropic), shows estimated USD.
+- **Historical-task replay** — click a past task in the history table and the dashboard replays its saved plan + summary + file list even though the WebSocket subscription only streams live tasks.
+
+### What to look for in the demo
+
+- **Project picker** under the hero input — dropdown of known projects plus a **Browse…** button.
+- **Streaming markdown** reflowing live in the task stream.
+- **Plan viewer** with per-step chips (type, risk, id, target) and three-button footer.
+- **Follow-up composer** at the bottom of each task view — continues the conversation by spawning a new task with composed prior-turn context.
+
+---
+
+## Common workflows
+
+### Analyze a file without touching it
+
+```bash
+forge run "summarize src/core/loop.ts"
+```
+
+The classifier tags this as `intent=analysis`, so the planner is forbidden from emitting mutation steps (`edit_file`, `write_file`, `run_tests`). The narrator pass turns the gathered context into a human-readable summary.
+
+### Iterate on a change in the REPL
+
+```
+[1] forge ❯ find everywhere we call `saveTask` without wrapping in try/catch
+[2] forge ❯ wrap those with a shared helper that logs the error
+[3] forge ❯ run the tests
+```
+
+Each turn's context is threaded into the next, so the model knows what the previous turns touched.
+
+### Plan-first, approve later (CI-friendly)
+
+```bash
+forge run --plan-only "add a /health endpoint" > plan.json
+# review plan.json in your PR
+forge run --yes "add a /health endpoint"
+```
+
+### Drive Forge from a browser tab
+
+```bash
+forge ui start
+```
+
+Open `http://127.0.0.1:7823`, set the project path once (sticky until you change it), fire any prompt — plan approval and permissions surface as modals.
+
+### Mix surfaces in one session
+
+Same SQLite index, same tasks, same conversation files. Start a task in the REPL, watch it finish in the dashboard's Active view, continue the conversation from either side. Each surface is a view over the runtime, not a sandbox.
+
+---
+
+## Tips & gotchas
+
+- **`forge doctor`** is your friend. If something's off — provider unreachable, keychain not available, model role unmapped — this tells you.
+- **First turn is slower.** Local models cold-start; Forge emits a `MODEL_WARMING` event so you can see it.
+- **`~/.forge/logs/forge.log`** is the authoritative debug log. Trace-level with `--trace` or `FORGE_LOG_LEVEL=debug`.
+- **Cancel** any running task with `Ctrl+C` in the REPL, the CLI, or the dashboard's **Cancel** button.
+- **Permission grants are scoped.** An "allow for session" only applies to that REPL / CLI invocation; it doesn't persist across runs unless you explicitly set it in `~/.forge/config.json`.
+- **Your local model matters.** Forge's planner and narrator expect a model ≥ 7B for reasonable instruction-following; 3B chat models will produce noisy plans. `ollama pull qwen2.5:7b` is a solid default.
+
+---
+
+## Where to next
+
+- [`README.md`](README.md) — full feature list, architecture, runtime metrics.
+- [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) — hot paths, mode caps, state machine.
+- [`docs/SETUP.md`](docs/SETUP.md) — contributor setup.
+- [`FLYWHEEL.md`](FLYWHEEL.md) — the plan → bead → code methodology.
+- [`CLAUDE.md`](CLAUDE.md) / [`AGENTS.md`](AGENTS.md) — context for AI agents working on this repo.
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..13e43a9
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,301 @@
+# Forge — local-first, multi-agent, programmable software-engineering runtime.
+#
+# This Makefile is a thin, self-documenting wrapper over npm scripts, Docker,
+# and a handful of shell one-liners that we'd otherwise retype a dozen times
+# a day. It is intentionally NOT the canonical build system — package.json
+# scripts are; this just gives them short names and groups them sensibly so
+# `make help` answers "how do I …" for new contributors.
+#
+# Invariants followed:
+# - Every target is .PHONY unless it produces the named file.
+# - Every user-facing target has a "##" doc comment on its line; `make help`
+# parses those into a categorised table.
+# - Recipes are idempotent where possible — running twice is safe.
+# - No target silently swallows errors; if a step fails, `make` fails.
+
+# ---------------------------------------------------------------------------
+# Configuration
+# ---------------------------------------------------------------------------
+
+SHELL := /usr/bin/env bash
+.SHELLFLAGS := -euo pipefail -c
+.ONESHELL:
+.DEFAULT_GOAL := help
+
+# Project metadata (derived from package.json so rename-the-package Just Works)
+PKG_NAME := $(shell node -p "require('./package.json').name" 2>/dev/null || echo @hoangsonw/forge)
+PKG_VERSION := $(shell node -p "require('./package.json').version" 2>/dev/null || echo 0.0.0)
+
+# Runtime
+NODE ?= node
+NPM ?= npm
+NPX ?= npx
+
+# Docker / OCI
+DOCKER ?= docker
+IMAGE ?= ghcr.io/hoangsonw/forge-agentic-coding-cli
+TAG ?= dev
+IMAGE_FULL := $(IMAGE):$(TAG)
+PLATFORMS ?= linux/amd64,linux/arm64
+COMPOSE_FILE ?= docker/docker-compose.yml
+
+# Where test harnesses drop throwaway state. Override via env:
+# make test FORGE_HOME=/tmp/forge-ci
+FORGE_HOME ?= $(HOME)/.forge
+
+# ---------------------------------------------------------------------------
+# Self-documenting help (parses `##` annotations from this file)
+# ---------------------------------------------------------------------------
+
+.PHONY: help
+help: ## Show this help (default target)
+ @awk 'BEGIN { \
+ FS = ":.*##"; \
+ printf "\n\033[1;36mForge\033[0m \033[2m%s@%s\033[0m · make targets\n\n", "$(PKG_NAME)", "$(PKG_VERSION)" \
+ } \
+ /^##@/ { \
+ printf "\n\033[1;35m%s\033[0m\n", substr($$0, 5); next \
+ } \
+ /^[a-zA-Z0-9_.-]+:.*##/ { \
+ printf " \033[32m%-22s\033[0m %s\n", $$1, $$2 \
+ }' $(MAKEFILE_LIST)
+ @printf "\nOverride knobs (env or \`make VAR=...\`):\n"
+ @printf " \033[2mTAG=\033[0m%-14s image tag for docker targets (default: dev)\n" "$(TAG)"
+ @printf " \033[2mPLATFORMS=\033[0m%-9s docker buildx platforms (default: linux/amd64,linux/arm64)\n" "$(PLATFORMS)"
+ @printf " \033[2mFORGE_HOME=\033[0m%-9s state dir for smoke runs (default: ~/.forge)\n" "$(FORGE_HOME)"
+ @printf "\n"
+
+##@ Setup
+
+.PHONY: install
+install: ## Install dependencies (npm ci, matches package-lock.json exactly)
+ $(NPM) ci --ignore-scripts
+
+.PHONY: install-dev
+install-dev: ## Install dependencies with devDeps (first-time contributor path)
+ $(NPM) install
+
+.PHONY: link
+link: build ## npm link — make `forge` on PATH resolve to this checkout
+ $(NPM) link
+
+.PHONY: unlink
+unlink: ## Remove the npm-linked binary (`@hoangsonw/forge`) from your PATH
+ -$(NPM) unlink -g $(PKG_NAME)
+
+.PHONY: relink
+relink: unlink link ## unlink + link in one step (after a pull / branch switch)
+
+##@ Build
+
+.PHONY: build
+build: ## Compile TypeScript + copy non-code assets into dist/
+ $(NPM) run build
+
+.PHONY: watch
+watch: ## Rebuild on every file change (tsc --watch; UI assets don't auto-copy)
+ $(NPM) run build:watch
+
+.PHONY: typecheck
+typecheck: ## Type-check without emitting files (fast; CI-safe)
+ $(NPM) run typecheck
+
+.PHONY: clean
+clean: ## Remove dist/ and any coverage output
+ rm -rf dist coverage .tsbuildinfo
+
+.PHONY: distclean
+distclean: clean ## clean + nuke node_modules (forces a fresh install next time)
+ rm -rf node_modules
+
+##@ Quality
+
+.PHONY: lint
+lint: ## ESLint over src/ (errors only; warnings OK)
+ $(NPM) run lint
+
+.PHONY: format
+format: ## Prettier write (src/ + test/)
+ $(NPM) run format
+
+.PHONY: format-check
+format-check: ## Prettier verify (fails if anything would be reformatted)
+ $(NPM) run format:check
+
+.PHONY: test
+test: ## Run the full vitest suite (97 files, 570+ tests)
+ $(NPM) test
+
+.PHONY: test-watch
+test-watch: ## Run vitest in watch mode (auto-reruns on change)
+ $(NPM) run test:watch
+
+.PHONY: test-coverage
+test-coverage: ## Run tests with v8 coverage → coverage/ + index.html
+ $(NPM) run test:coverage
+
+.PHONY: test-one
+test-one: ## Run ONE test file: make test-one FILE=test/unit/foo.test.ts
+ @if [[ -z "$${FILE:-}" ]]; then echo "usage: make test-one FILE=test/unit/foo.test.ts"; exit 2; fi
+ $(NPX) vitest run "$$FILE"
+
+.PHONY: verify
+verify: format-check lint typecheck build test ## Everything CI runs, in one shot
+
+##@ Metrics
+
+.PHONY: metrics
+metrics: ## Regenerate docs/metrics.json (counts, sizes, test count, …)
+ bash scripts/metrics.sh
+
+.PHONY: bundle
+bundle: build ## Build an offline tarball bundle (via scripts/bundle.js)
+ $(NODE) scripts/bundle.js
+
+##@ Run locally
+
+.PHONY: start
+start: build ## Run the compiled CLI (`./bin/forge.js`) with no args → REPL
+ $(NODE) ./bin/forge.js
+
+.PHONY: dev
+dev: ## Run the CLI via ts-node (no build step; slower cold start)
+ $(NPM) run dev
+
+.PHONY: doctor
+doctor: build ## Sanity-check providers + role→model mapping (<1 s cold)
+ $(NODE) ./bin/forge.js doctor --no-banner
+
+.PHONY: repl
+repl: build ## Alias: open the Forge REPL against this checkout
+ $(NODE) ./bin/forge.js
+
+.PHONY: ui
+ui: build ## Launch the local dashboard at http://127.0.0.1:7823
+ $(NODE) ./bin/forge.js ui start --bind 127.0.0.1 --port 7823
+
+.PHONY: ui-stop
+ui-stop: ## Kill any running Forge UI process bound to :7823
+ -lsof -ti tcp:7823 2>/dev/null | xargs -r kill -9
+
+##@ Docker
+
+.PHONY: docker-build
+docker-build: ## Build a single-arch image locally: $(IMAGE_FULL)
+ $(DOCKER) build -f docker/Dockerfile -t $(IMAGE_FULL) .
+
+.PHONY: docker-build-multi
+docker-build-multi: ## Multi-arch build (buildx; linux/amd64 + linux/arm64). Adds --push if PUSH=1
+ $(DOCKER) buildx build \
+ --platform $(PLATFORMS) \
+ -f docker/Dockerfile \
+ -t $(IMAGE_FULL) \
+ $(if $(filter 1 true,$(PUSH)),--push,--load) \
+ .
+
+.PHONY: docker-run
+docker-run: docker-build ## Run the image with the current repo mounted as /workspace
+ $(DOCKER) run --rm -it \
+ -v forge-home:/data \
+ -v "$$(pwd):/workspace" \
+ $(IMAGE_FULL) forge doctor --no-banner
+
+.PHONY: docker-ui
+docker-ui: docker-build ## Run the containerised dashboard at http://127.0.0.1:7823
+ $(DOCKER) run --rm -p 7823:7823 -v forge-home:/data \
+ $(IMAGE_FULL) forge ui start --bind 0.0.0.0
+
+.PHONY: compose-up
+compose-up: ## Bring up the full stack (forge + ollama + ui) via docker-compose
+ $(DOCKER) compose -f $(COMPOSE_FILE) up -d
+
+.PHONY: compose-down
+compose-down: ## Tear down the compose stack (keeps volumes)
+ $(DOCKER) compose -f $(COMPOSE_FILE) down
+
+.PHONY: compose-nuke
+compose-nuke: ## Tear down the compose stack AND delete all named volumes
+ $(DOCKER) compose -f $(COMPOSE_FILE) down --volumes --remove-orphans
+
+.PHONY: compose-logs
+compose-logs: ## Tail logs from the compose stack
+ $(DOCKER) compose -f $(COMPOSE_FILE) logs -f --tail=200
+
+##@ Release (maintainer-only)
+
+.PHONY: pack
+pack: build ## Produce an npm tarball in the repo root (no publish)
+ $(NPM) pack
+
+.PHONY: publish-dry
+publish-dry: build ## Dry-run `npm publish --access public` (shows what would be uploaded)
+ $(NPM) publish --access public --dry-run
+
+.PHONY: tag
+tag: ## Create & push a git tag `v$(PKG_VERSION)` (triggers release.yml)
+ @echo "Tagging v$(PKG_VERSION)"
+ git tag -a "v$(PKG_VERSION)" -m "Release v$(PKG_VERSION)"
+ git push origin "v$(PKG_VERSION)"
+
+##@ Maintenance
+
+.PHONY: audit
+audit: ## npm audit (production deps, fails on high/critical)
+ $(NPM) audit --omit=dev --audit-level=high
+
+.PHONY: outdated
+outdated: ## List packages that have newer versions available
+ -$(NPM) outdated
+
+.PHONY: tree
+tree: ## Show the dep tree (production only)
+ $(NPM) ls --omit=dev --all
+
+.PHONY: locs
+locs: ## Lines of code by language (requires `cloc`; brew install cloc)
+ @command -v cloc >/dev/null || { echo "install cloc: brew install cloc"; exit 1; }
+ cloc --quiet --exclude-dir=node_modules,dist,coverage,.git .
+
+##@ Troubleshooting
+
+.PHONY: where
+where: ## Print resolved paths and versions that builds/tests will use
+ @printf "package : $(PKG_NAME)@$(PKG_VERSION)\n"
+ @printf "node : $$($(NODE) --version) (at: $$(which $(NODE)))\n"
+ @printf "npm : $$($(NPM) --version) (at: $$(which $(NPM)))\n"
+ @printf "forge (dist) : $$(ls dist/cli/index.js 2>/dev/null || echo 'not built (make build)')\n"
+ @printf "forge (bin) : ./bin/forge.js\n"
+ @printf "FORGE_HOME : $(FORGE_HOME)\n"
+ @printf "docker : $$($(DOCKER) --version 2>/dev/null || echo 'not installed')\n"
+
+.PHONY: smoke
+smoke: build ## End-to-end smoke check (doctor + test + --help) in isolated FORGE_HOME
+ @tmp=$$(mktemp -d -t forge-smoke.XXXXXX); \
+ echo "Using FORGE_HOME=$$tmp"; \
+ FORGE_HOME=$$tmp $(NODE) ./bin/forge.js --help >/dev/null; \
+ FORGE_HOME=$$tmp $(NODE) ./bin/forge.js doctor --no-banner; \
+ rm -rf "$$tmp"; \
+ echo "smoke: OK"
+
+.PHONY: kill-stale
+kill-stale: ## Kill stray forge UI / daemon processes (useful after dev crashes)
+ -pgrep -f "bin/forge.js ui start" | xargs -r kill -9
+ -pgrep -f "bin/forge.js daemon" | xargs -r kill -9
+ -lsof -ti tcp:7823 2>/dev/null | xargs -r kill -9
+ @echo "cleaned up"
+
+# ---------------------------------------------------------------------------
+# Footer: ensure every user-facing target declared above is marked .PHONY so
+# stale files with the same name can't shadow them.
+# ---------------------------------------------------------------------------
+
+.PHONY: help install install-dev link unlink relink \
+ build watch typecheck clean distclean \
+ lint format format-check test test-watch test-coverage test-one verify \
+ metrics bundle \
+ start dev doctor repl ui ui-stop \
+ docker-build docker-build-multi docker-run docker-ui \
+ compose-up compose-down compose-nuke compose-logs \
+ pack publish-dry tag \
+ audit outdated tree locs \
+ where smoke kill-stale
diff --git a/README.md b/README.md
index 6b3df5d..88ffd21 100644
--- a/README.md
+++ b/README.md
@@ -4,14 +4,16 @@
# Forge
-**A local-first, multi-agent, programmable software-engineering runtime.**
+**A local-first, plan-first, multi-agent, and programmable software-engineering runtime.**
*Not an assistant. A runtime.* Forge brings its own scheduler, sandbox,
permission system, state machine, agentic loop, memory layers, and
plugin ecosystem. You pick the model. You approve the actions. Everything
is inspectable, replayable, and yours.
-**[Install](docs/INSTALL.md) · [Dev setup](docs/SETUP.md) · [Architecture](docs/ARCHITECTURE.md) · [Releases & versioning](RELEASES.md) · [Wiki Page](index.html) · [NPM Package](https://www.npmjs.com/package/@hoangsonw/forge) · [License](LICENSE)**
+
+
+**[Install](https://github.com/hoangsonww/Forge-Agentic-Coding-CLI/blob/master/docs/INSTALL.md) · [Dev setup](https://github.com/hoangsonww/Forge-Agentic-Coding-CLI/blob/master/docs/SETUP.md) · [Architecture](https://github.com/hoangsonww/Forge-Agentic-Coding-CLI/blob/master/docs/ARCHITECTURE.md) · [Releases & versioning](https://github.com/hoangsonww/Forge-Agentic-Coding-CLI/blob/master/RELEASES.md) · [Demo walkthrough](DEMO.md) · [Wiki Page](https://hoangsonww.github.io/Forge-Agentic-Coding-CLI/) · [NPM Package](https://www.npmjs.com/package/@hoangsonw/forge) · [License](LICENSE)**
@@ -42,25 +44,25 @@ is inspectable, replayable, and yours.
## At a glance
-Forge is a local-first, multi-agent, programmable software-engineering runtime. Unlike Claude Code or OpenAI Codex, Forge is local-first infrastructure, not a hosted assistant. It brings its own scheduler, sandbox, permission system, state machine, agentic loop, memory layers, and plugin ecosystem. You pick & host the model. You approve the actions. Everything is inspectable, replayable, and yours.
+Forge is a local-first, plan-first, multi-agent, and programmable software-engineering runtime. Unlike Claude Code or OpenAI Codex, Forge is local-first infrastructure, not a hosted assistant. It brings its own scheduler, sandbox, permission system, state machine, agentic loop, memory layers, and plugin ecosystem. You pick & host the model. You approve the actions. Everything is inspectable, replayable, and yours.
@@ -235,9 +237,38 @@ docker compose -f docker/docker-compose.yml up -d
# open http://127.0.0.1:7823
```
-**Requirements:** Node ≥ 20 *and/or* Docker ≥ 25. At least one LLM source
-(local runtime or API key). See [`docs/INSTALL.md`](docs/INSTALL.md) for
-per-OS notes.
+### System requirements
+
+| | Minimum | Notes |
+|---|---|---|
+| **Node.js** | **≥ 20** (22 tested) | Enforced via `package.json#engines`. Not needed if you use Docker. |
+| **OS** | macOS · Linux · Windows (WSL recommended) | `better-sqlite3` ships prebuilds for darwin-x64, darwin-arm64, linux-x64, linux-arm64, win32-x64 — no compile step. |
+| **Disk** | ~150 MB for `node_modules`; state under `~/.forge` grows with history | Override via `FORGE_HOME`. |
+| **RAM** | Forge ~100 MB; your local model consumes its own RAM/VRAM | `forge doctor` cold-starts in ~170 ms. |
+| **Docker** (alt path) | ≥ 25 | Multi-arch (amd64, arm64) image on GHCR. Zero host Node needed. |
+| **At least one model source** | Ollama · LM Studio · vLLM · llama.cpp · Anthropic · OpenAI-compatible | `forge doctor` tells you which are reachable. |
+
+**Runtime npm dependencies** (13, zero optional): `@modelcontextprotocol/sdk`, `better-sqlite3` (native, prebuilt), `chalk`, `cli-table3`, `commander`, `dotenv`, `ora`, `prompts`, `semver`, `undici`, `ws`, `yaml`, `zod`. No Python, Rust, or Go toolchain.
+
+**Recommended** (not required): `ripgrep` (fast `grep` tool path), `git` (diff/status tools + project-root detection), `$EDITOR` (used when you pick "Edit" on a plan).
+
+See [`docs/INSTALL.md`](docs/INSTALL.md) for per-OS notes and [`docs/SETUP.md`](docs/SETUP.md) for contributor setup.
+
+### See it running
+
+Three surfaces, one runtime.
+
+**REPL (Interactive Terminal) Mode**
+
+https://github.com/user-attachments/assets/eb592bbf-62a1-4d74-a540-7e066ebe56a4
+
+**CLI (Headless, One-shot run) Mode**
+
+https://github.com/user-attachments/assets/bc3b3204-fd87-436f-9467-604535edb4e2
+
+**Web UI Dashboard**
+
+https://github.com/user-attachments/assets/218cd64f-40fe-4836-9c62-c7a08538056b
---
@@ -496,6 +527,42 @@ warns once, never refuses to route.
Unknown models are accepted too — Forge rates them as generic executors
rather than refusing to route.
+### Model size & capability notes
+
+The agentic loop is cheap for the runtime but expensive for the *model*.
+Every step is a multi-turn tool-use conversation that returns strict JSON.
+Small models struggle with this in recognisable ways — please pick the
+right tool for the job.
+
+| Work you want to do | Safe local floor | What fails below the floor |
+|---|---|---|
+| Pure chat ("explain closures") | any 3B instruct (phi-3:mini, gemma-3:2b) | fine — conversation fast-path bypasses tool use entirely |
+| Summarize a file, explain a snippet | 7B instruct (qwen2.5:7b, llama3.1:8b) | summary is a line of "I read the file" instead of content |
+| Single-file edits / small features | **7B+ code specialist** (deepseek-coder:6.7b, qwen2.5-coder:7b) | picks wrong tool (run_command to write files), splits "create empty + edit" patterns, escalates to ask_user on tool errors |
+| Multi-file refactors, new features | 14B+ code specialist or a hosted frontier model | plan quality drops; step IDs get inconsistent; validation retries exhausted |
+| Architecture-level changes | hosted (Claude Opus/Sonnet, GPT-4 class) realistically | budgets blow out; changes go off-plan |
+
+Forge ships with defences so a small model fails *loudly* instead of
+silently corrupting files: the executor prompt spells out step-type →
+tool mappings, `ask_user` rejects empty/too-short questions as
+non-retryable, `edit_file` handles "create empty then fill" gracefully,
+parent directories auto-create, provider warm-up is explicit, and the
+router streams prose without `jsonMode` for narrator/conversation
+paths. The result is that a small model will often tell you it can't
+finish a task; it will rarely write the wrong code into a file.
+
+If in doubt: configure a code specialist for the `code` role, keep
+something lighter for `fast`, and set `ANTHROPIC_API_KEY` or
+`OPENAI_API_KEY` as a fallback — the router uses the hosted provider
+automatically when the local one fails or trips its circuit breaker.
+
+```bash
+forge config set models.code deepseek-coder:6.7b
+forge config set models.planner qwen2.5:7b
+forge config set models.fast phi3:mini
+export ANTHROPIC_API_KEY=sk-… # optional fallback
+```
+
---
## Safety model (not optional)
@@ -567,6 +634,8 @@ Each mode is an **enforceable budget** — not a hint to the model. See
## CLI reference
+> **▶ See each surface in action** in [DEMO.md](DEMO.md) — REPL walkthrough, `forge run` one-shots, and the web dashboard.
+
24 subcommands. Full surface:
```
@@ -697,6 +766,8 @@ API key auth. Tokens stored in the OS keychain.
Single hardened image (non-root, HEALTHCHECK, OCI labels, ~355 MB) that
serves both CLI and UI.
+> [▶ Dashboard demo](images/UI.mp4) — `forge ui start` driving a full task end-to-end (plan approval, streamed model output, follow-up thread). More in [DEMO.md](DEMO.md).
+
```bash
# Pull (multi-arch: linux/amd64 + linux/arm64):
docker pull ghcr.io/hoangsonw/forge-agentic-coding-cli:latest
diff --git a/RELEASES.md b/RELEASES.md
index 1478ebc..51a7476 100644
--- a/RELEASES.md
+++ b/RELEASES.md
@@ -6,8 +6,6 @@
# Releases & Versioning
-
-
**How Forge versions, tags, builds, signs, and ships.**
Who this is for:
@@ -15,8 +13,6 @@ Who this is for:
- **Maintainers** cutting a release or shipping a hotfix.
- **Integrators** consuming Forge from CI, Docker, or the npm registry.
-
-
---
## Table of contents
@@ -480,12 +476,12 @@ flowchart TD
H --> C1{{"match?"}}:::step
L --> C1
C1 -->|yes| OK1["layer 1 ok"]:::ok
- C1 -->|no| F1["REFUSE — retain existing binary"]:::fail
+ C1 -->|no| F1["REFUSE — retain existing binary"]:::fail
OK1 --> VER["Ed25519.verify( public_key = trusted_keys[i], message = manifest.json, signature = manifest.sig )"]:::step
VER --> C2{{"any trusted key verifies?"}}:::step
C2 -->|yes| OK2["layer 2 ok install"]:::ok
- C2 -->|no| F2["REFUSE — unless FORGE_ALLOW_UNSIGNED=1 (dev only)"]:::fail
+ C2 -->|no| F2["REFUSE — unless FORGE_ALLOW_UNSIGNED=1 (dev only)"]:::fail
```
### Verifying by hand
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index 70c8d3f..54461de 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -345,6 +345,48 @@ total — into `{class, roles, contextTokens}`.
model isn't installed on the user's provider. Picks best-fit from what's
actually there, caches per process, warns once.
+### 6.1 Model-capability assumptions and the runtime guards that defend them
+
+Forge does not assume a frontier model. The agentic loop is shaped so that
+small, cheap, local-first models (down to the 7B tier) can drive it usefully
+— but not silently. Every observed small-model failure mode has a
+corresponding runtime guard so that either:
+
+- the model recovers cleanly (retry with different args, switch tool, set
+ `done:true`), or
+- the tool fails loudly and non-retryably, forcing the executor to change
+ strategy instead of looping, or
+- the task ends with an honest failure message rather than corrupted state.
+
+| Failure mode (small / mid models) | Where it manifests | Runtime guard |
+|---|---|---|
+| Wrong tool selection (e.g. `run_command` to write file contents) | `src/agents/executor.ts` | System prompt spells out `step.type → tool` mapping and forbids `run_command` for file writes |
+| Splitting "create empty file → edit to fill" across steps | planner output → `src/tools/edit-file.ts` | `edit_file` with `oldText=""` on an empty/missing file writes the full body instead of erroring |
+| Missing-parent-directory `ENOENT` on `write_file` | `src/tools/write-file.ts` | `createDirs` defaults to `true` (mkdir-p); opt out explicitly to get the old behaviour |
+| Escalating to `ask_user` on tool errors, stalling the step | `src/tools/ask-user.ts` | Rejects questions < 3 chars as non-retryable; description tells the model "tool errors are for you to recover from, not escalate" |
+| Cold-load timeout treated as a model failure and fallback to hosted | `src/models/ollama.ts`, `src/models/router.ts` | Headers-timeout floor at 300 s; proactive `warm()` with `/api/ps` preflight; explicit `MODEL_WARMING`/`MODEL_WARMED` events drive the spinner |
+| Malformed JSON breaking `{actions, summary, done}` | `src/agents/executor.ts` | Parse-through-first-fence + schema validation; per-step retry budget capped; loop detector catches thrashing |
+| Reviewer rejecting analysis tasks for "no file changes" | `src/agents/reviewer.ts` | Classifier sets `requiresReview=false` for intent=analysis; loop short-circuits the verify phase; reviewer prompt knows analysis tasks have no diff |
+| Two concurrent edits racing on the same file | `src/sandbox/file-lock.ts` | Per-process path-keyed mutex serializes read-modify-write; atomic `writeAtomic` via temp + rename prevents torn reads |
+| `create_file` step that emits an empty file | `src/agents/planner.ts` | Planner prompt requires a single `create_file` step with the full intended body; `edit_file`-on-empty safety-net if ignored |
+
+**Consequences for capability tiers** (measured empirically, not specced —
+expect some variance across model families):
+
+| Work | Local floor | Above the floor | Below the floor |
+|---|---|---|---|
+| Conversation / concept Q&A | 3B instruct | — | fast-path skips tool use, so even 3B works |
+| Summarize / explain | 7B instruct | clean streaming narrator output | summary reduces to "I read the file" |
+| Single-file edits | 7B code specialist (deepseek-coder, qwen2.5-coder) | reliable tool calls, minimal retries | wrong-tool selection, step retries, occasional loop-detector trips |
+| Multi-file / new feature | 14B+ code specialist OR hosted | plan quality holds; dependencies tracked | plan IDs drift; validation retries exhausted |
+| Architecture / refactor | hosted frontier | end-to-end runs without intervention | not practical today |
+
+When the local-first path is insufficient, the router's fallback wiring
+(circuit breaker + `fallback` field on `RoutingDecision`) transparently
+routes the next call to the hosted provider if one is configured. No code
+change, no flag — just set `ANTHROPIC_API_KEY` or `OPENAI_API_KEY` and the
+system degrades gracefully under model failure.
+
---
## 7. Permission + sandbox model
diff --git a/docs/INSTALL.md b/docs/INSTALL.md
index 6c9b91b..65c3298 100644
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@@ -5,6 +5,7 @@
## Table of contents
+- [0. System requirements](#0-system-requirements)
- [1. Choose your install path](#1-choose-your-install-path)
- [2. npm (global)](#2-npm-global)
- [3. Docker](#3-docker)
@@ -18,6 +19,25 @@
---
+## 0. System requirements
+
+Forge runs anywhere Node 20+ runs. The Docker path has no host-side Node requirement at all.
+
+| | Minimum | Notes |
+|---|---|---|
+| **Node.js** | **≥ 20** (22 tested in CI) | Enforced via `package.json#engines`. Skip if you use Docker. |
+| **OS** | macOS · Linux · Windows (native or WSL) | `better-sqlite3` ships prebuilds for darwin-x64, darwin-arm64, linux-x64, linux-arm64, win32-x64 — no toolchain needed on `npm install`. |
+| **Disk** | ~150 MB `node_modules`; state under `~/.forge` grows with history | Override via `FORGE_HOME`. |
+| **RAM** | Forge: ~100 MB resident. Your local model: whatever the model needs. | `forge doctor` cold-starts in ~170 ms. |
+| **Docker** (alt path) | ≥ 25 | Multi-arch image `ghcr.io/hoangsonw/forge-agentic-coding-cli:latest`. Amd64 + arm64. |
+| **At least one model source** | Local runtime or hosted key | See [§7](#7-model-runtimes-you-can-point-forge-at). `forge doctor` probes all of them. |
+
+**Runtime npm dependencies** (13 total, **zero optional**): `@modelcontextprotocol/sdk`, `better-sqlite3`, `chalk`, `cli-table3`, `commander`, `dotenv`, `ora`, `prompts`, `semver`, `undici`, `ws`, `yaml`, `zod`. No Python, Rust, or Go required — `better-sqlite3` is the only native module and ships prebuilt binaries.
+
+**Recommended** (not required): `ripgrep` (fast path for the `grep` tool), `git` (for `git_diff`/`git_status` tools and project-root detection), `$EDITOR` (used when you pick "Edit" on a plan approval).
+
+---
+
## 1. Choose your install path
```mermaid
@@ -277,6 +297,53 @@ Granite-Code, CodeLlama, Codestral, StarCoder, Yi, Solar, Zephyr,
MiniCPM, LLaVA, TinyLlama, SmolLM, Aya, and more. Unknown models still
get a routable role rather than being refused.
+### Picking a model that fits the work
+
+Forge's agentic loop is multi-turn tool use with strict JSON output. That's
+easy for frontier hosted models and hard for small local ones. These are
+the tiers we've observed in practice — pull the right size for what you
+intend to do, and set a hosted fallback for when you hit the ceiling.
+
+| Task type | Local floor we trust | Example pulls | Notes |
+|---|---|---|---|
+| Chat / concept Q&A | 3B instruct | `phi3:mini`, `gemma3:2b`, `qwen2.5:3b` | Uses the conversation fast-path; no tool use required. |
+| Summarize / explain code | 7B instruct | `qwen2.5:7b`, `llama3.1:8b` | Narrator pass runs non-JSON and streams cleanly. |
+| Single-file edits / small features | **7B+ code specialist** | `deepseek-coder:6.7b`, `qwen2.5-coder:7b` | Multi-step tool use; general 7B models often pick the wrong tool here. |
+| Multi-file refactors / new features | 14B+ code specialist | `qwen2.5-coder:14b`, `deepseek-coder:33b` | Or route through a hosted frontier model. |
+| Architecture-level changes | hosted only, realistically | Claude Opus/Sonnet, GPT-4-class | Context windows + plan quality matter. |
+
+**Expected failure modes below the floor** (the rail guards flag these
+rather than silently corrupting files):
+
+- Wrong tool selection — e.g. `run_command` to write file contents.
+ Executor prompt maps step types explicitly; unrecoverable calls surface
+ loudly instead of looping.
+- Escalating to `ask_user` on tool errors instead of retrying or switching
+ tools. `ask_user` rejects empty/too-short questions as non-retryable.
+- Splitting "create empty file, then edit to fill" across two steps.
+ `edit_file` now handles empty-oldText on an empty file as a full-body
+ write, so this legitimate pattern succeeds.
+- Malformed JSON that breaks the executor's `{actions, summary, done}`
+ contract. The run fails cleanly; no partial state is written.
+
+**Configuring per-role models:**
+
+```bash
+forge config set models.planner qwen2.5:7b
+forge config set models.code deepseek-coder:6.7b
+forge config set models.fast phi3:mini
+
+# Hosted fallback — router engages automatically on local failure / breaker open.
+export ANTHROPIC_API_KEY=sk-…
+# or:
+export OPENAI_API_KEY=sk-…
+```
+
+First use of a local model triggers a visible `warming ` phase
+before the first call — cold-loading a 7B into RAM/VRAM can take up to a
+minute on slower machines. Subsequent calls are fast while Ollama keeps
+it resident (5 min default).
+
### Runtime selection flow
```mermaid
diff --git a/docs/SETUP.md b/docs/SETUP.md
index 2f576ab..6b987c8 100644
--- a/docs/SETUP.md
+++ b/docs/SETUP.md
@@ -21,16 +21,64 @@
## 1. Prerequisites
-| | Version |
+### Host toolchain
+
+| | Version | Why |
+|---|---|---|
+| **Node.js** | **≥ 20** (22 tested in CI) | Enforced via `package.json#engines`. Uses async iterators on `undici` request bodies, `node:events`, and native ESM/CJS interop. |
+| **npm** | bundled with Node | For `npm ci` / `npm link`. |
+| **git** | any modern | Project-root detection, `git_diff` / `git_status` tools. |
+| **ripgrep** | any | Optional but recommended — fast path for the `grep` tool. Falls back to a Node glob walker. |
+| **Docker** | ≥ 25 | Only needed for building the image or using the compose stack. |
+| **$EDITOR** | any | Used when you pick "Edit" on a plan; falls back to `vi`. |
+
+### OS support
+
+| OS | Status |
+|---|---|
+| macOS (darwin-x64, darwin-arm64) | first-class, tested in CI |
+| Linux (linux-x64, linux-arm64) | first-class, tested in CI |
+| Windows (native + WSL) | supported via native `better-sqlite3` prebuilds; WSL recommended for POSIX symlink / ripgrep parity |
+
+### Runtime npm dependencies
+
+Forge declares **13 runtime deps** and **zero optional deps**. None require a C/C++/Rust/Go/Python toolchain at install time — `better-sqlite3` is the only native module and ships prebuilds for every supported triple.
+
+| Package | Version | What for |
+|---|---|---|
+| `@modelcontextprotocol/sdk` | ^1.0.0 | MCP bridge (stdio/http_stream/websocket transports) |
+| `better-sqlite3` | ^11.3.0 | Local index DB (`~/.forge/forge.db`), FTS5 cold memory |
+| `chalk` | ^4.1.2 | ANSI color (v4 kept for CJS) |
+| `cli-table3` | ^0.6.5 | Tables in `forge doctor`, `task list`, `model list` |
+| `commander` | ^12.1.0 | CLI argv parsing |
+| `dotenv` | ^16.4.5 | `.env` loading |
+| `ora` | ^5.4.1 | Progress spinner (v5 kept for CJS) |
+| `prompts` | ^2.4.2 | Non-TTY fallback for the numbered-select helper |
+| `semver` | ^7.6.3 | Update-check version comparison |
+| `undici` | ^6.19.2 | HTTP client for Ollama / Anthropic / OpenAI streams |
+| `ws` | ^8.18.0 | UI dashboard WebSocket |
+| `yaml` | ^2.5.0 | Skill-file frontmatter |
+| `zod` | ^3.23.8 | Runtime validation of plans and tool args |
+
+### Model source — you need at least one
+
+Local runtimes (auto-detected on standard ports with a ~1.5 s probe):
+
+| Runtime | Default endpoint | Env override |
+|---|---|---|
+| Ollama | `http://127.0.0.1:11434` | `OLLAMA_ENDPOINT` |
+| LM Studio | `http://127.0.0.1:1234/v1` | `LMSTUDIO_ENDPOINT` |
+| vLLM | `http://127.0.0.1:8000/v1` | `VLLM_ENDPOINT` |
+| llama.cpp (`server`) | `http://127.0.0.1:8080/v1` | `LLAMACPP_ENDPOINT` |
+
+Hosted runtimes (API key via env or OS keychain):
+
+| Runtime | Env var |
|---|---|
-| Node.js | ≥ 20 (22 tested) |
-| npm | bundled with Node |
-| git | any |
-| ripgrep | optional but recommended — used by tools |
-| Docker (for image work) | ≥ 25 |
+| Anthropic | `ANTHROPIC_API_KEY` |
+| OpenAI-compatible | `OPENAI_API_KEY` (+ `OPENAI_BASE_URL` for non-OpenAI endpoints) |
-Optional: Ollama / LM Studio / vLLM / llama.cpp for testing against a real
-local model. Hosted `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` also works.
+If no provider is reachable, `forge doctor` reports it explicitly and prints the exact command to start one — no silent fallbacks.
---
@@ -222,6 +270,16 @@ flowchart TB
providers the router sees as up.
- **Events log:** `~/.forge/logs/events.jsonl` is append-only JSONL and
trivially `jq`-queryable.
+- **"Is this a model-capability bug or a Forge bug?"** — when tracking a
+ failing task, check the capability tier before changing code. Small
+ models (<7B, or any general 7B on multi-step edits) produce failure
+ modes that the runtime deliberately surfaces loudly rather than hides:
+ wrong-tool selection, `ask_user` escalation, split create-empty-then-fill
+ plans. See [ARCHITECTURE §6.1](ARCHITECTURE.md#61-model-capability-assumptions-and-the-runtime-guards-that-defend-them)
+ for the full table of failure modes → runtime guards. If you reproduce
+ the same failure on a hosted frontier model, it's a Forge bug. If only
+ on a small local, check the guard exists and that your change hasn't
+ regressed it.
---
diff --git a/images/CLI.mp4 b/images/CLI.mp4
new file mode 100644
index 0000000..d27b0de
Binary files /dev/null and b/images/CLI.mp4 differ
diff --git a/images/REPL.mp4 b/images/REPL.mp4
new file mode 100644
index 0000000..73f3b59
Binary files /dev/null and b/images/REPL.mp4 differ
diff --git a/images/UI.mp4 b/images/UI.mp4
new file mode 100644
index 0000000..34a99ef
Binary files /dev/null and b/images/UI.mp4 differ
diff --git a/images/cli.png b/images/cli.png
new file mode 100644
index 0000000..541153e
Binary files /dev/null and b/images/cli.png differ
diff --git a/images/logo.jpeg b/images/logo.jpeg
new file mode 100644
index 0000000..c09e5dc
Binary files /dev/null and b/images/logo.jpeg differ
diff --git a/images/repl.png b/images/repl.png
index b424798..097c894 100644
Binary files a/images/repl.png and b/images/repl.png differ
diff --git a/images/ui.png b/images/ui.png
index 65a2050..8b89e2d 100644
Binary files a/images/ui.png and b/images/ui.png differ
diff --git a/index.html b/index.html
index 8461e0e..dd84a67 100644
--- a/index.html
+++ b/index.html
@@ -128,6 +128,7 @@
Model sees every tool result (stdout / stderr / exit) and adapts within a step. Mode-capped turn budgets.
adaptivebounded
@@ -209,6 +215,73 @@
Every capability, highlighted.
+
+
+
+
+ Live demos
+
See it running.
+
Screen captures of each Forge surface — the interactive REPL, the one-shot CLI, and the web dashboard — all driving the same runtime.
+
+
+
+
+
▶ REPL
+
Interactive session
+
Multi-turn prompts with slash-command autocomplete, status line, digit shortcuts for prompts, streamed markdown rendering, and live file-change tracking.
+
streamslashautocomplete
+
+
+
▶ CLI
+
One-shot runs
+
forge run "…" launches a full classify → plan → approve → execute → verify pipeline in the terminal with a progress rail and completion block.
+
--yes--plan-onlyci-friendly
+
+
+
▶ UI
+
Web dashboard
+
Live WebSocket stream of plan approval, permission prompts, model deltas, and task results. Historical tasks replay from disk; follow-ups thread the conversation.
+
WebSocketstreamhistory
+
+
+
+
+
+
What every demo is actually showing
+
The same src/core/orchestrator.ts runtime drives all three surfaces. Any task you run in one surface is a real row in the SQLite index — pickable from another surface, visible in forge sessions, cancellable from the dashboard.
+
Deltas stream token-by-token from the provider (emitDelta → event bus → WebSocket / REPL progress rail). Markdown reflows in place so headings, fences, and lists form up live instead of dumping at the end.
+
+
+
Run these for yourself
+
REPL
+
forge
+
One-shot
+
forge run "summarize src/core/loop.ts"
+
Dashboard
+
forge ui start # http://127.0.0.1:7823
+
+
+
+
+
REPL demo · forge
+
+
+
+
CLI demo · forge run
+
+
+
+
Web dashboard demo · forge ui start
+
+
+
+
+
@@ -450,6 +523,50 @@
Model families → preferred roles
+
+
Model size & capability tiers
+
+ The agentic loop is multi-turn tool use with strict JSON output. Small
+ local models can drive it, but not every kind of work is realistic at
+ every size. Pick by the work you intend to do, and set a hosted
+ fallback for when you hit the ceiling — the router degrades gracefully
+ via its circuit breaker.
+
+ Below the tier floor, models fail in recognisable ways. Forge catches
+ each so a small model fails loudly instead of corrupting
+ state.
+
+
+
+
Failure mode
Runtime guard
+
+
Picks run_command to write file contents
Executor prompt spells out step.type → tool mapping and forbids run_command for file writes.
+
Escalates to ask_user on any tool error, stalling the step
ask_user rejects empty / too-short questions as non-retryable; model has to switch tools.
+
Splits "create empty file → edit to fill"
edit_file with oldText="" on an empty/missing file writes the full body.
+
write_fileENOENT because parent dir doesn't exist
createDirs defaults to true (mkdir-p).
+
Cold-load timeout interpreted as model failure
Headers-timeout floor 300 s; proactive warm() with /api/ps preflight.
+
Reviewer rejects analysis tasks for "no file changes"
Classifier sets requiresReview=false for intent=analysis; narrator pass writes the real answer.
+
Two concurrent edits race on the same file
Per-process path-mutex + atomic temp+rename.
+
+
+
@@ -652,12 +769,89 @@
Add MCP connector
+
+
+
+ 13 · System requirements
+
Node 20+. Or just Docker.
+
+
+ Forge runs on any platform Node 20 runs on, or anywhere Docker runs. There is no host-side Python, Rust, or Go requirement. better-sqlite3 is the only native module and ships prebuilts for every supported triple — no toolchain needed on npm install.
+
+
+
+
/ host
+
Host toolchain
+
+ Node.js ≥ 20 (22 tested).
+ OS: macOS · Linux · Windows (native or WSL).
+ Architectures: x64 · arm64.
+ Docker ≥ 25 (only if you prefer the container path).
+
+
node 20+darwinlinuxwin32arm64
+
+
+
/ footprint
+
Disk & RAM
+
+ Disk: ~150 MB node_modules; state under ~/.forge grows with session history (override with FORGE_HOME).
+ RAM: ~100 MB for Forge itself. Your local model uses its own RAM/VRAM on top.
+ Cold start:forge doctor ~170 ms.
+
+
~150 MB~100 MB RAM
+
+
+
/ model
+
Model source (pick ≥ 1)
+
+ Local: Ollama · LM Studio · vLLM · llama.cpp — auto-detected on standard ports.
+ Hosted:ANTHROPIC_API_KEY · OPENAI_API_KEY (+ OPENAI_BASE_URL for any OpenAI-compatible server).
+ forge doctor probes all of them and tells you which are reachable.
+
+
local-firsthosted fallback
+
+
+
+
Runtime npm dependencies
+
+ 13 runtime packages, zero optional dependencies. Listed below so you can audit them before npm install.
+
+
+
package.json · dependencies13 total
+
@modelcontextprotocol/sdk # MCP bridge (stdio / http_stream / websocket)
+better-sqlite3 # local index DB · FTS5 cold memory · native, prebuilt
+chalk # ANSI color
+cli-table3 # tables in `forge doctor`, `task list`
+commander # CLI argv parsing
+dotenv # .env loading
+ora # progress spinner
+prompts # non-TTY fallback for the numbered-select helper
+semver # update-check version comparison
+undici # HTTP client · Ollama / Anthropic / OpenAI streams
+ws # UI dashboard WebSocket
+yaml # skill-file frontmatter
+zod # runtime validation of plans & tool args
+
+
+
+
Recommended (not required)
+
+ ripgrep — fast path for the grep tool; falls back to a Node glob walker.
+ git — enables git_diff / git_status tools and project-root detection.
+ $EDITOR — used when you pick "Edit" on a plan approval; falls back to vi.
+
+
+
+
+
- 13 · Install
+ 14 · Install
Three paths. Pick one.
@@ -704,7 +898,7 @@
03 / Compose
- 14 · Container posture
+ 15 · Container posture
Single image. CLI + UI + daemon.
@@ -743,7 +937,7 @@
Single image. CLI + UI + daemon.
- 15 · CI/CD
+ 16 · CI/CD
9 jobs per PR. 6 release stages.
CI · every PR + push
@@ -819,7 +1013,7 @@
Release · on v* tag
- 16 · Runtime metrics
+ 17 · Runtime metrics
What it actually costs to run.
All measured locally — reproducers in the table at the bottom. No synthetic benchmarks, no comparisons against straw-man tools.
+ ${(() => {
+ const usd = Number(cost.totals?.usd ?? 0);
+ const toks = Number(cost.totals?.tokens ?? 0);
+ // Local providers (Ollama, llama.cpp) have no per-token pricing,
+ // so usd is always 0 even when tokens flow. Showing $0.000 in
+ // the headline makes the card look broken. When there is no
+ // billable cost but tokens are being used, promote the token
+ // count to the headline and annotate as "local · free".
+ if (usd > 0) {
+ return `
$${usd.toFixed(3)}
+
${toks.toLocaleString()} tokens
`;
+ }
+ if (toks > 0) {
+ return `
${toks.toLocaleString()}
+
tokens · local · free
`;
+ }
+ return `
$0.000
+
no tokens yet
`;
+ })()}
Provider
@@ -474,17 +497,34 @@ views.dashboard = async () => {
`);
const input = document.getElementById('hero-input');
+ const cwdInput = document.getElementById('hero-cwd');
const go = async (prompt = null) => {
const p = (prompt ?? input.value).trim();
if (!p) return;
pushPromptHistory(p);
+ const cwd = (cwdInput?.value || '').trim() || undefined;
try {
- const { taskId } = await apiPost('/api/tasks/run', { prompt: p, autoApprove: false });
+ const { taskId } = await apiPost('/api/tasks/run', { prompt: p, autoApprove: false, cwd });
toast('Task started', 'ok');
openTask(taskId);
} catch (e) { toast(String(e), 'err'); }
};
document.getElementById('hero-go').addEventListener('click', () => go());
+ // Known-projects autocomplete + Browse modal for the hero cwd input so
+ // the user can pick a dir directly from the Dashboard before running
+ // their first task, instead of having to dig into the New-task form.
+ api('/api/projects').then((ps) => {
+ const dl = document.getElementById('hero-cwd-list');
+ if (!dl) return;
+ dl.innerHTML = (ps || [])
+ .map((p) => ``)
+ .join('');
+ }).catch(() => {});
+ document.getElementById('hero-browse')?.addEventListener('click', () => {
+ openDirPicker((picked) => {
+ if (cwdInput) cwdInput.value = picked;
+ });
+ });
attachPromptHistory(input);
input.addEventListener('keydown', (e) => {
if (e.key === 'Enter' && !e.shiftKey) { e.preventDefault(); go(); }
@@ -876,7 +916,11 @@ views.run = async () => {
-
+
+
+
+
+
@@ -927,6 +971,22 @@ views.run = async () => {
});
document.getElementById('run-reset').addEventListener('click', () => setView('run'));
document.getElementById('run-prompt').focus();
+ // Populate the project datalist + wire the Browse button for the dir
+ // picker. Known projects come from Forge's project index so the user can
+ // jump to repos they've run tasks against before without retyping a path.
+ api('/api/projects').then((ps) => {
+ const dl = document.getElementById('run-cwd-list');
+ if (!dl) return;
+ dl.innerHTML = (ps || [])
+ .map((p) => ``)
+ .join('');
+ }).catch(() => {});
+ document.getElementById('run-browse')?.addEventListener('click', () => {
+ openDirPicker((picked) => {
+ const el = document.getElementById('run-cwd');
+ if (el) el.value = picked;
+ });
+ });
};
// ---------- Task detail ----------
@@ -935,7 +995,7 @@ const openTask = (taskId) => {
currentView = 'task';
document.querySelectorAll('.nav-item').forEach((b) => b.classList.remove('active'));
app.innerHTML = page(`
- ${pageHeader('Task · ' + taskId, 'Live stream from the interactive host.', `
+ ${pageHeader('Conversation · ' + taskId, 'Live stream from the interactive host. Type below to send a follow-up.', `
`)}
@@ -949,18 +1009,240 @@ const openTask = (taskId) => {
+
+
+
Follow-up
+ ready
+
+
+
+
+
+
+
+
+
+
+
+
+ 0 prior turns
+
+
+
+
+
`);
+ // Pre-populate the project path from /api/status so the user can see where
+ // tasks will run and override before sending. Without this the server-side
+ // cwd is whatever the `forge ui start` process inherited and the user has
+ // no visibility, which led to paths being joined against the wrong root
+ // (e.g. "~/Forge-Agentic-Coding-CLI/src/..." resolved under the real cwd).
+ api('/api/status').then((s) => {
+ const el = document.getElementById('followup-cwd');
+ if (el && s?.cwd) el.value = s.cwd;
+ }).catch(() => {});
+
+ // Populate the datalist with recent/known projects. Users can either type
+ // to autocomplete, pick from the dropdown, or click Browse for a full
+ // server-side directory picker.
+ api('/api/projects').then((ps) => {
+ const dl = document.getElementById('followup-cwd-list');
+ if (!dl) return;
+ dl.innerHTML = (ps || [])
+ .map((p) => ``)
+ .join('');
+ }).catch(() => {});
+
+ // Browse modal — walks directories under $HOME and lets the user click a
+ // folder to set it as the project. Server enforces the $HOME containment
+ // rule so the picker can't be used as a system enumerator.
+ document.getElementById('followup-browse')?.addEventListener('click', () => {
+ openDirPicker((picked) => {
+ const el = document.getElementById('followup-cwd');
+ if (el) el.value = picked;
+ });
+ });
const stream = document.getElementById('task-stream');
const planSec = document.getElementById('task-plan');
let currentPlanPromptId = null;
+ // Conversation state: each task under this view contributes one turn.
+ // We compose them on submit into a `description` that mirrors what
+ // `composeDescription` in the REPL does server-side — the orchestrator
+ // treats it as ground truth for follow-ups like "what did we talk
+ // about?". `activeTaskId` tracks which task's WS we're currently
+ // subscribed to (swaps on each follow-up).
+ const convoTurns = [];
+ let activeTaskId = taskId;
+ const composeDescription = (newInput) => {
+ const prior = convoTurns.filter((t) => t.summary);
+ if (!prior.length) return newInput;
+ const lines = [
+ '## Current request',
+ newInput,
+ '',
+ '## Conversation so far (earliest → latest)',
+ ];
+ prior.slice(-8).forEach((t, i) => {
+ lines.push(`${i + 1}. user: ${t.input.replace(/\s+/g, ' ').slice(0, 240)}`);
+ lines.push(` assistant: ${t.success === false ? 'FAILED' : 'OK'} — ${(t.summary || '').replace(/\s+/g, ' ').slice(0, 240)}`);
+ });
+ lines.push('', '## Notes', '- "Current request" is the user\'s latest message; prior turns are context only.');
+ return lines.join('\n');
+ };
+ const updateTurnCount = () => {
+ const el = document.getElementById('followup-turns');
+ if (el) el.textContent = `${convoTurns.filter((t) => t.summary).length} prior turn${convoTurns.filter((t) => t.summary).length === 1 ? '' : 's'}`;
+ };
+
+ // Stream flows earliest → latest top-to-bottom (chat-style). Each line gets
+ // a right-aligned local timestamp. Auto-scroll sticks to the bottom as long
+ // as the user hasn't deliberately scrolled up to read history.
+ const isAtBottom = () => {
+ const gap = stream.scrollHeight - stream.scrollTop - stream.clientHeight;
+ return gap < 48;
+ };
+ const scrollToBottom = () => {
+ stream.scrollTop = stream.scrollHeight;
+ };
+ const tsNow = () => {
+ const d = new Date();
+ const hh = String(d.getHours()).padStart(2, '0');
+ const mm = String(d.getMinutes()).padStart(2, '0');
+ const ss = String(d.getSeconds()).padStart(2, '0');
+ return `${hh}:${mm}:${ss}`;
+ };
const push = (line) => {
const el = document.createElement('div');
el.className = line.cls;
- el.innerHTML = line.html;
- stream.insertBefore(el, stream.firstChild);
- while (stream.childElementCount > 300) stream.lastChild?.remove();
+ // Wrap message in a flex row so the right-aligned timestamp never
+ // collides with the body. `.log-body` is a div (not span) because
+ // callers pass block-level markdown such as `
` and `
` — a span around block content is invalid HTML and
+ // triggered subtle inline-baseline artifacts between code lines in
+ // some browsers.
+ el.innerHTML =
+ `
${line.html}
` +
+ `${tsNow()}`;
+ const stick = isAtBottom();
+ stream.appendChild(el);
+ while (stream.childElementCount > 300) stream.firstChild?.remove();
+ if (stick) scrollToBottom();
+ return el;
+ };
+
+ // Markdown renderer reused by the chat view. Falls back to plain-text
+ // escaping when the markdown helper script hasn't loaded for some reason.
+ const md = (s) =>
+ window.forgeMd && window.forgeMd.mdToHtml ? window.forgeMd.mdToHtml(s || '') : esc(s || '');
+
+ // Live "working" spinner lives between STARTED and DONE. REPL/CLI already
+ // show an ora spinner during this phase; without something equivalent in
+ // the UI the task just sits on "STARTED" for ~2 minutes. The spinner goes
+ // away the moment streaming starts, DONE/FAILED/ERROR arrives, or the
+ // task is cancelled.
+ let workingEl = null;
+ const showWorking = (phase) => {
+ if (workingEl) {
+ const ph = workingEl.querySelector('.working-phase');
+ if (ph) ph.textContent = phase;
+ return;
+ }
+ workingEl = document.createElement('div');
+ workingEl.className = 'log-line log-line-working';
+ workingEl.innerHTML =
+ `WORKING · ⠋${esc(phase)}` +
+ `${tsNow()}`;
+ const stick = isAtBottom();
+ stream.appendChild(workingEl);
+ if (stick) scrollToBottom();
+ // Tiny inline spinner animation — swap the braille glyph every 80ms. Kept
+ // in-element so one interval per spawned spinner, cleared on hide().
+ const frames = ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏'];
+ let i = 0;
+ const sp = workingEl.querySelector('.working-spinner');
+ workingEl._timer = setInterval(() => {
+ if (!workingEl || !sp) return;
+ i = (i + 1) % frames.length;
+ sp.textContent = frames[i];
+ }, 80);
+ };
+ const hideWorking = () => {
+ if (!workingEl) return;
+ if (workingEl._timer) clearInterval(workingEl._timer);
+ workingEl.remove();
+ workingEl = null;
+ };
+
+ // Streaming model output: the REPL/CLI render accumulated markdown live
+ // as tokens arrive (headings, fences, lists all form up in place). The UI
+ // now does the same — each delta appends to a per-stream buffer, and a
+ // requestAnimationFrame-coalesced re-render pipes the buffer through the
+ // same `mdToHtml` the rest of the UI uses. One rAF tick per frame caps
+ // work at ~60 re-renders/sec even if tokens arrive faster.
+ let deltaEl = null;
+ let deltaKey = '';
+ let deltaBuf = '';
+ let deltaRaf = 0;
+ // Track whether any delta was rendered during this task. If yes, the
+ // final `task.result` frame shouldn't repeat the full summary as a DONE
+ // block — the user already read it live.
+ let deltaStreamed = false;
+ const flushDeltaRender = () => {
+ deltaRaf = 0;
+ if (!deltaEl) return;
+ const span = deltaEl.querySelector('.stream-text');
+ if (!span) return;
+ const stick = isAtBottom();
+ // Render the whole accumulator so mid-stream markdown (headings,
+ // partially-closed fences, list sequences) reflows cleanly.
+ span.innerHTML = md(deltaBuf);
+ if (stick) scrollToBottom();
+ };
+ const appendDelta = (msg) => {
+ const key = `${msg.provider || ''}/${msg.model || ''}/${msg.role || ''}`;
+ if (msg.done) {
+ // Final render pass so the closing tokens (last list items, closing
+ // code fence, etc.) are reflected even if the last rAF tick hadn't
+ // fired yet.
+ if (deltaRaf) cancelAnimationFrame(deltaRaf);
+ flushDeltaRender();
+ if (deltaEl) deltaEl.classList.add('log-line-done');
+ deltaEl = null;
+ deltaKey = '';
+ deltaBuf = '';
+ return;
+ }
+ if (!msg.text) return;
+ if (!deltaEl || deltaKey !== key) {
+ deltaEl = document.createElement('div');
+ deltaEl.className = 'log-line log-line-stream';
+ // `.stream-text` needs to be block-level because the streamed
+ // markdown may include `
` / `
` / `` — wrapping a block
+ // like `
` in an inline `` made browsers render subtle
+ // inline-baseline artifacts between code lines (the "weird lines"
+ // users saw). Use a div container for the body with an inline head
+ // row for the model label; the markdown target is its own div.
+ deltaEl.innerHTML =
+ `
${esc(msg.model || 'model')}
` +
+ `${tsNow()}`;
+ const stick = isAtBottom();
+ stream.appendChild(deltaEl);
+ while (stream.childElementCount > 300) stream.firstChild?.remove();
+ if (stick) scrollToBottom();
+ deltaKey = key;
+ deltaBuf = '';
+ }
+ deltaBuf += msg.text;
+ deltaStreamed = true;
+ // Coalesce re-renders. Multiple deltas within the same frame collapse
+ // into one innerHTML assignment so fast providers (local Ollama can
+ // emit 100+ tokens/sec) don't thrash the layout.
+ if (!deltaRaf) deltaRaf = requestAnimationFrame(flushDeltaRender);
};
const renderPlan = (plan) => {
@@ -987,12 +1269,23 @@ const openTask = (taskId) => {
${steps}
+
`;
planSec.querySelectorAll('[data-plan-action]').forEach((b) =>
b.addEventListener('click', async () => {
if (!currentPlanPromptId) return;
+ if (b.dataset.planAction === 'edit') {
+ // Show the inline JSON editor. When the user saves, we'll
+ // receive a fresh `plan_edit` prompt from the server (because
+ // the loop re-calls confirmPlan after editPlan returns) — this
+ // click itself only needs to resolve the CURRENT approval
+ // prompt with value='edit'.
+ await apiPost('/api/prompts/respond', { promptId: currentPlanPromptId, value: 'edit' });
+ currentPlanPromptId = null;
+ return;
+ }
await apiPost('/api/prompts/respond', { promptId: currentPlanPromptId, value: b.dataset.planAction });
planSec.hidden = true;
planSec.innerHTML = '';
@@ -1003,48 +1296,219 @@ const openTask = (taskId) => {
app.querySelector('[data-action="back"]').addEventListener('click', () => setView('active'));
app.querySelector('[data-action="cancel"]').addEventListener('click', async () => {
- try { await apiPost(`/api/tasks/${taskId}/cancel`); toast('cancel requested', 'warn'); }
+ try { await apiPost(`/api/tasks/${activeTaskId}/cancel`); toast('cancel requested', 'warn'); }
catch (e) { toast(String(e), 'err'); }
});
- if (taskConnections.has(taskId)) { try { taskConnections.get(taskId).close(); } catch {} }
- const url = `${location.protocol === 'https:' ? 'wss' : 'ws'}://${location.host}/ws/tasks/${taskId}`;
- const ws = new WebSocket(url);
- taskConnections.set(taskId, ws);
-
const meta = document.getElementById('task-meta');
- ws.onopen = () => { meta.textContent = 'live'; };
- ws.onclose = () => { meta.textContent = 'disconnected'; };
- ws.onmessage = (e) => {
- let msg;
- try { msg = JSON.parse(e.data); } catch { return; }
- if (msg.kind === 'event') {
- const ev = msg.event;
- push({
- cls: `log-line ${ev.severity ?? 'info'}`,
- html: `${esc(ev.type)} · ${esc(ev.message)}`,
- });
- } else if (msg.kind === 'prompt') {
- if (msg.promptType === 'plan_approval') {
- currentPlanPromptId = msg.promptId;
- renderPlan(msg.plan);
- } else if (msg.promptType === 'permission') {
- openPermissionModal(msg);
- } else if (msg.promptType === 'user_input') {
- openUserInputModal(msg);
+
+ // Attach a WebSocket to a given taskId; swaps the listener when a
+ // follow-up turn spawns a new task. Returns the socket so we can close
+ // it when swapping again.
+ // Replay a historical task's saved detail into the stream. The WS server
+ // only streams live/active tasks, so when the user opens a completed task
+ // from the Tasks / Dashboard tables we need to hydrate the view from the
+ // persisted JSON instead of leaving it blank with "disconnected". Safe to
+ // call at any point — `push` is idempotent per event, and we guard with a
+ // flag so a brief race where the WS opens + detail arrives doesn't
+ // double-render.
+ let hydrated = false;
+ const hydrateHistorical = async (id) => {
+ if (hydrated) return;
+ try {
+ const t = await api(`/api/tasks/${id}`);
+ if (!t || hydrated) return;
+ hydrated = true;
+ meta.textContent = `historical · ${String(t.status || 'done')}`;
+ push({ cls: 'log-line', html: `PROMPT · ${esc((t.title || '').slice(0, 200))}` });
+ if (t.plan) renderPlan(t.plan);
+ const ok = t.result?.success !== false;
+ const summary = t.result?.summary || '';
+ if (summary) {
+ const multiline = summary.includes('\n');
+ push({
+ cls: `log-line ${ok ? '' : 'error'}`,
+ html: multiline
+ ? `${ok ? 'DONE' : 'FAILED'}
${md(summary)}
`
+ : `${ok ? 'DONE' : 'FAILED'} · ${md(summary)}`,
+ });
+ }
+ for (const f of (t.result?.filesChanged || []).slice(0, 12)) {
+ push({ cls: 'log-line', html: `FILE · ${esc(f)}` });
+ }
+ } catch (e) {
+ // Task not found or error — show a one-line note so the page isn't
+ // silently empty.
+ push({ cls: 'log-line warning', html: `HISTORY · unable to load task detail (${esc(String(e).slice(0, 120))})` });
+ }
+ };
+
+ const attachWs = (id) => {
+ if (taskConnections.has(id)) { try { taskConnections.get(id).close(); } catch {} }
+ const url = `${location.protocol === 'https:' ? 'wss' : 'ws'}://${location.host}/ws/tasks/${id}`;
+ const ws = new WebSocket(url);
+ taskConnections.set(id, ws);
+ // If the WS hasn't opened within 800ms, assume this is a historical
+ // task (the server closes with 1008 'unknown task' almost instantly in
+ // that case) and hydrate from persisted state. 800ms is long enough
+ // that live tasks reliably OPEN first, short enough that the UX feels
+ // instant.
+ let openedOrErrored = false;
+ setTimeout(() => { if (!openedOrErrored && ws.readyState !== ws.OPEN) hydrateHistorical(id); }, 800);
+ ws.onopen = () => { openedOrErrored = true; meta.textContent = 'live · ' + id.slice(0, 8); };
+ ws.onclose = () => {
+ openedOrErrored = true;
+ // Close with code 1008 (policy violation) is how the server says
+ // "unknown task" for historical rows — treat that as a cue to load
+ // the detail view. For tasks that opened live and then closed, the
+ // stream already has content; leave the meta as "disconnected".
+ if (!hydrated) { hydrateHistorical(id); }
+ else { meta.textContent = 'disconnected'; }
+ };
+ ws.onmessage = (e) => {
+ let msg;
+ try { msg = JSON.parse(e.data); } catch { return; }
+ if (msg.kind === 'event') {
+ const ev = msg.event;
+ push({
+ cls: `log-line ${ev.severity ?? 'info'}`,
+ html: `${esc(ev.type)} · ${esc(ev.message)}`,
+ });
+ // Keep the "working" spinner label in sync with the latest phase
+ // event so the user sees what Forge is doing (classify / plan /
+ // step_001 reading src/foo…) instead of a static "classifying
+ // request" label for the full run.
+ if (ev?.message) showWorking(ev.message.slice(0, 80));
+ } else if (msg.kind === 'prompt') {
+ if (msg.promptType === 'plan_approval') {
+ currentPlanPromptId = msg.promptId;
+ renderPlan(msg.plan);
+ } else if (msg.promptType === 'plan_edit') {
+ openPlanEditModal(msg.promptId, msg.plan);
+ } else if (msg.promptType === 'permission') {
+ openPermissionModal(msg);
+ } else if (msg.promptType === 'user_input') {
+ openUserInputModal(msg);
+ }
+ } else if (msg.kind === 'task.started') {
+ push({ cls: 'log-line', html: `STARTED · ${esc(msg.prompt.slice(0, 120))}` });
+ showWorking('classifying request…');
+ } else if (msg.kind === 'task.result') {
+ hideWorking();
+ const ok = msg.result?.success;
+ const summary = msg.result?.summary ?? '';
+ if (deltaStreamed) {
+ // The narrator / conversation answer already streamed into the
+ // view as model deltas — rendering the same text again as a
+ // DONE block is pure duplication. Emit a single status line
+ // instead so the user sees the task finished.
+ push({
+ cls: `log-line ${ok ? '' : 'error'}`,
+ html: `${ok ? 'DONE' : 'FAILED'}${ok ? '' : ` · ${md(summary)}`}`,
+ });
+ } else {
+ // No live stream happened (planner-only task, executor used
+ // jsonMode, etc.) — render the full summary so the user has
+ // something to read.
+ const isMultiline = summary.includes('\n');
+ if (isMultiline) {
+ push({
+ cls: `log-line ${ok ? '' : 'error'}`,
+ html: `${ok ? 'DONE' : 'FAILED'}
${md(summary)}
`,
+ });
+ } else {
+ push({
+ cls: `log-line ${ok ? '' : 'error'}`,
+ html: `${ok ? 'DONE' : 'FAILED'} · ${md(summary)}`,
+ });
+ }
+ }
+ toast(ok ? 'Task complete' : 'Task failed', ok ? 'ok' : 'err');
+ // Reset for the next task in this tab.
+ deltaStreamed = false;
+ // Record the summary against the most recently-sent turn so future
+ // follow-ups can thread it into the composed description.
+ const pending = convoTurns[convoTurns.length - 1];
+ if (pending && pending.taskId === id && !pending.summary) {
+ pending.summary = summary;
+ pending.success = ok;
+ updateTurnCount();
+ }
+ // Re-enable the follow-up input once the task finishes.
+ const sendBtn = document.getElementById('followup-send');
+ const statusEl = document.getElementById('followup-status');
+ if (sendBtn) sendBtn.disabled = false;
+ if (statusEl) statusEl.textContent = 'ready';
+ } else if (msg.kind === 'task.error') {
+ hideWorking();
+ push({ cls: 'log-line error', html: `ERROR · ${esc(msg.error)}` });
+ const sendBtn = document.getElementById('followup-send');
+ const statusEl = document.getElementById('followup-status');
+ if (sendBtn) sendBtn.disabled = false;
+ if (statusEl) statusEl.textContent = 'ready';
+ } else if (msg.kind === 'task.cancel_requested') {
+ hideWorking();
+ push({ cls: 'log-line warning', html: `CANCEL · requested` });
+ } else if (msg.kind === 'model.delta') {
+ // First token arriving means the model is speaking — the generic
+ // "working" spinner has served its purpose; the streamed text is
+ // the new source of motion.
+ if (msg.text) hideWorking();
+ appendDelta(msg);
}
- } else if (msg.kind === 'task.started') {
- push({ cls: 'log-line', html: `STARTED · ${esc(msg.prompt.slice(0, 120))}` });
- } else if (msg.kind === 'task.result') {
- const ok = msg.result?.success;
- push({ cls: `log-line ${ok ? '' : 'error'}`, html: `${ok ? 'DONE' : 'FAILED'} · ${esc(msg.result?.summary ?? '')}` });
- toast(ok ? 'Task complete' : 'Task failed', ok ? 'ok' : 'err');
- } else if (msg.kind === 'task.error') {
- push({ cls: 'log-line error', html: `ERROR · ${esc(msg.error)}` });
- } else if (msg.kind === 'task.cancel_requested') {
- push({ cls: 'log-line warning', html: `CANCEL · requested` });
+ };
+ return ws;
+ };
+
+ attachWs(taskId);
+
+ // Follow-up input — typed message spawns a new task with prior-turns
+ // context threaded in via `description`. The server's orchestrator uses
+ // that for the conversation fast-path; for non-conversational intents
+ // it's handed to the planner as context.
+ const input = document.getElementById('followup-input');
+ const sendBtn = document.getElementById('followup-send');
+ const autoCk = document.getElementById('followup-auto');
+ const cwdInput = document.getElementById('followup-cwd');
+ const statusEl = document.getElementById('followup-status');
+ const submitFollowup = async () => {
+ const text = (input?.value || '').trim();
+ if (!text) return;
+ sendBtn.disabled = true;
+ statusEl.textContent = 'sending…';
+ // Push an echo of the user's turn into the stream so the conversation
+ // reads linearly.
+ push({ cls: 'log-line', html: `YOU · ${esc(text)}` });
+ const description = composeDescription(text);
+ const cwd = (cwdInput?.value || '').trim() || undefined;
+ try {
+ const body = await apiPost('/api/tasks/run', {
+ prompt: text,
+ autoApprove: !!autoCk?.checked,
+ description,
+ cwd,
+ });
+ const newTaskId = body.taskId;
+ convoTurns.push({ taskId: newTaskId, input: text, summary: null });
+ updateTurnCount();
+ activeTaskId = newTaskId;
+ statusEl.textContent = 'running · ' + newTaskId.slice(0, 8);
+ input.value = '';
+ attachWs(newTaskId);
+ } catch (e) {
+ toast(String(e), 'err');
+ sendBtn.disabled = false;
+ statusEl.textContent = 'error';
}
};
+ sendBtn?.addEventListener('click', submitFollowup);
+ input?.addEventListener('keydown', (e) => {
+ if (e.key === 'Enter' && !e.shiftKey) {
+ e.preventDefault();
+ void submitFollowup();
+ }
+ });
+ input?.focus();
};
// ---------- Active / tasks ----------
@@ -1105,7 +1569,7 @@ views.tasks = async () => {
${(() => {
+ const usd = Number(t.usd);
+ const toks = Number(t.tokens);
+ // Cost view: if only local providers were used, usd is 0 by design
+ // (Ollama, llama.cpp are free). Surface that explicitly instead of
+ // a deceptive "$0.0000 estimated USD" line.
+ if (usd > 0) return `
$${usd.toFixed(4)}
estimated USD
`;
+ if (toks > 0) return `
local · free
no billable providers
`;
+ return `
$0.0000
no calls yet
`;
+ })()}
Recent calls
${rows.length
@@ -1614,6 +2090,64 @@ const mountOverlay = (innerHTML, { closeOnClickOutside = true, onClose, onKey }
return { overlay, close };
};
+// Inline plan editor. The user clicked "Edit…" on a plan approval; the
+// server's loop re-called `host.editPlan(plan)` which surfaces this
+// prompt. We show the plan JSON in a textarea; on Save we POST the new
+// plan back as the response, the loop installs it, and re-surfaces a
+// fresh plan_approval prompt so the user can approve/reject/edit-again.
+const openPlanEditModal = (promptId, plan) => {
+ const initial = JSON.stringify(plan, null, 2);
+ const { overlay, close } = mountOverlay(`
+
+
+
Edit plan
+
+
+
+
+ Edit the plan JSON directly. Steps, descriptions, targets, dependsOn,
+ risk — anything the planner produced. The loop will re-ask you to
+ approve after saving.
+
@@ -1671,6 +2205,76 @@ const openUserInputModal = (msg) => {
}));
};
+// ---------- Directory picker ----------
+//
+// Server-side browser modal. Calls `/api/dir?path=...` to list subdirs of a
+// given path (confined to $HOME on the server), and lets the user click a
+// folder to drill in or pick it. Useful when the user doesn't remember the
+// exact absolute path of the project they want to run a task against.
+
+const openDirPicker = (onPick) => {
+ const { overlay, close } = mountOverlay(`
+
+
+
Choose project directory
+
+
+
+
+
+
+
+
+
+
+
Paths outside $HOME aren't listed here — type them directly in the input.
`;
+ }
+ };
+ overlay.querySelector('#dp-up')?.addEventListener('click', () => {
+ if (currentPath) load(currentPath.replace(/\/[^/]+\/?$/, '') || '/');
+ });
+ overlay.querySelector('#dp-home')?.addEventListener('click', () => load(''));
+ overlay.querySelector('#dp-pick')?.addEventListener('click', () => {
+ onPick(pathEl.value.trim() || currentPath);
+ close();
+ });
+ pathEl.addEventListener('keydown', (e) => {
+ if (e.key === 'Enter') { e.preventDefault(); load(pathEl.value.trim()); }
+ });
+ // Start at the server's cwd so "same as when I opened" is the first view.
+ api('/api/status').then((s) => load(s?.cwd || '')).catch(() => load(''));
+};
+
// ---------- Command palette ----------
//
// Navigation + quick actions + fallback "Run task: ".
@@ -1790,20 +2394,61 @@ const openPalette = () => {
// ---------- Project event WS (for live log on dashboard, if used) ----------
+// The project event WS closes for several reasons that are not "the server
+// is down" — file rotation, fs.watch quirks on macOS when the events file
+// is rewritten, the server swapping the watched handle during project
+// changes. When that happens we verify liveness via a lightweight HTTP
+// probe before flagging offline, and schedule a reconnect. Without this,
+// the sidebar can read "offline" while a task is actively streaming.
+let projectWsReconnectTimer = null;
+const probeAndMarkStatus = async () => {
+ try {
+ const r = await fetch('/api/status', { cache: 'no-store' });
+ if (r.ok) {
+ setStatus(true);
+ return true;
+ }
+ } catch {
+ /* fall through to offline */
+ }
+ setStatus(false);
+ return false;
+};
const connectProjectWs = (projectPath) => {
if (projectWs) { try { projectWs.close(); } catch {} }
const url = `${location.protocol === 'https:' ? 'wss' : 'ws'}://${location.host}/ws?projectPath=${encodeURIComponent(projectPath)}`;
try {
projectWs = new WebSocket(url);
- projectWs.onopen = () => setStatus(true);
- projectWs.onclose = () => setStatus(false);
- } catch {}
+ projectWs.onopen = () => {
+ setStatus(true);
+ if (projectWsReconnectTimer) { clearTimeout(projectWsReconnectTimer); projectWsReconnectTimer = null; }
+ };
+ projectWs.onclose = () => {
+ // Don't flip to offline just because the event-watcher socket
+ // hiccuped. Probe the HTTP side; if the server responds, stay
+ // online and schedule a reconnect so the event feed resumes.
+ probeAndMarkStatus();
+ if (!projectWsReconnectTimer) {
+ projectWsReconnectTimer = setTimeout(() => {
+ projectWsReconnectTimer = null;
+ if (currentProject) connectProjectWs(currentProject);
+ }, 1500);
+ }
+ };
+ projectWs.onerror = () => probeAndMarkStatus();
+ } catch { probeAndMarkStatus(); }
};
const setStatus = (online) => {
statusDot.classList.toggle('off', !online);
statusText.textContent = online ? 'online' : 'offline';
};
+// Periodic liveness heartbeat. 8s is frequent enough to catch a real
+// server death within ~10s, sparse enough not to spam the log. Skips the
+// probe when the page is hidden so background tabs don't churn.
+setInterval(() => {
+ if (document.visibilityState !== 'hidden') probeAndMarkStatus();
+}, 8000);
const updateActiveBadge = (list) => {
const n = (list ?? []).filter((t) => t.status === 'running' || t.status === 'awaiting').length;
diff --git a/src/ui/public/markdown.js b/src/ui/public/markdown.js
index a9a51ce..2a2f9cb 100644
--- a/src/ui/public/markdown.js
+++ b/src/ui/public/markdown.js
@@ -86,7 +86,7 @@
const isBlockBoundary = (line) => {
if (!line.trim()) return true;
- if (/^```+|^~~~+/.test(line)) return true;
+ if (/^\s*(?:```+|~~~+)/.test(line)) return true;
if (/^#{1,6}\s/.test(line)) return true;
if (BLOCKQUOTE_RE.test(line)) return true;
if (/^\s*[-*+]\s+/.test(line)) return true;
@@ -95,10 +95,67 @@
return false;
};
+ // LLMs routinely emit `1. 1. 1.` for every item in a numbered list,
+ // trusting the renderer to auto-number. Our renderer strips the marker
+ // and emits ``, but if the items are separated by blank lines or
+ // sub-bullets, each run ends up as its own single-item `` — and the
+ // browser starts every one at 1. Pre-pass: per-indent counter that
+ // rewrites `1.` to sequential numbers whenever we're already past the
+ // first item at that indent. Mirrors src/cli/markdown.ts#renumberOrderedLists.
+ const renumberOrderedLists = (input) => {
+ const lines = input.split('\n');
+ const counters = new Map();
+ for (let i = 0; i < lines.length; i++) {
+ const line = lines[i];
+ if (/^\s*(?:#{1,6}\s|```+|~~~+)/.test(line)) {
+ counters.clear();
+ continue;
+ }
+ const m = /^(\s*)(\d+)\.\s(.*)$/.exec(line);
+ if (!m) continue;
+ const indent = m[1].length;
+ const num = parseInt(m[2], 10);
+ const body = m[3];
+ const prev = counters.get(indent) || 0;
+ if (num === 1 && prev >= 1) {
+ const next = prev + 1;
+ counters.set(indent, next);
+ lines[i] = m[1] + next + '. ' + body;
+ } else {
+ counters.set(indent, num);
+ }
+ }
+ return lines.join('\n');
+ };
+
+ // Smaller LLMs (and any model whose output gets re-flowed on its way to
+ // us) emit triple-backtick fences inline:
+ // ```javascript const numbers = [1,2,3]; numbers.forEach(...); ```
+ // The block parser's fence rule requires the opener to sit on its own
+ // line, so without preprocessing we fall through to the paragraph branch,
+ // and renderInline's `[^`\n]+` span then matches the body between two of
+ // the backticks — leaving a stray pair of backticks floating in the
+ // output. Rewriting inline fences onto their own lines (open / body /
+ // close) fixes both issues. Mirrors src/cli/markdown.ts#normaliseInlineFences.
+ const normaliseInlineFences = (input) => {
+ return input.replace(/```([\w-]*)[ \t]+([^\n]*?)[ \t]*```/g, (_m, lang, body) => {
+ const trimmed = String(body).replace(/\s+$/, '');
+ return '\n```' + lang + '\n' + trimmed + '\n```\n';
+ });
+ };
+
/** Convert markdown input to a safe HTML fragment string. */
const mdToHtml = (raw) => {
if (raw == null || raw === '') return '';
- const escaped = esc(raw).replace(/\r\n?/g, '\n');
+ // Normalize inline fences BEFORE escaping — the regex matches literal
+ // backticks, not ```, and we want the rewritten newlines to survive
+ // into the line-based block parser. Renumber before splitting so the
+ // ordered-list loop below sees sequential numbers (and honors them via
+ // ``).
+ const prepped = renumberOrderedLists(
+ normaliseInlineFences(String(raw).replace(/\r\n?/g, '\n')),
+ );
+ const escaped = esc(prepped);
const lines = escaped.split('\n');
const out = [];
let i = 0;
@@ -107,17 +164,27 @@
const line = lines[i];
// Fenced code (```), optional language tag.
- const fence = /^(```+|~~~+)\s*([\w-]*)\s*$/.exec(line);
+ //
+ // LLMs routinely nest code blocks inside bullet / numbered lists so the
+ // fence arrives with 2–6 spaces of leading indent. Match liberally on
+ // whitespace prefix, and strip the same prefix from body lines so the
+ // code doesn't inherit the list indent as visible leading whitespace.
+ const fence = /^(\s*)(```+|~~~+)\s*([\w-]*)\s*$/.exec(line);
if (fence) {
- const closer = fence[1].replace(/\s/g, '').charAt(0);
+ const openerIndent = fence[1].length;
+ const closer = fence[2].charAt(0);
+ const closerRe = new RegExp('^\\s*' + closer + '{3,}\\s*$');
const buf = [];
i++;
- while (i < lines.length && !new RegExp('^' + closer + '{3,}\\s*$').test(lines[i])) {
- buf.push(lines[i]);
+ while (i < lines.length && !closerRe.test(lines[i])) {
+ const raw = lines[i];
+ const leading = (raw.match(/^[ \t]*/) || [''])[0].length;
+ const strip = Math.min(openerIndent, leading);
+ buf.push(raw.slice(strip));
i++;
}
if (i < lines.length) i++;
- const langAttr = fence[2] ? ' data-lang="' + esc(fence[2]) + '"' : '';
+ const langAttr = fence[3] ? ' data-lang="' + esc(fence[3]) + '"' : '';
out.push(
'
' + buf.join('\n') + '
',
);
@@ -155,14 +222,40 @@
continue;
}
- // Ordered list
- if (/^\s*\d+\.\s+/.test(line)) {
+ // Ordered list. LLMs routinely separate items with blank lines —
+ // without the peek-past-blanks lookahead, each item became its own
+ // single-entry `` and every one rendered as "1." since browsers
+ // auto-number each `` from 1. Fix: consume blank lines between
+ // items as long as the next non-blank line is still a list item,
+ // and emit `start="N"` so the first number from source is honored.
+ const olHead = /^\s*(\d+)\.\s+/.exec(line);
+ if (olHead) {
+ const firstNum = parseInt(olHead[1], 10);
const items = [];
- while (i < lines.length && /^\s*\d+\.\s+/.test(lines[i])) {
- items.push(lines[i].replace(/^\s*\d+\.\s+/, ''));
- i++;
+ while (i < lines.length) {
+ const cur = lines[i];
+ if (/^\s*\d+\.\s+/.test(cur)) {
+ items.push(cur.replace(/^\s*\d+\.\s+/, ''));
+ i++;
+ continue;
+ }
+ if (!cur.trim()) {
+ // Peek ahead past blank lines to see if more items follow.
+ let j = i + 1;
+ while (j < lines.length && !lines[j].trim()) j++;
+ if (j < lines.length && /^\s*\d+\.\s+/.test(lines[j])) {
+ i = j;
+ continue;
+ }
+ }
+ break;
}
- out.push('' + items.map((it) => '
` (fenced block) the `` wraps multi-line
+ text; because `` is inline, each rendered line becomes its own
+ line-box and each line-box gets its own top + bottom border — which
+ shows up as thin horizontal lines between every code line. Strip the
+ per-line chrome anywhere `` lives inside a `
`. */
+pre code {
+ border: none !important;
+ background: transparent !important;
+ padding: 0 !important;
+ border-radius: 0 !important;
+}
kbd {
background: var(--bg-1);
@@ -1317,13 +1528,19 @@ kbd {
background: var(--bg-1);
border: 1px solid var(--border);
border-radius: 10px;
- padding: 10px 14px;
+ padding: 8px 14px;
max-width: min(780px, 74%);
font-size: 13.5px;
line-height: 1.5;
color: var(--fg-1);
- white-space: pre-wrap;
word-break: break-word;
+ /* NO `white-space: pre-wrap` here. The agent bubble receives
+ pre-rendered HTML and the template literal that builds it has
+ newlines + indentation between `
`,
+ `${body}`, and `${files}` — `pre-wrap` rendered those as actual
+ blank lines, producing huge empty gaps above and below the reply
+ text. `pre-wrap` is only needed on the user bubble where the
+ literal typed input might contain intentional newlines. */
}
.chat-msg-user .chat-bubble { border: none; white-space: pre-wrap; }
@@ -1340,6 +1557,13 @@ kbd {
/* Markdown rendering in chat bubbles (both success and failure bodies) */
.md p { margin: 0 0 8px; line-height: 1.55; }
.md p:last-child { margin-bottom: 0; }
+/* Collapse top margin on the first block and bottom margin on the last
+ block inside an .md container. The bubble already provides 10px vertical
+ padding; without these resets the first heading/list added its own
+ 10–12px top margin on top of that, making bot replies look padded by
+ ~22px on the first line and ~18px after the last paragraph. */
+.md > *:first-child { margin-top: 0; }
+.md > *:last-child { margin-bottom: 0; }
.md h1, .md h2, .md h3, .md h4, .md h5, .md h6 {
margin: 12px 0 6px; font-weight: 700; line-height: 1.25; letter-spacing: -.01em;
}
@@ -1360,7 +1584,7 @@ kbd {
border-radius: 8px; padding: 10px 12px; margin: 8px 0;
overflow-x: auto; font-size: .88em; line-height: 1.5;
}
-.md pre code { background: transparent; padding: 0; color: #e2e8f0; font-size: inherit; }
+.md pre code { background: transparent; padding: 0; color: #e2e8f0; font-size: inherit; border: none; }
.md blockquote {
border-left: 3px solid rgba(148, 163, 184, .4);
padding: 2px 0 2px 10px; margin: 8px 0; color: var(--muted);
diff --git a/src/ui/server.ts b/src/ui/server.ts
index 42d6b04..9d159db 100644
--- a/src/ui/server.ts
+++ b/src/ui/server.ts
@@ -9,6 +9,7 @@
*/
import * as http from 'http';
import * as fs from 'fs';
+import * as os from 'os';
import * as path from 'path';
import { URL } from 'url';
import { WebSocketServer } from 'ws';
@@ -172,6 +173,56 @@ const router = async (
return sendJson(res, 200, listProjects());
}
+ // Directory browser for the task-view / run-form dir picker. Returns the
+ // resolved absolute path plus its immediate subdirectories. Confined to
+ // the user's home directory + common dev roots; anything outside is
+ // refused so the UI can't be used to enumerate system directories.
+ if (p === '/api/dir' && req.method === 'GET') {
+ try {
+ const homeDir = os.homedir();
+ const raw = u.searchParams.get('path') ?? homeDir;
+ // Expand a bare tilde or `~/...` the same way the task runner does.
+ let target = raw;
+ if (target === '~') target = homeDir;
+ else if (target.startsWith('~/') || target.startsWith('~' + path.sep)) {
+ target = path.join(homeDir, target.slice(2));
+ }
+ const abs = path.resolve(target);
+ // Containment: only allow paths under $HOME. Users who genuinely need
+ // a non-home workspace can still paste an absolute path into the
+ // input itself — the picker just doesn't expose it.
+ const homeReal = fs.realpathSync(homeDir);
+ const absReal = fs.existsSync(abs) ? fs.realpathSync(abs) : abs;
+ const rel = path.relative(homeReal, absReal);
+ if (rel.startsWith('..') || path.isAbsolute(rel)) {
+ return sendJson(res, 403, { error: 'path outside $HOME', path: absReal });
+ }
+ let entries: Array<{ name: string; isDir: boolean }> = [];
+ try {
+ entries = fs
+ .readdirSync(absReal, { withFileTypes: true })
+ .filter((d) => !d.name.startsWith('.'))
+ .map((d) => ({ name: d.name, isDir: d.isDirectory() }))
+ .filter((e) => e.isDir)
+ .sort((a, b) => a.name.localeCompare(b.name))
+ .slice(0, 500);
+ } catch {
+ /* unreadable dir — return empty list, frontend shows "no entries" */
+ }
+ const parent = path.dirname(absReal);
+ const parentAllowed = !path.relative(homeReal, parent).startsWith('..');
+ return sendJson(res, 200, {
+ path: absReal,
+ parent: parentAllowed && parent !== absReal ? parent : null,
+ home: homeReal,
+ entries,
+ });
+ } catch (e) {
+ const { status, body } = errorBody(e);
+ return sendJson(res, status, body);
+ }
+ }
+
if (p === '/api/config' && req.method === 'GET') {
return sendJson(res, 200, loadGlobalConfig());
}
@@ -212,6 +263,7 @@ const router = async (
autoApprove?: boolean;
flags?: Record;
title?: string;
+ description?: string;
}>(req);
if (!body.prompt?.trim()) return sendJson(res, 400, { error: 'prompt required' });
const reply = startUiTask({
@@ -221,6 +273,11 @@ const router = async (
autoApprove: body.autoApprove,
flags: body.flags as Parameters[0]['flags'],
title: body.title,
+ // Composed multi-turn context from the UI's chat follow-up path.
+ // The orchestrator / conversation fast-path uses this so follow-up
+ // questions like "what did we just talk about?" resolve against
+ // actual prior turns rather than model hallucination.
+ description: body.description,
});
return sendJson(res, 202, reply);
} catch (e) {
@@ -235,7 +292,7 @@ const router = async (
return sendJson(res, 200, { active: listActive(), pending: listPendingPrompts() });
}
- const cancelMatch = /^\/api\/tasks\/([a-f0-9]+)\/cancel$/.exec(p);
+ const cancelMatch = /^\/api\/tasks\/([a-z0-9_]+)\/cancel$/.exec(p);
if (cancelMatch && req.method === 'POST') {
const ok = cancelTask(cancelMatch[1]);
return sendJson(res, ok ? 200 : 404, { cancelled: ok });
@@ -694,7 +751,7 @@ export const startUiServer = (
wss.on('connection', (socket, req) => {
const u = new URL(req.url ?? '/', `http://${req.headers.host ?? 'localhost'}`);
- const taskMatch = /^\/ws\/tasks\/([a-f0-9]+)$/.exec(u.pathname);
+ const taskMatch = /^\/ws\/tasks\/([a-z0-9_]+)$/.exec(u.pathname);
if (taskMatch) {
const ok = subscribe(taskMatch[1], socket as unknown as import('ws').WebSocket);
if (!ok) {
diff --git a/src/ui/task-runner.ts b/src/ui/task-runner.ts
index 5c84c7a..6437bc3 100644
--- a/src/ui/task-runner.ts
+++ b/src/ui/task-runner.ts
@@ -17,7 +17,10 @@
* @author Son Nguyen
*/
import * as crypto from 'crypto';
+import * as os from 'os';
+import * as path from 'path';
import type { WebSocket } from 'ws';
+import { newTaskId } from '../logging/trace';
import { InteractiveHost, withHost } from '../core/interactive-host';
import { ForgeEvent, PermissionDecision, PermissionRequest, Plan } from '../types';
import { PermissionFlags } from '../permissions/manager';
@@ -25,8 +28,9 @@ import { orchestrateRun } from '../core/orchestrator';
import { Mode } from '../types';
import { log } from '../logging/logger';
import { redact } from '../security/redact';
+import { eventBus, ModelDeltaEvent } from '../persistence/events';
-type PromptType = 'plan_approval' | 'permission' | 'user_input';
+type PromptType = 'plan_approval' | 'plan_edit' | 'permission' | 'user_input';
interface Pending {
id: string;
@@ -41,6 +45,14 @@ interface Pending {
interface ActiveTask {
taskId: string;
ringBuffer: Array>;
+ // Replay buffer for model deltas. The WS subscriber for a UI task
+ // typically connects ~50–200ms *after* startUiTask returns (round-trip
+ // of the POST + WS handshake), which is plenty of time for a local
+ // Ollama stream to have already emitted tokens. Without replay those
+ // tokens go to an empty subscriber set and the user sees no live
+ // streaming at all — only the final `task.result` frame, which makes
+ // the UI feel batch rather than streaming.
+ deltaBuffer: Array>;
subscribers: Set;
abortRequested: boolean;
startedAt: number;
@@ -51,6 +63,10 @@ interface ActiveTask {
}
const RING_MAX = 500;
+// Higher cap for deltas because a single 2k-token answer produces that
+// many entries; 4000 comfortably covers a full planner/narrator response
+// plus some head-room before we start shifting off the front.
+const DELTA_MAX = 4000;
const pending = new Map();
const active = new Map();
@@ -73,6 +89,49 @@ const broadcast = (taskId: string, payload: Record): void => {
}
};
+// Streaming deltas. We kept these in a separate replay buffer (distinct
+// from the event ring buffer) so late-subscribing WS clients — which is
+// the common case on the UI, since the browser opens the socket only
+// after the POST response — still see the whole stream reflow in the
+// conversation view. Without this the first tokens fire before any
+// subscriber exists and the user perceives zero streaming.
+const broadcastDelta = (taskId: string, payload: Record): void => {
+ const task = active.get(taskId);
+ if (!task) return;
+ const frame = { kind: 'model.delta', ...payload };
+ task.deltaBuffer.push(frame);
+ if (task.deltaBuffer.length > DELTA_MAX) task.deltaBuffer.shift();
+ const wire = JSON.stringify({ taskId, ...frame });
+ for (const ws of task.subscribers) {
+ try {
+ ws.send(wire);
+ } catch {
+ // socket teardown races are routine during streaming; ignore
+ }
+ }
+};
+
+// Bridge in-process events/deltas onto the per-task WebSocket channel.
+// One subscription per module, cleaned up only at process exit. Ignore events
+// that didn't originate from a task this runner knows about so we don't leak
+// events from stray CLI invocations into the UI.
+eventBus.on('event', (e: ForgeEvent) => {
+ if (!e.taskId || !active.has(e.taskId)) return;
+ // Shape matches the `kind: 'event'` handler in src/ui/public/app.js — keep
+ // those two in sync.
+ broadcast(e.taskId, { kind: 'event', event: e });
+});
+eventBus.on('delta', (d: ModelDeltaEvent) => {
+ if (!d.taskId || !active.has(d.taskId)) return;
+ broadcastDelta(d.taskId, {
+ text: d.text,
+ done: Boolean(d.done),
+ model: d.model,
+ provider: d.provider,
+ role: d.role,
+ });
+});
+
const makeHost = (taskId: string): InteractiveHost => ({
name: 'ui',
async confirmPlan(plan: Plan): Promise<'approve' | 'cancel' | 'edit'> {
@@ -92,6 +151,34 @@ const makeHost = (taskId: string): InteractiveHost => ({
broadcast(taskId, { kind: 'prompt', promptId: id, promptType: 'plan_approval', plan });
});
},
+ // UI plan editor. Surfaces an inline-edit prompt over the WebSocket;
+ // the client posts the edited plan back through /api/prompts/respond
+ // and we resolve with the new plan. Falls back to returning the
+ // original plan if the client sends something non-plan-shaped (so the
+ // loop doesn't crash on bad input).
+ async editPlan(plan: Plan): Promise {
+ const t = active.get(taskId);
+ if (t) t.status = 'awaiting';
+ return new Promise((resolve, reject) => {
+ const id = newId();
+ pending.set(id, {
+ id,
+ type: 'plan_edit',
+ taskId,
+ payload: plan,
+ resolve: (v) => {
+ if (v && typeof v === 'object' && Array.isArray((v as Plan).steps)) {
+ resolve(v as Plan);
+ } else {
+ resolve(plan);
+ }
+ },
+ reject,
+ createdAt: Date.now(),
+ });
+ broadcast(taskId, { kind: 'prompt', promptId: id, promptType: 'plan_edit', plan });
+ });
+ },
async requestPermission(
req: PermissionRequest,
flags: PermissionFlags,
@@ -190,17 +277,57 @@ export const onTaskResolved = (cb: (r: TaskResolution) => void): (() => void) =>
};
};
+// A user-entered path from the dashboard can come in as `~`, `~/foo`, or a
+// plain absolute path. `~` is shell syntax — Node does not expand it, so
+// without this the path is joined literally and becomes `/~/foo`,
+// which is nonsense and breaks every tool that tries to stat it.
+const expandUserPath = (p: string | undefined): string | undefined => {
+ if (!p) return p;
+ const trimmed = p.trim();
+ if (!trimmed) return undefined;
+ if (trimmed === '~') return os.homedir();
+ if (trimmed.startsWith('~/') || trimmed.startsWith('~' + path.sep)) {
+ return path.join(os.homedir(), trimmed.slice(2));
+ }
+ return trimmed;
+};
+
export const startUiTask = (req: RunRequest): RunReply => {
- const taskId = newId();
+ // Use the canonical `task_` form so the id we register in `active`
+ // matches the id the orchestrator stamps on events + deltas. Before this,
+ // the runner made up a hex id and the orchestrator made a different id
+ // internally; the eventBus bridge checked `active.has(d.taskId)` and
+ // silently dropped every delta because the ids didn't match.
+ const taskId = newTaskId();
const startedAt = Date.now();
const host = makeHost(taskId);
+ // Register the task in `active` BEFORE kicking off orchestrateRun so any
+ // synchronous events the orchestrator's setup path emits (classify /
+ // plan / project-scan) aren't dropped by the bridge's `active.has`
+ // guard. Deltas are async so they'd be fine either way, but event lines
+ // that populate the working-spinner label depend on this ordering.
+ const placeholder: ActiveTask = {
+ taskId,
+ ringBuffer: [],
+ deltaBuffer: [],
+ subscribers: new Set(),
+ abortRequested: false,
+ startedAt,
+ prompt: req.prompt,
+ mode: req.mode ?? 'balanced',
+ status: 'running',
+ resultPromise: Promise.resolve() as Promise,
+ };
+ active.set(taskId, placeholder);
+
const resultPromise = withHost(host, () =>
orchestrateRun({
+ taskId,
input: req.prompt,
description: req.description,
mode: req.mode ?? 'balanced',
- cwd: req.cwd,
+ cwd: expandUserPath(req.cwd),
autoApprove: req.autoApprove,
flags: req.flags ?? {},
title: req.title,
@@ -257,18 +384,11 @@ export const startUiTask = (req: RunRequest): RunReply => {
throw err;
});
- const task: ActiveTask = {
- taskId,
- ringBuffer: [],
- subscribers: new Set(),
- abortRequested: false,
- startedAt,
- prompt: req.prompt,
- mode: req.mode ?? 'balanced',
- status: 'running',
- resultPromise,
- };
- active.set(taskId, task);
+ // The placeholder registered above has been accumulating the ring buffer
+ // + delta buffer + (usually empty) subscriber set while the orchestrator
+ // set itself up. Stamp the real `resultPromise` onto it so downstream
+ // lifecycle helpers (cancelTask, listActive, etc.) see the live task.
+ placeholder.resultPromise = resultPromise;
broadcast(taskId, {
kind: 'task.started',
prompt: req.prompt,
@@ -289,6 +409,18 @@ export const subscribe = (taskId: string, ws: WebSocket): boolean => {
/* ignore */
}
}
+ // Replay any model deltas that fired before the browser managed to open
+ // the WebSocket. For a fast local model the first tokens land within
+ // 50–500ms of startUiTask returning, which is typically *before* the WS
+ // handshake completes — without this replay those tokens were lost and
+ // the user saw no live streaming at all.
+ for (const payload of task.deltaBuffer) {
+ try {
+ ws.send(JSON.stringify({ taskId, ...payload }));
+ } catch {
+ /* ignore */
+ }
+ }
ws.on('close', () => {
task.subscribers.delete(ws);
});
diff --git a/test/unit/ask-user-tool.test.ts b/test/unit/ask-user-tool.test.ts
index 7468b70..2d372d3 100644
--- a/test/unit/ask-user-tool.test.ts
+++ b/test/unit/ask-user-tool.test.ts
@@ -34,4 +34,17 @@ describe('ask_user tool (non-interactive)', () => {
expect(r.success).toBe(false);
expect(r.error?.class).toBe('user_input');
});
+
+ it('rejects an empty question with a non-retryable user_input error', async () => {
+ const r = await askUserTool.execute({ question: '', nonInteractiveDefault: 'y' }, ctx);
+ expect(r.success).toBe(false);
+ expect(r.error?.class).toBe('user_input');
+ expect(r.error?.retryable).toBe(false);
+ });
+
+ it('rejects too-short questions so the executor switches tools', async () => {
+ const r = await askUserTool.execute({ question: '??', nonInteractiveDefault: 'y' }, ctx);
+ expect(r.success).toBe(false);
+ expect(r.error?.class).toBe('user_input');
+ });
});
diff --git a/test/unit/classifier.test.ts b/test/unit/classifier.test.ts
index b8dbd9b..51043f7 100644
--- a/test/unit/classifier.test.ts
+++ b/test/unit/classifier.test.ts
@@ -12,7 +12,7 @@
*/
import { describe, it, expect } from 'vitest';
-import { heuristicClassify } from '../../src/classifier/heuristics';
+import { heuristicClassify, looksConversational } from '../../src/classifier/heuristics';
describe('heuristicClassify', () => {
it('detects bugfix intent', () => {
@@ -41,3 +41,46 @@ describe('heuristicClassify', () => {
expect(r.complexity).toBe('trivial');
});
});
+
+describe('looksConversational', () => {
+ it('accepts pure concept questions', () => {
+ expect(looksConversational('what is the difference between a map and a dict?')).toBe(true);
+ expect(looksConversational('why is tail-call optimization hard in v8?')).toBe(true);
+ expect(looksConversational('how does the event loop work?')).toBe(true);
+ expect(looksConversational('explain closures')).toBe(true);
+ expect(looksConversational('compare goroutines and threads')).toBe(true);
+ });
+
+ it('rejects anything that references repo artifacts', () => {
+ expect(looksConversational('explain how src/core/loop.ts works')).toBe(false);
+ expect(looksConversational('what is this codebase doing in the auth module?')).toBe(false);
+ expect(looksConversational('summarize the README')).toBe(false);
+ expect(looksConversational('why is this function so slow?')).toBe(false);
+ });
+
+ it('rejects imperatives that imply code changes', () => {
+ expect(looksConversational('create a Map class')).toBe(false);
+ expect(looksConversational('fix the bug where X')).toBe(false);
+ expect(looksConversational('refactor to use async/await')).toBe(false);
+ expect(looksConversational('write a test for Y')).toBe(false);
+ });
+
+ it('accepts common greetings and short chat openers', () => {
+ expect(looksConversational('hi')).toBe(true);
+ expect(looksConversational('hello')).toBe(true);
+ expect(looksConversational('hey!')).toBe(true);
+ expect(looksConversational('thanks')).toBe(true);
+ expect(looksConversational('good morning')).toBe(true);
+ expect(looksConversational('ok')).toBe(true);
+ });
+
+ it('accepts short non-imperative prose without a repo reference', () => {
+ expect(looksConversational('tell me something fun')).toBe(true);
+ expect(looksConversational('recommend a book')).toBe(true);
+ expect(looksConversational('your thoughts on rust')).toBe(true);
+ });
+
+ it('rejects overly long inputs', () => {
+ expect(looksConversational('a'.repeat(500))).toBe(false);
+ });
+});
diff --git a/test/unit/edit-file.test.ts b/test/unit/edit-file.test.ts
index 21ef9e5..3736e8f 100644
--- a/test/unit/edit-file.test.ts
+++ b/test/unit/edit-file.test.ts
@@ -76,4 +76,35 @@ describe('edit_file tool', () => {
expect(r.success).toBe(false);
expect(r.error?.class).toBe('not_found');
});
+
+ it('treats empty oldText on an empty file as a full-body write (planner pattern)', async () => {
+ fs.writeFileSync(path.join(tmp, 'empty.js'), '');
+ const body =
+ '/** @param {number} n */\nexport const fib = (n) => (n < 2 ? n : fib(n - 1) + fib(n - 2));\n';
+ const r = await editFileTool.execute(
+ { path: 'empty.js', oldText: '', newText: body },
+ { ...ctx, projectRoot: tmp },
+ );
+ expect(r.success).toBe(true);
+ expect(fs.readFileSync(path.join(tmp, 'empty.js'), 'utf8')).toBe(body);
+ });
+
+ it('treats empty oldText on a missing file as a create', async () => {
+ const r = await editFileTool.execute(
+ { path: 'new.js', oldText: '', newText: 'export const x = 1;\n' },
+ { ...ctx, projectRoot: tmp },
+ );
+ expect(r.success).toBe(true);
+ expect(fs.readFileSync(path.join(tmp, 'new.js'), 'utf8')).toBe('export const x = 1;\n');
+ });
+
+ it('still rejects empty oldText on a non-empty file (ambiguous)', async () => {
+ fs.writeFileSync(path.join(tmp, 'has-content.txt'), 'pre-existing');
+ const r = await editFileTool.execute(
+ { path: 'has-content.txt', oldText: '', newText: 'anything' },
+ { ...ctx, projectRoot: tmp },
+ );
+ expect(r.success).toBe(false);
+ expect(r.error?.class).toBe('user_input');
+ });
});
diff --git a/test/unit/file-lock.test.ts b/test/unit/file-lock.test.ts
new file mode 100644
index 0000000..f9796ee
--- /dev/null
+++ b/test/unit/file-lock.test.ts
@@ -0,0 +1,136 @@
+/**
+ * Concurrency guarantees for the edit_file / write_file tools.
+ *
+ * The bug these tests pin: a naive read-modify-write is a TOCTOU race. If
+ * call A reads content X, call B reads content X, A writes X' and THEN B
+ * writes X'' — A's change is silently lost. The fix serializes callers via
+ * an in-process per-path mutex and commits via an atomic temp+rename.
+ *
+ * These tests fire off many concurrent tool calls against a single file
+ * and assert that every intended change is present in the final content.
+ *
+ * @author Son Nguyen
+ */
+
+import { describe, it, expect, beforeEach } from 'vitest';
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+
+import { editFileTool } from '../../src/tools/edit-file';
+import { writeFileTool } from '../../src/tools/write-file';
+import { withFileLock, writeAtomic, _resetFileLocksForTest } from '../../src/sandbox/file-lock';
+
+const ctxFor = (root: string) => ({
+ taskId: 't',
+ projectId: 'p',
+ projectRoot: root,
+ traceId: 'r',
+ runId: 'r',
+});
+
+describe('file-lock primitive', () => {
+ beforeEach(() => _resetFileLocksForTest());
+
+ it('serializes callers on the same path (observable order)', async () => {
+ const order: number[] = [];
+ const makeJob = (n: number, delayMs: number) => async () => {
+ await new Promise((r) => setTimeout(r, delayMs));
+ order.push(n);
+ };
+ // Kick off in reverse order but all lock the same path — despite the
+ // first task sleeping longer, all three must still run one at a time
+ // in submission order.
+ await Promise.all([
+ withFileLock('/tmp/lock-demo', makeJob(1, 30)),
+ withFileLock('/tmp/lock-demo', makeJob(2, 10)),
+ withFileLock('/tmp/lock-demo', makeJob(3, 0)),
+ ]);
+ expect(order).toEqual([1, 2, 3]);
+ });
+
+ it('does NOT serialize callers on different paths', async () => {
+ const order: string[] = [];
+ await Promise.all([
+ withFileLock('/tmp/path-a', async () => {
+ await new Promise((r) => setTimeout(r, 30));
+ order.push('a');
+ }),
+ withFileLock('/tmp/path-b', async () => {
+ order.push('b');
+ }),
+ ]);
+ // b is allowed to finish before a because they're on different paths.
+ expect(order).toEqual(['b', 'a']);
+ });
+
+ it('propagates errors from the critical section', async () => {
+ await expect(
+ withFileLock('/tmp/lock-err', async () => {
+ throw new Error('boom');
+ }),
+ ).rejects.toThrow('boom');
+ // And the next acquirer still runs — the failure doesn't deadlock the key.
+ let ran = false;
+ await withFileLock('/tmp/lock-err', async () => {
+ ran = true;
+ });
+ expect(ran).toBe(true);
+ });
+});
+
+describe('writeAtomic', () => {
+ it('leaves no orphan temp files on success', () => {
+ const tmp = fs.mkdtempSync(path.join(os.tmpdir(), 'forge-atomic-'));
+ const target = path.join(tmp, 'out.txt');
+ writeAtomic(target, 'hello');
+ expect(fs.readFileSync(target, 'utf8')).toBe('hello');
+ // Nothing matching the temp pattern should remain.
+ const leftovers = fs.readdirSync(tmp).filter((f) => f.startsWith('.out.txt.forge-tmp.'));
+ expect(leftovers).toEqual([]);
+ });
+});
+
+describe('edit_file under concurrent load', () => {
+ it('loses no replacements when 20 edits target disjoint snippets in parallel', async () => {
+ const tmp = fs.mkdtempSync(path.join(os.tmpdir(), 'forge-concurrent-edit-'));
+ const target = path.join(tmp, 'a.txt');
+ // 20 unique sentinels — zero-padded ids so no sentinel is a substring
+ // of another (a substring match would trip edit_file's ambiguity guard).
+ const N = 20;
+ const oldSentinel = (i: number) => `<>`;
+ const newSentinel = (i: number) => `<>`;
+ const original = Array.from({ length: N }, (_, i) => oldSentinel(i)).join('\n') + '\n';
+ fs.writeFileSync(target, original);
+
+ const results = await Promise.all(
+ Array.from({ length: N }, (_, i) =>
+ editFileTool.execute(
+ { path: 'a.txt', oldText: oldSentinel(i), newText: newSentinel(i) },
+ ctxFor(tmp),
+ ),
+ ),
+ );
+
+ expect(results.every((r) => r.success)).toBe(true);
+ const final = fs.readFileSync(target, 'utf8');
+ for (let i = 0; i < N; i++) {
+ expect(final).toContain(newSentinel(i));
+ expect(final).not.toContain(oldSentinel(i));
+ }
+ });
+
+ it('two concurrent writes to the same path land one-then-other, neither is lost', async () => {
+ const tmp = fs.mkdtempSync(path.join(os.tmpdir(), 'forge-concurrent-write-'));
+ const target = path.join(tmp, 'a.txt');
+ const r1 = writeFileTool.execute({ path: 'a.txt', content: 'first' }, ctxFor(tmp));
+ const r2 = writeFileTool.execute({ path: 'a.txt', content: 'second' }, ctxFor(tmp));
+ const [a, b] = await Promise.all([r1, r2]);
+ expect(a.success).toBe(true);
+ expect(b.success).toBe(true);
+ // The second write wins because writes serialize in submission order.
+ expect(fs.readFileSync(target, 'utf8')).toBe('second');
+ // And critically: the file is never a torn mix of the two contents.
+ expect(['first', 'second']).toContain(fs.readFileSync(target, 'utf8'));
+ });
+});
diff --git a/test/unit/permission-manager-noninteractive.test.ts b/test/unit/permission-manager-noninteractive.test.ts
index 88ce8a6..e17f870 100644
--- a/test/unit/permission-manager-noninteractive.test.ts
+++ b/test/unit/permission-manager-noninteractive.test.ts
@@ -14,9 +14,30 @@
import { describe, it, expect, vi, beforeEach } from 'vitest';
+// Shared mutable fixture so individual tests can seed persisted grants
+// without redeclaring the mock per file.
+const persisted: Array<{
+ tool: string;
+ project_id: string | null;
+ scope: string;
+ granted_at: string;
+ expires_at: string | null;
+}> = [];
+
vi.mock('../../src/persistence/index-db', () => ({
- loadPermissionGrants: () => [],
- savePermissionGrant: () => undefined,
+ loadPermissionGrants: (tool: string, projectId: string | null) =>
+ persisted.filter(
+ (g) => g.tool === tool && (g.project_id === projectId || g.scope === 'global'),
+ ),
+ savePermissionGrant: (row: {
+ tool: string;
+ project_id: string | null;
+ scope: string;
+ granted_at: string;
+ expires_at: string | null;
+ }) => {
+ persisted.push(row);
+ },
getDb: () => ({
prepare: () => ({ all: () => [], get: () => null, run: () => undefined }),
exec: () => undefined,
@@ -91,3 +112,57 @@ describe('permission manager — non-interactive decisions', () => {
).rejects.toThrow(/permission denied/i);
});
});
+
+describe('permission manager — cached grants', () => {
+ beforeEach(() => {
+ clearSession();
+ persisted.length = 0;
+ });
+
+ it("honors a project grant for an execute tool (fix: don't re-prompt run_tests)", async () => {
+ persisted.push({
+ tool: 'run_tests',
+ project_id: 'proj',
+ scope: 'project',
+ granted_at: new Date().toISOString(),
+ expires_at: null,
+ });
+ const d = await requestPermission(
+ baseReq({ tool: 'run_tests', sideEffect: 'execute', risk: 'medium' }),
+ { nonInteractive: true },
+ );
+ expect(d).toBe('allow_session');
+ });
+
+ it('honors a global grant for a network tool', async () => {
+ persisted.push({
+ tool: 'web.fetch',
+ project_id: null,
+ scope: 'global',
+ granted_at: new Date().toISOString(),
+ expires_at: null,
+ });
+ const d = await requestPermission(
+ baseReq({ tool: 'web.fetch', sideEffect: 'network', risk: 'medium' }),
+ { nonInteractive: true },
+ );
+ expect(d).toBe('allow_session');
+ });
+
+ it('still re-confirms critical-risk tools regardless of grants', async () => {
+ persisted.push({
+ tool: 'dangerous_op',
+ project_id: 'proj',
+ scope: 'project',
+ granted_at: new Date().toISOString(),
+ expires_at: null,
+ });
+ const d = await requestPermission(
+ baseReq({ tool: 'dangerous_op', sideEffect: 'execute', risk: 'critical' }),
+ { nonInteractive: true },
+ );
+ // Critical risk bypasses cache and falls through to prompt; in
+ // non-interactive mode that means deny.
+ expect(d).toBe('deny');
+ });
+});
diff --git a/test/unit/run-tests-tool.test.ts b/test/unit/run-tests-tool.test.ts
index ac2bc5d..2ce19ba 100644
--- a/test/unit/run-tests-tool.test.ts
+++ b/test/unit/run-tests-tool.test.ts
@@ -118,6 +118,42 @@ describe('run_tests tool', () => {
expect(mockRunCommand).not.toHaveBeenCalled();
});
+ it('falls back to `node --test` when *.test.js files exist without a package.json', async () => {
+ const root = mkdir();
+ fs.mkdirSync(path.join(root, 'test'));
+ fs.writeFileSync(path.join(root, 'test', 'fib.test.js'), "import test from 'node:test';\n");
+ mockRunCommand.mockResolvedValueOnce({
+ stdout: '',
+ stderr: '',
+ exitCode: 0,
+ signal: null,
+ timedOut: false,
+ });
+ const r = await runTestsTool.execute({}, ctxFor(root));
+ expect(r.success).toBe(true);
+ expect(r.output?.framework).toBe('node');
+ expect(mockRunCommand.mock.calls[0][0]).toBe('node --test');
+ });
+
+ it('prefers npm over node:test when both are present', async () => {
+ const root = mkdir();
+ fs.writeFileSync(
+ path.join(root, 'package.json'),
+ JSON.stringify({ scripts: { test: 'vitest' } }),
+ );
+ fs.mkdirSync(path.join(root, 'test'));
+ fs.writeFileSync(path.join(root, 'test', 'x.test.js'), '');
+ mockRunCommand.mockResolvedValueOnce({
+ stdout: '',
+ stderr: '',
+ exitCode: 0,
+ signal: null,
+ timedOut: false,
+ });
+ const r = await runTestsTool.execute({}, ctxFor(root));
+ expect(r.output?.framework).toBe('npm');
+ });
+
it('reports failure when exit code is non-zero', async () => {
const root = mkdir();
fs.writeFileSync(
diff --git a/test/unit/state-machine.test.ts b/test/unit/state-machine.test.ts
index 6d3f9f2..54eb3e2 100644
--- a/test/unit/state-machine.test.ts
+++ b/test/unit/state-machine.test.ts
@@ -36,4 +36,14 @@ describe('task state machine', () => {
it('illegal transitions are rejected', () => {
expect(isLegalTransition('draft', 'running')).toBe(false);
});
+
+ it('draft → completed / failed allowed for the conversation fast-path', () => {
+ // The conversation fast-path skips planning/execution entirely and
+ // records a terminal result directly from draft.
+ expect(isLegalTransition('draft', 'completed')).toBe(true);
+ expect(isLegalTransition('draft', 'failed')).toBe(true);
+ // Mid-lifecycle states are still not reachable from draft.
+ expect(isLegalTransition('draft', 'verifying')).toBe(false);
+ expect(isLegalTransition('draft', 'approved')).toBe(false);
+ });
});
diff --git a/test/unit/updater.test.ts b/test/unit/updater.test.ts
index 262fb99..9f692fe 100644
--- a/test/unit/updater.test.ts
+++ b/test/unit/updater.test.ts
@@ -128,6 +128,20 @@ describe('updater — checkForUpdate', () => {
expect(res!.latestVersion).toBe(res!.currentVersion);
});
+ it('hits the real package on the npm registry (not an unrelated @forge/cli)', async () => {
+ // Regression guard: an earlier version hardcoded `@forge/cli`, an
+ // unrelated package at 12.18.0, so `forge` told users to update to a
+ // wildly wrong version. The URL must be derived from package.json#name.
+ mockRequest.mockResolvedValueOnce(npmBody('latest', '99.0.0'));
+ await checkForUpdate({ force: true });
+ const url = String(mockRequest.mock.calls[0][0]);
+ expect(url).toContain('registry.npmjs.org');
+ expect(url).toContain('@hoangsonw');
+ // %2F is the correct npm-registry encoding for the scope separator.
+ expect(url).toContain('%2Fforge');
+ expect(url).not.toContain('@forge/cli');
+ });
+
it('honours the beta channel dist-tag', async () => {
const cfg = loadGlobalConfig();
saveGlobalConfig({ ...cfg, update: { ...cfg.update, channel: 'beta' } });
diff --git a/test/unit/write-file-tool.test.ts b/test/unit/write-file-tool.test.ts
index 17fbd3d..2884afb 100644
--- a/test/unit/write-file-tool.test.ts
+++ b/test/unit/write-file-tool.test.ts
@@ -85,11 +85,20 @@ describe('write_file tool', () => {
expect(fs.existsSync(path.join(tmp, 'sub/nested/deep.txt'))).toBe(true);
});
- it('fails without createDirs when parent is missing', async () => {
+ it('auto-creates missing parent dirs by default (mkdir -p semantics)', async () => {
const r = await writeFileTool.execute(
{ path: 'missing/x.txt', content: 'hi' },
{ ...ctx, projectRoot: tmp },
);
+ expect(r.success).toBe(true);
+ expect(fs.existsSync(path.join(tmp, 'missing/x.txt'))).toBe(true);
+ });
+
+ it('fails when createDirs is explicitly false and parent is missing', async () => {
+ const r = await writeFileTool.execute(
+ { path: 'absent/y.txt', content: 'hi', createDirs: false },
+ { ...ctx, projectRoot: tmp },
+ );
expect(r.success).toBe(false);
});
});
diff --git a/wiki/styles.css b/wiki/styles.css
index 4b17c6f..dc1d3fc 100644
--- a/wiki/styles.css
+++ b/wiki/styles.css
@@ -411,6 +411,31 @@ ul li {
50% { transform: scale(1.6); opacity: .5; }
}
+@media (max-width: 560px) {
+ .eyebrow {
+ gap: 8px;
+ padding: 7px 14px;
+ font-size: 9px;
+ letter-spacing: .12em;
+ line-height: 1.45;
+ max-width: 100%;
+ justify-content: center;
+ text-align: center;
+ }
+ .eyebrow .pulse {
+ flex: 0 0 auto;
+ align-self: center;
+ }
+}
+
+@media (max-width: 400px) {
+ .eyebrow {
+ font-size: 8.5px;
+ letter-spacing: .08em;
+ padding: 7px 12px;
+ }
+}
+
.hero h1 {
margin-top: 22px;
font-size: clamp(34px, 5.5vw, 72px);