Project-wide guidance for any code-writing agent (Codex, OpenAI agent runtime, Cursor, Aider, Cline, Continue, …). Claude Code reads this too but also reads CLAUDE.md.
Tooling infrastructure. On top of this file, the repo ships agent-specific wiring so every major coding agent has the same rules, skills, and permissions:
.claude/— Claude Code settings, path-scoped rules, skills (/verify,/add-tool,/add-provider,/add-agent,/debug-loop,/release-check), subagents (code-reviewer,test-runner,docs-auditor), and safe/deny permissions..codex/— Codexconfig.toml, execpolicyrules/default.rules, custom agents (forge_explorer,forge_reviewer,forge_test_runner), guardrail hooks, and Codex-native skills..cursor/rules/— Cursor*.mdcrules (always-applied core + path-scoped TypeScript/testing/tools/models/security/UI rules)..agents/skills/— portable skills following the open agentskills.io spec. Any compliant agent can read these without vendor-specific wiring.If you update a workflow (e.g. the verify chain), update it in
.agents/skills/verify/SKILL.mdfirst, then mirror the change into.claude/skills/verify/SKILL.mdand.codex/skills/verify/SKILL.mdif they diverge.Agentic Coding Flywheel. Forge follows a plan-heavy, bead-driven workflow for any non-trivial change. See FLYWHEEL.md for the full methodology. In brief:
- Plan space →
.flywheel/plans/(iterated markdown plans)- Bead space →
.beads/beads.jsonl(self-contained work units)- Code space →
src/+test/(implementation)Skills:
/plan,/plan-synthesize,/plan-to-beads,/polish-beads(run 4–6×),/fresh-eyes,/dedupe-beads,/idea-wizard,/deep-review,/reality-check,/landing,/de-slopify. Subagents:bead-polisher,plan-synthesizer,skill-refiner(Claude);forge_bead_polisher,forge_plan_synthesizer(Codex). The 8 canonical operators live in.flywheel/operators/.Post-compaction ritual. When an agent gets confused, send: "Reread AGENTS.md, CLAUDE.md, and FLYWHEEL.md so they're still fresh in your mind." This is the single most common intervention in the methodology.
Conform to the OpenAI
AGENTS.mdconvention: this file is a flat Markdown cheat-sheet that answers the questions "where am I, what can I run, and what shouldn't I break?".
Forge is a TypeScript CLI runtime for local-first agentic software engineering. Node 20+. Ships via npm and a multi-arch Docker image.
- Entry point:
bin/forge.js→dist/cli/index.js - Orchestrator:
src/core/orchestrator.ts - Agentic loop:
src/core/loop.ts
npm ci --ignore-scripts
npm run build # tsc + copy-assets
npm test # vitest; 249 tests must pass
npm run typecheck
npm run lint
npm run format # writes
npm run format:check # reads, for CI
npm run test:coverage
./bin/forge.js doctorAlways end a change with:
npm run format && npm run lint && npm run build && npm test| Path | Purpose |
|---|---|
src/cli/ |
commander-based CLI, REPL, raw-mode input editor |
src/core/ |
orchestrator, agentic loop, mode policy, validation gate |
src/agents/ |
planner, architect, executor, reviewer, debugger, memory |
src/models/ |
providers (ollama/openai/anthropic/llamacpp/vllm/lmstudio), router, adapter, catalog |
src/tools/ |
18 tools |
src/permissions/ |
risk + interactive permission manager |
src/sandbox/ |
path-safe fs + command risk classifier |
src/persistence/ |
tasks/sessions/conversations/events + SQLite |
src/memory/ |
hot/warm/cold/learning |
src/ui/ |
HTTP + WS dashboard + static app |
src/mcp/ |
MCP bridge |
test/unit/ |
vitest unit tests |
docs/ |
ARCHITECTURE, INSTALL, SETUP, metrics |
.github/workflows/ |
ci, release, nightly |
docker/ |
Dockerfile + compose |
- Keep
npm testat 100% pass. - Respect the state machine in
src/persistence/tasks.ts. - Gate every new tool through
requestPermission. - Classify model ids through
src/models/local-catalog.ts(don't hand-roll regexes in a new provider). - Add a unit test for any new logic in
src/core,src/agents, orsrc/tools.
- Bypass the permission system.
- Introduce network calls in tests — use
vi.mock. - Log credentials. Use
src/security/redact.ts. - Add dependencies without a clear reason.
- Rename exported APIs without updating every caller + the docs.
- Prefer
readonly/ immutable data flow. - Prefer function modules over classes unless state really needs encapsulation (see the provider classes for the accepted shape).
- Prefer explicit
Result<T, E>-style shapes over thrown errors for expected failures. Throw only for programmer errors.
Vitest. Patterns to copy:
- Stubbing
callModel: seetest/unit/executor-loop.test.ts. - Stubbing providers: see
test/unit/adapter.test.ts. - Tempdir + cleanup: see
test/unit/validation-gate.test.ts.
9 parallel jobs on every PR:
- 🎨 format (prettier --check)
- 🧹 lint (eslint)
- 🧠 typecheck (tsc --noEmit)
- 🧪 test (matrix: ubuntu + macOS × Node 20 + 22)
- 📈 coverage
- 🏗️ build (full
npm run build) - 🐳 docker-build (catches Dockerfile drift)
- 🔐 audit (npm audit, informational)
- 📊 pipeline status (aggregates + fails if any required job failed)
Release (tag v*): 6-stage pipeline — gate, artifacts, docker publish to
GHCR, signed manifest + GH release, npm publish (provenance), status.
The product runs on personal machines, often alongside Ollama. Keep it lean:
- No UI framework (app shell is vanilla JS + CSS, < 100 KB).
- No synchronous disk reads on REPL redraw or UI poll paths.
- Default executor turn cap in
src/core/mode-policy.ts. - Watchers are ref-counted so multiple surfaces share one file watcher.
- Providers do 1.5 s availability probes, not long timeouts.
- All paths resolved to realpath + confined to project root.
- All tool invocations classified by risk × sideEffect and gated.
- Shell commands classified before execution;
criticalis hard-blocked. - Credentials via
src/keychain/(macOS keychain / libsecret / DPAPI). - Prompt injection fenced by
src/security/injection.ts.
- Why the agentic loop works: docs/ARCHITECTURE.md §2
- State machine diagram: docs/ARCHITECTURE.md §3
- Mode caps table: docs/ARCHITECTURE.md §4
- Provider routing: docs/ARCHITECTURE.md §6
- Dev setup: docs/SETUP.md
- Install: docs/INSTALL.md
Read the failing test first. Then read src/core/loop.ts. Then ask.
Surfacing "I don't know what this invariant is" is better than guessing
and breaking a test six commits later.