Forge

A local-first, plan-first, multi-agent, and programmable software-engineering runtime.

Not an assistant. A runtime. Forge brings its own scheduler, sandbox, permission system, state machine, agentic loop, memory layers, and plugin ecosystem. You pick the model. You approve the actions. Everything is inspectable, replayable, and yours.

Install · Dev setup · Architecture · Releases & versioning · Demo walkthrough · Wiki Page · NPM Package · License

At a glance
Why Forge
Quick start
The agentic loop (with diagrams)
Task state machine
Executor — iterative tool-use loop
Memory layers
Provider routing & auto-adaptation
Safety model
Modes
CLI reference
Filesystem layout
Skills · Instructions · MCP
Run in a container
CI/CD pipeline
Architecture map
Development
License

At a glance

Forge is a local-first, plan-first, multi-agent, and programmable software-engineering runtime. Unlike Claude Code or OpenAI Codex, Forge is local-first infrastructure, not a hosted assistant. It brings its own scheduler, sandbox, permission system, state machine, agentic loop, memory layers, and plugin ecosystem. You pick & host the model. You approve the actions. Everything is inspectable, replayable, and yours.

	value	reproducer
⚡ `forge doctor` cold-start	173 ms	`time node bin/forge.js doctor --no-banner`
⚡ `forge --help` cold-start	238 ms	`time node bin/forge.js --help`
📦 UI shell · zero CDN	90 KB uncompressed	`wc -c src/ui/public/app.js`
🌐 Provider probe timeout	1.5 s	`src/models/openai.ts#isAvailable`
🔌 Model providers (auto-detected)	6	ollama · lmstudio · vllm · llama.cpp · openai-compat · anthropic
🧠 Model families classified	41	Llama / Qwen / DeepSeek / Gemma / Phi / Mistral / Codestral / …
🤖 Built-in agents	6	planner · architect · executor · reviewer · debugger · memory
🛠 Tools available to agents	18	read · write · edit · grep · glob · run_command · git · web · …
💬 CLI subcommands · slash commands	24 · 55	`forge --help` · `/help` in REPL
🎛 Modes	9	fast · balanced · heavy · plan · execute · audit · debug · architect · offline-safe
✅ Tests	548 / 97 files · 100% passing · ~5.5 s wall-clock	`npx vitest run`
🐳 CI jobs · release stages	9 · 6	`.github/workflows/`
📦 Container image	~355 MB · multi-arch · non-root · HEALTHCHECK	`docker pull ghcr.io/hoangsonw/forge-agentic-coding-cli:latest`

Tech Stack:

Why Forge

Most "AI coding tools" are thin chat wrappers over a cloud API. Forge is engineering infrastructure with first-class:

mindmap
  root((Forge))
    Local-first
      Auto-detect Ollama / LM Studio / vLLM / llama.cpp
      Model-family auto-adapt
      Offline-safe mode
    Agentic
      6 role-typed agents
      Iterative tool-use executor
      Validation gate (typecheck/lint)
      Bounded retries + diagnose
    Controllable
      Default-deny permissions
      Path-realpath-confined sandbox
      Risk-classified shell
      OS-keychain credentials
    Inspectable
      Tasks JSON · Sessions JSONL · Events JSONL
      Prompt-hashed, replayable
      Concurrent-writer-safe
    Extensible
      Markdown skills
      MCP connectors
      Pluggable agents + tools
    Performant
      REPL cold-start 238 ms
      UI shell 89 KB · zero CDN
      Providers probe in 1.5 s

Local-first. Forge auto-detects Ollama, LM Studio, vLLM, and llama.cpp on their default ports. Cloud (Anthropic / OpenAI / LocalAI / Together / Groq / Azure) is opt-in, not required.
Agentic but controllable. Every action is classified (risk × side-effect × sensitivity), gated by a permission system, and logged with a reproducible prompt hash.
Inspectable. Sessions JSONL, tasks JSON, events JSONL. Two processes can edit the same conversation concurrently (POSIX O_APPEND + mkdir lockfile).
Mode-driven. 9 explicit modes — each carries enforceable budgets (max executor turns, max validation retries, allowMutations, maxAutoRisk).
Extensible. Drop a Markdown file in ~/.forge/skills/. Add an Agent. Wire an MCP connector. No rebuild required.
Performant. forge doctor cold-starts in 173 ms. The UI shell is a single 89 KB JavaScript file with zero CDN dependencies. Providers are probed in parallel with a 1.5 s timeout.
Open source. MIT license. No telemetry, no phoning home, no hidden backdoors. You get the whole stack. Unlike hosted assistants, Forge is fully inspectable, replayable, and yours.

Tip

Unlike Claude Code or OpenAI Codex, Forge is not a hosted assistant. It's local-first infrastructure. You pick & host the model. You approve the actions. Everything is inspectable, replayable, and yours.

Quick start

# Option 1 — npm (global):
npm install -g @hoangsonw/forge
forge doctor             # green checks + role→model mapping
forge run "explain this repo"

# Option 2 — Docker:
docker run --rm -it \
  -v forge-home:/data -v "$PWD:/workspace" \
  ghcr.io/hoangsonw/forge-agentic-coding-cli:latest forge run "explain this repo"

# Option 3 — full stack (forge + ollama + dashboard):
docker compose -f docker/docker-compose.yml up -d
# open http://127.0.0.1:7823

System requirements

	Minimum	Notes
Node.js	≥ 20 (22 tested)	Enforced via `package.json#engines`. Not needed if you use Docker.
OS	macOS · Linux · Windows (WSL recommended)	`better-sqlite3` ships prebuilds for darwin-x64, darwin-arm64, linux-x64, linux-arm64, win32-x64 — no compile step.
Disk	~150 MB for `node_modules`; state under `~/.forge` grows with history	Override via `FORGE_HOME`.
RAM	Forge ~100 MB; your local model consumes its own RAM/VRAM	`forge doctor` cold-starts in ~170 ms.
Docker (alt path)	≥ 25	Multi-arch (amd64, arm64) image on GHCR. Zero host Node needed.
At least one model source	Ollama · LM Studio · vLLM · llama.cpp · Anthropic · OpenAI-compatible	`forge doctor` tells you which are reachable.

Runtime npm dependencies (13, zero optional): @modelcontextprotocol/sdk, better-sqlite3 (native, prebuilt), chalk, cli-table3, commander, dotenv, ora, prompts, semver, undici, ws, yaml, zod. No Python, Rust, or Go toolchain.

Recommended (not required): ripgrep (fast grep tool path), git (diff/status tools + project-root detection), $EDITOR (used when you pick "Edit" on a plan).

See docs/INSTALL.md for per-OS notes and docs/SETUP.md for contributor setup.

See it running

Three surfaces, one runtime.

REPL (Interactive Terminal) Mode

REPL.mp4

CLI (Headless, One-shot run) Mode

CLI.mp4

Web UI Dashboard

UI.mp4

The agentic loop

Every non-trivial task flows through the same pipeline. Nothing escapes it — no hidden shortcut, no "just this once" bypass.

flowchart LR
  classDef step fill:#0f172a,stroke:#38bdf8,color:#f1f5f9,rx:4,ry:4
  classDef gate fill:#1e1b4b,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4
  classDef term fill:#14532d,stroke:#10b981,color:#d1fae5,rx:4,ry:4
  classDef fail fill:#450a0a,stroke:#f87171,color:#fee2e2,rx:4,ry:4

  IN([user prompt]):::step --> CLASSIFY[classify]:::step
  CLASSIFY --> PLAN[plan · DAG]:::step
  PLAN --> VALID{valid plan?}:::gate
  VALID -->|no| FIX[auto-fix]:::step --> VALID
  VALID -->|yes| APPROVE{user approves?}:::gate
  APPROVE -->|edit| PLAN
  APPROVE -->|cancel| CANCEL([cancelled]):::fail
  APPROVE -->|yes| EXEC[execute]:::step
  EXEC --> STEP[next step]:::step
  STEP --> TOOLS[iterative tool use]:::step
  TOOLS --> VGATE{validation gate?}:::gate
  VGATE -->|fail + budget| TOOLS
  VGATE -->|fail + exhausted| RETRY{retries?}:::gate
  VGATE -->|ok| DONE{more steps?}:::gate
  RETRY -->|yes| STEP
  RETRY -->|no| DIAG[diagnose]:::step --> FAIL([failed]):::fail
  DONE -->|yes| STEP
  DONE -->|no| VERIFY[reviewer]:::step
  VERIFY --> VSUM{approves?}:::gate
  VSUM -->|no| STEP
  VSUM -->|yes| COMP([completed]):::term

Source: src/core/loop.ts. Retry cap is 3, then the debugger agent diagnoses before the task is marked failed.

A concrete run

forge run "fix the failing login test" --mode heavy
  → classified:   bugfix · complexity=moderate · risk=low
  → plan:         4 steps  (analyze → locate → patch → run_tests)
  → approve?      [y/n/edit]
  → executor:     turn 1 — read_file src/auth/login.ts
                  turn 2 — grep "issuedAt" in src
                  turn 3 — apply_patch src/auth/login.ts
                  turn 4 — run_command "npm test -- auth.login"
  → validate:     typecheck ✓   lint ✓
  → reviewer:     approved
  → ✔ Done. Files changed: src/auth/login.ts

Task state machine

Every task lives in exactly one of 10 statuses. Transitions are enforced by LEGAL_TRANSITIONS — illegal moves throw state_invalid with the legal-next list in recoveryHint.

stateDiagram-v2
  [*] --> draft
  draft --> planned: planner output
  draft --> cancelled

  planned --> approved: user approves
  planned --> cancelled
  planned --> blocked

  approved --> scheduled
  approved --> cancelled

  scheduled --> running
  scheduled --> cancelled
  scheduled --> blocked

  running --> verifying
  running --> failed
  running --> blocked
  running --> cancelled

  verifying --> completed
  verifying --> failed
  verifying --> running: reviewer bounces

  completed --> draft: forge resume
  failed    --> draft: forge resume
  blocked   --> draft: forge resume
  blocked   --> cancelled
  cancelled --> draft: forge resume

  completed --> [*]
  failed    --> [*]
  cancelled --> [*]

Source: src/persistence/tasks.ts#LEGAL_TRANSITIONS.

Executor — iterative tool-use loop

Each plan step runs a bounded model↔tool conversation, not a one-shot call. The model sees every tool result and can adapt within the same step — retry with different args, switch tools, or signal done.

sequenceDiagram
  autonumber
  participant L as loop.ts
  participant E as executor.ts
  participant M as model
  participant T as tool
  participant V as validator

  L->>E: runStep(step)
  loop up to maxExecutorTurns (mode-capped)
    E->>M: prompt + schema (JSON-mode)
    M-->>E: { actions[], summary, done? }
    alt done && no failures
      E-->>L: completed
    else has actions
      E->>T: execute each action
      T-->>E: stdout / stderr / exitCode / error
      E->>E: digest + append user turn
    end
  end
  opt step wrote files & mode enables gate
    loop up to maxValidationRetries
      E->>V: typecheck / lint / tsc
      alt passes
        E-->>L: completed
      else fails
        E->>M: VALIDATION_FAILED · <output>
        M-->>E: corrective actions
        E->>T: execute
      end
    end
  end
  E-->>L: { toolResults, summary, filesChanged, completed }

Mode caps — read directly from src/core/mode-policy.ts:

Mode	maxExecutorTurns	maxValidationRetries	allowMutations	maxAutoRisk
fast	2	0	✅	low
balanced	4	1	✅	medium
heavy	8	2	✅	high
plan	0→1	0	❌	low
execute	4	1	✅	medium
audit	3	0	❌	low
debug	6	2	✅	medium
architect	3	1	✅	medium
offline-safe	3	1	✅	medium

Memory layers

Four tiers with distinct retention and access cost:

flowchart TB
  classDef hot  fill:#450a0a,stroke:#f87171,color:#fee2e2,rx:4,ry:4
  classDef warm fill:#451a03,stroke:#fb923c,color:#ffedd5,rx:4,ry:4
  classDef cold fill:#0c4a6e,stroke:#38bdf8,color:#e0f2fe,rx:4,ry:4
  classDef learn fill:#14532d,stroke:#10b981,color:#d1fae5,rx:4,ry:4

  Q[retrieve.ts · query] --> H["Hot<br/>current-session facts<br/>src/memory/hot.ts"]:::hot
  Q --> W["Warm<br/>recent tasks · SQLite<br/>src/memory/warm.ts"]:::warm
  Q --> C["Cold<br/>project files · grep · AST<br/>src/memory/cold.ts"]:::cold
  Q --> L["Learning<br/>patterns + confidence<br/>src/memory/learning.ts"]:::learn

  H -.clear on task end.-> X([evict])
  W -.age out after N days.-> X
  L -.decay if unreinforced.-> L

Hot — in-process per-task facts, cleared at task end.
Warm — SQLite index of recent task metadata; powers "what was I doing yesterday" queries.
Cold — lazy file/grep/AST index scoped to projectRoot. No background indexer; populated on demand.
Learning — patterns keyed by intent:scope with confidence that evolves on success/failure. The planner reads the top-K patterns before producing every plan (see src/agents/planner.ts#learnedPatternBlock).

Provider routing & auto-adaptation

flowchart LR
  classDef local fill:#0c4a6e,stroke:#38bdf8,color:#e0f2fe,rx:4,ry:4
  classDef hosted fill:#3f1d5c,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4
  classDef route fill:#1e293b,stroke:#f1f5f9,color:#f1f5f9,rx:4,ry:4

  ROUTER[router.ts · resolveModel]:::route
  ADAPT[adapter.ts · resolveLocalModel]:::route
  CB[circuit-breaker]:::route
  RL[rate-limit]:::route
  CACHE[prompt cache]:::route
  COST[USD cost ledger]:::route

  subgraph LOCAL[Local runtimes · auto-detected]
    OLL["ollama<br/>:11434"]:::local
    LMS["lmstudio<br/>:1234"]:::local
    VLL["vllm<br/>:8000"]:::local
    LCP["llamacpp<br/>:8080"]:::local
  end
  subgraph HOSTED[Hosted · opt-in]
    ANT["anthropic"]:::hosted
    OAI["openai-compat<br/>(OpenAI / Azure / LocalAI / Together / Groq / Fireworks)"]:::hosted
  end

  ROUTER --> ADAPT --> OLL & LMS & VLL & LCP
  ROUTER --> ANT & OAI
  ROUTER --> CB & RL & CACHE & COST

Auto-adaptation

If your configured model isn't pulled on the provider, Forge picks the best-fit installed model for each role via src/models/local-catalog.ts + src/models/adapter.ts. Cached per process, warns once, never refuses to route.

Supported runtimes

Runtime	Default endpoint	Override
Ollama	`http://127.0.0.1:11434`	`OLLAMA_ENDPOINT`
LM Studio	`http://127.0.0.1:1234/v1`	`LMSTUDIO_ENDPOINT`
vLLM	`http://127.0.0.1:8000/v1`	`VLLM_ENDPOINT`
llama.cpp server	`http://127.0.0.1:8080/v1`	`LLAMACPP_ENDPOINT`
OpenAI-compatible	env-configured	`OPENAI_BASE_URL` + `OPENAI_API_KEY`
Anthropic	hosted	`ANTHROPIC_API_KEY`

Model family classification (41 families)

Role	Families preferred
architect / reviewer / debugger	Llama 3.x / 4.x, Mixtral, Command-R+, DeepSeek V3/R1, Mistral-Large
planner	Qwen 2.5/3, Llama 3.x, DeepSeek V3, Gemma 3, Mistral-Nemo, Command-R, Phi 4
executor (code specialists)	DeepSeek-Coder, Qwen 2.5-Coder, CodeLlama, Codestral, StarCoder, Granite-Code, WizardCoder
fast	Phi 3/4, Gemma 2, TinyLlama, SmolLM, MiniCPM

Unknown models are accepted too — Forge rates them as generic executors rather than refusing to route.

Model size & capability notes

The agentic loop is cheap for the runtime but expensive for the model. Every step is a multi-turn tool-use conversation that returns strict JSON. Small models struggle with this in recognisable ways — please pick the right tool for the job.

Work you want to do	Safe local floor	What fails below the floor
Pure chat ("explain closures")	any 3B instruct (phi-3:mini, gemma-3:2b)	fine — conversation fast-path bypasses tool use entirely
Summarize a file, explain a snippet	7B instruct (qwen2.5:7b, llama3.1:8b)	summary is a line of "I read the file" instead of content
Single-file edits / small features	7B+ code specialist (deepseek-coder:6.7b, qwen2.5-coder:7b)	picks wrong tool (run_command to write files), splits "create empty + edit" patterns, escalates to ask_user on tool errors
Multi-file refactors, new features	14B+ code specialist or a hosted frontier model	plan quality drops; step IDs get inconsistent; validation retries exhausted
Architecture-level changes	hosted (Claude Opus/Sonnet, GPT-4 class) realistically	budgets blow out; changes go off-plan

Forge ships with defences so a small model fails loudly instead of silently corrupting files: the executor prompt spells out step-type → tool mappings, ask_user rejects empty/too-short questions as non-retryable, edit_file handles "create empty then fill" gracefully, parent directories auto-create, provider warm-up is explicit, and the router streams prose without jsonMode for narrator/conversation paths. The result is that a small model will often tell you it can't finish a task; it will rarely write the wrong code into a file.

If in doubt: configure a code specialist for the code role, keep something lighter for fast, and set ANTHROPIC_API_KEY or OPENAI_API_KEY as a fallback — the router uses the hosted provider automatically when the local one fails or trips its circuit breaker.

forge config set models.code    deepseek-coder:6.7b
forge config set models.planner qwen2.5:7b
forge config set models.fast    phi3:mini
export ANTHROPIC_API_KEY=sk-…   # optional fallback

Safety model (not optional)

Forge treats safety as load-bearing. These invariants are enforced in code, not convention:

flowchart TB
  classDef ask fill:#1e1b4b,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4
  classDef allow fill:#14532d,stroke:#10b981,color:#d1fae5,rx:4,ry:4
  classDef deny  fill:#450a0a,stroke:#f87171,color:#fee2e2,rx:4,ry:4

  REQ[tool invocation] --> CLASSIFY[classify risk × sideEffect × sensitivity]
  CLASSIFY --> SANDBOX{path in sandbox? / cmd allow-listed?}
  SANDBOX -->|no| BLOCK[hard-block · sandbox_violation]:::deny
  SANDBOX -->|yes| GATE{risk × sideEffect}
  GATE -->|low · read| AUTO[auto-allow]:::allow
  GATE -->|med · write| ASK[ask user]:::ask
  GATE -->|high · execute / network| STRICT[ask even with --skip-permissions]:::ask
  ASK --> FLAGS{session flags?}
  FLAGS -->|--allow-shell / --allow-files etc.| AUTO
  FLAGS -->|--non-interactive| DENY[deny silently]:::deny
  FLAGS -->|else| PROMPT[interactive prompt]
  PROMPT -->|allow| AUTO
  PROMPT -->|deny| DENY
  AUTO --> EXEC[execute] --> TRUST[trust calibration<br/>auto-allow after N confirmations<br/>src/permissions/manager.ts]

Invariant	Where
Instruction precedence: `System Safety > Page Rules > Mode Rules > Approved Plan > Project Defaults > User Preferences`	`src/prompts/assembler.ts`
Permission model = default deny	`src/permissions/manager.ts`
`--skip-permissions` skips routine prompts only; critical/destructive still ask	`src/permissions/risk.ts`
Retry cap = 3, then debugger escalates	`src/core/loop.ts`
Hard limits: `maxSteps=50` · `maxToolCalls=100` · `maxRuntimeSeconds=600`	`src/config/schema.ts`
Untrusted content (web / MCP / retrieved) fenced as data, never instructions	`src/security/injection.ts`
Secrets redacted before every log, session entry, and prompt	`src/security/redact.ts`
Scoped filesystem sandbox; symlink-escape-proof via realpath	`src/sandbox/fs.ts`
Destructive shell commands blocked (`rm -rf /`, `sudo`, fork bombs, curl-to-shell)	`src/sandbox/shell.ts`
Credentials in OS keychain (macOS / libsecret / DPAPI) + AES-GCM fallback	`src/keychain/`
Release artefacts: SHA-256 + Ed25519 signature verification	`src/release/`

Modes

flowchart LR
  classDef ro fill:#1e293b,stroke:#64748b,color:#cbd5e1,rx:4,ry:4
  classDef rw fill:#0c4a6e,stroke:#38bdf8,color:#e0f2fe,rx:4,ry:4
  classDef big fill:#3f1d5c,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4

  FAST[fast · 2 turns]:::rw
  BAL[balanced · 4 turns · default]:::rw
  HEAVY[heavy · 8 turns · 2 validate retries]:::big
  PLAN[plan · 0 turns · no mutations]:::ro
  EXEC[execute · 4 turns]:::rw
  AUDIT[audit · 3 turns · no mutations]:::ro
  DEBUG[debug · 6 turns · 2 validate retries]:::rw
  ARCH[architect · 3 turns]:::big
  OFFLINE[offline-safe · 3 turns · never hosted]:::rw

Each mode is an enforceable budget — not a hint to the model. See src/core/mode-policy.ts.

CLI reference

▶ See each surface in action in DEMO.md — REPL walkthrough, forge run one-shots, and the web dashboard.

24 subcommands. Full surface:

forge                          # REPL (default)
forge init                     # create ~/.forge + project .forge
forge run "<prompt>"           # full agentic loop
forge plan "<prompt>"          # plan-only
forge execute "<prompt>"       # auto-approve + execute
forge resume [taskId]          # resume any prior task (any status)
forge status                   # runtime state
forge doctor                   # health check + role→model mapping
forge task list|search|delete  # task history (SQLite-indexed); delete prompts (or -y)
forge session list|replay <id> # session JSONL inspection
forge model list               # probe all providers
forge config get|set|path      # configuration
forge mcp list|add|remove      # MCP connections
forge skills list|new          # skill management
forge agents list              # custom agents
forge permissions reset|list   # permission grants
forge daemon start|stop|status # optional background process
forge memory {hot|warm|cold}   # memory inspection
forge cost                     # USD spend ledger
forge ui start                 # local dashboard at :7823
forge bundle {pack|unpack}     # offline bundles
forge container up|down        # compose wrapper
forge update [--check|--force] # self-update (REPL also checks on start, cache-gated)
forge migrate                  # DB migrations
forge changelog                # local changelog view
forge dev                      # dev helpers
forge web {search|fetch}       # web tools
forge spec {new|show|diff}     # spec-driven development

Common flags (`run` / `plan` / `execute`)

--mode <m>             fast|balanced|heavy|plan|execute|audit|debug|architect|offline-safe
--yes                  auto-approve plan
--skip-permissions     skip routine prompts (high-risk still asked)
--allow-files          pre-approve file writes for this session
--allow-shell          pre-approve shell for this session
--allow-network        pre-approve network tools
--allow-web            pre-approve web search/fetch/browse
--allow-mcp            pre-approve MCP tool calls
--strict               confirm every action
--non-interactive      deny all prompts silently (CI mode)
--deterministic        fixed temperatures for reproducibility
--trace                full trace (implies --debug)
--no-banner            omit startup banner

Filesystem layout

flowchart TB
  classDef g fill:#18181b,stroke:#f59e0b,color:#fef3c7,rx:4,ry:4
  classDef p fill:#0c4a6e,stroke:#38bdf8,color:#e0f2fe,rx:4,ry:4

  subgraph GLOBAL["~/.forge  (global)"]
    G1["config.json"]:::g
    G2["instructions.md"]:::g
    G3["skills/*.md"]:::g
    G4["agents/*.md"]:::g
    G5["mcp/*"]:::g
    G6["models/"]:::g
    G7["logs/forge.log"]:::g
    G8["global/index.db  ← SQLite"]:::g
    G9["projects/&lt;hash&gt;/tasks · sessions · events"]:::g
  end

  subgraph PROJECT["./.forge  (per-project)"]
    P1["config.json"]:::p
    P2["instructions.md"]:::p
    P3["skills/  (override global)"]:::p
    P4["agents/"]:::p
    P5["mcp/"]:::p
  end

Paths resolved via src/config/xdg.ts — respects XDG_* env vars on Linux.

Skills · Instructions · MCP

Skills — a Markdown file with YAML frontmatter

---
name: conventional-commit
description: Enforce Conventional Commits in every commit message.
triggers: [commit, git]
---
When writing commit messages, use Conventional Commits:
  feat(scope): …
  fix(scope): …
  refactor(scope): …

Drop into ~/.forge/skills/ (global) or ./.forge/skills/ (project). Project skills override global.

Instructions

Both ~/.forge/instructions.md and ./.forge/instructions.md are layered into every prompt via src/prompts/assembler.ts. Precedence is: System Safety > Page > Mode > Plan > Project > User.

MCP connections

forge mcp list
forge mcp add <name> --transport stdio --command "…"
forge mcp add <name> --transport http --url https://… --auth oauth2-pkce
forge mcp status

Both stdio and HTTP-stream transports supported. OAuth 2.0 + PKCE or API key auth. Tokens stored in the OS keychain.

Run in a container (Docker or Podman)

Single hardened image (non-root, HEALTHCHECK, OCI labels, ~355 MB) that serves both CLI and UI.

▶ Dashboard demo — forge ui start driving a full task end-to-end (plan approval, streamed model output, follow-up thread). More in DEMO.md.

# Pull (multi-arch: linux/amd64 + linux/arm64):
docker pull ghcr.io/hoangsonw/forge-agentic-coding-cli:latest

# One-shot CLI:
docker run --rm -it -v forge-home:/data -v "$PWD:/workspace" \
  ghcr.io/hoangsonw/forge-agentic-coding-cli:latest forge run "explain this repo"

# Dashboard:
docker run --rm -p 7823:7823 -v forge-home:/data \
  ghcr.io/hoangsonw/forge-agentic-coding-cli:latest forge ui start --bind 0.0.0.0

# Full stack (forge + ollama + UI):
docker compose -f docker/docker-compose.yml up -d
# or: podman-compose -f docker/docker-compose.yml up -d

Stack topology:

flowchart LR
  classDef c fill:#0c4a6e,stroke:#38bdf8,color:#e0f2fe,rx:4,ry:4
  classDef v fill:#18181b,stroke:#f59e0b,color:#fef3c7,rx:4,ry:4

  OLLAMA["ollama<br/>:11434 · healthcheck"]:::c
  UI["forge-ui<br/>:7823 · healthcheck · restart unless-stopped"]:::c
  CORE["forge-core<br/>(on-demand via compose run)"]:::c
  FH[forge-home · named volume]:::v
  OM[ollama-models · named volume]:::v

  OLLAMA --> OM
  UI --> FH
  CORE --> FH
  UI --> OLLAMA
  CORE --> OLLAMA

Full install guide: docs/INSTALL.md.

CI/CD pipeline

CI (every PR + push)

flowchart LR
  classDef pass fill:#14532d,stroke:#10b981,color:#d1fae5,rx:4,ry:4
  classDef gate fill:#1e1b4b,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4

  PR[PR / push] --> FMT["🎨 format"]:::pass
  PR --> LINT["🧹 lint"]:::pass
  PR --> TYPE["🧠 typecheck"]:::pass
  PR --> TEST["🧪 test matrix<br/>Ubuntu + macOS × Node 20 + 22"]:::pass
  TEST --> COV["📈 coverage"]:::pass
  TYPE --> BUILD["🏗️ build"]:::pass
  BUILD --> DOCKER["🐳 docker-build"]:::pass
  PR --> AUDIT["🔐 audit"]:::pass
  FMT & LINT & TYPE & TEST & BUILD & DOCKER & AUDIT & COV --> STATUS["📊 pipeline status<br/>GH step summary · fails if any required job failed"]:::gate

Release (on `v*` tag)

flowchart LR
  classDef gate fill:#1e1b4b,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4
  classDef ship fill:#451a03,stroke:#fb923c,color:#ffedd5,rx:4,ry:4

  TAG[git tag v*] --> GATE["🧪 pre-release gate<br/>build + full test suite"]:::gate
  GATE --> ART["📦 artifacts<br/>5 tarball targets"]:::ship
  GATE --> DOCKP["🐳 docker publish<br/>multi-arch → ghcr.io"]:::ship
  ART --> MAN["📝 manifest + gh-release<br/>ed25519-signed"]:::ship
  MAN --> NPM["📤 npm publish<br/>--provenance --access public"]:::ship
  GATE & ART & DOCKP & MAN & NPM --> RSUM["📊 release status"]:::gate

Workflows: .github/workflows/ci.yml, .github/workflows/release.yml, .github/workflows/nightly.yml.

Full versioning & release playbook (SemVer policy, channels, signing, hotfix flow, rollback, built-in updater): RELEASES.md.

Architecture map

flowchart TB
  classDef surface fill:#0f172a,stroke:#38bdf8,color:#f1f5f9,rx:6,ry:6
  classDef core    fill:#082f49,stroke:#38bdf8,color:#e0f2fe,rx:6,ry:6
  classDef agent   fill:#1e293b,stroke:#a78bfa,color:#ede9fe,rx:6,ry:6
  classDef io      fill:#0f172a,stroke:#10b981,color:#d1fae5,rx:6,ry:6
  classDef store   fill:#18181b,stroke:#f59e0b,color:#fef3c7,rx:6,ry:6

  subgraph S[User surfaces]
    CLI["CLI (commander)"]:::surface
    REPL["REPL (raw-mode editor)"]:::surface
    UI["Dashboard (HTTP + WS)"]:::surface
  end

  ORCH["Orchestrator · src/core/orchestrator.ts"]:::core
  LOOP["Agentic loop · src/core/loop.ts"]:::core
  CLS["Classifier"]:::core

  subgraph A[Agents · src/agents]
    PL[planner]:::agent
    AR[architect]:::agent
    EX[executor]:::agent
    RV[reviewer]:::agent
    DB[debugger]:::agent
    ME[memory]:::agent
  end

  subgraph I[I/O surfaces]
    TOOLS["18 tools · src/tools"]:::io
    MODELS["6 providers · src/models"]:::io
    PERM["Permissions"]:::io
    SAND["Sandbox (fs + shell)"]:::io
    MCP["MCP bridge"]:::io
  end

  subgraph P[Durable state]
    TASKS[tasks/*.json]:::store
    SESS[sessions/*.jsonl]:::store
    CONV[conversations/*.jsonl]:::store
    IDX[SQLite index]:::store
    MEM["memory/{hot,warm,cold,learning}"]:::store
  end

  CLI --> ORCH
  REPL --> ORCH
  UI --> ORCH
  ORCH --> CLS --> LOOP
  LOOP --> PL --> EX --> RV
  RV --> LOOP
  LOOP --> AR & DB & ME
  EX --> TOOLS
  TOOLS --> PERM & SAND & MCP
  PL --> MODELS
  EX --> MODELS
  LOOP --> TASKS & SESS & CONV & IDX
  ME --> MEM

Full map with every subsystem explained: docs/ARCHITECTURE.md.

Executor turn budget per mode

xychart-beta
  title "Executor turns per mode (hard runtime cap)"
  x-axis ["plan", "fast", "audit", "architect", "offline-safe", "balanced", "execute", "debug", "heavy"]
  y-axis "turns" 0 --> 8
  bar [1, 2, 3, 3, 3, 4, 4, 6, 8]

Development

git clone https://github.com/hoangsonww/Forge-Agentic-Coding-CLI && cd forge
npm install
npm run build             # tsc + copy-assets
npm test                  # 548 tests across 97 files; all must pass
./bin/forge.js doctor

Task	Command
Build	`npm run build`
Watch	`npm run build:watch`
Tests	`npm test`
One test file	`npx vitest run test/unit/<file>.test.ts`
Coverage	`npm run test:coverage`
Typecheck	`npm run typecheck`
Lint / format	`npm run lint` · `npm run format` · `npm run format:check`
Metrics	`bash scripts/metrics.sh`
Docker	`docker build -f docker/Dockerfile -t forge/core:dev .`
REPL	`./bin/forge.js`
Dashboard	`./bin/forge.js ui start`

Full guide: docs/SETUP.md.

Measured performance (reproduce with the commands shown)

Target	Measured	How
`forge --help` cold-start	238 ms	`time node bin/forge.js --help`
`forge doctor` cold-start	173 ms	`time node bin/forge.js doctor --no-banner`
UI `app.js` uncompressed	89 KB	`wc -c src/ui/public/app.js`
Landing `index.html`	25 KB, self-contained, zero CDN	`wc -c index.html`
Full test suite	~3.3 s wall-clock	`npx vitest run`
Container image	~355 MB multi-arch non-root	`docker images`

Agent-facing context

If you're a code-writing agent (Claude Code, Codex, Cursor, Aider, Cline, Continue, …) working in this repo, start here:

CLAUDE.md — Claude Code / Claude-family context
AGENTS.md — OpenAI AGENTS.md convention (used by Codex and most others)

Both files carry: canonical commands, hot paths, conventions, performance posture, security posture, and pre-completion checklist.

License

MIT. See LICENSE for more details.

Son Nguyen · sonnguyenhoang.com · github.com/hoangsonww

Thank you for checking out Forge! If you have any questions, feedback, or want to contribute, please open an issue or a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.agents		.agents
.beads		.beads
.claude		.claude
.codex		.codex
.cursor/rules		.cursor/rules
.flywheel		.flywheel
.githooks		.githooks
.github		.github
.husky		.husky
.idea		.idea
.playwright-mcp		.playwright-mcp
.vscode		.vscode
bin		bin
docker		docker
docs		docs
examples		examples
images		images
install		install
scripts		scripts
src		src
test		test
wiki		wiki
.dockerignore		.dockerignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEMO.md		DEMO.md
FLYWHEEL.md		FLYWHEEL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASES.md		RELEASES.md
SECURITY.md		SECURITY.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

Forge

Table of contents

At a glance

Why Forge

Quick start

System requirements

See it running

The agentic loop

A concrete run

Task state machine

Executor — iterative tool-use loop

Memory layers

Provider routing & auto-adaptation

Auto-adaptation

Supported runtimes

Model family classification (41 families)

Model size & capability notes

Safety model (not optional)

Modes

CLI reference

Common flags (run / plan / execute)

Filesystem layout

Skills · Instructions · MCP

Skills — a Markdown file with YAML frontmatter

Instructions

MCP connections

Run in a container (Docker or Podman)

CI/CD pipeline

CI (every PR + push)

Release (on v* tag)

Architecture map

Executor turn budget per mode

Development

Measured performance (reproduce with the commands shown)

Agent-facing context

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Common flags (`run` / `plan` / `execute`)

Release (on `v*` tag)

Packages