Skip to content

hoangsonww/Forge-Agentic-Coding-CLI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

52 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Forge

A local-first, plan-first, multi-agent, and programmable software-engineering runtime.

Not an assistant. A runtime. Forge brings its own scheduler, sandbox, permission system, state machine, agentic loop, memory layers, and plugin ecosystem. You pick the model. You approve the actions. Everything is inspectable, replayable, and yours.

Forge logo

Install Β· Dev setup Β· Architecture Β· Releases & versioning Β· Demo walkthrough Β· Wiki Page Β· NPM Package Β· License


Table of contents

  1. At a glance
  2. Why Forge
  3. Quick start
  4. The agentic loop (with diagrams)
  5. Task state machine
  6. Executor β€” iterative tool-use loop
  7. Memory layers
  8. Provider routing & auto-adaptation
  9. Safety model
  10. Modes
  11. CLI reference
  12. Filesystem layout
  13. Skills Β· Instructions Β· MCP
  14. Run in a container
  15. CI/CD pipeline
  16. Architecture map
  17. Development
  18. License

At a glance

Forge is a local-first, plan-first, multi-agent, and programmable software-engineering runtime. Unlike Claude Code or OpenAI Codex, Forge is local-first infrastructure, not a hosted assistant. It brings its own scheduler, sandbox, permission system, state machine, agentic loop, memory layers, and plugin ecosystem. You pick & host the model. You approve the actions. Everything is inspectable, replayable, and yours.

value reproducer
⚑ forge doctor cold-start 173 ms time node bin/forge.js doctor --no-banner
⚑ forge --help cold-start 238 ms time node bin/forge.js --help
πŸ“¦ UI shell Β· zero CDN 90 KB uncompressed wc -c src/ui/public/app.js
🌐 Provider probe timeout 1.5 s src/models/openai.ts#isAvailable
πŸ”Œ Model providers (auto-detected) 6 ollama Β· lmstudio Β· vllm Β· llama.cpp Β· openai-compat Β· anthropic
🧠 Model families classified 41 Llama / Qwen / DeepSeek / Gemma / Phi / Mistral / Codestral / …
πŸ€– Built-in agents 6 planner Β· architect Β· executor Β· reviewer Β· debugger Β· memory
πŸ›  Tools available to agents 18 read Β· write Β· edit Β· grep Β· glob Β· run_command Β· git Β· web Β· …
πŸ’¬ CLI subcommands Β· slash commands 24 Β· 55 forge --help Β· /help in REPL
πŸŽ› Modes 9 fast Β· balanced Β· heavy Β· plan Β· execute Β· audit Β· debug Β· architect Β· offline-safe
βœ… Tests 548 / 97 files Β· 100% passing Β· ~5.5 s wall-clock npx vitest run
🐳 CI jobs · release stages 9 · 6 .github/workflows/
πŸ“¦ Container image ~355 MB Β· multi-arch Β· non-root Β· HEALTHCHECK docker pull ghcr.io/hoangsonw/forge-agentic-coding-cli:latest

Tech Stack:

TypeScript Node.js JavaScript HTML5 CSS3 Bash YAML JSON Markdown Mermaid SVG npm Vitest ESLint Prettier ts-node Commander Zod Chalk Ora Prompts Undici dotenv semver SQLite better-sqlite3 FTS5 JSONL WebSockets REST HTTP POSIX XDG MCP OAuth2 Ed25519 SHA-256 AES-GCM DPAPI libsecret macOS Keychain Docker Compose Podman Buildx QEMU tini OCI ripgrep GitHub Actions GHCR npm Provenance Sigstore Dependabot Conventional Commits SemVer Ollama LM Studio vLLM llama.cpp Anthropic OpenAI Azure OpenAI Groq Together AI LocalAI Fireworks Llama Qwen DeepSeek Gemma Phi Mistral Codestral CodeLlama StarCoder Granite Command R macOS Linux Windows linux/amd64 linux/arm64 Git GitHub VS Code EditorConfig


Why Forge

Most "AI coding tools" are thin chat wrappers over a cloud API. Forge is engineering infrastructure with first-class:

mindmap
  root((Forge))
    Local-first
      Auto-detect Ollama / LM Studio / vLLM / llama.cpp
      Model-family auto-adapt
      Offline-safe mode
    Agentic
      6 role-typed agents
      Iterative tool-use executor
      Validation gate (typecheck/lint)
      Bounded retries + diagnose
    Controllable
      Default-deny permissions
      Path-realpath-confined sandbox
      Risk-classified shell
      OS-keychain credentials
    Inspectable
      Tasks JSON Β· Sessions JSONL Β· Events JSONL
      Prompt-hashed, replayable
      Concurrent-writer-safe
    Extensible
      Markdown skills
      MCP connectors
      Pluggable agents + tools
    Performant
      REPL cold-start 238 ms
      UI shell 89 KB Β· zero CDN
      Providers probe in 1.5 s
Loading
  • Local-first. Forge auto-detects Ollama, LM Studio, vLLM, and llama.cpp on their default ports. Cloud (Anthropic / OpenAI / LocalAI / Together / Groq / Azure) is opt-in, not required.
  • Agentic but controllable. Every action is classified (risk Γ— side-effect Γ— sensitivity), gated by a permission system, and logged with a reproducible prompt hash.
  • Inspectable. Sessions JSONL, tasks JSON, events JSONL. Two processes can edit the same conversation concurrently (POSIX O_APPEND + mkdir lockfile).
  • Mode-driven. 9 explicit modes β€” each carries enforceable budgets (max executor turns, max validation retries, allowMutations, maxAutoRisk).
  • Extensible. Drop a Markdown file in ~/.forge/skills/. Add an Agent. Wire an MCP connector. No rebuild required.
  • Performant. forge doctor cold-starts in 173 ms. The UI shell is a single 89 KB JavaScript file with zero CDN dependencies. Providers are probed in parallel with a 1.5 s timeout.
  • Open source. MIT license. No telemetry, no phoning home, no hidden backdoors. You get the whole stack. Unlike hosted assistants, Forge is fully inspectable, replayable, and yours.

Tip

Unlike Claude Code or OpenAI Codex, Forge is not a hosted assistant. It's local-first infrastructure. You pick & host the model. You approve the actions. Everything is inspectable, replayable, and yours.


Quick start

# Option 1 β€” npm (global):
npm install -g @hoangsonw/forge
forge doctor             # green checks + role→model mapping
forge run "explain this repo"

# Option 2 β€” Docker:
docker run --rm -it \
  -v forge-home:/data -v "$PWD:/workspace" \
  ghcr.io/hoangsonw/forge-agentic-coding-cli:latest forge run "explain this repo"

# Option 3 β€” full stack (forge + ollama + dashboard):
docker compose -f docker/docker-compose.yml up -d
# open http://127.0.0.1:7823

System requirements

Minimum Notes
Node.js β‰₯ 20 (22 tested) Enforced via package.json#engines. Not needed if you use Docker.
OS macOS Β· Linux Β· Windows (WSL recommended) better-sqlite3 ships prebuilds for darwin-x64, darwin-arm64, linux-x64, linux-arm64, win32-x64 β€” no compile step.
Disk ~150 MB for node_modules; state under ~/.forge grows with history Override via FORGE_HOME.
RAM Forge ~100 MB; your local model consumes its own RAM/VRAM forge doctor cold-starts in ~170 ms.
Docker (alt path) β‰₯ 25 Multi-arch (amd64, arm64) image on GHCR. Zero host Node needed.
At least one model source Ollama Β· LM Studio Β· vLLM Β· llama.cpp Β· Anthropic Β· OpenAI-compatible forge doctor tells you which are reachable.

Runtime npm dependencies (13, zero optional): @modelcontextprotocol/sdk, better-sqlite3 (native, prebuilt), chalk, cli-table3, commander, dotenv, ora, prompts, semver, undici, ws, yaml, zod. No Python, Rust, or Go toolchain.

Recommended (not required): ripgrep (fast grep tool path), git (diff/status tools + project-root detection), $EDITOR (used when you pick "Edit" on a plan).

See docs/INSTALL.md for per-OS notes and docs/SETUP.md for contributor setup.

See it running

Three surfaces, one runtime.

REPL (Interactive Terminal) Mode

REPL.mp4

CLI (Headless, One-shot run) Mode

CLI.mp4

Web UI Dashboard

UI.mp4

The agentic loop

Every non-trivial task flows through the same pipeline. Nothing escapes it β€” no hidden shortcut, no "just this once" bypass.

flowchart LR
  classDef step fill:#0f172a,stroke:#38bdf8,color:#f1f5f9,rx:4,ry:4
  classDef gate fill:#1e1b4b,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4
  classDef term fill:#14532d,stroke:#10b981,color:#d1fae5,rx:4,ry:4
  classDef fail fill:#450a0a,stroke:#f87171,color:#fee2e2,rx:4,ry:4

  IN([user prompt]):::step --> CLASSIFY[classify]:::step
  CLASSIFY --> PLAN[plan Β· DAG]:::step
  PLAN --> VALID{valid plan?}:::gate
  VALID -->|no| FIX[auto-fix]:::step --> VALID
  VALID -->|yes| APPROVE{user approves?}:::gate
  APPROVE -->|edit| PLAN
  APPROVE -->|cancel| CANCEL([cancelled]):::fail
  APPROVE -->|yes| EXEC[execute]:::step
  EXEC --> STEP[next step]:::step
  STEP --> TOOLS[iterative tool use]:::step
  TOOLS --> VGATE{validation gate?}:::gate
  VGATE -->|fail + budget| TOOLS
  VGATE -->|fail + exhausted| RETRY{retries?}:::gate
  VGATE -->|ok| DONE{more steps?}:::gate
  RETRY -->|yes| STEP
  RETRY -->|no| DIAG[diagnose]:::step --> FAIL([failed]):::fail
  DONE -->|yes| STEP
  DONE -->|no| VERIFY[reviewer]:::step
  VERIFY --> VSUM{approves?}:::gate
  VSUM -->|no| STEP
  VSUM -->|yes| COMP([completed]):::term
Loading

Source: src/core/loop.ts. Retry cap is 3, then the debugger agent diagnoses before the task is marked failed.

A concrete run

forge run "fix the failing login test" --mode heavy
  β†’ classified:   bugfix Β· complexity=moderate Β· risk=low
  β†’ plan:         4 steps  (analyze β†’ locate β†’ patch β†’ run_tests)
  β†’ approve?      [y/n/edit]
  β†’ executor:     turn 1 β€” read_file src/auth/login.ts
                  turn 2 β€” grep "issuedAt" in src
                  turn 3 β€” apply_patch src/auth/login.ts
                  turn 4 β€” run_command "npm test -- auth.login"
  β†’ validate:     typecheck βœ“   lint βœ“
  β†’ reviewer:     approved
  β†’ βœ” Done. Files changed: src/auth/login.ts

Task state machine

Every task lives in exactly one of 10 statuses. Transitions are enforced by LEGAL_TRANSITIONS β€” illegal moves throw state_invalid with the legal-next list in recoveryHint.

stateDiagram-v2
  [*] --> draft
  draft --> planned: planner output
  draft --> cancelled

  planned --> approved: user approves
  planned --> cancelled
  planned --> blocked

  approved --> scheduled
  approved --> cancelled

  scheduled --> running
  scheduled --> cancelled
  scheduled --> blocked

  running --> verifying
  running --> failed
  running --> blocked
  running --> cancelled

  verifying --> completed
  verifying --> failed
  verifying --> running: reviewer bounces

  completed --> draft: forge resume
  failed    --> draft: forge resume
  blocked   --> draft: forge resume
  blocked   --> cancelled
  cancelled --> draft: forge resume

  completed --> [*]
  failed    --> [*]
  cancelled --> [*]
Loading

Source: src/persistence/tasks.ts#LEGAL_TRANSITIONS.


Executor β€” iterative tool-use loop

Each plan step runs a bounded model↔tool conversation, not a one-shot call. The model sees every tool result and can adapt within the same step β€” retry with different args, switch tools, or signal done.

sequenceDiagram
  autonumber
  participant L as loop.ts
  participant E as executor.ts
  participant M as model
  participant T as tool
  participant V as validator

  L->>E: runStep(step)
  loop up to maxExecutorTurns (mode-capped)
    E->>M: prompt + schema (JSON-mode)
    M-->>E: { actions[], summary, done? }
    alt done && no failures
      E-->>L: completed
    else has actions
      E->>T: execute each action
      T-->>E: stdout / stderr / exitCode / error
      E->>E: digest + append user turn
    end
  end
  opt step wrote files & mode enables gate
    loop up to maxValidationRetries
      E->>V: typecheck / lint / tsc
      alt passes
        E-->>L: completed
      else fails
        E->>M: VALIDATION_FAILED Β· <output>
        M-->>E: corrective actions
        E->>T: execute
      end
    end
  end
  E-->>L: { toolResults, summary, filesChanged, completed }
Loading

Mode caps β€” read directly from src/core/mode-policy.ts:

Mode maxExecutorTurns maxValidationRetries allowMutations maxAutoRisk
fast 2 0 βœ… low
balanced 4 1 βœ… medium
heavy 8 2 βœ… high
plan 0β†’1 0 ❌ low
execute 4 1 βœ… medium
audit 3 0 ❌ low
debug 6 2 βœ… medium
architect 3 1 βœ… medium
offline-safe 3 1 βœ… medium

Memory layers

Four tiers with distinct retention and access cost:

flowchart TB
  classDef hot  fill:#450a0a,stroke:#f87171,color:#fee2e2,rx:4,ry:4
  classDef warm fill:#451a03,stroke:#fb923c,color:#ffedd5,rx:4,ry:4
  classDef cold fill:#0c4a6e,stroke:#38bdf8,color:#e0f2fe,rx:4,ry:4
  classDef learn fill:#14532d,stroke:#10b981,color:#d1fae5,rx:4,ry:4

  Q[retrieve.ts Β· query] --> H["Hot<br/>current-session facts<br/>src/memory/hot.ts"]:::hot
  Q --> W["Warm<br/>recent tasks Β· SQLite<br/>src/memory/warm.ts"]:::warm
  Q --> C["Cold<br/>project files Β· grep Β· AST<br/>src/memory/cold.ts"]:::cold
  Q --> L["Learning<br/>patterns + confidence<br/>src/memory/learning.ts"]:::learn

  H -.clear on task end.-> X([evict])
  W -.age out after N days.-> X
  L -.decay if unreinforced.-> L
Loading
  • Hot β€” in-process per-task facts, cleared at task end.
  • Warm β€” SQLite index of recent task metadata; powers "what was I doing yesterday" queries.
  • Cold β€” lazy file/grep/AST index scoped to projectRoot. No background indexer; populated on demand.
  • Learning β€” patterns keyed by intent:scope with confidence that evolves on success/failure. The planner reads the top-K patterns before producing every plan (see src/agents/planner.ts#learnedPatternBlock).

Provider routing & auto-adaptation

flowchart LR
  classDef local fill:#0c4a6e,stroke:#38bdf8,color:#e0f2fe,rx:4,ry:4
  classDef hosted fill:#3f1d5c,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4
  classDef route fill:#1e293b,stroke:#f1f5f9,color:#f1f5f9,rx:4,ry:4

  ROUTER[router.ts Β· resolveModel]:::route
  ADAPT[adapter.ts Β· resolveLocalModel]:::route
  CB[circuit-breaker]:::route
  RL[rate-limit]:::route
  CACHE[prompt cache]:::route
  COST[USD cost ledger]:::route

  subgraph LOCAL[Local runtimes Β· auto-detected]
    OLL["ollama<br/>:11434"]:::local
    LMS["lmstudio<br/>:1234"]:::local
    VLL["vllm<br/>:8000"]:::local
    LCP["llamacpp<br/>:8080"]:::local
  end
  subgraph HOSTED[Hosted Β· opt-in]
    ANT["anthropic"]:::hosted
    OAI["openai-compat<br/>(OpenAI / Azure / LocalAI / Together / Groq / Fireworks)"]:::hosted
  end

  ROUTER --> ADAPT --> OLL & LMS & VLL & LCP
  ROUTER --> ANT & OAI
  ROUTER --> CB & RL & CACHE & COST
Loading

Auto-adaptation

If your configured model isn't pulled on the provider, Forge picks the best-fit installed model for each role via src/models/local-catalog.ts + src/models/adapter.ts. Cached per process, warns once, never refuses to route.

Supported runtimes

Runtime Default endpoint Override
Ollama http://127.0.0.1:11434 OLLAMA_ENDPOINT
LM Studio http://127.0.0.1:1234/v1 LMSTUDIO_ENDPOINT
vLLM http://127.0.0.1:8000/v1 VLLM_ENDPOINT
llama.cpp server http://127.0.0.1:8080/v1 LLAMACPP_ENDPOINT
OpenAI-compatible env-configured OPENAI_BASE_URL + OPENAI_API_KEY
Anthropic hosted ANTHROPIC_API_KEY

Model family classification (41 families)

Role Families preferred
architect / reviewer / debugger Llama 3.x / 4.x, Mixtral, Command-R+, DeepSeek V3/R1, Mistral-Large
planner Qwen 2.5/3, Llama 3.x, DeepSeek V3, Gemma 3, Mistral-Nemo, Command-R, Phi 4
executor (code specialists) DeepSeek-Coder, Qwen 2.5-Coder, CodeLlama, Codestral, StarCoder, Granite-Code, WizardCoder
fast Phi 3/4, Gemma 2, TinyLlama, SmolLM, MiniCPM

Unknown models are accepted too β€” Forge rates them as generic executors rather than refusing to route.

Model size & capability notes

The agentic loop is cheap for the runtime but expensive for the model. Every step is a multi-turn tool-use conversation that returns strict JSON. Small models struggle with this in recognisable ways β€” please pick the right tool for the job.

Work you want to do Safe local floor What fails below the floor
Pure chat ("explain closures") any 3B instruct (phi-3:mini, gemma-3:2b) fine β€” conversation fast-path bypasses tool use entirely
Summarize a file, explain a snippet 7B instruct (qwen2.5:7b, llama3.1:8b) summary is a line of "I read the file" instead of content
Single-file edits / small features 7B+ code specialist (deepseek-coder:6.7b, qwen2.5-coder:7b) picks wrong tool (run_command to write files), splits "create empty + edit" patterns, escalates to ask_user on tool errors
Multi-file refactors, new features 14B+ code specialist or a hosted frontier model plan quality drops; step IDs get inconsistent; validation retries exhausted
Architecture-level changes hosted (Claude Opus/Sonnet, GPT-4 class) realistically budgets blow out; changes go off-plan

Forge ships with defences so a small model fails loudly instead of silently corrupting files: the executor prompt spells out step-type β†’ tool mappings, ask_user rejects empty/too-short questions as non-retryable, edit_file handles "create empty then fill" gracefully, parent directories auto-create, provider warm-up is explicit, and the router streams prose without jsonMode for narrator/conversation paths. The result is that a small model will often tell you it can't finish a task; it will rarely write the wrong code into a file.

If in doubt: configure a code specialist for the code role, keep something lighter for fast, and set ANTHROPIC_API_KEY or OPENAI_API_KEY as a fallback β€” the router uses the hosted provider automatically when the local one fails or trips its circuit breaker.

forge config set models.code    deepseek-coder:6.7b
forge config set models.planner qwen2.5:7b
forge config set models.fast    phi3:mini
export ANTHROPIC_API_KEY=sk-…   # optional fallback

Safety model (not optional)

Forge treats safety as load-bearing. These invariants are enforced in code, not convention:

flowchart TB
  classDef ask fill:#1e1b4b,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4
  classDef allow fill:#14532d,stroke:#10b981,color:#d1fae5,rx:4,ry:4
  classDef deny  fill:#450a0a,stroke:#f87171,color:#fee2e2,rx:4,ry:4

  REQ[tool invocation] --> CLASSIFY[classify risk Γ— sideEffect Γ— sensitivity]
  CLASSIFY --> SANDBOX{path in sandbox? / cmd allow-listed?}
  SANDBOX -->|no| BLOCK[hard-block Β· sandbox_violation]:::deny
  SANDBOX -->|yes| GATE{risk Γ— sideEffect}
  GATE -->|low Β· read| AUTO[auto-allow]:::allow
  GATE -->|med Β· write| ASK[ask user]:::ask
  GATE -->|high Β· execute / network| STRICT[ask even with --skip-permissions]:::ask
  ASK --> FLAGS{session flags?}
  FLAGS -->|--allow-shell / --allow-files etc.| AUTO
  FLAGS -->|--non-interactive| DENY[deny silently]:::deny
  FLAGS -->|else| PROMPT[interactive prompt]
  PROMPT -->|allow| AUTO
  PROMPT -->|deny| DENY
  AUTO --> EXEC[execute] --> TRUST[trust calibration<br/>auto-allow after N confirmations<br/>src/permissions/manager.ts]
Loading
Invariant Where
Instruction precedence: System Safety > Page Rules > Mode Rules > Approved Plan > Project Defaults > User Preferences src/prompts/assembler.ts
Permission model = default deny src/permissions/manager.ts
--skip-permissions skips routine prompts only; critical/destructive still ask src/permissions/risk.ts
Retry cap = 3, then debugger escalates src/core/loop.ts
Hard limits: maxSteps=50 Β· maxToolCalls=100 Β· maxRuntimeSeconds=600 src/config/schema.ts
Untrusted content (web / MCP / retrieved) fenced as data, never instructions src/security/injection.ts
Secrets redacted before every log, session entry, and prompt src/security/redact.ts
Scoped filesystem sandbox; symlink-escape-proof via realpath src/sandbox/fs.ts
Destructive shell commands blocked (rm -rf /, sudo, fork bombs, curl-to-shell) src/sandbox/shell.ts
Credentials in OS keychain (macOS / libsecret / DPAPI) + AES-GCM fallback src/keychain/
Release artefacts: SHA-256 + Ed25519 signature verification src/release/

Modes

flowchart LR
  classDef ro fill:#1e293b,stroke:#64748b,color:#cbd5e1,rx:4,ry:4
  classDef rw fill:#0c4a6e,stroke:#38bdf8,color:#e0f2fe,rx:4,ry:4
  classDef big fill:#3f1d5c,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4

  FAST[fast Β· 2 turns]:::rw
  BAL[balanced Β· 4 turns Β· default]:::rw
  HEAVY[heavy Β· 8 turns Β· 2 validate retries]:::big
  PLAN[plan Β· 0 turns Β· no mutations]:::ro
  EXEC[execute Β· 4 turns]:::rw
  AUDIT[audit Β· 3 turns Β· no mutations]:::ro
  DEBUG[debug Β· 6 turns Β· 2 validate retries]:::rw
  ARCH[architect Β· 3 turns]:::big
  OFFLINE[offline-safe Β· 3 turns Β· never hosted]:::rw
Loading

Each mode is an enforceable budget β€” not a hint to the model. See src/core/mode-policy.ts.


CLI reference

β–Ά See each surface in action in DEMO.md β€” REPL walkthrough, forge run one-shots, and the web dashboard.

24 subcommands. Full surface:

forge                          # REPL (default)
forge init                     # create ~/.forge + project .forge
forge run "<prompt>"           # full agentic loop
forge plan "<prompt>"          # plan-only
forge execute "<prompt>"       # auto-approve + execute
forge resume [taskId]          # resume any prior task (any status)
forge status                   # runtime state
forge doctor                   # health check + role→model mapping
forge task list|search|delete  # task history (SQLite-indexed); delete prompts (or -y)
forge session list|replay <id> # session JSONL inspection
forge model list               # probe all providers
forge config get|set|path      # configuration
forge mcp list|add|remove      # MCP connections
forge skills list|new          # skill management
forge agents list              # custom agents
forge permissions reset|list   # permission grants
forge daemon start|stop|status # optional background process
forge memory {hot|warm|cold}   # memory inspection
forge cost                     # USD spend ledger
forge ui start                 # local dashboard at :7823
forge bundle {pack|unpack}     # offline bundles
forge container up|down        # compose wrapper
forge update [--check|--force] # self-update (REPL also checks on start, cache-gated)
forge migrate                  # DB migrations
forge changelog                # local changelog view
forge dev                      # dev helpers
forge web {search|fetch}       # web tools
forge spec {new|show|diff}     # spec-driven development

Common flags (run / plan / execute)

--mode <m>             fast|balanced|heavy|plan|execute|audit|debug|architect|offline-safe
--yes                  auto-approve plan
--skip-permissions     skip routine prompts (high-risk still asked)
--allow-files          pre-approve file writes for this session
--allow-shell          pre-approve shell for this session
--allow-network        pre-approve network tools
--allow-web            pre-approve web search/fetch/browse
--allow-mcp            pre-approve MCP tool calls
--strict               confirm every action
--non-interactive      deny all prompts silently (CI mode)
--deterministic        fixed temperatures for reproducibility
--trace                full trace (implies --debug)
--no-banner            omit startup banner

Filesystem layout

flowchart TB
  classDef g fill:#18181b,stroke:#f59e0b,color:#fef3c7,rx:4,ry:4
  classDef p fill:#0c4a6e,stroke:#38bdf8,color:#e0f2fe,rx:4,ry:4

  subgraph GLOBAL["~/.forge  (global)"]
    G1["config.json"]:::g
    G2["instructions.md"]:::g
    G3["skills/*.md"]:::g
    G4["agents/*.md"]:::g
    G5["mcp/*"]:::g
    G6["models/"]:::g
    G7["logs/forge.log"]:::g
    G8["global/index.db  ← SQLite"]:::g
    G9["projects/&lt;hash&gt;/tasks Β· sessions Β· events"]:::g
  end

  subgraph PROJECT["./.forge  (per-project)"]
    P1["config.json"]:::p
    P2["instructions.md"]:::p
    P3["skills/  (override global)"]:::p
    P4["agents/"]:::p
    P5["mcp/"]:::p
  end
Loading

Paths resolved via src/config/xdg.ts β€” respects XDG_* env vars on Linux.


Skills Β· Instructions Β· MCP

Skills β€” a Markdown file with YAML frontmatter

---
name: conventional-commit
description: Enforce Conventional Commits in every commit message.
triggers: [commit, git]
---
When writing commit messages, use Conventional Commits:
  feat(scope): …
  fix(scope): …
  refactor(scope): …

Drop into ~/.forge/skills/ (global) or ./.forge/skills/ (project). Project skills override global.

Instructions

Both ~/.forge/instructions.md and ./.forge/instructions.md are layered into every prompt via src/prompts/assembler.ts. Precedence is: System Safety > Page > Mode > Plan > Project > User.

MCP connections

forge mcp list
forge mcp add <name> --transport stdio --command "…"
forge mcp add <name> --transport http --url https://… --auth oauth2-pkce
forge mcp status

Both stdio and HTTP-stream transports supported. OAuth 2.0 + PKCE or API key auth. Tokens stored in the OS keychain.


Run in a container (Docker or Podman)

Single hardened image (non-root, HEALTHCHECK, OCI labels, ~355 MB) that serves both CLI and UI.

β–Ά Dashboard demo β€” forge ui start driving a full task end-to-end (plan approval, streamed model output, follow-up thread). More in DEMO.md.

# Pull (multi-arch: linux/amd64 + linux/arm64):
docker pull ghcr.io/hoangsonw/forge-agentic-coding-cli:latest

# One-shot CLI:
docker run --rm -it -v forge-home:/data -v "$PWD:/workspace" \
  ghcr.io/hoangsonw/forge-agentic-coding-cli:latest forge run "explain this repo"

# Dashboard:
docker run --rm -p 7823:7823 -v forge-home:/data \
  ghcr.io/hoangsonw/forge-agentic-coding-cli:latest forge ui start --bind 0.0.0.0

# Full stack (forge + ollama + UI):
docker compose -f docker/docker-compose.yml up -d
# or: podman-compose -f docker/docker-compose.yml up -d

Stack topology:

flowchart LR
  classDef c fill:#0c4a6e,stroke:#38bdf8,color:#e0f2fe,rx:4,ry:4
  classDef v fill:#18181b,stroke:#f59e0b,color:#fef3c7,rx:4,ry:4

  OLLAMA["ollama<br/>:11434 Β· healthcheck"]:::c
  UI["forge-ui<br/>:7823 Β· healthcheck Β· restart unless-stopped"]:::c
  CORE["forge-core<br/>(on-demand via compose run)"]:::c
  FH[forge-home Β· named volume]:::v
  OM[ollama-models Β· named volume]:::v

  OLLAMA --> OM
  UI --> FH
  CORE --> FH
  UI --> OLLAMA
  CORE --> OLLAMA
Loading

Full install guide: docs/INSTALL.md.


CI/CD pipeline

CI (every PR + push)

flowchart LR
  classDef pass fill:#14532d,stroke:#10b981,color:#d1fae5,rx:4,ry:4
  classDef gate fill:#1e1b4b,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4

  PR[PR / push] --> FMT["🎨 format"]:::pass
  PR --> LINT["🧹 lint"]:::pass
  PR --> TYPE["🧠 typecheck"]:::pass
  PR --> TEST["πŸ§ͺ test matrix<br/>Ubuntu + macOS Γ— Node 20 + 22"]:::pass
  TEST --> COV["πŸ“ˆ coverage"]:::pass
  TYPE --> BUILD["πŸ—οΈ build"]:::pass
  BUILD --> DOCKER["🐳 docker-build"]:::pass
  PR --> AUDIT["πŸ” audit"]:::pass
  FMT & LINT & TYPE & TEST & BUILD & DOCKER & AUDIT & COV --> STATUS["πŸ“Š pipeline status<br/>GH step summary Β· fails if any required job failed"]:::gate
Loading

Release (on v* tag)

flowchart LR
  classDef gate fill:#1e1b4b,stroke:#a78bfa,color:#ede9fe,rx:4,ry:4
  classDef ship fill:#451a03,stroke:#fb923c,color:#ffedd5,rx:4,ry:4

  TAG[git tag v*] --> GATE["πŸ§ͺ pre-release gate<br/>build + full test suite"]:::gate
  GATE --> ART["πŸ“¦ artifacts<br/>5 tarball targets"]:::ship
  GATE --> DOCKP["🐳 docker publish<br/>multi-arch β†’ ghcr.io"]:::ship
  ART --> MAN["πŸ“ manifest + gh-release<br/>ed25519-signed"]:::ship
  MAN --> NPM["πŸ“€ npm publish<br/>--provenance --access public"]:::ship
  GATE & ART & DOCKP & MAN & NPM --> RSUM["πŸ“Š release status"]:::gate
Loading

Workflows: .github/workflows/ci.yml, .github/workflows/release.yml, .github/workflows/nightly.yml.

Full versioning & release playbook (SemVer policy, channels, signing, hotfix flow, rollback, built-in updater): RELEASES.md.


Architecture map

flowchart TB
  classDef surface fill:#0f172a,stroke:#38bdf8,color:#f1f5f9,rx:6,ry:6
  classDef core    fill:#082f49,stroke:#38bdf8,color:#e0f2fe,rx:6,ry:6
  classDef agent   fill:#1e293b,stroke:#a78bfa,color:#ede9fe,rx:6,ry:6
  classDef io      fill:#0f172a,stroke:#10b981,color:#d1fae5,rx:6,ry:6
  classDef store   fill:#18181b,stroke:#f59e0b,color:#fef3c7,rx:6,ry:6

  subgraph S[User surfaces]
    CLI["CLI (commander)"]:::surface
    REPL["REPL (raw-mode editor)"]:::surface
    UI["Dashboard (HTTP + WS)"]:::surface
  end

  ORCH["Orchestrator Β· src/core/orchestrator.ts"]:::core
  LOOP["Agentic loop Β· src/core/loop.ts"]:::core
  CLS["Classifier"]:::core

  subgraph A[Agents Β· src/agents]
    PL[planner]:::agent
    AR[architect]:::agent
    EX[executor]:::agent
    RV[reviewer]:::agent
    DB[debugger]:::agent
    ME[memory]:::agent
  end

  subgraph I[I/O surfaces]
    TOOLS["18 tools Β· src/tools"]:::io
    MODELS["6 providers Β· src/models"]:::io
    PERM["Permissions"]:::io
    SAND["Sandbox (fs + shell)"]:::io
    MCP["MCP bridge"]:::io
  end

  subgraph P[Durable state]
    TASKS[tasks/*.json]:::store
    SESS[sessions/*.jsonl]:::store
    CONV[conversations/*.jsonl]:::store
    IDX[SQLite index]:::store
    MEM["memory/{hot,warm,cold,learning}"]:::store
  end

  CLI --> ORCH
  REPL --> ORCH
  UI --> ORCH
  ORCH --> CLS --> LOOP
  LOOP --> PL --> EX --> RV
  RV --> LOOP
  LOOP --> AR & DB & ME
  EX --> TOOLS
  TOOLS --> PERM & SAND & MCP
  PL --> MODELS
  EX --> MODELS
  LOOP --> TASKS & SESS & CONV & IDX
  ME --> MEM
Loading

Full map with every subsystem explained: docs/ARCHITECTURE.md.

Executor turn budget per mode

xychart-beta
  title "Executor turns per mode (hard runtime cap)"
  x-axis ["plan", "fast", "audit", "architect", "offline-safe", "balanced", "execute", "debug", "heavy"]
  y-axis "turns" 0 --> 8
  bar [1, 2, 3, 3, 3, 4, 4, 6, 8]
Loading

Development

git clone https://github.com/hoangsonww/Forge-Agentic-Coding-CLI && cd forge
npm install
npm run build             # tsc + copy-assets
npm test                  # 548 tests across 97 files; all must pass
./bin/forge.js doctor
Task Command
Build npm run build
Watch npm run build:watch
Tests npm test
One test file npx vitest run test/unit/<file>.test.ts
Coverage npm run test:coverage
Typecheck npm run typecheck
Lint / format npm run lint Β· npm run format Β· npm run format:check
Metrics bash scripts/metrics.sh
Docker docker build -f docker/Dockerfile -t forge/core:dev .
REPL ./bin/forge.js
Dashboard ./bin/forge.js ui start

Full guide: docs/SETUP.md.

Measured performance (reproduce with the commands shown)

Target Measured How
forge --help cold-start 238 ms time node bin/forge.js --help
forge doctor cold-start 173 ms time node bin/forge.js doctor --no-banner
UI app.js uncompressed 89 KB wc -c src/ui/public/app.js
Landing index.html 25 KB, self-contained, zero CDN wc -c index.html
Full test suite ~3.3 s wall-clock npx vitest run
Container image ~355 MB multi-arch non-root docker images

Agent-facing context

If you're a code-writing agent (Claude Code, Codex, Cursor, Aider, Cline, Continue, …) working in this repo, start here:

  • CLAUDE.md β€” Claude Code / Claude-family context
  • AGENTS.md β€” OpenAI AGENTS.md convention (used by Codex and most others)

Both files carry: canonical commands, hot paths, conventions, performance posture, security posture, and pre-completion checklist.


License

MIT. See LICENSE for more details.


Son Nguyen Β· sonnguyenhoang.com Β· github.com/hoangsonww

Thank you for checking out Forge! If you have any questions, feedback, or want to contribute, please open an issue or a pull request.

About

πŸ¦„ Forge - a local-first, multi-agent software-engineering runtime that runs Claude Code / Codex-style agentic workflows entirely on your own machine via Ollama, llama.cpp, vLLM, and LM Studio (cloud models optional). Ships 18 sandboxed tools, 6 model providers, and a full REPL + UI dashboard in a single Node CLI - no telemetry, no lock-in.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors