scarson · scarson · Apr 16, 2026 · Apr 21, 2026 · Apr 21, 2026 · Apr 21, 2026
diff --git a/.github/workflows/windows-smoke.yml b/.github/workflows/windows-smoke.yml
@@ -0,0 +1,88 @@
+# Windows Smoke CI — Phase 1 of the phased rollout in docs/designs/WINDOWS_CI.md
+#
+# Answers one question per run: "does the code path through a Windows-critical
+# module actually run on Windows." That's deliberately a lower bar than "does
+# every test pass" — it catches the class of bugs where Linux/macOS CI runs
+# green but a Windows user immediately hits ENOENT / "browse binary not found"
+# / silent mislocations of ~/.gstack/ state.
+#
+# Coverage catch list (see RFC for full reasoning):
+#   - Build fails to produce .exe on Windows              (catches #1013 / #1024)
+#   - Binary-resolution probes wrong filename             (catches #1118 / #1094)
+#   - Shebang bash script spawn fails                     (catches #1119)
+#   - Sensitive files written without ACL restriction     (catches #1121)
+#   - { mode: 0o600 } silently ignored on Windows         (catches Pre-#1121 state)
+#
+# Miss: #1120-style home-directory fallback — no direct unit test. RFC
+# proposes adding one as a follow-on.
+name: windows-smoke
+on:
+  pull_request:
+    branches: [main]
+    paths:
+      - 'browse/**'
+      - 'make-pdf/**'
+      - 'design/**'
+      - 'scripts/**'
+      - 'bin/**'
+      - 'package.json'
+      - 'bun.lockb'
+      - '.github/workflows/windows-smoke.yml'
+  push:
+    branches: [main]
+    paths:
+      - 'browse/**'
+      - 'make-pdf/**'
+      - 'design/**'
+      - 'scripts/**'
+      - 'bin/**'
+      - 'package.json'
+      - 'bun.lockb'
+  workflow_dispatch:
+
+concurrency:
+  group: windows-smoke-${{ github.head_ref || github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  smoke:
+    runs-on: windows-latest
+    timeout-minutes: 10
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: oven-sh/setup-bun@v2
+        with:
+          bun-version: latest
+
+      - name: Install dependencies
+        run: bun install --frozen-lockfile
+
+      - name: Build binaries
+        run: bun run build
+
+      - name: Assert Windows binary layout
+        shell: pwsh
+        run: |
+          $missing = @()
+          foreach ($p in @(
+            'browse/dist/browse.exe',
+            'browse/dist/find-browse.exe',
+            'browse/dist/server-node.mjs',
+            'make-pdf/dist/pdf.exe',
+            'design/dist/design.exe'
+          )) { if (-not (Test-Path $p)) { $missing += $p } }
+          if ($missing.Count -gt 0) {
+            Write-Error "Missing build artifacts: $($missing -join ', ')"
+            exit 1
+          }
+
+
+      - name: Windows-specific unit tests
+        # Single bun test invocation with all files so a failure in any
+        # file correctly fails the step. Separate invocations + default
+        # PowerShell error-handling would mask all-but-the-last failure.
+        run: bun test browse/test/security.test.ts browse/test/file-permissions.test.ts browse/test/home-dir-resolution.test.ts make-pdf/test/browseClient.test.ts make-pdf/test/pdftotext.test.ts
+
+      - name: make-pdf render smoke
+        run: bun test make-pdf/test/render.test.ts
diff --git a/.gitignore b/.gitignore
@@ -13,6 +13,8 @@ bin/gstack-global-discover
 .slate/
 .cursor/
 .openclaw/
+.hermes/
+.gbrain/
 .context/
 extension/.auth.json
 .gstack-worktrees/

diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -209,6 +209,8 @@ Templates contain the workflows, tips, and examples that require human judgment.
 | `{{DESIGN_SETUP}}` | `resolvers/design.ts` | Discovery pattern for `$D` design binary, mirrors `{{BROWSE_SETUP}}` |
 | `{{DESIGN_SHOTGUN_LOOP}}` | `resolvers/design.ts` | Shared comparison board feedback loop for /design-shotgun, /plan-design-review, /design-consultation |
 | `{{UX_PRINCIPLES}}` | `resolvers/design.ts` | User behavioral foundations (scanning, satisficing, goodwill reservoir, trunk test) for /design-html, /design-shotgun, /design-review, /plan-design-review |
+| `{{GBRAIN_CONTEXT_LOAD}}` | `resolvers/gbrain.ts` | Brain-first context search with keyword extraction, health awareness, and data-research routing. Injected into 10 brain-aware skills. Suppressed on non-brain hosts. |
+| `{{GBRAIN_SAVE_RESULTS}}` | `resolvers/gbrain.ts` | Post-skill brain persistence with entity enrichment, throttle handling, and per-skill save instructions. 8 skill-specific save formats. |
 
 This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear.
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,25 @@
 # Changelog
 
+## [0.18.0.0] - 2026-04-15
+
+### Added
+- **Confusion Protocol.** Every workflow skill now has an inline ambiguity gate. When Claude hits a decision that could go two ways (which architecture? which data model? destructive operation with unclear scope?), it stops and asks instead of guessing. Scoped to high-stakes decisions only, so it doesn't slow down routine coding. Addresses Karpathy's #1 AI coding failure mode.
+- **Hermes host support.** gstack now generates skill docs for [Hermes Agent](https://github.com/nousresearch/hermes-agent) with proper tool rewrites (`terminal`, `read_file`, `patch`, `delegate_task`). `./setup --host hermes` prints integration instructions.
+- **GBrain host + brain-first resolver.** GBrain is a "mod" for gstack. When installed, your coding skills become brain-aware: they search your brain for relevant context before starting and save results to your brain after finishing. 10 skills are now brain-aware: /office-hours, /investigate, /plan-ceo-review, /retro, /ship, /qa, /design-review, /plan-eng-review, /cso, and /design-consultation. Compatible with GBrain >= v0.10.0.
+- **GBrain v0.10.0 integration.** Agent instructions now use `gbrain search` (fast keyword lookup) instead of `gbrain query` (expensive hybrid). Every command shows full CLI syntax with `--title`, `--tags`, and heredoc examples. Keyword extraction guidance helps agents search effectively. Entity enrichment auto-creates stub pages for people and companies mentioned in skill output. Throttle errors are named so agents can detect and handle them. A preamble health check runs `gbrain doctor --fast --json` at session start and names failing checks when the brain is degraded.
+- **Skill triggers for GBrain router.** All 38 skill templates now include `triggers:` arrays in their frontmatter, multi-word keywords like "debug this", "ship it", "brainstorm this". These power GBrain's RESOLVER.md skill router and pass `checkResolvable()` validation. Distinct from `voice-triggers:` (speech-to-text aliases).
+- **Hermes brain support.** Hermes agents with GBrain installed as a mod now get brain features automatically. The resolver fallback logic ("if GBrain is not available, proceed without") handles non-GBrain Hermes installs gracefully.
+- **slop:diff in /review.** Every code review now runs `bun run slop:diff` as an advisory diagnostic, catching AI code quality issues (empty catches, redundant abstractions, overcomplicated patterns) before they land. Informational only, never blocking.
+- **Karpathy compatibility.** README now positions gstack as the workflow enforcement layer for [Karpathy-style CLAUDE.md rules](https://github.com/forrestchang/andrej-karpathy-skills) (17K stars). Maps each failure mode to the gstack skill that addresses it.
+
+### Changed
+- **CEO review HARD GATE reinforcement.** "Do NOT make any code changes. Review only." now repeats at every STOP point (12 locations), not just the top. Prompt repetition measurably reduces the "starts implementing" failure mode.
+- **Office-hours design doc visibility.** After writing the design doc, the skill now prints the full path so downstream skills (/plan-ceo-review, /plan-eng-review) can find it.
+- **Investigate investigation history.** Each investigation now logs to the learnings system with `type: "investigation"` and affected file paths. Future investigations on the same files surface prior root causes automatically. Recurring bugs in the same area = architectural smell.
+- **Retro non-git context.** If `~/.gstack/retro-context.md` exists, the retro now reads it for meeting notes, calendar events, and decisions that don't appear in git history.
+- **Native OpenClaw skills improved.** The 4 hand-crafted ClawHub skills (office-hours, ceo-review, investigate, retro) now mirror the template improvements above.
+- **Host count: 8 to 10.** Hermes and GBrain join Claude, Codex, Factory, Kiro, OpenCode, Slate, Cursor, and OpenClaw.
+
 ## [0.17.0.0] - 2026-04-14
 
 ### Added

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -68,14 +68,15 @@ gstack/
 ├── hosts/           # Typed host configs (one per AI agent)
 │   ├── claude.ts    # Primary host config
 │   ├── codex.ts, factory.ts, kiro.ts  # Existing hosts
-│   ├── opencode.ts, slate.ts, cursor.ts, openclaw.ts  # New hosts
+│   ├── opencode.ts, slate.ts, cursor.ts, openclaw.ts  # IDE hosts
+│   ├── hermes.ts, gbrain.ts  # Agent runtime hosts
 │   └── index.ts     # Registry: exports all, derives Host type
 ├── scripts/         # Build + DX tooling
 │   ├── gen-skill-docs.ts  # Template → SKILL.md generator (config-driven)
 │   ├── host-config.ts     # HostConfig interface + validator
 │   ├── host-config-export.ts  # Shell bridge for setup script
 │   ├── host-adapters/     # Host-specific adapters (OpenClaw tool mapping)
-│   ├── resolvers/   # Template resolver modules (preamble, design, review, etc.)
+│   ├── resolvers/   # Template resolver modules (preamble, design, review, gbrain, etc.)
 │   ├── skill-check.ts     # Health dashboard
 │   └── dev-skill.ts       # Watch mode
 ├── test/            # Skill validation + eval tests

diff --git a/README.md b/README.md
@@ -110,7 +110,7 @@ These are conversational skills. Your OpenClaw agent runs them directly via chat
 
 ### Other AI Agents
 
-gstack works on 8 AI coding agents, not just Claude. Setup auto-detects which
+gstack works on 10 AI coding agents, not just Claude. Setup auto-detects which
 agents you have installed:
 
 ```bash
@@ -128,6 +128,8 @@ Or target a specific agent with `./setup --host <name>`:
 | Factory Droid | `--host factory` | `~/.factory/skills/gstack-*/` |
 | Slate | `--host slate` | `~/.slate/skills/gstack-*/` |
 | Kiro | `--host kiro` | `~/.kiro/skills/gstack-*/` |
+| Hermes | `--host hermes` | `~/.hermes/skills/gstack-*/` |
+| GBrain (mod) | `--host gbrain` | `~/.gbrain/skills/gstack-*/` |
 
 **Want to add support for another agent?** See [docs/ADDING_A_HOST.md](docs/ADDING_A_HOST.md).
 It's one TypeScript config file, zero code changes.
@@ -236,6 +238,10 @@ Each skill feeds into the next. `/office-hours` writes a design doc that `/plan-
 
 **[Deep dives with examples and philosophy for every skill →](docs/skills.md)**
 
+### Karpathy's four failure modes? Already covered.
+
+Andrej Karpathy's [AI coding rules](https://github.com/forrestchang/andrej-karpathy-skills) (17K stars) nail four failure modes: wrong assumptions, overcomplexity, orthogonal edits, imperative over declarative. gstack's workflow skills enforce all four. `/office-hours` forces assumptions into the open before code is written. The Confusion Protocol stops Claude from guessing on architectural decisions. `/review` catches unnecessary complexity and drive-by edits. `/ship` transforms tasks into verifiable goals with test-first execution. If you already use Karpathy-style CLAUDE.md rules, gstack is the workflow enforcement layer that makes them stick across entire sprints, not just single prompts.
+
 ## Parallel sprints
 
 gstack works well with one sprint. It gets interesting with ten running at once.

diff --git a/SKILL.md b/SKILL.md
@@ -11,6 +11,11 @@ allowed-tools:
   - Bash
   - Read
   - AskUserQuestion
+triggers:
+  - browse this page
+  - take a screenshot
+  - navigate to url
+  - inspect the page
 
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
@@ -255,6 +260,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.
 
+
+
 ## Voice
 
 **Tone:** direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.

diff --git a/SKILL.md.tmpl b/SKILL.md.tmpl
@@ -11,6 +11,11 @@ allowed-tools:
   - Bash
   - Read
   - AskUserQuestion
+triggers:
+  - browse this page
+  - take a screenshot
+  - navigate to url
+  - inspect the page
 
 ---
 

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.17.0.0
+0.18.0.0
diff --git a/autoplan/SKILL.md b/autoplan/SKILL.md
@@ -13,6 +13,10 @@ description: |
   gauntlet without answering 15-30 intermediate questions. (gstack)
   Voice triggers (speech-to-text aliases): "auto plan", "automatic review".
 benefits-from: [office-hours]
+triggers:
+  - run all reviews
+  - automatic review pipeline
+  - auto plan review
 allowed-tools:
   - Bash
   - Read
@@ -265,6 +269,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.
 
+
+
 ## Voice
 
 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -383,6 +389,19 @@ AI makes completeness near-free. Always recommend the complete option over short
 
 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).
 
+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Repo Ownership — See Something, Say Something
 
 `REPO_MODE` controls how to handle issues outside your branch:

diff --git a/autoplan/SKILL.md.tmpl b/autoplan/SKILL.md.tmpl
@@ -15,6 +15,10 @@ voice-triggers:
   - "auto plan"
   - "automatic review"
 benefits-from: [office-hours]
+triggers:
+  - run all reviews
+  - automatic review pipeline
+  - auto plan review
 allowed-tools:
   - Bash
   - Read

diff --git a/benchmark/SKILL.md b/benchmark/SKILL.md
@@ -9,6 +9,10 @@ description: |
   Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals",
   "bundle size", "load time". (gstack)
   Voice triggers (speech-to-text aliases): "speed test", "check performance".
+triggers:
+  - performance benchmark
+  - check page speed
+  - detect performance regression
 allowed-tools:
   - Bash
   - Read
@@ -258,6 +262,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.
 
+
+
 ## Voice
 
 **Tone:** direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.

diff --git a/benchmark/SKILL.md.tmpl b/benchmark/SKILL.md.tmpl
@@ -11,6 +11,10 @@ description: |
 voice-triggers:
   - "speed test"
   - "check performance"
+triggers:
+  - performance benchmark
+  - check page speed
+  - detect performance regression
 allowed-tools:
   - Bash
   - Read

diff --git a/bin/gstack-settings-hook b/bin/gstack-settings-hook
@@ -54,7 +54,7 @@ case "$ACTION" in
     " 2>/dev/null
     ;;
   remove)
-    [ -f "$SETTINGS_FILE" ] || exit 0
+    [ -f "$SETTINGS_FILE" ] || exit 1
     GSTACK_SETTINGS_PATH="$SETTINGS_FILE" bun -e "
       const fs = require('fs');
       const settingsPath = process.env.GSTACK_SETTINGS_PATH;

diff --git a/browse/SKILL.md b/browse/SKILL.md
@@ -9,6 +9,10 @@ description: |
   ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
   user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
   site", "take a screenshot", or "dogfood this". (gstack)
+triggers:
+  - browse a page
+  - headless browser
+  - take page screenshot
 allowed-tools:
   - Bash
   - Read
@@ -257,6 +261,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.
 
+
+
 ## Voice
 
 **Tone:** direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.

diff --git a/browse/SKILL.md.tmpl b/browse/SKILL.md.tmpl
@@ -9,6 +9,10 @@ description: |
   ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
   user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
   site", "take a screenshot", or "dogfood this". (gstack)
+triggers:
+  - browse a page
+  - headless browser
+  - take page screenshot
 allowed-tools:
   - Bash
   - Read

diff --git a/canary/SKILL.md b/canary/SKILL.md
@@ -14,6 +14,10 @@ allowed-tools:
   - Write
   - Glob
   - AskUserQuestion
+triggers:
+  - monitor after deploy
+  - canary check
+  - watch for errors post-deploy
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
@@ -257,6 +261,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.
 
+
+
 ## Voice
 
 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -375,6 +381,19 @@ AI makes completeness near-free. Always recommend the complete option over short
 
 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).
 
+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Completion Status Protocol
 
 When completing a skill workflow, report status using one of:

diff --git a/canary/SKILL.md.tmpl b/canary/SKILL.md.tmpl
@@ -14,6 +14,10 @@ allowed-tools:
   - Write
   - Glob
   - AskUserQuestion
+triggers:
+  - monitor after deploy
+  - canary check
+  - watch for errors post-deploy
 ---
 
 {{PREAMBLE}}

diff --git a/careful/SKILL.md b/careful/SKILL.md
@@ -7,6 +7,10 @@ description: |
   User can override each warning. Use when touching prod, debugging live systems,
   or working in a shared environment. Use when asked to "be careful", "safety mode",
   "prod mode", or "careful mode". (gstack)
+triggers:
+  - be careful
+  - warn before destructive
+  - safety mode
 allowed-tools:
   - Bash
   - Read