Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions .github/workflows/windows-smoke.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Windows Smoke CI — Phase 1 of the phased rollout in docs/designs/WINDOWS_CI.md
#
# Answers one question per run: "does the code path through a Windows-critical
# module actually run on Windows." That's deliberately a lower bar than "does
# every test pass" — it catches the class of bugs where Linux/macOS CI runs
# green but a Windows user immediately hits ENOENT / "browse binary not found"
# / silent mislocations of ~/.gstack/ state.
#
# Coverage catch list (see RFC for full reasoning):
# - Build fails to produce .exe on Windows (catches #1013 / #1024)
# - Binary-resolution probes wrong filename (catches #1118 / #1094)
# - Shebang bash script spawn fails (catches #1119)
# - Sensitive files written without ACL restriction (catches #1121)
# - { mode: 0o600 } silently ignored on Windows (catches Pre-#1121 state)
#
# Miss: #1120-style home-directory fallback — no direct unit test. RFC
# proposes adding one as a follow-on.
name: windows-smoke
on:
pull_request:
branches: [main]
paths:
- 'browse/**'
- 'make-pdf/**'
- 'design/**'
- 'scripts/**'
- 'bin/**'
- 'package.json'
- 'bun.lockb'
- '.github/workflows/windows-smoke.yml'
push:
branches: [main]
paths:
- 'browse/**'
- 'make-pdf/**'
- 'design/**'
- 'scripts/**'
- 'bin/**'
- 'package.json'
- 'bun.lockb'
workflow_dispatch:

concurrency:
group: windows-smoke-${{ github.head_ref || github.ref }}
cancel-in-progress: true

jobs:
smoke:
runs-on: windows-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4

- uses: oven-sh/setup-bun@v2
with:
bun-version: latest

- name: Install dependencies
run: bun install --frozen-lockfile

- name: Build binaries
run: bun run build

- name: Assert Windows binary layout
shell: pwsh
run: |
$missing = @()
foreach ($p in @(
'browse/dist/browse.exe',
'browse/dist/find-browse.exe',
'browse/dist/server-node.mjs',
'make-pdf/dist/pdf.exe',
'design/dist/design.exe'
)) { if (-not (Test-Path $p)) { $missing += $p } }
if ($missing.Count -gt 0) {
Write-Error "Missing build artifacts: $($missing -join ', ')"
exit 1
}


- name: Windows-specific unit tests
# Single bun test invocation with all files so a failure in any
# file correctly fails the step. Separate invocations + default
# PowerShell error-handling would mask all-but-the-last failure.
run: bun test browse/test/security.test.ts browse/test/file-permissions.test.ts browse/test/home-dir-resolution.test.ts make-pdf/test/browseClient.test.ts make-pdf/test/pdftotext.test.ts

- name: make-pdf render smoke
run: bun test make-pdf/test/render.test.ts
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ bin/gstack-global-discover
.slate/
.cursor/
.openclaw/
.hermes/
.gbrain/
.context/
extension/.auth.json
.gstack-worktrees/
Expand Down
2 changes: 2 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,8 @@ Templates contain the workflows, tips, and examples that require human judgment.
| `{{DESIGN_SETUP}}` | `resolvers/design.ts` | Discovery pattern for `$D` design binary, mirrors `{{BROWSE_SETUP}}` |
| `{{DESIGN_SHOTGUN_LOOP}}` | `resolvers/design.ts` | Shared comparison board feedback loop for /design-shotgun, /plan-design-review, /design-consultation |
| `{{UX_PRINCIPLES}}` | `resolvers/design.ts` | User behavioral foundations (scanning, satisficing, goodwill reservoir, trunk test) for /design-html, /design-shotgun, /design-review, /plan-design-review |
| `{{GBRAIN_CONTEXT_LOAD}}` | `resolvers/gbrain.ts` | Brain-first context search with keyword extraction, health awareness, and data-research routing. Injected into 10 brain-aware skills. Suppressed on non-brain hosts. |
| `{{GBRAIN_SAVE_RESULTS}}` | `resolvers/gbrain.ts` | Post-skill brain persistence with entity enrichment, throttle handling, and per-skill save instructions. 8 skill-specific save formats. |

This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear.

Expand Down
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# Changelog

## [0.18.0.0] - 2026-04-15

### Added
- **Confusion Protocol.** Every workflow skill now has an inline ambiguity gate. When Claude hits a decision that could go two ways (which architecture? which data model? destructive operation with unclear scope?), it stops and asks instead of guessing. Scoped to high-stakes decisions only, so it doesn't slow down routine coding. Addresses Karpathy's #1 AI coding failure mode.
- **Hermes host support.** gstack now generates skill docs for [Hermes Agent](https://github.com/nousresearch/hermes-agent) with proper tool rewrites (`terminal`, `read_file`, `patch`, `delegate_task`). `./setup --host hermes` prints integration instructions.
- **GBrain host + brain-first resolver.** GBrain is a "mod" for gstack. When installed, your coding skills become brain-aware: they search your brain for relevant context before starting and save results to your brain after finishing. 10 skills are now brain-aware: /office-hours, /investigate, /plan-ceo-review, /retro, /ship, /qa, /design-review, /plan-eng-review, /cso, and /design-consultation. Compatible with GBrain >= v0.10.0.
- **GBrain v0.10.0 integration.** Agent instructions now use `gbrain search` (fast keyword lookup) instead of `gbrain query` (expensive hybrid). Every command shows full CLI syntax with `--title`, `--tags`, and heredoc examples. Keyword extraction guidance helps agents search effectively. Entity enrichment auto-creates stub pages for people and companies mentioned in skill output. Throttle errors are named so agents can detect and handle them. A preamble health check runs `gbrain doctor --fast --json` at session start and names failing checks when the brain is degraded.
- **Skill triggers for GBrain router.** All 38 skill templates now include `triggers:` arrays in their frontmatter, multi-word keywords like "debug this", "ship it", "brainstorm this". These power GBrain's RESOLVER.md skill router and pass `checkResolvable()` validation. Distinct from `voice-triggers:` (speech-to-text aliases).
- **Hermes brain support.** Hermes agents with GBrain installed as a mod now get brain features automatically. The resolver fallback logic ("if GBrain is not available, proceed without") handles non-GBrain Hermes installs gracefully.
- **slop:diff in /review.** Every code review now runs `bun run slop:diff` as an advisory diagnostic, catching AI code quality issues (empty catches, redundant abstractions, overcomplicated patterns) before they land. Informational only, never blocking.
- **Karpathy compatibility.** README now positions gstack as the workflow enforcement layer for [Karpathy-style CLAUDE.md rules](https://github.com/forrestchang/andrej-karpathy-skills) (17K stars). Maps each failure mode to the gstack skill that addresses it.

### Changed
- **CEO review HARD GATE reinforcement.** "Do NOT make any code changes. Review only." now repeats at every STOP point (12 locations), not just the top. Prompt repetition measurably reduces the "starts implementing" failure mode.
- **Office-hours design doc visibility.** After writing the design doc, the skill now prints the full path so downstream skills (/plan-ceo-review, /plan-eng-review) can find it.
- **Investigate investigation history.** Each investigation now logs to the learnings system with `type: "investigation"` and affected file paths. Future investigations on the same files surface prior root causes automatically. Recurring bugs in the same area = architectural smell.
- **Retro non-git context.** If `~/.gstack/retro-context.md` exists, the retro now reads it for meeting notes, calendar events, and decisions that don't appear in git history.
- **Native OpenClaw skills improved.** The 4 hand-crafted ClawHub skills (office-hours, ceo-review, investigate, retro) now mirror the template improvements above.
- **Host count: 8 to 10.** Hermes and GBrain join Claude, Codex, Factory, Kiro, OpenCode, Slate, Cursor, and OpenClaw.

## [0.17.0.0] - 2026-04-14

### Added
Expand Down
5 changes: 3 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,14 +68,15 @@ gstack/
├── hosts/ # Typed host configs (one per AI agent)
│ ├── claude.ts # Primary host config
│ ├── codex.ts, factory.ts, kiro.ts # Existing hosts
│ ├── opencode.ts, slate.ts, cursor.ts, openclaw.ts # New hosts
│ ├── opencode.ts, slate.ts, cursor.ts, openclaw.ts # IDE hosts
│ ├── hermes.ts, gbrain.ts # Agent runtime hosts
│ └── index.ts # Registry: exports all, derives Host type
├── scripts/ # Build + DX tooling
│ ├── gen-skill-docs.ts # Template → SKILL.md generator (config-driven)
│ ├── host-config.ts # HostConfig interface + validator
│ ├── host-config-export.ts # Shell bridge for setup script
│ ├── host-adapters/ # Host-specific adapters (OpenClaw tool mapping)
│ ├── resolvers/ # Template resolver modules (preamble, design, review, etc.)
│ ├── resolvers/ # Template resolver modules (preamble, design, review, gbrain, etc.)
│ ├── skill-check.ts # Health dashboard
│ └── dev-skill.ts # Watch mode
├── test/ # Skill validation + eval tests
Expand Down
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ These are conversational skills. Your OpenClaw agent runs them directly via chat

### Other AI Agents

gstack works on 8 AI coding agents, not just Claude. Setup auto-detects which
gstack works on 10 AI coding agents, not just Claude. Setup auto-detects which
agents you have installed:

```bash
Expand All @@ -128,6 +128,8 @@ Or target a specific agent with `./setup --host <name>`:
| Factory Droid | `--host factory` | `~/.factory/skills/gstack-*/` |
| Slate | `--host slate` | `~/.slate/skills/gstack-*/` |
| Kiro | `--host kiro` | `~/.kiro/skills/gstack-*/` |
| Hermes | `--host hermes` | `~/.hermes/skills/gstack-*/` |
| GBrain (mod) | `--host gbrain` | `~/.gbrain/skills/gstack-*/` |

**Want to add support for another agent?** See [docs/ADDING_A_HOST.md](docs/ADDING_A_HOST.md).
It's one TypeScript config file, zero code changes.
Expand Down Expand Up @@ -236,6 +238,10 @@ Each skill feeds into the next. `/office-hours` writes a design doc that `/plan-

**[Deep dives with examples and philosophy for every skill →](docs/skills.md)**

### Karpathy's four failure modes? Already covered.

Andrej Karpathy's [AI coding rules](https://github.com/forrestchang/andrej-karpathy-skills) (17K stars) nail four failure modes: wrong assumptions, overcomplexity, orthogonal edits, imperative over declarative. gstack's workflow skills enforce all four. `/office-hours` forces assumptions into the open before code is written. The Confusion Protocol stops Claude from guessing on architectural decisions. `/review` catches unnecessary complexity and drive-by edits. `/ship` transforms tasks into verifiable goals with test-first execution. If you already use Karpathy-style CLAUDE.md rules, gstack is the workflow enforcement layer that makes them stick across entire sprints, not just single prompts.

## Parallel sprints

gstack works well with one sprint. It gets interesting with ten running at once.
Expand Down
7 changes: 7 additions & 0 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ allowed-tools:
- Bash
- Read
- AskUserQuestion
triggers:
- browse this page
- take a screenshot
- navigate to url
- inspect the page

---
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
Expand Down Expand Up @@ -255,6 +260,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
- Focus on completing the task and reporting results via prose output.
- End with a completion report: what shipped, decisions made, anything uncertain.



## Voice

**Tone:** direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.
Expand Down
5 changes: 5 additions & 0 deletions SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ allowed-tools:
- Bash
- Read
- AskUserQuestion
triggers:
- browse this page
- take a screenshot
- navigate to url
- inspect the page

---

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.17.0.0
0.18.0.0
19 changes: 19 additions & 0 deletions autoplan/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ description: |
gauntlet without answering 15-30 intermediate questions. (gstack)
Voice triggers (speech-to-text aliases): "auto plan", "automatic review".
benefits-from: [office-hours]
triggers:
- run all reviews
- automatic review pipeline
- auto plan review
allowed-tools:
- Bash
- Read
Expand Down Expand Up @@ -265,6 +269,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
- Focus on completing the task and reporting results via prose output.
- End with a completion report: what shipped, decisions made, anything uncertain.



## Voice

You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
Expand Down Expand Up @@ -383,6 +389,19 @@ AI makes completeness near-free. Always recommend the complete option over short

Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

## Confusion Protocol

When you encounter high-stakes ambiguity during coding:
- Two plausible architectures or data models for the same requirement
- A request that contradicts existing patterns and you're unsure which to follow
- A destructive operation where the scope is unclear
- Missing context that would change your approach significantly

STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
Ask the user. Do not guess on architectural or data model decisions.

This does NOT apply to routine coding, small features, or obvious changes.

## Repo Ownership — See Something, Say Something

`REPO_MODE` controls how to handle issues outside your branch:
Expand Down
4 changes: 4 additions & 0 deletions autoplan/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ voice-triggers:
- "auto plan"
- "automatic review"
benefits-from: [office-hours]
triggers:
- run all reviews
- automatic review pipeline
- auto plan review
allowed-tools:
- Bash
- Read
Expand Down
6 changes: 6 additions & 0 deletions benchmark/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ description: |
Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals",
"bundle size", "load time". (gstack)
Voice triggers (speech-to-text aliases): "speed test", "check performance".
triggers:
- performance benchmark
- check page speed
- detect performance regression
allowed-tools:
- Bash
- Read
Expand Down Expand Up @@ -258,6 +262,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
- Focus on completing the task and reporting results via prose output.
- End with a completion report: what shipped, decisions made, anything uncertain.



## Voice

**Tone:** direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.
Expand Down
4 changes: 4 additions & 0 deletions benchmark/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ description: |
voice-triggers:
- "speed test"
- "check performance"
triggers:
- performance benchmark
- check page speed
- detect performance regression
allowed-tools:
- Bash
- Read
Expand Down
2 changes: 1 addition & 1 deletion bin/gstack-settings-hook
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ case "$ACTION" in
" 2>/dev/null
;;
remove)
[ -f "$SETTINGS_FILE" ] || exit 0
[ -f "$SETTINGS_FILE" ] || exit 1
GSTACK_SETTINGS_PATH="$SETTINGS_FILE" bun -e "
const fs = require('fs');
const settingsPath = process.env.GSTACK_SETTINGS_PATH;
Expand Down
6 changes: 6 additions & 0 deletions browse/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ description: |
~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
site", "take a screenshot", or "dogfood this". (gstack)
triggers:
- browse a page
- headless browser
- take page screenshot
allowed-tools:
- Bash
- Read
Expand Down Expand Up @@ -257,6 +261,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
- Focus on completing the task and reporting results via prose output.
- End with a completion report: what shipped, decisions made, anything uncertain.



## Voice

**Tone:** direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.
Expand Down
4 changes: 4 additions & 0 deletions browse/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ description: |
~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
site", "take a screenshot", or "dogfood this". (gstack)
triggers:
- browse a page
- headless browser
- take page screenshot
allowed-tools:
- Bash
- Read
Expand Down
19 changes: 19 additions & 0 deletions canary/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ allowed-tools:
- Write
- Glob
- AskUserQuestion
triggers:
- monitor after deploy
- canary check
- watch for errors post-deploy
---
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
Expand Down Expand Up @@ -257,6 +261,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
- Focus on completing the task and reporting results via prose output.
- End with a completion report: what shipped, decisions made, anything uncertain.



## Voice

You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
Expand Down Expand Up @@ -375,6 +381,19 @@ AI makes completeness near-free. Always recommend the complete option over short

Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

## Confusion Protocol

When you encounter high-stakes ambiguity during coding:
- Two plausible architectures or data models for the same requirement
- A request that contradicts existing patterns and you're unsure which to follow
- A destructive operation where the scope is unclear
- Missing context that would change your approach significantly

STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
Ask the user. Do not guess on architectural or data model decisions.

This does NOT apply to routine coding, small features, or obvious changes.

## Completion Status Protocol

When completing a skill workflow, report status using one of:
Expand Down
4 changes: 4 additions & 0 deletions canary/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ allowed-tools:
- Write
- Glob
- AskUserQuestion
triggers:
- monitor after deploy
- canary check
- watch for errors post-deploy
---

{{PREAMBLE}}
Expand Down
4 changes: 4 additions & 0 deletions careful/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ description: |
User can override each warning. Use when touching prod, debugging live systems,
or working in a shared environment. Use when asked to "be careful", "safety mode",
"prod mode", or "careful mode". (gstack)
triggers:
- be careful
- warn before destructive
- safety mode
allowed-tools:
- Bash
- Read
Expand Down
Loading
Loading