Skip to content

Commit 28c2e8b

Browse files
garrytanclaude
andauthored
fix: security hardening + issue triage (v0.8.3) (garrytan#205)
* fix: check for bun before running setup (garrytan#147) Users without bun installed got a cryptic "command not found" error. Now prints a clear message with install instructions. Closes garrytan#147 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF via URL validation in browse commands (garrytan#17) Adds validateNavigationUrl() that blocks non-HTTP(S) schemes (file://, javascript:, data:) and cloud metadata endpoints (169.254.169.254, metadata.google.internal). Applied to goto, diff, and newTab commands. Localhost and private IPs remain allowed for local dev QA. Closes garrytan#17 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: replace eval $(gstack-slug) with source <(...) (garrytan#133) Eliminates unnecessary use of eval across all skill templates and generated files. source <(...) has identical behavior without the shell injection surface. Also hardens gstack-diff-scope usage. Closes garrytan#133 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: rename /debug to /investigate to avoid Claude Code conflict (garrytan#190) Claude Code has a built-in /debug command that shadows the gstack skill. Renaming to /investigate which better reflects the systematic root-cause investigation methodology. Closes garrytan#190 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add unit tests for path validation helpers validateOutputPath() and validateReadPath() are security-critical functions with zero test coverage. Adds 14 tests covering safe paths, traversal attacks, and prefix collision edge cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.8.3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update /debug → /investigate references in docs CLAUDE.md, README.md, and docs/skills.md still referenced the old /debug skill name after the rename. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden URL validation against hostname bypasses (Codex P1) Codex review found that metadata IPs could be reached via hex (0xA9FEA9FE), decimal (2852039166), octal, trailing dot, and IPv6 bracket forms. Now normalizes hostnames before checking the blocklist and probes numeric IP representations via URL constructor. Also moves URL validation before page allocation in newTab() to prevent zombie tabs on rejection (Codex P3). 5 new test cases for bypass variants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3528336 commit 28c2e8b

45 files changed

Lines changed: 311 additions & 89 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,14 @@
1818
- **`skip_eng_review` respected everywhere.** If you've opted out of eng review globally, the chaining recommendations won't nag you about it.
1919
- **Design review lite now tracks commits too.** The lightweight design check that runs inside `/review` and `/ship` gets the same staleness tracking as full reviews.
2020

21+
### Fixed
22+
23+
- **Browse no longer navigates to dangerous URLs.** `goto`, `diff`, and `newtab` now block `file://`, `javascript:`, `data:` schemes and cloud metadata endpoints (`169.254.169.254`, `metadata.google.internal`). Localhost and private IPs are still allowed for local QA testing. (Closes #17)
24+
- **Setup script tells you what's missing.** Running `./setup` without `bun` installed now shows a clear error with install instructions instead of a cryptic "command not found." (Closes #147)
25+
- **`/debug` renamed to `/investigate`.** Claude Code has a built-in `/debug` command that shadowed the gstack skill. The systematic root-cause debugging workflow now lives at `/investigate`. (Closes #190)
26+
- **Shell injection surface removed.** All skill templates now use `source <(gstack-slug)` instead of `eval $(gstack-slug)`. Same behavior, no `eval`. (Closes #133)
27+
- **25 new security tests.** URL validation (16 tests) and path traversal validation (14 tests) now have dedicated unit test suites covering scheme blocking, metadata IP blocking, directory escapes, and prefix collision edge cases.
28+
2129
## [0.8.2] - 2026-03-19
2230

2331
### Added
@@ -86,7 +94,6 @@ When both `/review` (Claude) and `/codex review` have run, you get a cross-model
8694
### Fixed
8795

8896
- `/debug` and `/office-hours` were completely invisible to natural language — no trigger phrases at all. Now both have full reactive + proactive triggers.
89-
>>>>>>> origin/main
9097

9198
## [0.7.0] - 2026-03-18 — YC Office Hours
9299

@@ -124,7 +131,7 @@ When something is broken and you don't know why, `/debug` is your systematic deb
124131
### Added
125132

126133
- **Every PR touching frontend code now gets a design review automatically.** `/review` and `/ship` apply a 20-item design checklist against changed CSS, HTML, JSX, and view files. Catches AI slop patterns (purple gradients, 3-column icon grids, generic hero copy), typography issues (body text < 16px, blacklisted fonts), accessibility gaps (`outline: none`), and `!important` abuse. Mechanical CSS fixes are auto-applied; design judgment calls ask you first.
127-
- **`gstack-diff-scope` categorizes what changed in your branch.** Run `eval $(gstack-diff-scope main)` and get `SCOPE_FRONTEND=true/false`, `SCOPE_BACKEND`, `SCOPE_PROMPTS`, `SCOPE_TESTS`, `SCOPE_DOCS`, `SCOPE_CONFIG`. Design review uses it to skip silently on backend-only PRs. Ship pre-flight uses it to recommend design review when frontend files are touched.
134+
- **`gstack-diff-scope` categorizes what changed in your branch.** Run `source <(gstack-diff-scope main)` and get `SCOPE_FRONTEND=true/false`, `SCOPE_BACKEND`, `SCOPE_PROMPTS`, `SCOPE_TESTS`, `SCOPE_DOCS`, `SCOPE_CONFIG`. Design review uses it to skip silently on backend-only PRs. Ship pre-flight uses it to recommend design review when frontend files are touched.
128135
- **Design review shows up in the Review Readiness Dashboard.** The dashboard now distinguishes between "LITE" (code-level, runs automatically in /review and /ship) and "FULL" (visual audit via /plan-design-review with browse binary). Both show up as Design Review entries.
129136
- **E2E eval for design review detection.** Planted CSS/HTML fixtures with 7 known anti-patterns (Papyrus font, 14px body text, `outline: none`, `!important`, purple gradient, generic hero copy, 3-column feature grid). The eval verifies `/review` catches at least 4 of 7.
130137

@@ -240,7 +247,7 @@ Read the philosophy: https://garryslist.org/posts/boil-the-ocean
240247
## 0.5.1 — 2026-03-17
241248
- **Know where you stand before you ship.** Every `/plan-ceo-review`, `/plan-eng-review`, and `/plan-design-review` now logs its result to a review tracker. At the end of each review, you see a **Review Readiness Dashboard** showing which reviews are done, when they ran, and whether they're clean — with a clear CLEARED TO SHIP or NOT READY verdict.
242249
- **`/ship` checks your reviews before creating the PR.** Pre-flight now reads the dashboard and asks if you want to continue when reviews are missing. Informational only — it won't block you, but you'll know what you skipped.
243-
- **One less thing to copy-paste.** The SLUG computation (that opaque sed pipeline for computing `owner-repo` from git remote) is now a shared `bin/gstack-slug` helper. All 14 inline copies across templates replaced with `eval $(gstack-slug)`. If the format ever changes, fix it once.
250+
- **One less thing to copy-paste.** The SLUG computation (that opaque sed pipeline for computing `owner-repo` from git remote) is now a shared `bin/gstack-slug` helper. All 14 inline copies across templates replaced with `source <(gstack-slug)`. If the format ever changes, fix it once.
244251
- **Screenshots are now visible during QA and browse sessions.** When gstack takes screenshots, they now show up as clickable image elements in your output — no more invisible `/tmp/browse-screenshot.png` paths you can't see. Works in `/qa`, `/qa-only`, `/plan-design-review`, `/qa-design-review`, `/browse`, and `/gstack`.
245252

246253
### For contributors

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ gstack/
5959
├── plan-ceo-review/ # /plan-ceo-review skill
6060
├── plan-eng-review/ # /plan-eng-review skill
6161
├── office-hours/ # /office-hours skill (YC Office Hours — startup diagnostic + builder brainstorm)
62-
├── debug/ # /debug skill (systematic root-cause debugging)
62+
├── investigate/ # /investigate skill (systematic root-cause debugging)
6363
├── retro/ # Retrospective skill
6464
├── document-release/ # /document-release skill (post-ship doc updates)
6565
├── setup # One-time setup: build binary + symlink skills

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,11 +48,11 @@ Expect first useful run in under 5 minutes on any repo with tests already set up
4848

4949
Open Claude Code and paste this. Claude does the rest.
5050

51-
> Install gstack: run **`git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /retro, /debug, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade. Then ask the user if they also want to add gstack to the current project so teammates get it.
51+
> Install gstack: run **`git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /retro, /investigate, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade. Then ask the user if they also want to add gstack to the current project so teammates get it.
5252
5353
### Step 2: Add to your repo so teammates get it (optional)
5454

55-
> Add gstack to this project: run **`cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`** then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /retro, /debug, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
55+
> Add gstack to this project: run **`cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`** then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /retro, /investigate, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
5656
5757
Real files get committed to your repo (not a submodule), so `git clone` just works. Everything lives inside `.claude/`. Nothing touches your PATH or runs in the background.
5858

@@ -117,7 +117,7 @@ One sprint, one person, one feature — that takes about 30 minutes with gstack.
117117
| `/plan-design-review` | **Senior Designer** | Rates each design dimension 0-10, explains what a 10 looks like, then edits the plan to get there. AI Slop detection. Interactive — one AskUserQuestion per design choice. |
118118
| `/design-consultation` | **Design Partner** | Build a complete design system from scratch. Knows the landscape, proposes creative risks, generates realistic product mockups. Design at the heart of all other phases. |
119119
| `/review` | **Staff Engineer** | Find the bugs that pass CI but blow up in production. Auto-fixes the obvious ones. Flags completeness gaps. |
120-
| `/debug` | **Debugger** | Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes. |
120+
| `/investigate` | **Debugger** | Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes. |
121121
| `/design-review` | **Designer Who Codes** | Same audit as /plan-design-review, then fixes what it finds. Atomic commits, before/after screenshots. |
122122
| `/qa` | **QA Lead** | Test your app, find bugs, fix them with atomic commits, re-verify. Auto-generates regression tests for every fix. |
123123
| `/qa-only` | **QA Reporter** | Same methodology as /qa but report only. Use when you want a pure bug report without code changes. |
@@ -158,7 +158,7 @@ One sprint, one person, one feature — that takes about 30 minutes with gstack.
158158

159159
**Multi-AI second opinion.** `/codex` gets an independent review from OpenAI's Codex CLI — a completely different AI looking at the same diff. Three modes: code review with a pass/fail gate, adversarial challenge that actively tries to break your code, and open consultation with session continuity. When both `/review` (Claude) and `/codex` (OpenAI) have reviewed the same branch, you get a cross-model analysis showing which findings overlap and which are unique to each.
160160

161-
**Safety guardrails on demand.** Say "be careful" and `/careful` warns before any destructive command — rm -rf, DROP TABLE, force-push, git reset --hard. `/freeze` locks edits to one directory while debugging so Claude can't accidentally "fix" unrelated code. `/guard` activates both. `/debug` auto-freezes to the module being investigated.
161+
**Safety guardrails on demand.** Say "be careful" and `/careful` warns before any destructive command — rm -rf, DROP TABLE, force-push, git reset --hard. `/freeze` locks edits to one directory while debugging so Claude can't accidentally "fix" unrelated code. `/guard` activates both. `/investigate` auto-freezes to the module being investigated.
162162

163163
**Proactive skill suggestions.** gstack notices what stage you're in — brainstorming, reviewing, debugging, testing — and suggests the right skill. Don't like it? Say "stop suggesting" and it remembers across sessions.
164164

@@ -213,7 +213,7 @@ Fifteen specialists and six power tools. All slash commands. All Markdown. All f
213213
Use /browse from gstack for all web browsing. Never use mcp__claude-in-chrome__* tools.
214214
Available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review,
215215
/design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review,
216-
/setup-browser-cookies, /retro, /debug, /document-release, /codex, /careful,
216+
/setup-browser-cookies, /retro, /investigate, /document-release, /codex, /careful,
217217
/freeze, /guard, /unfreeze, /gstack-upgrade.
218218
```
219219

SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ description: |
1515
- Reviewing a plan (architecture) → suggest /plan-eng-review
1616
- Reviewing a plan (design) → suggest /plan-design-review
1717
- Creating a design system → suggest /design-consultation
18-
- Debugging errors → suggest /debug
18+
- Debugging errors → suggest /investigate
1919
- Testing the app → suggest /qa
2020
- Code review before merge → suggest /review
2121
- Visual design audit → suggest /design-review

SKILL.md.tmpl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ description: |
1515
- Reviewing a plan (architecture) → suggest /plan-eng-review
1616
- Reviewing a plan (design) → suggest /plan-design-review
1717
- Creating a design system → suggest /design-consultation
18-
- Debugging errors → suggest /debug
18+
- Debugging errors → suggest /investigate
1919
- Testing the app → suggest /qa
2020
- Code review before merge → suggest /review
2121
- Visual design audit → suggest /design-review

TODOS.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -512,25 +512,25 @@ Shipped as `/careful`, `/freeze`, `/guard`, and `/unfreeze` in v0.6.5. Includes
512512

513513
Shipped in v0.6.5. TemplateContext in gen-skill-docs.ts bakes skill name into preamble telemetry line. Analytics CLI (`bun run analytics`) for querying. /retro integration shows skills-used-this-week.
514514

515-
### /debug scoped debugging enhancements (gated on telemetry)
515+
### /investigate scoped debugging enhancements (gated on telemetry)
516516

517-
**What:** Six enhancements to /debug auto-freeze, contingent on telemetry showing the freeze hook actually fires in real debugging sessions.
517+
**What:** Six enhancements to /investigate auto-freeze, contingent on telemetry showing the freeze hook actually fires in real debugging sessions.
518518

519-
**Why:** /debug v0.7.1 auto-freezes edits to the module being debugged. If telemetry shows the hook fires often, these enhancements make the experience smarter. If it never fires, the problem wasn't real and these aren't worth building.
519+
**Why:** /investigate v0.7.1 auto-freezes edits to the module being debugged. If telemetry shows the hook fires often, these enhancements make the experience smarter. If it never fires, the problem wasn't real and these aren't worth building.
520520

521-
**Context:** All items are prose additions to `debug/SKILL.md.tmpl`. No new scripts.
521+
**Context:** All items are prose additions to `investigate/SKILL.md.tmpl`. No new scripts.
522522

523523
**Items:**
524524
1. Stack trace auto-detection for freeze directory (parse deepest app frame)
525525
2. Freeze boundary widening (ask to widen instead of hard-block when hitting boundary)
526526
3. Post-fix auto-unfreeze + full test suite run
527527
4. Debug instrumentation cleanup (tag with DEBUG-TEMP, remove before commit)
528-
5. Debug session persistence (~/.gstack/debug-sessions/ — save investigation for reuse)
528+
5. Debug session persistence (~/.gstack/investigate-sessions/ — save investigation for reuse)
529529
6. Investigation timeline in debug report (hypothesis log with timing)
530530

531531
**Effort:** M (all 6 combined)
532532
**Priority:** P3
533-
**Depends on:** Telemetry data showing freeze hook fires in real /debug sessions
533+
**Depends on:** Telemetry data showing freeze hook fires in real /investigate sessions
534534

535535
## Completed
536536

bin/gstack-diff-scope

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/usr/bin/env bash
22
# gstack-diff-scope — categorize what changed in the diff against a base branch
3-
# Usage: eval $(gstack-diff-scope main) → sets SCOPE_FRONTEND=true SCOPE_BACKEND=false ...
3+
# Usage: source <(gstack-diff-scope main) → sets SCOPE_FRONTEND=true SCOPE_BACKEND=false ...
44
# Or: gstack-diff-scope main → prints SCOPE_*=... lines
55
set -euo pipefail
66

bin/gstack-slug

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/usr/bin/env bash
22
# gstack-slug — output project slug and sanitized branch name
3-
# Usage: eval $(gstack-slug) → sets SLUG and BRANCH variables
3+
# Usage: source <(gstack-slug) → sets SLUG and BRANCH variables
44
# Or: gstack-slug → prints SLUG=... and BRANCH=... lines
55
set -euo pipefail
66
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')

browse/src/browser-manager.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717

1818
import { chromium, type Browser, type BrowserContext, type BrowserContextOptions, type Page, type Locator, type Cookie } from 'playwright';
1919
import { addConsoleEntry, addNetworkEntry, addDialogEntry, networkBuffer, type DialogEntry } from './buffers';
20+
import { validateNavigationUrl } from './url-validation';
2021

2122
export interface RefEntry {
2223
locator: Locator;
@@ -119,6 +120,11 @@ export class BrowserManager {
119120
async newTab(url?: string): Promise<number> {
120121
if (!this.context) throw new Error('Browser not launched');
121122

123+
// Validate URL before allocating page to avoid zombie tabs on rejection
124+
if (url) {
125+
validateNavigationUrl(url);
126+
}
127+
122128
const page = await this.context.newPage();
123129
const id = this.nextTabId++;
124130
this.pages.set(id, page);

browse/src/meta-commands.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,15 @@ import type { BrowserManager } from './browser-manager';
66
import { handleSnapshot } from './snapshot';
77
import { getCleanText } from './read-commands';
88
import { READ_COMMANDS, WRITE_COMMANDS, META_COMMANDS } from './commands';
9+
import { validateNavigationUrl } from './url-validation';
910
import * as Diff from 'diff';
1011
import * as fs from 'fs';
1112
import * as path from 'path';
1213

1314
// Security: Path validation to prevent path traversal attacks
1415
const SAFE_DIRECTORIES = ['/tmp', process.cwd()];
1516

16-
function validateOutputPath(filePath: string): void {
17+
export function validateOutputPath(filePath: string): void {
1718
const resolved = path.resolve(filePath);
1819
const isSafe = SAFE_DIRECTORIES.some(dir => resolved === dir || resolved.startsWith(dir + '/'));
1920
if (!isSafe) {
@@ -221,9 +222,11 @@ export async function handleMetaCommand(
221222
if (!url1 || !url2) throw new Error('Usage: browse diff <url1> <url2>');
222223

223224
const page = bm.getPage();
225+
validateNavigationUrl(url1);
224226
await page.goto(url1, { waitUntil: 'domcontentloaded', timeout: 15000 });
225227
const text1 = await getCleanText(page);
226228

229+
validateNavigationUrl(url2);
227230
await page.goto(url2, { waitUntil: 'domcontentloaded', timeout: 15000 });
228231
const text2 = await getCleanText(page);
229232

0 commit comments

Comments
 (0)