Skip to content

Commit b0d6578

Browse files
garrytanHMAKT99Ty-Robbclaudeorkun1675
authored
fix: community security + stability fixes (wave 1) (garrytan#325)
* feat: add /cso skill — OWASP Top 10 + STRIDE security audit * fix: harden gstack-slug against shell injection via eval Whitelist safe characters (a-zA-Z0-9._-) in SLUG and BRANCH output to prevent shell metacharacter injection when used with eval. Only affects self-hosted git servers with lax naming rules — GitHub and GitLab enforce safe characters already. Defense-in-depth. * fix(security): sanitize gstack-slug output against shell injection The gstack-slug script is consumed via eval $(gstack-slug) throughout skill templates. If a git remote URL contains shell metacharacters like $(), backticks, or semicolons, they would be executed by eval. Fix: strip all characters except [a-zA-Z0-9._-] from both SLUG and BRANCH before output. This preserves normal values while neutralizing any injection payload in malicious remote URLs. Before: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → executes rm After: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → SLUG=foo-barrm-rf- * fix(security): redact sensitive values in storage command output The browse `storage` command dumps all localStorage and sessionStorage as JSON. This can expose tokens, API keys, JWTs, and session credentials in QA reports and agent transcripts. Fix: redact values where the key matches sensitive patterns (token, secret, key, password, auth, jwt, csrf) or the value starts with known credential prefixes (eyJ for JWT, sk- for Stripe, ghp_ for GitHub, etc.). Redacted values show length to aid debugging: [REDACTED — 128 chars] * fix(browse): kill old server before restart to prevent orphaned chromium processes When the health check fails or the server connection drops, `ensureServer()` and `sendCommand()` would call `startServer()` without first killing the previous server process. This left orphaned `chrome-headless-shell` renderer processes running at ~120% CPU each. After several reconnect cycles (e.g. pages that crash during hydration or trigger hard navigations via `window.location.href`), dozens of zombie chromium processes accumulate and exhaust system resources. Fix: call `killServer()` on the stale PID before spawning a new server in both the `ensureServer()` unhealthy path and the `sendCommand()` connection- lost retry path. Fixes garrytan#294 * Fix YAML linter error: nested mapping in compact sequence entries Having "Run: bun" inside a plain scalar is not allowed per YAML spec which states: Plain scalars must never contain the “: ” and “ #” character combinations. This simple fix switches to block scalars (|) to eliminate the ambiguity without changing runtime behavior. * fix(security): add Azure metadata endpoint to SSRF blocklist Add metadata.azure.internal to BLOCKED_METADATA_HOSTS alongside the existing AWS/GCP endpoints. Closes the coverage gap identified in garrytan#125. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add coverage for storage redaction Test key-based redaction (auth_token, api_key), value-based redaction (JWT prefix, GitHub PAT prefix), pass-through for normal keys, and length preservation in redacted output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add community PR triage process to CONTRIBUTING.md Document the wave-based PR triage pattern used for batching community contributions. References PR garrytan#205 (v0.8.3) as the original example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: adjust test key names to avoid redaction pattern collision Rename testKey→testData and normalKey→displayName in storage tests to avoid triggering garrytan#238's SENSITIVE_KEY regex (which matches 'key'). Also generate Codex variant of /cso skill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.9.10.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: zero-noise /cso security audits with FP filtering (v0.11.0.0) Absorb Anthropic's security-review false positive filtering into /cso: - 17 hard exclusions (DOS, test files, log spoofing, SSRF path-only, regex injection, race conditions unless concrete, etc.) - 9 precedents (React XSS-safe, env vars trusted, client-side code doesn't need auth, shell scripts need concrete untrusted input path) - 8/10 confidence gate — below threshold = don't report - Independent sub-agent verification for each finding - Exploit scenario requirement per finding - Framework-aware analysis (Rails CSRF, React escaping, Angular sanitization) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: consolidate CHANGELOG — merge /cso launch + community wave into v0.11.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite README — lead with Karpathy quote, cut LinkedIn phrases, add /cso Opens with the revolution (Karpathy, Steinberger/OpenClaw), keeps credentials and LOC numbers, cuts filler phrases, adds hater bait, restores hiring block, removes bloated "What's new" section, adds /cso to skills table and install. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(cso): adversarial review fixes — FP filtering, prompt injection, language coverage - Exclusion garrytan#10: test files must verify not imported by non-test code - Exclusion garrytan#13: distinguish user-message AI input from system-prompt injection - Exclusion garrytan#14: ReDoS in user-input regex IS a real CVE class, don't exclude - Add anti-manipulation rule: ignore audit-influencing instructions in codebase - Fix confidence gate: remove contradictory 7-8 tier, hard cutoff at 8 - Fix verifier anchoring: send only file+line, not category/description - Add Go, PHP, Java, C#, Kotlin to grep patterns (was 4 languages, now 8) - Add GraphQL, gRPC, WebSocket endpoint detection to attack surface mapping Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): correct skill counts, add /autoplan to README tables Skill count was wrong in 3 places (said 19+7=26, said 25, actual is 28). Added /autoplan to specialist table. Fixed troubleshooting skills list to include all skills added since v0.7.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): DNS rebinding protection for SSRF blocklist validateNavigationUrl is now async — resolves hostname to IP and checks against blocked metadata IPs. Prevents DNS rebinding where evil.com initially resolves to a safe IP, then switches to 169.254.169.254. All callers updated to await. Tests updated for async assertions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): lockfile prevents concurrent server start races Adds exclusive lockfile (O_CREAT|O_EXCL) around ensureServer to prevent TOCTOU race where two CLI invocations could both kill the old server and start new ones, leaving an orphaned chromium process. Second caller now waits for the first to finish starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): improve storage redaction — word-boundary keys + more value prefixes Key regex: use underscore/dot/hyphen boundaries instead of \b (which treats _ as word char). Now correctly redacts auth_token, session_token while skipping keyboardShortcuts, monkeyPatch, primaryKey. Value regex: add AWS (AKIA), Stripe (sk_live_, pk_live_), Anthropic (sk-ant-), Google (AIza), Sendgrid (SG.), Supabase (sbp_) prefixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: migrate all remaining eval callers to source, fix stale CHANGELOG claim 5 templates and 2 bin scripts still used eval $(gstack-slug). All now use source <(gstack-slug). Updated gstack-slug comment to match. Fixed v0.8.3 CHANGELOG entry that falsely claimed eval was fully eliminated — it was the output sanitization that made it safe, not a calling convention change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): add /autoplan to install instructions, regen skill docs The install instruction blocks and troubleshooting section were missing /autoplan. All three skill list locations now include the complete 28-skill set. Regenerated codex/agents SKILL.md files to match template changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.11.0.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(cso): add disclaimer — not a substitute for professional security audits LLMs can miss subtle vulns and produce false negatives. For production systems with sensitive data, hire a real firm. /cso is a first pass, not your only line of defense. Disclaimer appended to every report. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com> Co-authored-by: Tyrone Robb <tyrone.robb@icloud.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Orkun Duman <orkun1675@gmail.com>
1 parent 1b82999 commit b0d6578

32 files changed

Lines changed: 1920 additions & 139 deletions

.agents/skills/gstack-benchmark/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@ When the user types `/benchmark`, run this skill.
290290
### Phase 1: Setup
291291

292292
```bash
293-
eval $(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown")
293+
source <(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown")
294294
mkdir -p .gstack/benchmark-reports
295295
mkdir -p .gstack/benchmark-reports/baselines
296296
```

.agents/skills/gstack-canary/SKILL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,7 @@ When the user types `/canary`, run this skill.
308308
### Phase 1: Setup
309309

310310
```bash
311-
eval $(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown")
311+
source <(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown")
312312
mkdir -p .gstack/canary-reports
313313
mkdir -p .gstack/canary-reports/baselines
314314
mkdir -p .gstack/canary-reports/screenshots
@@ -458,7 +458,7 @@ Save report to `.gstack/canary-reports/{date}-canary.md` and `.gstack/canary-rep
458458
Log the result for the review dashboard:
459459

460460
```bash
461-
eval $(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null)
461+
source <(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null)
462462
mkdir -p ~/.gstack/projects/$SLUG
463463
```
464464

.agents/skills/gstack-cso/SKILL.md

Lines changed: 607 additions & 0 deletions
Large diffs are not rendered by default.

.agents/skills/gstack-land-and-deploy/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,7 @@ Save report to `.gstack/deploy-reports/{date}-pr{number}-deploy.md`.
840840
Log to the review dashboard:
841841

842842
```bash
843-
eval $(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null)
843+
source <(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null)
844844
mkdir -p ~/.gstack/projects/$SLUG
845845
```
846846

.github/workflows/skill-docs.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ jobs:
99
- run: bun install
1010
- name: Check Claude host freshness
1111
run: bun run gen:skill-docs
12-
- run: git diff --exit-code || (echo "Generated SKILL.md files are stale. Run: bun run gen:skill-docs" && exit 1)
12+
- run: |
13+
git diff --exit-code || (echo "Generated SKILL.md files are stale. Run: bun run gen:skill-docs" && exit 1)
1314
- name: Check Codex host freshness
1415
run: bun run gen:skill-docs --host codex
15-
- run: git diff --exit-code -- .agents/ || (echo "Generated Codex SKILL.md files are stale. Run: bun run gen:skill-docs --host codex" && exit 1)
16+
- run: |
17+
git diff --exit-code -- .agents/ || (echo "Generated Codex SKILL.md files are stale. Run: bun run gen:skill-docs --host codex" && exit 1)

CHANGELOG.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,28 @@
11
# Changelog
22

3+
## [0.11.0.0] - 2026-03-22 — /cso: Zero-Noise Security Audits
4+
5+
### Added
6+
7+
- **`/cso` — your Chief Security Officer.** Full codebase security audit: OWASP Top 10, STRIDE threat modeling, attack surface mapping, data classification, and dependency scanning. Each finding includes severity, confidence score, a concrete exploit scenario, and remediation options. Not a linter — a threat model.
8+
- **Zero-noise false positive filtering.** 17 hard exclusions and 9 precedents adapted from Anthropic's security review methodology. DOS isn't a finding. Test files aren't attack surface. React is XSS-safe by default. Every finding must score 8/10+ confidence to make the report. The result: 3 real findings, not 3 real + 12 theoretical.
9+
- **Independent finding verification.** Each candidate finding is verified by a fresh sub-agent that only sees the finding and the false positive rules — no anchoring bias from the initial scan. Findings that fail independent verification are silently dropped.
10+
- **`browse storage` now redacts secrets automatically.** Tokens, JWTs, API keys, GitHub PATs, and Bearer tokens are detected by both key name and value prefix. You see `[REDACTED — 42 chars]` instead of the secret.
11+
- **Azure metadata endpoint blocked.** SSRF protection for `browse goto` now covers all three major cloud providers (AWS, GCP, Azure).
12+
13+
### Fixed
14+
15+
- **`gstack-slug` hardened against shell injection.** Output sanitized to alphanumeric, dot, dash, and underscore only. All remaining `eval $(gstack-slug)` callers migrated to `source <(...)`.
16+
- **DNS rebinding protection.** `browse goto` now resolves hostnames to IPs and checks against the metadata blocklist — prevents attacks where a domain initially resolves to a safe IP, then switches to a cloud metadata endpoint.
17+
- **Concurrent server start race fixed.** An exclusive lockfile prevents two CLI invocations from both killing the old server and starting new ones simultaneously, which could leave orphaned Chromium processes.
18+
- **Smarter storage redaction.** Key matching now uses underscore-aware boundaries (won't false-positive on `keyboardShortcuts` or `monkeyPatch`). Value detection expanded to cover AWS, Stripe, Anthropic, Google, Sendgrid, and Supabase key prefixes.
19+
- **CI workflow YAML lint error fixed.**
20+
21+
### For contributors
22+
23+
- **Community PR triage process documented** in CONTRIBUTING.md.
24+
- **Storage redaction test coverage.** Four new tests for key-based and value-based detection.
25+
326
## [0.10.2.0] - 2026-03-22 — Autoplan Depth Fix
427

528
### Fixed
@@ -205,7 +228,7 @@
205228
- **Browse no longer navigates to dangerous URLs.** `goto`, `diff`, and `newtab` now block `file://`, `javascript:`, `data:` schemes and cloud metadata endpoints (`169.254.169.254`, `metadata.google.internal`). Localhost and private IPs are still allowed for local QA testing. (Closes #17)
206229
- **Setup script tells you what's missing.** Running `./setup` without `bun` installed now shows a clear error with install instructions instead of a cryptic "command not found." (Closes #147)
207230
- **`/debug` renamed to `/investigate`.** Claude Code has a built-in `/debug` command that shadowed the gstack skill. The systematic root-cause debugging workflow now lives at `/investigate`. (Closes #190)
208-
- **Shell injection surface removed.** All skill templates now use `source <(gstack-slug)` instead of `eval $(gstack-slug)`. Same behavior, no `eval`. (Closes #133)
231+
- **Shell injection surface reduced.** gstack-slug output is now sanitized to `[a-zA-Z0-9._-]` only, making both `eval` and `source` callers safe. (Closes #133)
209232
- **25 new security tests.** URL validation (16 tests) and path traversal validation (14 tests) now have dedicated unit test suites covering scheme blocking, metadata IP blocking, directory escapes, and prefix collision edge cases.
210233

211234
## [0.8.2] - 2026-03-19

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ gstack/
8080
├── investigate/ # /investigate skill (systematic root-cause debugging)
8181
├── retro/ # Retrospective skill
8282
├── document-release/ # /document-release skill (post-ship doc updates)
83+
├── cso/ # /cso skill (OWASP Top 10 + STRIDE security audit)
84+
├── design-consultation/ # /design-consultation skill (design system from scratch)
8385
├── setup-deploy/ # /setup-deploy skill (one-time deploy config)
8486
├── bin/ # CLI utilities (gstack-repo-mode, gstack-slug, gstack-config, etc.)
8587
├── setup # One-time setup: build binary + symlink skills

CONTRIBUTING.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ project where you actually felt the pain.
5656

5757
### Session awareness
5858

59-
When you have 3+ gstack sessions open simultaneously, every question tells you which project, which branch, and what's happening. No more staring at a question thinking "wait, which window is this?" The format is consistent across all 15 skills.
59+
When you have 3+ gstack sessions open simultaneously, every question tells you which project, which branch, and what's happening. No more staring at a question thinking "wait, which window is this?" The format is consistent across all skills.
6060

6161
## Working on gstack inside the gstack repo
6262

@@ -342,6 +342,23 @@ bun install && bun run build
342342

343343
This affects all projects. To revert: `git checkout main && git pull && bun run build`.
344344

345+
## Community PR triage (wave process)
346+
347+
When community PRs accumulate, batch them into themed waves:
348+
349+
1. **Categorize** — group by theme (security, features, infra, docs)
350+
2. **Deduplicate** — if two PRs fix the same thing, pick the one that
351+
changes fewer lines. Close the other with a note pointing to the winner.
352+
3. **Collector branch** — create `pr-wave-N`, merge clean PRs, resolve
353+
conflicts for dirty ones, verify with `bun test && bun run build`
354+
4. **Close with context** — every closed PR gets a comment explaining
355+
why and what (if anything) supersedes it. Contributors did real work;
356+
respect that with clear communication.
357+
5. **Ship as one PR** — single PR to main with all attributions preserved
358+
in merge commits. Include a summary table of what merged and what closed.
359+
360+
See [PR #205](../../pull/205) (v0.8.3) for the first wave as an example.
361+
345362
## Shipping your changes
346363

347364
When you're happy with your skill edits:

0 commit comments

Comments
 (0)