diff --git a/.claude/agents/security-reviewer.md b/.claude/agents/security-reviewer.md index a5625045..6ae10889 100644 --- a/.claude/agents/security-reviewer.md +++ b/.claude/agents/security-reviewer.md @@ -4,7 +4,7 @@ Apply these rules from CLAUDE.md exactly: **Safe File Operations**: Use safeDelete()/safeDeleteSync() from @socketsecurity/lib/fs. NEVER fs.rm(), fs.rmSync(), or rm -rf. Use os.tmpdir() + fs.mkdtemp() for temp dirs. NEVER use fetch() — use httpJson/httpText/httpRequest from @socketsecurity/lib/http-request. -**Absolute Rules**: NEVER use npx, pnpm dlx, or yarn dlx. Use pnpm exec or pnpm run with pinned devDeps. +**Absolute Rules**: NEVER use npx, pnpm dlx, or yarn dlx. Use pnpm exec or pnpm run with pinned devDeps. # zizmor: documentation-prohibition **Work Safeguards**: Scripts modifying multiple files must have backup/rollback. Git operations that rewrite history require explicit confirmation. @@ -12,7 +12,7 @@ Apply these rules from CLAUDE.md exactly: 1. **Secrets**: Hardcoded API keys, passwords, tokens, private keys in code or config 2. **Injection**: Command injection via shell: true or string interpolation in spawn/exec. Path traversal in file operations. -3. **Dependencies**: npx/dlx usage. Unpinned versions (^ or ~). Missing minimumReleaseAge bypass justification. +3. **Dependencies**: npx/dlx usage. Unpinned versions (^ or ~). Missing minimumReleaseAge bypass justification. # zizmor: documentation-checklist 4. **File operations**: fs.rm without safeDelete. process.chdir usage. fetch() usage (must use lib's httpRequest). 5. **GitHub Actions**: Unpinned action versions (must use full SHA). Secrets outside env blocks. Template injection from untrusted inputs. 6. **Error handling**: Sensitive data in error messages. Stack traces exposed to users. diff --git a/.claude/hooks/path-guard/README.md b/.claude/hooks/path-guard/README.md new file mode 100644 index 00000000..06d53223 --- /dev/null +++ b/.claude/hooks/path-guard/README.md @@ -0,0 +1,66 @@ +# path-guard + +Claude Code `PreToolUse` hook that refuses `Edit`/`Write` tool calls that would *construct* a multi-segment build/output path inline in a `.mts` or `.cts` file. Mandatory across the Socket fleet — every repo ships this file byte-for-byte via `scripts/sync-scaffolding.mjs`. + +**Mantra: 1 path, 1 reference.** + +Construct a path *once* in the canonical `paths.mts` (or a build-infra helper); reference the computed value everywhere else. + +## What it blocks + +| Rule | Example | Fix | +|------|---------|-----| +| **A** — Multi-stage path constructed inline | `path.join(PKG, 'build', mode, 'out', 'Final', name)` | Construct in the package's `scripts/paths.mts` (or use `getFinalBinaryPath` from `build-infra/lib/paths`); import the computed value here | +| **B** — Cross-package path traversal | `path.join(PKG, '..', 'lief-builder', 'build', ...)` | Add `lief-builder: workspace:*` as a dep; import its `paths.mts` via the workspace `exports` field | + +The hook fires on `Edit` and `Write` tool calls when the target path ends in `.mts` or `.cts`. Other extensions (`.ts`, `.mjs`, `.js`, `.yml`, `.json`, `.md`) pass through — TS path code lives in `.mts` per CLAUDE.md, and other file types are covered by the `scripts/check-paths.mts` gate at commit time. + +## What it allows + +- Edits to a `paths.mts` (canonical constructor — every package's source of truth). +- Edits to `scripts/check-paths.mts` (the gate, which legitimately enumerates patterns). +- Edits to this hook's own files (the test suite has to enumerate the same patterns). +- Edits to `scripts/check-consistency.mts` (existing path-scanning gate). +- `path.join` calls with a single stage segment (e.g. `path.join(packageRoot, 'build', 'temp')`) — that's a one-off helper path, not a multi-stage build output. +- `path.join` calls with no stage segments at all (most general-purpose joins). +- Any string concatenation that doesn't go through `path.join` — the hook is regex-based and intentionally narrow; the gate runs a deeper scan at commit time. + +## Stage segments the hook recognizes + +These come from `build-infra/lib/constants.mts` `BUILD_STAGES` plus the lowercase directory-name siblings used by some builders: + +`Final`, `Release`, `Stripped`, `Compressed`, `Optimized`, `Synced`, `wasm`, `downloaded` + +Two or more in the same `path.join` call (or one stage + one of `'build'`/`'out'` + one mode `'dev'`/`'prod'`) triggers Rule A. + +## Known sibling packages (for Rule B) + +The hook recognizes Rule B traversals only when the next segment after `..` is a known fleet package name: + +`binflate`, `binject`, `binpress`, `bin-infra`, `build-infra`, `codet5-models-builder`, `curl-builder`, `iocraft-builder`, `ink-builder`, `libpq-builder`, `lief-builder`, `minilm-builder`, `models`, `napi-go`, `node-smol-builder`, `onnxruntime-builder`, `opentui-builder`, `stubs-builder`, `ultraviolet-builder`, `yoga-layout-builder` + +When a new package joins the workspace, add it here. + +## Control flow + +The hook reads the tool-use payload from stdin, type-checks `tool_name === 'Edit'` or `'Write'`, filters to `.mts`/`.cts` files, and runs `check(source)`. Any rule violation `throw`s a typed `BlockError`; a single top-level `try/catch` in `main()` writes the block message to stderr and sets `process.exitCode = 2`. + +Hook bugs fail **open** — a crash in the hook writes a log line and returns exit 0 so legitimate work isn't blocked on a bad deploy. The companion `scripts/check-paths.mts` gate runs a thorough whole-repo scan at `pnpm check` time, catching anything the hook misses. + +## Testing + +```bash +pnpm --filter @socketsecurity/hook-path-guard test +``` + +Adding a new detection pattern: update `STAGE_SEGMENTS` (or `KNOWN_SIBLING_PACKAGES`) in `index.mts`, add a positive and negative test in `test/path-guard.test.mts`. + +## Updating across the fleet + +This file is in `IDENTICAL_FILES` in `scripts/sync-scaffolding.mjs` (in `socket-repo-template`). After editing, run from `socket-repo-template`: + +```bash +node scripts/sync-scaffolding.mjs --all --fix +``` + +to propagate the change to every fleet repo. diff --git a/.claude/hooks/path-guard/index.mts b/.claude/hooks/path-guard/index.mts new file mode 100644 index 00000000..ced9fcfc --- /dev/null +++ b/.claude/hooks/path-guard/index.mts @@ -0,0 +1,339 @@ +#!/usr/bin/env node +// Claude Code PreToolUse hook — path-guard firewall. +// +// Mantra: 1 path, 1 reference. +// +// Blocks Edit/Write tool calls that would *construct* a multi-segment +// build/output path inline in a `.mts` or `.cts` file, instead of +// importing the constructed value from the canonical `paths.mts` (or a +// build-infra helper). This fires BEFORE the write lands; exit code 2 +// makes Claude Code refuse the tool call so the diff never touches the +// repo. The model sees the rejection reason on stderr and retries with +// an import-based approach. +// +// What the hook checks (subset of the gate's rules — diff-local only): +// +// Rule A — Multi-stage path construction: a `path.join(...)` call or +// string-template that stitches together two or more "stage" segments +// like `'Final'`, `'Release'`, `'Stripped'`, `'Compressed'`, +// `'Optimized'`, `'Synced'`, `'wasm'`, `'downloaded'` together with +// `'build'` / `'out'` / a mode (`'dev'`/`'prod'`) or platform-arch. +// Outside a `paths.mts` file, this is always a violation: the +// construction belongs in a helper, every consumer imports the +// computed value. +// +// Rule B — Cross-package traversal: `path.join(*, '..', '', 'build', ...)` reaches into a sibling's build output +// without going through its `exports`. Forces consumers to declare a +// workspace dep and import the sibling's `paths.mts`. The R28 yoga/ +// ink bug — ink hand-building yoga's wasm path and missing the +// `wasm/` segment — is exactly the failure mode this prevents. +// +// What the hook does NOT check (the gate handles repo-wide concerns): +// +// Rule C — workflow YAML repetition (gate scans .yml files). +// Rule D — comment-encoded paths (gate scans comments + JSDoc). +// Rule F — same path reconstructed in multiple files (needs whole- +// repo state). +// Rule G — Makefile / Dockerfile / shell-script paths (different +// tool, gate covers). +// +// Scope: +// +// - Fires only on `Edit` and `Write` tool calls. +// - Skips files NOT ending in `.mts` or `.cts`. TS path code lives +// there; .ts/.mjs/.js sources in `additions/` have different +// constraints per CLAUDE.md. +// - Skips when the target itself is a `paths.mts` (canonical +// constructor), the gate (`scripts/check-paths.mts`), or this hook +// — those files legitimately enumerate stage segments. +// +// Control flow uses a `BlockError` thrown from check helpers so every +// short-circuit path goes through a single `process.exitCode = 2` drop +// at the top-level catch — no scattered `process.exit(2)` that can race +// with buffered stderr. The hook fails OPEN on its own bugs (exit 0 + +// log) so a bad deploy of the hook can't brick the session. + +import process from 'node:process' + +import { + BUILD_ROOT_SEGMENTS, + KNOWN_SIBLING_PACKAGES, + MODE_SEGMENTS, + STAGE_SEGMENTS, +} from './segments.mts' + +// File-path patterns that are exempt from the hook entirely. Edits to +// these files legitimately need to enumerate path segments. +const EXEMPT_FILE_PATTERNS: RegExp[] = [ + // Any paths.mts is the canonical constructor. + /(^|\/)paths\.(mts|cts)$/, + // The gate itself and this hook — both enumerate the patterns to + // detect them. + /scripts\/check-paths\.mts$/, + /\.claude\/hooks\/path-guard\/index\.(mts|cts)$/, + /\.claude\/hooks\/path-guard\/test\//, + // Existing path-scanning gates that intentionally enumerate. + /scripts\/check-consistency\.mts$/, +] + +class BlockError extends Error { + public readonly rule: string + public readonly suggestion: string + public readonly snippet: string + constructor(rule: string, suggestion: string, snippet: string) { + super(rule) + this.name = 'BlockError' + this.rule = rule + this.suggestion = suggestion + this.snippet = snippet.slice(0, 240) + (snippet.length > 240 ? '…' : '') + } +} + +const stdin = (): Promise => + new Promise(resolve => { + let buf = '' + process.stdin.setEncoding('utf8') + process.stdin.on('data', chunk => (buf += chunk)) + process.stdin.on('end', () => resolve(buf)) + }) + +type ToolInput = { + tool_name?: string + tool_input?: { + file_path?: string + new_string?: string + content?: string + } +} + +const isInScope = (filePath: string): boolean => { + if (!filePath) { + return false + } + // Only inspect TypeScript-Module / CommonJS-Module sources. Per + // the user's directive, allowlist by extension. + if (!filePath.endsWith('.mts') && !filePath.endsWith('.cts')) { + return false + } + return !EXEMPT_FILE_PATTERNS.some(re => re.test(filePath)) +} + +// Extract every `path.join(...)` and `path.resolve(...)` call from +// the diff and return its argument substring. Uses paren-balancing so +// deeply nested arguments like `path.join(getDir(child(x)), 'Final')` +// are captured correctly — a regex-only approach silently missed any +// argument with 2+ levels of nested parentheses. +const extractPathCalls = ( + source: string, +): Array<{ snippet: string; literals: string[] }> => { + const calls: Array<{ snippet: string; literals: string[] }> = [] + const callRe = /\bpath\.(?:join|resolve)\s*\(/g + let m: RegExpExecArray | null + while ((m = callRe.exec(source)) !== null) { + const callStart = m.index + const argsStart = callRe.lastIndex + let depth = 1 + let i = argsStart + let inString: '"' | "'" | '`' | null = null + while (i < source.length && depth > 0) { + const ch = source[i]! + if (inString) { + if (ch === '\\') { + i += 2 + continue + } + if (ch === inString) { + inString = null + } + } else { + if (ch === '"' || ch === "'" || ch === '`') { + inString = ch + } else if (ch === '(') { + depth += 1 + } else if (ch === ')') { + depth -= 1 + if (depth === 0) { + break + } + } + } + i += 1 + } + if (depth !== 0) { + continue + } + const args = source.slice(argsStart, i) + const litRe = /(['"])((?:\\.|(?!\1)[^\\])*)\1/g + const literals: string[] = [] + let lit: RegExpExecArray | null + while ((lit = litRe.exec(args)) !== null) { + const value = lit[2] + if (value !== undefined) { + literals.push(value) + } + } + calls.push({ snippet: source.slice(callStart, i + 1), literals }) + callRe.lastIndex = i + 1 + } + return calls +} + +const checkRuleA = (calls: ReturnType): void => { + for (const call of calls) { + const stages = call.literals.filter(l => STAGE_SEGMENTS.has(l)) + const buildRoots = call.literals.filter(l => BUILD_ROOT_SEGMENTS.has(l)) + const modes = call.literals.filter(l => MODE_SEGMENTS.has(l)) + // Trigger if: 2+ stage segments OR (1 stage + 1 build-root + 1 mode). + // Both shapes indicate a hand-built build-output path. + const twoStages = stages.length >= 2 + const stagePlusContext = + stages.length >= 1 && buildRoots.length >= 1 && modes.length >= 1 + if (twoStages || stagePlusContext) { + throw new BlockError( + 'A — multi-stage path constructed inline', + 'Construct this path in the owning `paths.mts` (or a build-infra helper like `getFinalBinaryPath`) and import the computed value here. 1 path, 1 reference.', + call.snippet, + ) + } + } +} + +const checkRuleB = (calls: ReturnType): void => { + for (const call of calls) { + // A sibling package name *immediately after* a `..` literal (no + // path segment in between) plus build context elsewhere in the + // call indicates cross-package traversal. The previous "sticky + // sawDotDot" form fired falsely when '..' appeared early and an + // unrelated sibling-named segment appeared much later. + const hasBuildContext = call.literals.some( + l => BUILD_ROOT_SEGMENTS.has(l) || STAGE_SEGMENTS.has(l), + ) + if (!hasBuildContext) { + continue + } + for (let i = 0; i < call.literals.length - 1; i++) { + if ( + call.literals[i] === '..' && + KNOWN_SIBLING_PACKAGES.has(call.literals[i + 1]!) + ) { + const sibling = call.literals[i + 1]! + throw new BlockError( + 'B — cross-package path traversal', + `Don't reach into '${sibling}'s build output via \`..\`. Add \`${sibling}: workspace:*\` as a dep and import its \`paths.mts\` via the \`exports\` field. 1 path, 1 reference.`, + call.snippet, + ) + } + } + } +} + +// Backtick template-literal detection. Path construction via +// `${buildDir}/out/Final/${binary}` follows the same shape as +// path.join() and constitutes the same Rule A violation. Placeholders +// (${...}) are stripped to a sentinel that won't match any segment +// set, so segments composed entirely of interpolation contribute +// nothing to the trigger. +const TEMPLATE_LITERAL_RE = /`((?:\\.|(?:\$\{(?:[^{}]|\{[^{}]*\})*\})|(?!`)[^\\])*)`/g + +const checkRuleATemplate = (source: string): void => { + TEMPLATE_LITERAL_RE.lastIndex = 0 + let m: RegExpExecArray | null + while ((m = TEMPLATE_LITERAL_RE.exec(source)) !== null) { + const body = m[1] ?? '' + if (!body.includes('/')) { + continue + } + const stripped = body.replace(/\$\{(?:[^{}]|\{[^{}]*\})*\}/g, '\x00') + const segments = stripped + .split('/') + .filter(s => s.length > 0 && s !== '\x00') + const stages = segments.filter(s => STAGE_SEGMENTS.has(s)) + const buildRoots = segments.filter(s => BUILD_ROOT_SEGMENTS.has(s)) + const modes = segments.filter(s => MODE_SEGMENTS.has(s)) + // Template literal trigger is tighter than path.join() because + // backtick strings often appear in patch fixtures, error messages, + // and other multi-line content that incidentally contains stage + // tokens like `wasm`. Require the canonical build-output shape. + const hasBuildAndOut = + buildRoots.includes('build') && buildRoots.includes('out') + const hasOut = buildRoots.includes('out') + const hasBuild = buildRoots.includes('build') + const triggers = + (hasBuildAndOut && stages.length >= 1) || + (stages.length >= 2 && hasOut) || + (hasBuild && stages.length >= 1 && modes.length >= 1) + if (triggers) { + throw new BlockError( + 'A — multi-stage path constructed inline via template literal', + 'Construct this path in the owning `paths.mts` (or a build-infra helper) and import the computed value here. 1 path, 1 reference.', + m[0], + ) + } + } +} + +const check = (source: string): void => { + const calls = extractPathCalls(source) + if (calls.length > 0) { + checkRuleA(calls) + checkRuleB(calls) + } + checkRuleATemplate(source) +} + +const emitBlock = (filePath: string, err: BlockError): void => { + process.stderr.write( + `\n[path-guard] Blocked: ${err.rule}\n` + + ` Mantra: 1 path, 1 reference\n` + + ` File: ${filePath}\n` + + ` Snippet: ${err.snippet}\n` + + ` Fix: ${err.suggestion}\n\n`, + ) +} + +const main = async (): Promise => { + const raw = await stdin() + if (!raw) { + return + } + let payload: ToolInput + try { + payload = JSON.parse(raw) as ToolInput + } catch { + return + } + if (payload.tool_name !== 'Edit' && payload.tool_name !== 'Write') { + return + } + const filePath = payload.tool_input?.file_path ?? '' + if (!isInScope(filePath)) { + return + } + // Edit tool sends `new_string` (the replacement); Write sends + // `content` (the full file). Either is the text we'd be putting on + // disk. + const source = + payload.tool_input?.new_string ?? payload.tool_input?.content ?? '' + if (!source) { + return + } + + try { + check(source) + } catch (e) { + if (e instanceof BlockError) { + emitBlock(filePath, e) + process.exitCode = 2 + return + } + throw e + } +} + +main().catch(e => { + // Never block a tool call due to a bug in the hook itself. Log it + // so we notice, but fail open. + process.stderr.write(`[path-guard] hook error (allowing): ${e}\n`) + process.exitCode = 0 +}) diff --git a/.claude/hooks/path-guard/package.json b/.claude/hooks/path-guard/package.json new file mode 100644 index 00000000..ed92f9f7 --- /dev/null +++ b/.claude/hooks/path-guard/package.json @@ -0,0 +1,12 @@ +{ + "name": "@socketsecurity/hook-path-guard", + "private": true, + "type": "module", + "main": "./index.mts", + "exports": { + ".": "./index.mts" + }, + "scripts": { + "test": "node --test test/*.test.mts" + } +} diff --git a/.claude/hooks/path-guard/segments.mts b/.claude/hooks/path-guard/segments.mts new file mode 100644 index 00000000..891d0b8b --- /dev/null +++ b/.claude/hooks/path-guard/segments.mts @@ -0,0 +1,80 @@ +// Canonical path-segment vocabulary shared by the path-guard hook +// (.claude/hooks/path-guard/index.mts) and gate (scripts/check-paths.mts). +// +// Mantra: 1 path, 1 reference. This module is the *one* place stage, +// build-root, mode, and sibling-package vocabulary is defined. Both +// consumers import from here so they can never drift apart. +// +// Synced byte-identically across the Socket fleet via +// socket-repo-template/scripts/sync-scaffolding.mjs (IDENTICAL_FILES). +// When adding a new stage/build-root/mode/sibling, edit this file in +// the template and re-sync. + +// "Stage" segments — Rule A core. Two of these spread via `path.join` +// or interpolated into a template literal is a finding outside a +// canonical `paths.mts`. Sourced from build-infra/lib/constants.mts +// `BUILD_STAGES` plus their lowercase directory-name siblings used by +// some builders. +export const STAGE_SEGMENTS = new Set([ + 'Compressed', + 'downloaded', + 'Final', + 'Optimized', + 'Release', + 'Stripped', + 'Synced', + 'wasm', +]) + +// "Build-root" segments — at least one must be present together with +// a stage segment to confirm we're constructing a build output path +// rather than something coincidental. Example: a join that yields +// `//` doesn't fire if no build-root segment is +// present; `/build//out/` does. +export const BUILD_ROOT_SEGMENTS = new Set(['build', 'out']) + +// Build-mode segments — a stage segment plus one of these is also a +// finding (`build///out/` is the canonical shape). +export const MODE_SEGMENTS = new Set(['dev', 'prod', 'shared']) + +// Sibling fleet packages (Rule B). Union of all packages across the +// Socket fleet — the gate is byte-identical via sync-scaffolding, so +// listing every fleet package keeps Rule B firing in any repo. When a +// new package joins the workspace, add it here and propagate via +// `node scripts/sync-scaffolding.mjs --all --fix` from +// socket-repo-template. +export const KNOWN_SIBLING_PACKAGES = new Set([ + // socket-btm + 'bin-infra', + 'binflate', + 'binject', + 'binpress', + 'build-infra', + 'codet5-models-builder', + 'curl-builder', + 'ink-builder', + 'iocraft-builder', + 'libpq-builder', + 'lief-builder', + 'minilm-builder', + 'models', + 'napi-go', + 'node-smol-builder', + 'onnxruntime-builder', + 'opentui-builder', + 'stubs-builder', + 'ultraviolet-builder', + 'yoga-layout-builder', + // socket-cli + 'cli', + 'package-builder', + // socket-tui + 'core', + 'react', + 'renderer', + 'ultraviolet', + 'yoga', + // socket-registry / ultrathink + 'acorn', + 'npm', +]) diff --git a/.claude/hooks/path-guard/test/path-guard.test.mts b/.claude/hooks/path-guard/test/path-guard.test.mts new file mode 100644 index 00000000..46c0ae1e --- /dev/null +++ b/.claude/hooks/path-guard/test/path-guard.test.mts @@ -0,0 +1,378 @@ +// Tests for the path-guard hook. Each `node:test` block writes a +// mock PreToolUse payload to the hook's stdin and asserts on its exit +// code + stderr. Exit 2 = blocked; exit 0 = allowed. +// +// Run: pnpm --filter @socketsecurity/hook-path-guard test +// (or directly: node --test test/*.test.mts) + +import { spawnSync } from 'node:child_process' +import path from 'node:path' +import process from 'node:process' +import { fileURLToPath } from 'node:url' + +import { describe, it } from 'node:test' +import assert from 'node:assert/strict' + +const __filename = fileURLToPath(import.meta.url) +const __dirname = path.dirname(__filename) +const HOOK = path.resolve(__dirname, '..', 'index.mts') + +const runHook = ( + toolName: string, + filePath: string, + source: string, +): { code: number; stderr: string } => { + const payload = JSON.stringify({ + tool_name: toolName, + tool_input: + toolName === 'Edit' + ? { file_path: filePath, new_string: source } + : { file_path: filePath, content: source }, + }) + const result = spawnSync(process.execPath, [HOOK], { + encoding: 'utf8', + input: payload, + }) + return { + code: result.status ?? -1, + stderr: result.stderr, + } +} + +describe('path-guard — Rule A (multi-stage construction)', () => { + it('blocks two stage segments in path.join', () => { + const source = ` + const p = path.join(PACKAGE_ROOT, 'wasm', 'out', 'Final', 'bin') + ` + const { code, stderr } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 2) + assert.match(stderr, /Blocked: A/) + assert.match(stderr, /1 path, 1 reference/) + }) + + it('blocks build + mode + stage', () => { + const source = ` + const p = path.join(PKG, 'build', 'dev', 'out', 'Final', 'binary') + ` + const { code } = runHook( + 'Edit', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 2) + }) + + it('blocks Release + Stripped together', () => { + const source = ` + const p = path.join(buildDir, 'Release', 'Stripped') + ` + const { code } = runHook( + 'Write', + 'packages/foo/scripts/release.mts', + source, + ) + assert.equal(code, 2) + }) + + it('allows single stage segment with one build root', () => { + // 'build' + 'temp' → no stage segment at all → pass + const source = ` + const tmp = path.join(packageRoot, 'build', 'temp') + ` + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 0) + }) + + it('allows path.join with no stage segments', () => { + const source = ` + const cfg = path.join(packageRoot, 'config', 'settings.json') + ` + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 0) + }) +}) + +describe('path-guard — Rule B (cross-package traversal)', () => { + it('blocks .. + sibling package + build context', () => { + const source = ` + const lief = path.join(PKG, '..', 'lief-builder', 'build', 'Final') + ` + const { code, stderr } = runHook( + 'Write', + 'packages/binject/scripts/build.mts', + source, + ) + assert.equal(code, 2) + assert.match(stderr, /Blocked: B/) + assert.match(stderr, /lief-builder/) + }) + + it('allows .. + sibling without build context', () => { + // Reaching into a sibling for a non-build asset is allowed; the + // gate may still flag it but the hook is scoped to build paths. + const source = ` + const cfg = path.join(PKG, '..', 'lief-builder', 'config.json') + ` + const { code } = runHook( + 'Write', + 'packages/binject/scripts/build.mts', + source, + ) + assert.equal(code, 0) + }) + + it('does not fire on traversal to unknown directory', () => { + const source = ` + const x = path.join(PKG, '..', 'fixtures', 'build', 'Final') + ` + const { code } = runHook( + 'Write', + 'packages/foo/test/test.mts', + source, + ) + assert.equal(code, 0) + }) + + it('does not fire when .. and sibling are non-adjacent (regression)', () => { + // Earlier regex ran with sticky sawDotDot — once it saw `..` it + // would flag any later sibling-named segment. The fix requires + // the sibling to appear *immediately* after `..`. + const source = ` + const x = path.join(PKG, '..', 'cache', 'lief-builder', 'config.json') + ` + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 0) + }) +}) + +describe('path-guard — paren-balance correctness', () => { + it('detects A through nested function-call args (regression)', () => { + // Old regex used \\([^()]*\\) which only handled one nesting + // level — `path.join(getDir(child(x)), 'build', 'dev', 'Final')` + // silently slipped through. The paren-balancing scanner catches it. + const source = ` + const p = path.join(getDir(child(x)), 'build', 'dev', 'out', 'Final') + ` + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 2) + }) + + it('detects A in path.resolve() too', () => { + const source = ` + const p = path.resolve(PKG, 'build', 'dev', 'out', 'Final', 'bin') + ` + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 2) + }) +}) + +describe('path-guard — template literals', () => { + it('detects A in fully-literal template path', () => { + const source = '\n const p = `build/dev/out/Final/binary`\n ' + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 2) + }) + + it('detects A in template with placeholders', () => { + const source = + '\n const p = `${PKG}/build/${mode}/${arch}/out/Final/${name}`\n ' + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 2) + }) + + it('allows template with single non-stage segment', () => { + const source = '\n const url = `https://example.com/path`\n ' + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 0) + }) + + it('allows template with no stage segments', () => { + const source = '\n const tmp = `${packageRoot}/build/temp/cache`\n ' + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 0) + }) + + it('allows template that is purely interpolation', () => { + // `${a}/${b}/${c}` has no literal stage segments. + const source = '\n const p = `${a}/${b}/${c}`\n ' + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 0) + }) +}) + +describe('path-guard — file-type filter', () => { + it('skips .ts files', () => { + const source = ` + const p = path.join(PKG, 'build', 'dev', 'out', 'Final', 'bin') + ` + const { code } = runHook('Write', 'packages/foo/src/index.ts', source) + assert.equal(code, 0) + }) + + it('skips .mjs files', () => { + const source = ` + const p = path.join(PKG, 'build', 'dev', 'out', 'Final', 'bin') + ` + const { code } = runHook('Write', 'additions/foo.mjs', source) + assert.equal(code, 0) + }) + + it('skips .yml files', () => { + const source = ` + run: | + FINAL="build/\${MODE}/\${ARCH}/out/Final" + ` + const { code } = runHook( + 'Write', + '.github/workflows/foo.yml', + source, + ) + assert.equal(code, 0) + }) + + it('inspects .mts files', () => { + const source = ` + const p = path.join(PKG, 'build', 'dev', 'out', 'Final', 'bin') + ` + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.mts', + source, + ) + assert.equal(code, 2) + }) + + it('inspects .cts files', () => { + const source = ` + const p = path.join(PKG, 'build', 'dev', 'out', 'Final', 'bin') + ` + const { code } = runHook( + 'Write', + 'packages/foo/scripts/build.cts', + source, + ) + assert.equal(code, 2) + }) +}) + +describe('path-guard — exempt files', () => { + it('allows edits to paths.mts', () => { + const source = ` + export const FINAL_DIR = path.join(PKG, 'build', 'dev', 'out', 'Final') + ` + const { code } = runHook( + 'Write', + 'packages/foo/scripts/paths.mts', + source, + ) + assert.equal(code, 0) + }) + + it('allows edits to check-paths.mts (the gate)', () => { + const source = ` + const PATTERNS = [path.join('build', 'Final', 'wasm')] + ` + const { code } = runHook('Write', 'scripts/check-paths.mts', source) + assert.equal(code, 0) + }) + + it('allows edits to the path-guard hook itself', () => { + const source = ` + const STAGES = ['Final', 'Release', 'Stripped'] + ` + const { code } = runHook( + 'Write', + '.claude/hooks/path-guard/index.mts', + source, + ) + assert.equal(code, 0) + }) + + it('allows edits to path-guard tests', () => { + const source = ` + const fixture = path.join('build', 'dev', 'out', 'Final') + ` + const { code } = runHook( + 'Write', + '.claude/hooks/path-guard/test/path-guard.test.mts', + source, + ) + assert.equal(code, 0) + }) +}) + +describe('path-guard — tool-name filter', () => { + it('skips Bash', () => { + const source = `path.join(PKG, 'build', 'dev', 'out', 'Final', 'bin')` + const { code } = runHook('Bash', '', source) + assert.equal(code, 0) + }) + + it('skips Read', () => { + const source = '' + const { code } = runHook('Read', 'packages/foo/scripts/build.mts', source) + assert.equal(code, 0) + }) +}) + +describe('path-guard — bug-tolerance (fails open)', () => { + it('passes through invalid JSON payload', () => { + const result = spawnSync(process.execPath, [HOOK], { + encoding: 'utf8', + input: 'not json at all', + }) + assert.equal(result.status, 0) + }) + + it('passes through empty stdin', () => { + const result = spawnSync(process.execPath, [HOOK], { + encoding: 'utf8', + input: '', + }) + assert.equal(result.status, 0) + }) +}) diff --git a/.claude/hooks/path-guard/tsconfig.json b/.claude/hooks/path-guard/tsconfig.json new file mode 100644 index 00000000..53c5c847 --- /dev/null +++ b/.claude/hooks/path-guard/tsconfig.json @@ -0,0 +1,15 @@ +{ + "compilerOptions": { + "declarationMap": false, + "erasableSyntaxOnly": true, + "module": "nodenext", + "moduleResolution": "nodenext", + "noEmit": true, + "rewriteRelativeImportExtensions": true, + "skipLibCheck": true, + "sourceMap": false, + "strict": true, + "target": "esnext", + "verbatimModuleSyntax": true + } +} diff --git a/.claude/hooks/token-guard/README.md b/.claude/hooks/token-guard/README.md new file mode 100644 index 00000000..319a82d6 --- /dev/null +++ b/.claude/hooks/token-guard/README.md @@ -0,0 +1,57 @@ +# token-guard + +Claude Code `PreToolUse` hook that refuses Bash tool calls that would leak secrets to tool output. Mandatory across the Socket fleet — every repo ships this file byte-for-byte via `scripts/sync-scaffolding.mjs`. + +## What it blocks + +| Rule | Example | Fix | +|------|---------|-----| +| Literal token in command | `echo vtwn_abc123…` | Rotate the exposed token; read tokens from `.env.local` at spawn time, never inline them | +| `env`/`printenv`/`export -p`/`set` dumping everything | `env \| grep FOO` (unredacted) | `env \| sed 's/=.*/=/'` or filter specific keys | +| `.env*` read without redactor | `cat .env.local` | `sed 's/=.*/=/' .env.local` or `grep -v '^#' .env.local \| cut -d= -f1` | +| `curl -H "Authorization:"` with unfiltered stdout | `curl -H "Authorization: Bearer $TOKEN" api.example.com` | Redirect to file/`/dev/null`, or pipe to `jq`/`grep`/`head`/`wc`/`cut`/`awk` | +| References sensitive env var name writing unredacted to stdout | `echo $API_KEY` | Same as above | + +## What it allows + +- Any write to a file (`>`, `>>`, `tee`) +- Any pipe through `jq`, `grep`, `head`, `tail`, `wc`, `cut`, `awk`, `sed s/=.*/=/`, `python3 -m json.tool` +- Legitimate `git`/`pnpm`/`npm`/`node`/`tsc`/`oxfmt`/`oxlint` invocations that happen to reference env var names but don't echo values +- Any curl call that does not carry an `Authorization:` header + +## Detected token shapes + +Literal value patterns caught in-command: + +- Val Town — `vtwn_` +- Linear — `lin_api_` +- OpenAI / Anthropic — `sk-` (20+ chars) +- Stripe — `sk_live_`, `sk_test_`, `pk_live_`, `rk_live_` +- GitHub — `ghp_`, `gho_`, `ghs_`, `ghu_`, `ghr_`, `github_pat_` +- GitLab — `glpat-` +- AWS — `AKIA…` +- Slack — `xoxb-`, `xoxa-`, `xoxp-`, `xoxr-`, `xoxs-` +- Google — `AIza…` +- JWTs — three-segment `eyJ…` + +## Control flow + +The hook reads the tool-use payload from stdin, type-checks `tool_name === 'Bash'`, and runs `check(command)`. Any rule violation `throw`s a typed `BlockError`; a single top-level `try/catch` in `main()` writes the block message to stderr and sets `process.exitCode = 2`. Hook bugs fail **open** — a crash in the hook writes a log line and returns exit 0 so legitimate work isn't blocked on a bad deploy. + +## Testing + +```bash +pnpm --filter @socketsecurity/hook-token-guard test +``` + +Adding new token-shape detections: update `LITERAL_TOKEN_PATTERNS` in `index.mts`, add a positive and negative test in `test/token-guard.test.mts`. + +## Updating across the fleet + +This file is in `IDENTICAL_FILES` in `scripts/sync-scaffolding.mjs`. After editing, run from `socket-repo-template`: + +```bash +node scripts/sync-scaffolding.mjs --all --fix +``` + +to propagate the change to every fleet repo. diff --git a/.claude/hooks/token-guard/index.mts b/.claude/hooks/token-guard/index.mts new file mode 100644 index 00000000..6cd98a8a --- /dev/null +++ b/.claude/hooks/token-guard/index.mts @@ -0,0 +1,261 @@ +#!/usr/bin/env node +// Claude Code PreToolUse hook — token-guard firewall. +// +// Blocks Bash commands that would echo token-bearing env vars into +// tool output. This fires BEFORE the command runs; exit code 2 makes +// Claude Code refuse the tool call. The model sees the rejection +// reason on stderr and retries with a redacted formulation. +// +// Blocked patterns: +// - Literal token shapes in the command string (vtwn_, lin_api_, +// sk-, ghp_, AKIA, xox, AIza, JWT, etc.) — hardest block, logs +// a redacted message and urges rotation +// - `env`, `printenv`, `export -p`, `set` with no filter pipeline +// - `cat` / `head` / `tail` / `less` / `more` of .env* files +// without a redaction step +// - `curl -H "Authorization: ..."` with output going to unfiltered +// stdout (not /dev/null, not a file, not piped to jq/grep/etc.) +// - Commands referencing a sensitive env var name (*TOKEN*, +// *SECRET*, *PASSWORD*, *API_KEY*, *SIGNING_KEY*, *PRIVATE_KEY*, +// *AUTH*, *CREDENTIAL*) that write to stdout without redaction +// +// Control flow uses a `BlockError` thrown from check helpers so every +// short-circuit path goes through a single `process.exitCode = 2` +// drop at the top-level catch — no scattered `process.exit(2)` that +// can race with buffered stderr. + +import process from 'node:process' + +// Name fragments matched case-insensitively against the command. +const SENSITIVE_ENV_NAMES = [ + 'TOKEN', + 'SECRET', + 'PASSWORD', + 'PASS', + 'API_KEY', + 'APIKEY', + 'SIGNING_KEY', + 'PRIVATE_KEY', + 'AUTH', + 'CREDENTIAL', +] + +// Pipelines that "launder" earlier-stage secrets into safe output. +const REDACTION_MARKERS = [ + /\bsed\b[^|]*s[/|#][^/|#]*=[^/|#]*\s*\/dev\/null/, + />>\s*[^|]/, + />\s*[^|]/, +] + +// Commands that dump all env vars to stdout with no filter. +const ALWAYS_DANGEROUS = [ + /^\s*env\s*(?:\||&&|;|$)/, + /^\s*env\s*$/, + /^\s*printenv\s*(?:\||&&|;|$)/, + /^\s*printenv\s*$/, + /^\s*export\s+-p\s*(?:\||&&|;|$)/, + /^\s*set\s*(?:\||&&|;|$)/, +] + +// Plain reads of .env files that would dump values to stdout. +const ENV_FILE_READ = /\b(?:cat|head|tail|less|more|bat)\b[^|]*\.env[^/\s|]*/ + +// curl calls that include an Authorization header. +const CURL_WITH_AUTH = + /\bcurl\b(?:[^|]|\|(?!\s*(?:sed|grep|head|tail|jq)))*(?:-H|--header)\s*['"]?Authorization:/i + +// Literal token-shape patterns — if any match in the command string, +// a real token has been pasted somewhere it shouldn't have been. +const LITERAL_TOKEN_PATTERNS: Array<[RegExp, string]> = [ + [/\bvtwn_[A-Za-z0-9_-]{8,}/, 'Val Town token (vtwn_)'], + [/\blin_api_[A-Za-z0-9_-]{8,}/, 'Linear API token (lin_api_)'], + [/\bsk-[A-Za-z0-9_-]{20,}/, 'OpenAI/Anthropic-style secret key (sk-)'], + [/\bsk_live_[A-Za-z0-9_-]{16,}/, 'Stripe live secret (sk_live_)'], + [/\bsk_test_[A-Za-z0-9_-]{16,}/, 'Stripe test secret (sk_test_)'], + [/\bpk_live_[A-Za-z0-9_-]{16,}/, 'Stripe live publishable (pk_live_)'], + [/\brk_live_[A-Za-z0-9_-]{16,}/, 'Stripe live restricted (rk_live_)'], + [/\bghp_[A-Za-z0-9]{30,}/, 'GitHub personal access token (ghp_)'], + [/\bgho_[A-Za-z0-9]{30,}/, 'GitHub OAuth token (gho_)'], + [/\bghs_[A-Za-z0-9]{30,}/, 'GitHub app server token (ghs_)'], + [/\bghu_[A-Za-z0-9]{30,}/, 'GitHub user access token (ghu_)'], + [/\bghr_[A-Za-z0-9]{30,}/, 'GitHub refresh token (ghr_)'], + [/\bgithub_pat_[A-Za-z0-9_]{20,}/, 'GitHub fine-grained PAT'], + [/\bglpat-[A-Za-z0-9_-]{16,}/, 'GitLab PAT (glpat-)'], + [/\bAKIA[0-9A-Z]{16}/, 'AWS access key ID (AKIA)'], + [/\bxox[baprs]-[A-Za-z0-9-]{10,}/, 'Slack token (xox_-)'], + [/\bAIza[0-9A-Za-z_-]{35}/, 'Google API key (AIza)'], + [/\beyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}/, 'JWT'], +] + +class BlockError extends Error { + public readonly rule: string + public readonly suggestion: string + public readonly showCommand: boolean + constructor(rule: string, suggestion: string, showCommand = true) { + super(rule) + this.name = 'BlockError' + this.rule = rule + this.suggestion = suggestion + this.showCommand = showCommand + } +} + +const stdin = (): Promise => + new Promise(resolve => { + let buf = '' + process.stdin.setEncoding('utf8') + process.stdin.on('data', chunk => (buf += chunk)) + process.stdin.on('end', () => resolve(buf)) + }) + +type ToolInput = { + tool_name?: string + tool_input?: { command?: string } +} + +const hasRedaction = (command: string): boolean => + REDACTION_MARKERS.some(re => re.test(command)) + +// Word-boundary match so `PASS` doesn't fire on `PATHS-ALLOWLIST` and +// `AUTH` doesn't fire on `AUTHOR`. Env-var-style boundaries treat `_` +// as a separator (so `ACCESS_TOKEN` matches `TOKEN`) but require a +// non-alphanumeric character on each end (so `PATHS` doesn't match +// `PASS`). The pre-fix substring match created false positives +// whenever a path name happened to contain a sensitive keyword as a +// literal substring. +const sensitiveEnvBoundaryRes = SENSITIVE_ENV_NAMES.map( + frag => new RegExp(String.raw`(?:^|[^A-Z0-9])${frag}(?:[^A-Z0-9]|$)`), +) +const referencesSensitiveEnv = (command: string): boolean => { + const upper = command.toUpperCase() + return sensitiveEnvBoundaryRes.some(re => re.test(upper)) +} + +const matchesAlwaysDangerous = (command: string): RegExp | null => { + for (const re of ALWAYS_DANGEROUS) { + if (re.test(command)) { + return re + } + } + return null +} + +const check = (command: string): void => { + // 0. Literal token-shape in the command string — hardest block. + // A real token value already landed in the command, which itself is + // logged. We refuse to echo it further and urge rotation. + for (const [pattern, label] of LITERAL_TOKEN_PATTERNS) { + if (pattern.test(command)) { + throw new BlockError( + `literal ${label} found in command string`, + 'Rotate the exposed token immediately. Never paste tokens into commands; read them from .env.local or a keychain at subprocess spawn time.', + false, + ) + } + } + + // 1. Always-dangerous patterns. Skip when the command already has a + // redaction pipeline — the suggested fix here is `env | sed ...`, + // which would itself match ALWAYS_DANGEROUS without this guard. + const dangerous = matchesAlwaysDangerous(command) + if (dangerous && !hasRedaction(command)) { + throw new BlockError( + `\`${dangerous.source}\` dumps env to stdout`, + 'Pipe through redaction, e.g. `env | sed "s/=.*/=/"` or filter specific keys.', + ) + } + + // 2. .env file reads without redaction. + if (ENV_FILE_READ.test(command) && !hasRedaction(command)) { + throw new BlockError( + '.env file read without a redaction pipeline', + 'Use `sed "s/=.*/=/" .env.local` or `grep -v "^#" .env.local | cut -d= -f1` for key names only.', + ) + } + + // 3. curl with Authorization header and unsanitized stdout. + const curlHasAuth = CURL_WITH_AUTH.test(command) + const curlOutputSafe = + />\s*\/dev\/null|>\s*[^|&]/.test(command) || + /\|\s*(?:jq|grep|head|tail|wc|cut|awk|python3?\s+-m\s+json\.tool)\b/.test( + command, + ) + if (curlHasAuth && !curlOutputSafe) { + throw new BlockError( + 'curl with Authorization header and unsanitized stdout', + 'Redirect response to /dev/null, pipe to jq/grep/head, or save to a file.', + ) + } + + // 4. References a sensitive env var name and writes to stdout + // without a redaction step. Skip when curl-with-auth passed — that + // rule already evaluated the same pipeline. + if ( + !curlHasAuth && + referencesSensitiveEnv(command) && + !hasRedaction(command) + ) { + const isPureWrite = /^\s*(?:git|pnpm|npm|node|tsc|oxfmt|oxlint)\b/.test( + command, + ) + if (!isPureWrite) { + throw new BlockError( + 'command references sensitive env var name and writes to stdout without redaction', + 'Redirect to a file, pipe through `sed "s/=.*/=/"`, or ensure only key names (not values) are printed.', + ) + } + } +} + +const emitBlock = (command: string, err: BlockError): void => { + const safeCommand = err.showCommand + ? command.slice(0, 200) + (command.length > 200 ? '…' : '') + : '' + process.stderr.write( + `\n[token-guard] Blocked: ${err.rule}\n` + + ` Command: ${safeCommand}\n` + + ` Fix: ${err.suggestion}\n\n`, + ) +} + +const main = async (): Promise => { + const raw = await stdin() + if (!raw) { + return + } + let payload: ToolInput + try { + payload = JSON.parse(raw) as ToolInput + } catch { + return + } + if (payload.tool_name !== 'Bash') { + return + } + const command = payload.tool_input?.command ?? '' + if (!command) { + return + } + + try { + check(command) + } catch (e) { + if (e instanceof BlockError) { + emitBlock(command, e) + process.exitCode = 2 + return + } + throw e + } +} + +main().catch(e => { + // Never block a tool call due to a bug in the hook itself. Log it + // so we notice, but fail open. + process.stderr.write(`[token-guard] hook error (allowing): ${e}\n`) + process.exitCode = 0 +}) diff --git a/.claude/hooks/token-guard/package.json b/.claude/hooks/token-guard/package.json new file mode 100644 index 00000000..8f94eab0 --- /dev/null +++ b/.claude/hooks/token-guard/package.json @@ -0,0 +1,12 @@ +{ + "name": "@socketsecurity/hook-token-guard", + "private": true, + "type": "module", + "main": "./index.mts", + "exports": { + ".": "./index.mts" + }, + "scripts": { + "test": "node --test test/*.test.mts" + } +} diff --git a/.claude/hooks/token-guard/test/token-guard.test.mts b/.claude/hooks/token-guard/test/token-guard.test.mts new file mode 100644 index 00000000..b2ab6714 --- /dev/null +++ b/.claude/hooks/token-guard/test/token-guard.test.mts @@ -0,0 +1,225 @@ +/** + * @fileoverview Tests for the token-guard hook. + * + * Runs the hook as a subprocess (node --test), piping a tool-use + * payload on stdin and asserting on the exit code + stderr. Exit 2 + * means the hook refused the command; exit 0 means it passed it + * through. + */ + +import { describe, it } from 'node:test' +import assert from 'node:assert/strict' + +import { whichSync } from '@socketsecurity/lib/bin' +import { spawnSync } from '@socketsecurity/lib/spawn' + +const hookScript = new URL('../index.mts', import.meta.url).pathname +const nodeBin = whichSync('node') +if (!nodeBin) { + throw new Error('"node" not found on PATH') +} + +function runHook(command: string, toolName = 'Bash'): { + code: number | null + stdout: string + stderr: string +} { + const input = JSON.stringify({ + tool_name: toolName, + tool_input: { command }, + }) + const result = spawnSync(nodeBin, [hookScript], { + input, + timeout: 5_000, + stdio: ['pipe', 'pipe', 'pipe'], + }) + return { + code: result.status, + stdout: (result.stdout || '').toString(), + stderr: (result.stderr || '').toString(), + } +} + +describe('token-guard hook', () => { + describe('allows safe commands', () => { + it('plain echo', () => { + assert.equal(runHook('echo hello').code, 0) + }) + it('git log', () => { + assert.equal(runHook('git log -1 --oneline').code, 0) + }) + it('pnpm install', () => { + assert.equal(runHook('pnpm install').code, 0) + }) + it('node script', () => { + assert.equal(runHook('node scripts/build.mts').code, 0) + }) + it('sed with redaction on .env', () => { + assert.equal( + runHook("sed 's/=.*/=/' .env.local").code, + 0, + ) + }) + it('grep key-names-only on .env', () => { + assert.equal( + runHook("grep -v '^#' .env.local | cut -d= -f1").code, + 0, + ) + }) + it('curl without Authorization header', () => { + assert.equal(runHook('curl -sS https://api.example.com').code, 0) + }) + it('curl with auth piped to jq', () => { + assert.equal( + runHook( + 'curl -sS -H "Authorization: Bearer $TOKEN" https://api.example.com | jq .name', + ).code, + 0, + ) + }) + it('curl with auth redirected to file', () => { + assert.equal( + runHook( + 'curl -sS -H "Authorization: Bearer $TOKEN" https://api.example.com > out.json', + ).code, + 0, + ) + }) + it('non-Bash tool is always allowed', () => { + assert.equal(runHook('env', 'Edit').code, 0) + }) + }) + + describe('blocks literal token shapes', () => { + it('Val Town token', () => { + const r = runHook('echo vtwn_ABCDEFGHIJKL') + assert.equal(r.code, 2) + assert.match(r.stderr, /Val Town token/) + }) + it('Linear API token', () => { + const r = runHook('echo lin_api_ABCDEFGHIJKLMNOP') + assert.equal(r.code, 2) + assert.match(r.stderr, /Linear API token/) + }) + it('GitHub PAT', () => { + const r = runHook( + 'echo ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcd1234', + ) + assert.equal(r.code, 2) + assert.match(r.stderr, /GitHub personal access token/) + }) + it('AWS access key', () => { + const r = runHook('echo AKIAIOSFODNN7EXAMPLE') + assert.equal(r.code, 2) + assert.match(r.stderr, /AWS access key/) + }) + it('Stripe test secret', () => { + const r = runHook('echo sk_test_ABCDEFGHIJKLMNOP') + assert.equal(r.code, 2) + assert.match(r.stderr, /Stripe test secret/) + }) + it('JWT', () => { + const r = runHook( + 'echo eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c', + ) + assert.equal(r.code, 2) + assert.match(r.stderr, /JWT/) + }) + it('redacts the command in stderr so the literal token is not re-logged', () => { + const r = runHook('echo vtwn_SECRETVALUE') + assert.equal(r.code, 2) + assert.doesNotMatch(r.stderr, /SECRETVALUE/) + assert.match(r.stderr, /suppressed/) + }) + }) + + describe('blocks env/printenv dumps', () => { + it('bare env', () => { + assert.equal(runHook('env').code, 2) + }) + it('env piped without redactor', () => { + assert.equal(runHook('env | grep FOO').code, 2) + }) + it('printenv', () => { + assert.equal(runHook('printenv').code, 2) + }) + it('export -p', () => { + assert.equal(runHook('export -p').code, 2) + }) + }) + + describe('blocks .env reads without redaction', () => { + it('cat .env.local', () => { + assert.equal(runHook('cat .env.local').code, 2) + }) + it('head .env', () => { + assert.equal(runHook('head .env').code, 2) + }) + it('less .env.production', () => { + assert.equal(runHook('less .env.production').code, 2) + }) + }) + + describe('blocks curl with auth to unfiltered stdout', () => { + it('plain curl -H Authorization', () => { + const r = runHook( + 'curl -sS -H "Authorization: Bearer $TOKEN" https://api.example.com', + ) + assert.equal(r.code, 2) + assert.match(r.stderr, /Authorization header and unsanitized stdout/) + }) + }) + + describe('blocks sensitive-env-name references without redaction', () => { + it('echoing $API_KEY', () => { + assert.equal(runHook('echo $API_KEY').code, 2) + }) + it('ruby -e with $TOKEN', () => { + assert.equal( + runHook('ruby -e "puts ENV[\'ACCESS_TOKEN\']"').code, + 2, + ) + }) + }) + + describe('does not false-positive on substring of sensitive name', () => { + // Regression: `PATHS-ALLOWLIST.YML` toUpperCase()d contains `PASS` + // as a substring, which the pre-fix unbounded match treated as + // a sensitive env reference. Word-boundary fix means `PASS` must + // be a standalone token (or at a `_`/`-`/`.`/`/` boundary). + it('paths-allowlist.yml does not trip PASS', () => { + assert.equal(runHook('cat .github/paths-allowlist.yml').code, 0) + }) + it('AUTHOR_NAME does not trip AUTH', () => { + // AUTHOR ends with R; the boundary-after match correctly skips + // it because the next char is `_`, but `AUTH` followed by `O` + // (alphanumeric) is not a token boundary. + assert.equal(runHook('echo $AUTHOR_NAME').code, 0) + }) + it('PASSAGE_TIME does not trip PASS', () => { + assert.equal(runHook('echo $PASSAGE_TIME').code, 0) + }) + }) + + describe('fails open on malformed input', () => { + it('empty stdin', () => { + const r = spawnSync(nodeBin, [hookScript], { + input: '', + timeout: 5_000, + stdio: ['pipe', 'pipe', 'pipe'], + }) + assert.equal(r.status, 0) + }) + it('non-JSON stdin', () => { + const r = spawnSync(nodeBin, [hookScript], { + input: 'not json', + timeout: 5_000, + stdio: ['pipe', 'pipe', 'pipe'], + }) + assert.equal(r.status, 0) + }) + it('empty command', () => { + assert.equal(runHook('').code, 0) + }) + }) +}) diff --git a/.claude/hooks/token-guard/tsconfig.json b/.claude/hooks/token-guard/tsconfig.json new file mode 100644 index 00000000..53c5c847 --- /dev/null +++ b/.claude/hooks/token-guard/tsconfig.json @@ -0,0 +1,15 @@ +{ + "compilerOptions": { + "declarationMap": false, + "erasableSyntaxOnly": true, + "module": "nodenext", + "moduleResolution": "nodenext", + "noEmit": true, + "rewriteRelativeImportExtensions": true, + "skipLibCheck": true, + "sourceMap": false, + "strict": true, + "target": "esnext", + "verbatimModuleSyntax": true + } +} diff --git a/.claude/settings.json b/.claude/settings.json index ac130fc1..cc6c18da 100644 --- a/.claude/settings.json +++ b/.claude/settings.json @@ -7,6 +7,23 @@ { "type": "command", "command": "node .claude/hooks/check-new-deps/index.mts" + }, + { + "type": "command", + "command": "node .claude/hooks/path-guard/index.mts" + } + ] + }, + { + "matcher": "Bash", + "hooks": [ + { + "type": "command", + "command": "node .claude/hooks/token-guard/index.mts" + }, + { + "type": "command", + "command": "node .claude/hooks/public-surface-reminder/index.mts" } ] } diff --git a/.claude/skills/_shared/path-guard-rule.md b/.claude/skills/_shared/path-guard-rule.md new file mode 100644 index 00000000..fa42a32e --- /dev/null +++ b/.claude/skills/_shared/path-guard-rule.md @@ -0,0 +1,39 @@ + + +## 1 path, 1 reference + +**A path is *constructed* exactly once. Everywhere else *references* the constructed value.** + +Referencing a single computed path many times is fine — that's the whole point of computing it once. What's banned is *re-constructing* the same path in multiple places, because that's where drift is born. Three concrete shapes: + +1. **Within a package** — every script, test, and lib file that needs a build path imports it from the package's `scripts/paths.mts` (or `lib/paths.mts`). No `path.join('build', mode, ...)` outside that module. + +2. **Across packages** — when package B consumes package A's output, B imports A's `paths.mts` via the workspace `exports` field. Never `path.join(PKG, '..', '', 'build', ...)`. The R28 yoga/ink bug — ink hand-building yoga's wasm path and missing the `wasm/` segment — is the canonical failure mode this rule prevents. + +3. **Workflows, Dockerfiles, shell scripts** — they can't `import` TS, so they construct the string once and reference it everywhere downstream. Workflows: a "Compute paths" step exposes `steps.paths.outputs.final_dir`; later steps read `${{ steps.paths.outputs.final_dir }}`. Dockerfiles/shell: assign once to a variable, reference by name thereafter. Each canonical construction carries a comment naming the source-of-truth `paths.mts` so the YAML can't drift from TS without a flagged change. **Re-building** the same path in a second step is the violation, not referring to the constructed value many times. + +Comments that re-state a full path are forbidden. The import statement IS the comment. Docs and READMEs may describe the structure ("output goes under the Final dir") but should not encode a complete `build///out/Final/binary` string — encoded paths get parsed by tools and silently rot. + +Code execution takes priority over docs: violations in `.mts`/`.cts`, Makefiles, Dockerfiles, workflow YAML, and shell scripts are blocking. README and doc-comment violations are advisory unless they contain a fully-qualified path with no parametric placeholders. + +### Three-level enforcement + +- **Hook** — `.claude/hooks/path-guard/` blocks `Edit`/`Write` calls that would introduce a violation in a `.mts`/`.cts` file. Refusal at edit time stops new duplication from landing. +- **Gate** — `scripts/check-paths.mts` runs in `pnpm check`. Fails the build on any violation that isn't allowlisted. +- **Skill** — `/path-guard` audits the repo and fixes findings; `/path-guard check` reports only; `/path-guard install` drops the gate + hook + rule into a fresh repo. + +The mantra is intentionally short so it sticks: **1 path, 1 reference**. When in doubt, find the canonical owner and import from it. diff --git a/.claude/skills/path-guard/SKILL.md b/.claude/skills/path-guard/SKILL.md new file mode 100644 index 00000000..11d0e5ba --- /dev/null +++ b/.claude/skills/path-guard/SKILL.md @@ -0,0 +1,248 @@ +--- +name: path-guard +description: Audit and fix path duplication in this Socket repo. Apply the strict "1 path, 1 reference" rule — every build/test/runtime/config path is constructed exactly once; everywhere else references the constructed value. Default mode finds and fixes; `check` mode reports only; `install` mode drops the gate + hook + rule into a fresh repo. +user-invocable: true +allowed-tools: Task, Bash, Read, Edit, Write, Grep, Glob, AskUserQuestion +--- + +# path-guard + +**Mantra: 1 path, 1 reference.** A path is constructed exactly once; everywhere else references the constructed value. Re-constructing the same path twice is the violation, not referencing the constructed value many times. + +## Modes + +- `/path-guard` — full audit-and-fix conversion of the current repo (default). +- `/path-guard check` — read-only audit, report violations, no fixes. +- `/path-guard fix ` — fix a single finding from a prior `check` run, by index. +- `/path-guard install` — drop the gate + hook + rule + allowlist into a fresh repo (for new Socket repos). + +## Three-level enforcement + +The strategy lives in three artifacts that ship together: + +1. **CLAUDE.md rule** — the mantra and detection rules in plain language. Every Socket repo's CLAUDE.md carries `## 1 path, 1 reference`. Synced from `.claude/skills/_shared/path-guard-rule.md`. +2. **Hook** — `.claude/hooks/path-guard/index.mts` runs `PreToolUse` on `Edit`/`Write` of `.mts`/`.cts` files. Blocks new violations at edit time. Mandatory across the fleet. +3. **Gate** — `scripts/check-paths.mts` runs in `pnpm check` (and CI). Whole-repo scan. Fails the build on any unsanctioned violation. + +This skill is the *audit-and-fix workflow* that makes a repo conform initially and validates conformance over time. + +## Detection rules + +The gate enforces six rules. The hook enforces a subset (A and B) since it sees only one diff at a time. + +| Rule | What it catches | Where checked | +|---|---|---| +| **A** | Multi-stage `path.join(...)` constructed inline. Two or more "stage" segments (Final, Release, Stripped, Compressed, Optimized, Synced, wasm, downloaded), or one stage + build-root + mode. | `.mts`/`.cts` files outside a `paths.mts`. Hook + gate. | +| **B** | Cross-package traversal: `path.join(*, '..', '', 'build', ...)` reaching into a sibling's output instead of importing via `exports`. | `.mts`/`.cts` files. Hook + gate. | +| **C** | Workflow YAML constructs the same path string in 2+ steps outside a "Compute paths" step. | `.github/workflows/*.yml`. Gate. | +| **D** | Comment encodes a fully-qualified multi-stage path string (e.g. `# build/dev/darwin-arm64/out/Final/binary`). | `.github/workflows/*.yml`. Gate. | +| **F** | Same path shape constructed in 2+ different files. | All scanned files. Gate. | +| **G** | Hand-built multi-stage path constructed 2+ times in the same Makefile/Dockerfile/shell stage. | `Makefile`, `*.mk`, `*.Dockerfile`, `Dockerfile.*`, `*.sh`. Gate. | + +Comments may describe path *structure* with placeholders (`/` or `${BUILD_MODE}/${PLATFORM_ARCH}`) but should not encode a complete literal path string. Code execution takes priority over docs: violations in `.mts`, Makefiles, Dockerfiles, workflow YAML, shell scripts are blocking. + +## Mode: audit-and-fix (default) + +When invoked as `/path-guard` with no arg: + +1. **Setup** — spawn a worktree off `main` per `CLAUDE.md` parallel-sessions rule: + ```bash + git worktree add -b paths-audit ../-paths-audit main + cd ../-paths-audit + ``` + +2. **Audit** — run the gate to enumerate findings: + ```bash + pnpm run check:paths --json > /tmp/paths-findings.json + pnpm run check:paths --explain # human-readable + ``` + +3. **Fix loop** — for each finding, apply the matching pattern below. After each fix, re-run the gate. Stop iterating when `pnpm run check:paths` exits 0. + +4. **Verify** — run the full check suite + zizmor on any modified workflow: + ```bash + pnpm check + for w in .github/workflows/*.yml; do zizmor "$w"; done + ``` + +5. **Commit and push** — group fixes by logical category (workflows, code, Dockerfiles). Push directly to `main` for repos that allow direct push, or open a PR for repos that require it (socket-cli, socket-sdk-js, socket-registry per their CLAUDE.md / memory entries). + +## Fix patterns + +### Rule A — Multi-stage path constructed inline (in `.mts`/`.cts`) + +**Bad**: +```ts +const finalBinary = path.join(PACKAGE_ROOT, 'build', BUILD_MODE, PLATFORM_ARCH, 'out', 'Final', 'binary') +``` + +**Fix**: move the construction into the package's `scripts/paths.mts` (or `lib/paths.mts`), or use a build-infra helper: +```ts +// In packages/foo/scripts/paths.mts: +export function getBuildPaths(mode, platformArch) { + // ... constructs once ... + return { outputFinalBinary: path.join(PACKAGE_ROOT, 'build', mode, platformArch, 'out', 'Final', binaryName) } +} + +// In the consumer: +import { getBuildPaths } from './paths.mts' +const { outputFinalBinary } = getBuildPaths(mode, platformArch) +``` + +For binsuite tools (binpress/binflate/binject) the canonical helper is `getFinalBinaryPath(packageRoot, mode, platformArch, binaryName)` from `build-infra/lib/paths`. For download caches use `getDownloadedDir(packageRoot)`. + +### Rule B — Cross-package traversal + +**Bad**: +```ts +const liefDir = path.join(PACKAGE_ROOT, '..', 'lief-builder', 'build', mode, platformArch, 'out', 'Final', 'lief') +``` + +**Fix**: declare the workspace dep, expose `paths.mts` via the producer's `exports`, import the helper: + +1. In producer's `package.json`: + ```json + "exports": { + "./scripts/paths": "./scripts/paths.mts" + } + ``` +2. In consumer's `package.json` `dependencies`: + ```json + "lief-builder": "workspace:*" + ``` +3. In consumer: + ```ts + import { getBuildPaths as getLiefBuildPaths } from 'lief-builder/scripts/paths' + const { outputFinalDir } = getLiefBuildPaths(mode, platformArch) + ``` + +### Rule C — Workflow path repetition + +**Bad** (3 steps each rebuilding the same path): +```yaml +- name: Step A + run: cd packages/foo/build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final && do-thing-1 +- name: Step B + run: cd packages/foo/build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final && do-thing-2 +- name: Step C + run: cd packages/foo/build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final && do-thing-3 +``` + +**Fix**: add a "Compute paths" step early in the job that constructs the path once, expose via `$GITHUB_OUTPUT`, reference downstream: + +```yaml +- name: Compute foo paths + id: paths + env: + BUILD_MODE: ${{ steps.build-mode.outputs.mode }} + PLATFORM_ARCH: ${{ steps.platform-arch.outputs.platform_arch }} + run: | + PACKAGE_DIR="packages/foo" + PLATFORM_BUILD_DIR="${PACKAGE_DIR}/build/${BUILD_MODE}/${PLATFORM_ARCH}" + FINAL_DIR="${PLATFORM_BUILD_DIR}/out/Final" + { + echo "package_dir=${PACKAGE_DIR}" + echo "platform_build_dir=${PLATFORM_BUILD_DIR}" + echo "final_dir=${FINAL_DIR}" + } >> "$GITHUB_OUTPUT" + +- name: Step A + env: + FINAL_DIR: ${{ steps.paths.outputs.final_dir }} + run: cd "$FINAL_DIR" && do-thing-1 +# ... etc +``` + +For paths used inside `working-directory: packages/foo` steps, expose a `_rel` companion (e.g. `final_dir_rel=build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final`) and reference that. + +### Rule D — Comment-encoded paths + +**Bad**: +```yaml +# Path: packages/foo/build/dev/darwin-arm64/out/Final/binary +COPY --from=builder /build/.../out/Final/binary /out/Final/binary +``` + +**Fix**: cite the canonical `paths.mts` instead of duplicating the string: +```yaml +# Layout owned by packages/foo/scripts/paths.mts:getBuildPaths(). +COPY --from=builder /build/packages/foo/build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final/binary /out/Final/binary +``` + +The comment may describe structure (`/`) but should not be a parsable literal path. + +### Rule G — Dockerfile/Makefile/shell duplicate construction + +**Bad** (Dockerfile reconstructs the path 3 times in the same stage): +```dockerfile +RUN mkdir -p build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final && \ + cp src build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final/output && \ + ls build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final/ +``` + +**Fix**: declare an `ENV` once, reference everywhere: +```dockerfile +# Layout owned by packages/foo/scripts/paths.mts. +ENV FINAL_DIR=build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final +RUN mkdir -p "$FINAL_DIR" && cp src "$FINAL_DIR/output" && ls "$FINAL_DIR/" +``` + +Each Dockerfile `FROM` stage is its own scope — ENV from the build stage doesn't reach a subsequent `FROM scratch AS export` stage. The gate accounts for this. + +## Mode: check (read-only) + +When invoked as `/path-guard check`: + +```bash +pnpm run check:paths --explain +``` + +Print the gate's findings without making any edits. Exit 0 if clean, 1 if findings present. Useful for CI / pre-merge inspection. + +## Allowlisting a finding + +When a genuine exemption is needed (rare — most "false positives" should be reported as gate bugs), add an entry to `.github/paths-allowlist.yml`. Two ways to pin the entry to a specific site: + +- **`line:`** — exact line number. Strict; a single-line edit above shifts the entry off-target and the finding re-surfaces. +- **`snippet_hash:`** — 12-char SHA-256 prefix of the offending snippet (whitespace-normalized). Drift-resistant: survives reformatting, but any content-changing edit invalidates it. Get the hash: + ```bash + pnpm run check:paths --show-hashes + ``` + +Both may be set — either matching is sufficient. Prefer `snippet_hash` over raw `line:` when the exemption is expected to outlive routine reformatting; prefer `line:` when you specifically *want* the entry to fall off after any nearby edit. + +## Mode: install (new repo) + +When invoked as `/path-guard install` on a Socket repo that doesn't yet have the gate: + +1. Copy the gate file from this skill's reference dir: + ```bash + cp .claude/skills/path-guard/reference/check-paths.mts.tmpl scripts/check-paths.mts + ``` +2. Copy the empty allowlist: + ```bash + cp .claude/skills/path-guard/reference/paths-allowlist.yml.tmpl .github/paths-allowlist.yml + ``` +3. Add `"check:paths": "node scripts/check-paths.mts"` to `package.json`. +4. Wire `runPathHygieneCheck()` into `scripts/check.mts` (after the existing checks). +5. Append the rule snippet from `.claude/skills/_shared/path-guard-rule.md` to the repo's `CLAUDE.md` if a `1 path, 1 reference` section is missing. +6. Add the hook entry to `.claude/settings.json` `PreToolUse` matcher `Edit|Write`: + ```json + { "type": "command", "command": "node .claude/hooks/path-guard/index.mts" } + ``` +7. Run the gate against the repo. Triage findings as you would in audit-and-fix mode. + +## Tie-in with quality-scan + +The `/quality-scan` skill should call `pnpm run check:paths --json` as one of its sub-scans and surface findings as part of its A-F graded report. Failures roll into the overall quality grade. The full audit-and-fix workflow lives here; quality-scan just *detects* during periodic scans. + +## Reference patterns + +When converting a repo to the strategy, the patterns I keep reusing: + +- **TS-first packages**: each package owns a `scripts/paths.mts` with `PACKAGE_ROOT`, `BUILD_ROOT`, `getBuildPaths(mode, platformArch)` returning at minimum `outputFinalDir` and `outputFinalBinary`/`outputFinalFile`. +- **Cross-package consumers**: `package.json` `exports` whitelists `./scripts/paths`. Consumer adds `": workspace:*"` and imports. +- **Workflows**: each job has a "Compute paths" step (`id: paths`) early in the job. Step outputs include `package_dir`, `platform_build_dir`, `final_dir`, named files. `_rel` companions when `working-directory:` is used. +- **Docker stages**: each `FROM` stage declares `ENV PLATFORM_BUILD_DIR=...` and `ENV FINAL_DIR=...` once. Subsequent RUN steps reference the variables. + +The first repo (socket-btm) is the worked example. Read its `scripts/paths.mts` files and `.github/workflows/*.yml` for canonical patterns when applying the strategy elsewhere. diff --git a/.claude/skills/path-guard/reference/check-paths.mts.tmpl b/.claude/skills/path-guard/reference/check-paths.mts.tmpl new file mode 100644 index 00000000..023b6ce1 --- /dev/null +++ b/.claude/skills/path-guard/reference/check-paths.mts.tmpl @@ -0,0 +1,946 @@ +#!/usr/bin/env node +/** + * @fileoverview Path-hygiene gate. + * + * Mantra: 1 path, 1 reference. A path is constructed exactly once; + * everywhere else references the constructed value. + * + * Whole-repo scan complementing the per-edit `.claude/hooks/path-guard` + * hook. The hook stops new violations from landing; this gate finds + * the existing ones and blocks merges that introduce more. + * + * Rules enforced: + * + * A — Multi-stage path constructed inline. A `path.join(...)` call + * (or template literal) in a `.mts`/`.cts` file outside a + * `paths.mts` that stitches together two or more "stage" + * segments (Final, Release, Stripped, Compressed, Optimized, + * Synced, wasm, downloaded), or one stage plus a build-root + * (`build`/`out`) plus a mode (`dev`/`prod`/`shared`). The + * construction belongs in the package's `paths.mts` (or a + * build-infra helper); every consumer imports the computed + * value. + * + * B — Cross-package path traversal. A `path.join(*, '..', '', 'build', ...)` reaches into a sibling's build + * output without going through its `exports`. The sibling owns + * its layout; consumers declare a workspace dep and import the + * sibling's `paths.mts`. + * + * C — Hand-built workflow path. A `.github/workflows/*.yml` step + * constructs `build/${...}/out//...` inline outside a + * canonical "Compute paths" step. Workflows can carry path + * strings, but the strings are constructed once and exposed via + * step outputs / job env that downstream steps reference. + * + * D — Comment-encoded paths. Comments (in code or YAML) that re-state + * a fully-qualified multi-stage path. Comments may describe the + * structure ("Final dir" or "build//...") but should not + * encode a complete path string that a tool would parse — the + * canonical construction IS the documentation. + * + * F — Same path constructed in multiple places. The same shape of + * multi-stage `path.join(...)` (or workflow `build/${...}/...` + * string template) appearing in two or more files. Construct + * once and import; references of the constructed value are + * unlimited. + * + * G — Hand-built paths in Makefiles, Dockerfiles, and shell scripts. + * Same shape as A, applied to executable artifacts that don't + * run TypeScript. Each canonical construction must carry a + * comment naming the source-of-truth `paths.mts` so the script + * can't drift from TS without a flagged change. + * + * Allowlist: `.github/paths-allowlist.yml`. Each entry needs a + * `reason` so the list stays audit-able. Patterns are deliberately + * narrow — entries should be specific, not blanket. + * + * Usage: + * node scripts/check-paths.mts # default: report + fail + * node scripts/check-paths.mts --explain # long-form explanation + * node scripts/check-paths.mts --json # machine-readable + * node scripts/check-paths.mts --quiet # silent on clean + * + * Exit codes: + * 0 — clean (no findings, or every finding is allowlisted) + * 1 — findings present + * 2 — gate itself crashed + */ + +import { createHash } from 'node:crypto' +import { existsSync, readFileSync, readdirSync } from 'node:fs' +import path from 'node:path' +import process from 'node:process' + +import { fileURLToPath } from 'node:url' + +import { parseArgs } from 'node:util' + +import { + BUILD_ROOT_SEGMENTS, + KNOWN_SIBLING_PACKAGES, + MODE_SEGMENTS, + STAGE_SEGMENTS, +} from '../.claude/hooks/path-guard/segments.mts' + +// Plain stderr/stdout output — no @socketsecurity/lib dependency so +// the gate is self-contained and works in socket-lib itself (which +// would otherwise import itself). +const logger = { + log: (msg: string) => process.stdout.write(msg + '\n'), + error: (msg: string) => process.stderr.write(msg + '\n'), + step: (msg: string) => process.stdout.write(`→ ${msg}\n`), + success: (msg: string) => process.stdout.write(`✔ ${msg}\n`), + substep: (msg: string) => process.stdout.write(` ${msg}\n`), +} + +const __filename = fileURLToPath(import.meta.url) +const __dirname = path.dirname(__filename) +const REPO_ROOT = path.resolve(__dirname, '..') + +// Stage / build-root / mode / sibling-package vocabularies are imported +// from `.claude/hooks/path-guard/segments.mts` (the canonical source). +// Both this gate and the path-guard hook share that single definition +// — Mantra: 1 path, 1 reference. + +// File-path patterns that legitimately enumerate path segments. +const EXEMPT_FILE_PATTERNS: RegExp[] = [ + // Any paths.mts is the canonical constructor. + /(^|\/)paths\.(mts|cts|js)$/, + // Build-infra owns shared helpers that enumerate stages. + /packages\/build-infra\/lib\/paths\.mts$/, + /packages\/build-infra\/lib\/constants\.mts$/, + // Path-scanning gates that intentionally enumerate. + /scripts\/check-paths\.mts$/, + /scripts\/check-consistency\.mts$/, + /\.claude\/hooks\/path-guard\//, + // Allowlist + config files. + /\.github\/paths-allowlist\.yml$/, +] + +type Finding = { + rule: 'A' | 'B' | 'C' | 'D' | 'F' | 'G' + file: string + line: number + snippet: string + message: string + fix: string +} + +const findings: Finding[] = [] + +const args = parseArgs({ + options: { + explain: { type: 'boolean', default: false }, + json: { type: 'boolean', default: false }, + quiet: { type: 'boolean', default: false }, + 'show-hashes': { type: 'boolean', default: false }, + }, + strict: false, +}) + +const isExempt = (filePath: string): boolean => + EXEMPT_FILE_PATTERNS.some(re => re.test(filePath)) + +// ────────────────────────────────────────────────────────────────── +// Allowlist loading +// ────────────────────────────────────────────────────────────────── + +type AllowlistEntry = { + file?: string + pattern?: string + rule?: string + line?: number + snippet_hash?: string + reason: string +} + +const loadAllowlist = (): AllowlistEntry[] => { + const allowlistPath = path.join(REPO_ROOT, '.github', 'paths-allowlist.yml') + if (!existsSync(allowlistPath)) { + return [] + } + const text = readFileSync(allowlistPath, 'utf8') + // Tiny YAML parser — only the shape we need: list of entries with + // `file`, `pattern`, `rule`, `line`, `reason` scalar fields, plus + // YAML 1.2 block-scalar indicators `|` (literal) and `>` (folded) + // for multi-line reasons. Avoids a yaml dep for a gate that has to + // be self-contained. + const entries: AllowlistEntry[] = [] + let current: Partial | null = null + // When set, subsequent more-indented lines fold into this key as a + // block scalar (literal '|' keeps newlines, folded '>' joins with + // spaces). + let blockKey: string | null = null + let blockKind: '|' | '>' | null = null + let blockIndent = 0 + let blockLines: string[] = [] + const flushBlock = () => { + if (current && blockKey) { + const value = + blockKind === '>' + ? blockLines.join(' ').replace(/\s+/g, ' ').trim() + : blockLines.join('\n').replace(/\n+$/, '') + ;(current as any)[blockKey] = value + } + blockKey = null + blockKind = null + blockLines = [] + } + const indentOf = (line: string): number => { + let i = 0 + while (i < line.length && line[i] === ' ') { + i += 1 + } + return i + } + const lines = text.split('\n') + for (let i = 0; i < lines.length; i++) { + const raw = lines[i]! + const line = raw.replace(/\r$/, '') + // Block-scalar accumulation takes precedence over normal parsing. + if (blockKey !== null) { + if (line.trim() === '') { + // Preserve blank lines inside a literal block; folded blocks + // turn them into paragraph breaks (kept as separate joins). + blockLines.push('') + continue + } + const indent = indentOf(line) + if (indent >= blockIndent) { + blockLines.push(line.slice(blockIndent)) + continue + } + flushBlock() + // Fall through and re-process the dedented line as normal. + } + if (!line.trim() || line.trim().startsWith('#')) { + continue + } + const tryAssign = (key: string, value: string) => { + const trimmed = value.trim() + if (current === null) { + return + } + if (trimmed === '|' || trimmed === '>') { + blockKey = key + blockKind = trimmed as '|' | '>' + blockIndent = indentOf(lines[i + 1] ?? '') || indentOf(line) + 2 + blockLines = [] + return + } + ;(current as any)[key] = key === 'line' ? Number(unquote(trimmed)) : unquote(trimmed) + } + if (line.startsWith('- ')) { + if (current && current.reason) { + entries.push(current as AllowlistEntry) + } + current = {} + const rest = line.slice(2).trim() + if (rest) { + const m = rest.match(/^([\w-]+):\s*(.*)$/) + if (m) { + tryAssign(m[1]!, m[2]!) + } + } + } else if (current) { + const m = line.match(/^\s+([\w-]+):\s*(.*)$/) + if (m) { + tryAssign(m[1]!, m[2]!) + } + } + } + if (blockKey !== null) { + flushBlock() + } + if (current && current.reason) { + entries.push(current as AllowlistEntry) + } + return entries +} + +const unquote = (s: string): string => { + const t = s.trim() + if ( + (t.startsWith('"') && t.endsWith('"')) || + (t.startsWith("'") && t.endsWith("'")) + ) { + return t.slice(1, -1) + } + return t +} + +const ALLOWLIST = loadAllowlist() + +/** + * Stable, normalized snippet hash. Whitespace-insensitive so trivial + * reformatting (indent change, trailing comma, line wrap) doesn't + * invalidate an allowlist entry, but content-changing edits do. The + * hash exposes only the first 12 hex chars (~48 bits) which is plenty + * for collision-resistance within a single repo's finding set and + * keeps the YAML readable. + */ +const snippetHash = (snippet: string): string => { + const normalized = snippet.replace(/\s+/g, ' ').trim() + return createHash('sha256').update(normalized).digest('hex').slice(0, 12) +} + +/** + * Allowlist matching trades off two failure modes: + * + * - Drift via reformatting (a line shift breaks an entry, the + * finding re-surfaces, devs paper over with a new entry). + * - Stealth allowlisting (an entry pinned to "anywhere in this file" + * silently exempts unrelated future violations). + * + * Strategy: exact line match OR `snippet_hash` match (whitespace- + * normalized SHA-256, first 12 hex). Either is sufficient. Lines stay + * exact (was ±2; the slack let reformatting silently slide), and + * `snippet_hash` provides reformatting-tolerant matching that's still + * tied to the literal text — paste-and-edit cheating would change the + * hash. If neither `line` nor `snippet_hash` is provided, the entry + * matches purely by `rule` + `file` + `pattern` (file-level exempt; + * use sparingly and always pair with a precise `pattern`). + */ +const isAllowlisted = (finding: Finding): boolean => + ALLOWLIST.some(entry => { + if (entry.rule && entry.rule !== finding.rule) { + return false + } + if (entry.file && !finding.file.includes(entry.file)) { + return false + } + if (entry.pattern && !finding.snippet.includes(entry.pattern)) { + return false + } + const lineProvided = entry.line !== undefined + const hashProvided = + typeof entry.snippet_hash === 'string' && entry.snippet_hash.length > 0 + if (lineProvided || hashProvided) { + const lineMatches = + lineProvided && entry.line === finding.line + const hashMatches = + hashProvided && entry.snippet_hash === snippetHash(finding.snippet) + if (!(lineMatches || hashMatches)) { + return false + } + } + return true + }) + +// ────────────────────────────────────────────────────────────────── +// File walking +// ────────────────────────────────────────────────────────────────── + +const SKIP_DIRS = new Set([ + '.git', + 'node_modules', + 'build', + 'dist', + 'out', + 'target', + '.cache', + 'upstream', +]) + +const walk = function* ( + dir: string, + filter: (relPath: string) => boolean, +): Generator { + let entries + try { + entries = readdirSync(dir, { withFileTypes: true }) + } catch { + return + } + for (const e of entries) { + if (SKIP_DIRS.has(e.name)) { + continue + } + const full = path.join(dir, e.name) + const rel = path.relative(REPO_ROOT, full) + if (e.isDirectory()) { + yield* walk(full, filter) + } else if (e.isFile() && filter(rel)) { + yield rel + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule A + B: code scan (.mts / .cts) +// ────────────────────────────────────────────────────────────────── + +// Locate `path.join(` or `path.resolve(` call sites; argument-list +// extraction uses a paren-balancing scanner below to handle arbitrary +// nesting depth (the previous regex-only approach silently missed any +// argument containing 2+ levels of nested function calls). +const PATH_CALL_RE = /\bpath\.(?:join|resolve)\s*\(/g +const STRING_LITERAL_RE = /(['"])((?:\\.|(?!\1)[^\\])*)\1/g + +// Template literal scanner. Captures backtick-delimited strings +// (including those with `${...}` placeholders) so Rule A also catches +// path construction via template literals like +// `${buildDir}/out/Final/${binary}` or `build/${mode}/out/Final`. +const TEMPLATE_LITERAL_RE = /`((?:\\.|(?:\$\{(?:[^{}]|\{[^{}]*\})*\})|(?!`)[^\\])*)`/g + +/** + * Convert a template-literal body into a synthetic forward-slash path + * by replacing `${...}` placeholders with a sentinel and normalizing + * separators. Returns the sequence of path segments split on `/`. The + * sentinel doesn't match any STAGE/BUILD_ROOT/MODE token, so a + * placeholder-only segment (`${binaryName}`) won't match those sets. + */ +const templateLiteralSegments = (body: string): string[] => { + // Strip placeholders so they don't introduce noise in segments. + // Empty result for a placeholder is fine; downstream filters by set + // membership and skips empties. + const stripped = body.replace(/\$\{(?:[^{}]|\{[^{}]*\})*\}/g, '\x00') + return stripped.split('/').filter(seg => seg.length > 0 && seg !== '\x00') +} + +/** + * Extract every `path.join(...)` and `path.resolve(...)` call from the + * source text, returning each call's literal start offset and argument + * substring. Uses paren-balancing so deeply-nested arguments like + * `path.join(getDir(child(x)), 'build', 'Final')` are captured fully. + */ +const extractPathCalls = ( + source: string, +): Array<{ offset: number; args: string }> => { + const calls: Array<{ offset: number; args: string }> = [] + PATH_CALL_RE.lastIndex = 0 + let match: RegExpExecArray | null + while ((match = PATH_CALL_RE.exec(source)) !== null) { + const callStart = match.index + const argsStart = PATH_CALL_RE.lastIndex + let depth = 1 + let i = argsStart + let inString: '"' | "'" | '`' | null = null + while (i < source.length && depth > 0) { + const ch = source[i]! + if (inString) { + if (ch === '\\') { + i += 2 + continue + } + if (ch === inString) { + inString = null + } + } else { + if (ch === '"' || ch === "'" || ch === '`') { + inString = ch + } else if (ch === '(') { + depth += 1 + } else if (ch === ')') { + depth -= 1 + if (depth === 0) { + break + } + } + } + i += 1 + } + if (depth === 0) { + calls.push({ offset: callStart, args: source.slice(argsStart, i) }) + PATH_CALL_RE.lastIndex = i + 1 + } + } + return calls +} + +const extractStringLiterals = (args: string): string[] => { + const literals: string[] = [] + let match: RegExpExecArray | null + STRING_LITERAL_RE.lastIndex = 0 + while ((match = STRING_LITERAL_RE.exec(args)) !== null) { + if (match[2] !== undefined) { + literals.push(match[2]) + } + } + return literals +} + +const scanCodeFile = (relPath: string): void => { + const full = path.join(REPO_ROOT, relPath) + let content: string + try { + content = readFileSync(full, 'utf8') + } catch { + return + } + const lines = content.split('\n') + // Build a line-offset map so we can map regex offsets back to line + // numbers cheaply. + const lineOffsets: number[] = [0] + for (let i = 0; i < content.length; i++) { + if (content[i] === '\n') { + lineOffsets.push(i + 1) + } + } + const offsetToLine = (offset: number): number => { + let lo = 0 + let hi = lineOffsets.length - 1 + while (lo < hi) { + const mid = (lo + hi + 1) >>> 1 + if (lineOffsets[mid]! <= offset) { + lo = mid + } else { + hi = mid - 1 + } + } + return lo + 1 + } + + for (const call of extractPathCalls(content)) { + const literals = extractStringLiterals(call.args) + const stages = literals.filter(l => STAGE_SEGMENTS.has(l)) + const buildRoots = literals.filter(l => BUILD_ROOT_SEGMENTS.has(l)) + const modes = literals.filter(l => MODE_SEGMENTS.has(l)) + + // Rule A: 2+ stages OR (1 stage + 1 build-root + 1 mode). + const triggersA = + stages.length >= 2 || + (stages.length >= 1 && buildRoots.length >= 1 && modes.length >= 1) + if (triggersA) { + const line = offsetToLine(call.offset) + const snippet = (lines[line - 1] ?? '').trim() + findings.push({ + rule: 'A', + file: relPath, + line, + snippet, + message: 'Multi-stage path constructed inline (outside paths.mts).', + fix: 'Construct in the owning paths.mts (or use getFinalBinaryPath / getDownloadedDir from build-infra/lib/paths). Import the computed value here.', + }) + } + + // Rule B: each '..' opens a window; the window stays open only + // until the next non-'..' literal. A sibling-package literal + // *immediately after* a '..' (no path segment between them) + // triggers, AND there must be build context elsewhere in the + // call. Resetting per-segment prevents false positives where '..' + // appears earlier and sibling-name appears much later in an + // unrelated position. + const hasBuildContext = literals.some( + l => BUILD_ROOT_SEGMENTS.has(l) || STAGE_SEGMENTS.has(l), + ) + if (hasBuildContext) { + for (let i = 0; i < literals.length - 1; i++) { + if ( + literals[i] === '..' && + KNOWN_SIBLING_PACKAGES.has(literals[i + 1]!) + ) { + const sibling = literals[i + 1]! + const line = offsetToLine(call.offset) + const snippet = (lines[line - 1] ?? '').trim() + findings.push({ + rule: 'B', + file: relPath, + line, + snippet, + message: `Cross-package traversal into '${sibling}' build output.`, + fix: `Add '${sibling}: workspace:*' as a dep, declare an exports entry on '${sibling}' (e.g. './scripts/paths' → './scripts/paths.mts'), and import the path from there.`, + }) + break + } + } + } + } + + // Rule A (template literal variant). Backtick strings like + // `${buildDir}/out/Final/${binary}` or `build/${mode}/${arch}/out/Final` + // construct paths the same way `path.join(...)` does — flag the + // same shapes. Skip raw imports / template tag positions by + // filtering out leading `import.meta.url`-style / tag positions + // implicitly: TEMPLATE_LITERAL_RE matches any backtick string and + // we rely on segment composition to decide if it's a path. + TEMPLATE_LITERAL_RE.lastIndex = 0 + let tmpl: RegExpExecArray | null + while ((tmpl = TEMPLATE_LITERAL_RE.exec(content)) !== null) { + const body = tmpl[1] ?? '' + if (!body.includes('/')) { + continue + } + const segments = templateLiteralSegments(body) + const stages = segments.filter(s => STAGE_SEGMENTS.has(s)) + const buildRoots = segments.filter(s => BUILD_ROOT_SEGMENTS.has(s)) + const modes = segments.filter(s => MODE_SEGMENTS.has(s)) + // Template literal trigger is tighter than path.join() because + // backtick strings often appear in patch fixtures, error messages, + // and other multi-line content that incidentally contains stage + // tokens like `wasm`. Require the canonical build-output shape: + // - 'build' + 'out' + stage (canonical multi-stage layout), OR + // - 2+ stage segments AND 'out' (e.g. `wasm/out/Final`), OR + // - 'build' + stage + literal mode (back-compat with path.join). + const hasBuildAndOut = + buildRoots.includes('build') && buildRoots.includes('out') + const hasOut = buildRoots.includes('out') + const hasBuild = buildRoots.includes('build') + const triggersA = + (hasBuildAndOut && stages.length >= 1) || + (stages.length >= 2 && hasOut) || + (hasBuild && stages.length >= 1 && modes.length >= 1) + if (triggersA) { + const line = offsetToLine(tmpl.index) + const snippet = (lines[line - 1] ?? '').trim() + findings.push({ + rule: 'A', + file: relPath, + line, + snippet, + message: + 'Multi-stage path constructed inline via template literal (outside paths.mts).', + fix: 'Construct in the owning paths.mts (or use getFinalBinaryPath / getDownloadedDir from build-infra/lib/paths). Import the computed value here.', + }) + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule C + D: workflow YAML scan +// ────────────────────────────────────────────────────────────────── + +const WORKFLOW_PATH_RE = + /build\/\$\{[^}]+\}\/[^"'`\s]*\/out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/g +const WORKFLOW_GH_EXPR_PATH_RE = + /build\/\$\{\{\s*[^}]+\}\}\/[^"'`\s]*\/out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/g + +const isInsideComputePathsBlock = ( + lines: string[], + lineIdx: number, +): boolean => { + // Walk backwards up to 60 lines looking for the start of the + // current step. If that step is a "Compute paths" step, the line + // is exempt. + for (let i = lineIdx; i >= Math.max(0, lineIdx - 60); i--) { + const l = lines[i] ?? '' + if (/^\s*-\s*name:/i.test(l)) { + // Step boundary — check if THIS step is a Compute paths step. + // The step body may include `id: paths` even if the name is + // something else (e.g. `id: stub-paths`), so look at the next + // ~20 lines for either marker. + for (let j = i; j < Math.min(lines.length, i + 20); j++) { + const m = lines[j] ?? '' + if ( + /^\s*-\s*name:\s*Compute\s+[\w-]+\s+paths/i.test(m) || + /^\s*id:\s*[\w-]*paths\s*$/i.test(m) + ) { + return true + } + if (j > i && /^\s*-\s*name:/i.test(m)) { + // Hit the next step — current step is NOT Compute paths. + return false + } + } + return false + } + } + return false +} + +const scanWorkflowFile = (relPath: string): void => { + const full = path.join(REPO_ROOT, relPath) + let content: string + try { + content = readFileSync(full, 'utf8') + } catch { + return + } + const lines = content.split('\n') + + // First pass: collect every hand-built path occurrence outside a + // "Compute paths" step. Per the mantra, a single reference is fine + // — what's banned is reconstructing the same path 2+ times. + type PathHit = { + line: number + snippet: string + pathStr: string + } + const occurrences = new Map() + + for (let i = 0; i < lines.length; i++) { + const line = lines[i]! + if (/^\s*#/.test(line)) { + // Skip comment lines from C scan; they're under D below. + continue + } + if (isInsideComputePathsBlock(lines, i)) { + // Inside the canonical construction step — exempt. + continue + } + WORKFLOW_PATH_RE.lastIndex = 0 + WORKFLOW_GH_EXPR_PATH_RE.lastIndex = 0 + const matches: string[] = [] + let m: RegExpExecArray | null + while ((m = WORKFLOW_PATH_RE.exec(line)) !== null) { + matches.push(m[0]) + } + while ((m = WORKFLOW_GH_EXPR_PATH_RE.exec(line)) !== null) { + matches.push(m[0]) + } + for (const pathStr of matches) { + const list = occurrences.get(pathStr) ?? [] + list.push({ line: i + 1, snippet: line.trim(), pathStr }) + occurrences.set(pathStr, list) + } + } + + // Flag every occurrence of a shape that appears 2+ times. + for (const [pathStr, hits] of occurrences) { + if (hits.length < 2) { + continue + } + for (const hit of hits) { + findings.push({ + rule: 'C', + file: relPath, + line: hit.line, + snippet: hit.snippet, + message: `Workflow constructs the same path ${hits.length} times: ${pathStr}`, + fix: 'Add a "Compute paths" step (id: paths) early in the job that computes this path ONCE and exposes it via $GITHUB_OUTPUT. Reference as ${{ steps.paths.outputs. }} in subsequent steps. References of the constructed value are unlimited; reconstructing is the violation.', + }) + } + } + + // Rule D: comments encoding a fully-qualified multi-stage path + // (separate scan since it has different semantics). + for (let i = 0; i < lines.length; i++) { + const line = lines[i]! + if (!/^\s*#/.test(line)) { + continue + } + const literalShape = + /build\/(?:dev|prod|shared)\/[a-z0-9-]+\/(?:wasm\/)?out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/i + if (literalShape.test(line)) { + findings.push({ + rule: 'D', + file: relPath, + line: i + 1, + snippet: line.trim(), + message: 'Comment encodes a fully-qualified path string.', + fix: 'Cite the canonical paths.mts (e.g. "see packages//scripts/paths.mts:getBuildPaths()") instead of duplicating the path string. Comments may describe structure with placeholders ("/") but should not be a parsable path.', + }) + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule G: Makefile / Dockerfile / shell scan +// ────────────────────────────────────────────────────────────────── + +const SCRIPT_HAND_BUILT_RE = + /build\/\$?\{?(?:BUILD_MODE|MODE|prod|dev)\}?\/[\w${}.-]*\/out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/g + +const scanScriptFile = (relPath: string): void => { + const full = path.join(REPO_ROOT, relPath) + let content: string + try { + content = readFileSync(full, 'utf8') + } catch { + return + } + const lines = content.split('\n') + const isDockerfile = + /Dockerfile/i.test(relPath) || /\.glibc$|\.musl$/.test(relPath) + + // First pass: collect every multi-stage path occurrence in this file, + // scoped per Dockerfile stage (each `FROM ... AS ...` starts a new + // scope where ENV/ARG don't propagate). + type Hit = { line: number; text: string; pathStr: string; stage: number } + const hits: Hit[] = [] + let stage = 0 + for (let i = 0; i < lines.length; i++) { + const line = lines[i]! + if (/^\s*#/.test(line)) { + // Skip comments — documentation, not construction. + continue + } + if (isDockerfile && /^FROM\s+/i.test(line)) { + stage += 1 + continue + } + SCRIPT_HAND_BUILT_RE.lastIndex = 0 + let m: RegExpExecArray | null + while ((m = SCRIPT_HAND_BUILT_RE.exec(line)) !== null) { + hits.push({ + line: i + 1, + text: line.trim(), + pathStr: m[0], + stage, + }) + } + } + + // Group by (stage, pathStr) — only flag when a path is built 2+ + // times within the SAME Dockerfile stage (or anywhere in non- + // Dockerfile scripts, where stages don't apply). + const grouped = new Map() + for (const h of hits) { + const key = `${h.stage}::${h.pathStr}` + const list = grouped.get(key) ?? [] + list.push(h) + grouped.set(key, list) + } + for (const [, list] of grouped) { + if (list.length < 2) { + continue + } + for (const hit of list) { + findings.push({ + rule: 'G', + file: relPath, + line: hit.line, + snippet: hit.text, + message: `Hand-built multi-stage path constructed ${list.length} times in this file: ${hit.pathStr}`, + fix: 'Assign to a variable / ENV once near the top of the script / Dockerfile stage, with a comment naming the canonical paths.mts. Reference the variable everywhere downstream. References of a single construction are unlimited; reconstructing the same path is the violation.', + }) + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule F: cross-file path repetition +// ────────────────────────────────────────────────────────────────── + +const checkRuleF = (): void => { + // A path is "constructed" each time we see a new path.join with a + // matching shape. Group findings of Rule A by their snippet shape; + // when the same shape appears in 2+ files, demote them to Rule F so + // the message is more accurate. + const byShape = new Map() + for (const f of findings) { + if (f.rule !== 'A') { + continue + } + // Normalize: strip whitespace, identifiers, surrounding context; + // keep just the literal path-segment shape. + const literalsRe = /'[^']*'|"[^"]*"/g + const literals = (f.snippet.match(literalsRe) ?? []).join(',') + if (!literals) { + continue + } + const list = byShape.get(literals) ?? [] + list.push(f) + byShape.set(literals, list) + } + for (const [shape, list] of byShape) { + if (list.length < 2) { + continue + } + // Promote each Rule-A finding in this group to Rule F so the + // message tells the reader the issue is cross-file repetition, + // not just a single hand-build. + for (const f of list) { + f.rule = 'F' + f.message = `Same path shape constructed in ${list.length} places: ${shape.slice(0, 100)}` + f.fix = + 'Construct this path ONCE in a paths.mts (or build-infra helper) and import the computed value. References of the computed variable are unlimited; re-constructing the same shape twice is the violation.' + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Main +// ────────────────────────────────────────────────────────────────── + +const main = (): number => { + // Scan code files (Rule A + B). + for (const rel of walk( + REPO_ROOT, + p => p.endsWith('.mts') || p.endsWith('.cts'), + )) { + if (isExempt(rel)) { + continue + } + scanCodeFile(rel) + } + // Scan workflows (Rule C + D). + const workflowDir = path.join(REPO_ROOT, '.github', 'workflows') + if (existsSync(workflowDir)) { + for (const rel of walk(workflowDir, p => p.endsWith('.yml'))) { + if (isExempt(rel)) { + continue + } + scanWorkflowFile(rel) + } + } + // Scan scripts/Makefiles/Dockerfiles (Rule G). + for (const rel of walk(REPO_ROOT, p => { + const base = path.basename(p) + return ( + base === 'Makefile' || + base.endsWith('.mk') || + base.endsWith('.Dockerfile') || + base === 'Dockerfile' || + base.endsWith('.glibc') || + base.endsWith('.musl') || + (base.endsWith('.sh') && !p.includes('test/')) + ) + })) { + if (isExempt(rel)) { + continue + } + scanScriptFile(rel) + } + // Promote cross-file Rule-A repeats to Rule F. + checkRuleF() + + // Filter against allowlist. + const blocking = findings.filter(f => !isAllowlisted(f)) + + if (args.values.json) { + process.stdout.write( + JSON.stringify( + { findings: blocking, allowlisted: findings.length - blocking.length }, + null, + 2, + ) + '\n', + ) + return blocking.length === 0 ? 0 : 1 + } + + if (blocking.length === 0) { + if (!args.values.quiet) { + logger.success('Path-hygiene check passed (1 path, 1 reference)') + if (findings.length > 0) { + logger.substep(`${findings.length} finding(s) allowlisted`) + } + } + return 0 + } + + logger.error(`Path-hygiene check FAILED — ${blocking.length} finding(s)`) + logger.log('') + logger.log('Mantra: 1 path, 1 reference') + logger.log('') + for (const f of blocking) { + logger.log(` [${f.rule}] ${f.file}:${f.line}`) + logger.log(` ${f.snippet}`) + logger.log(` → ${f.message}`) + if (args.values['show-hashes']) { + logger.log(` snippet_hash: ${snippetHash(f.snippet)}`) + } + if (args.values.explain) { + logger.log(` Fix: ${f.fix}`) + } + logger.log('') + } + if (!args.values.explain) { + logger.log('Run with --explain to see fix suggestions per finding.') + logger.log( + 'Add intentional exceptions to .github/paths-allowlist.yml with a `reason` field.', + ) + logger.log( + 'Run with --show-hashes to print the snippet_hash for each finding (drift-resistant allowlisting).', + ) + } + return 1 +} + +try { + process.exitCode = main() +} catch (e) { + logger.error(`Path-hygiene gate crashed: ${e}`) + process.exitCode = 2 +} diff --git a/.claude/skills/path-guard/reference/claude-md-rule.md b/.claude/skills/path-guard/reference/claude-md-rule.md new file mode 100644 index 00000000..3e32b1ba --- /dev/null +++ b/.claude/skills/path-guard/reference/claude-md-rule.md @@ -0,0 +1,29 @@ + + +## 1 path, 1 reference + +**A path is *constructed* exactly once. Everywhere else *references* the constructed value.** + +Referencing a single computed path many times is fine — that's the whole point of computing it once. What's banned is *re-constructing* the same path in multiple places, because that's where drift is born. + +Three concrete shapes: + +1. **Within a package** — every script, test, and lib file that needs a build path imports it from the package's `scripts/paths.mts` (or `lib/paths.mts`). No `path.join('build', mode, ...)` outside that module. + +2. **Across packages** — when package B consumes package A's output, B imports A's `paths.mts` via the workspace `exports` field. Never `path.join(PKG, '..', '', 'build', ...)`. The R28 yoga/ink bug — ink hand-building yoga's wasm path and missing the `wasm/` segment — is the canonical failure mode this rule prevents. + +3. **Workflows, Dockerfiles, shell scripts** — they can't `import` TS, so they construct the string once and reference it everywhere downstream. Workflows: a "Compute paths" step exposes `steps.paths.outputs.final_dir`; later steps read `${{ steps.paths.outputs.final_dir }}`. Dockerfiles/shell: assign once to a variable / `ENV`, reference by name thereafter. Each canonical construction carries a comment naming the source-of-truth `paths.mts`. **Re-building** the same path in a second step is the violation, not referring to the constructed value many times. + +Comments may describe path *structure* with placeholders ("`/`" or "`${BUILD_MODE}/${PLATFORM_ARCH}`") but should not encode a complete literal path string. Code execution takes priority over docs: violations in `.mts`/`.cts`, Makefiles, Dockerfiles, workflow YAML, and shell scripts are blocking. README and doc-comment violations are advisory unless they contain a fully-qualified path with no parametric placeholders. + +### Three-level enforcement + +- **Hook** — `.claude/hooks/path-guard/` blocks `Edit`/`Write` calls that would introduce a violation in a `.mts`/`.cts` file. Refusal at edit time stops new duplication from landing. +- **Gate** — `scripts/check-paths.mts` runs in `pnpm check`. Fails the build on any violation that isn't allowlisted in `.github/paths-allowlist.yml`. +- **Skill** — `/path-guard` audits the repo and fixes findings; `/path-guard check` reports only; `/path-guard install` drops the gate + hook + rule into a fresh repo. + +The mantra is intentionally short so it sticks: **1 path, 1 reference**. When in doubt, find the canonical owner and import from it. diff --git a/.claude/skills/path-guard/reference/paths-allowlist.yml.tmpl b/.claude/skills/path-guard/reference/paths-allowlist.yml.tmpl new file mode 100644 index 00000000..e2746660 --- /dev/null +++ b/.claude/skills/path-guard/reference/paths-allowlist.yml.tmpl @@ -0,0 +1,28 @@ +# Path-hygiene gate allowlist. +# Mantra: 1 path, 1 reference. +# +# Each entry exempts a specific finding from `scripts/check-paths.mts`. +# Entries MUST carry a `reason` so the list stays audit-able and +# entries can be removed when the underlying code changes. +# +# Schema (all top-level keys optional except `reason`): +# +# - rule: Rule letter (A, B, C, D, F, G). Omit to match any rule. +# file: Substring match against the relative file path. +# pattern: Substring match against the offending snippet. +# line: Line number; matches if within ±2 of the finding. +# reason: Why this site is genuinely exempt. Required. +# +# Prefer narrow entries (rule + file + line + pattern) over blanket +# `file:` entries that exempt the whole file. Genuine exemptions are +# rare — most "false positives" should be reported as gate bugs. +# +# Example: +# +# - rule: A +# file: packages/foo/scripts/legacy-build.mts +# line: 42 +# pattern: "path.join(testDir, 'out', 'Final')" +# reason: | +# legacy-build.mts is scheduled for removal in v2.0; refactoring +# its path construction now would conflict with the rewrite. diff --git a/.claude/skills/security-scan/SKILL.md b/.claude/skills/security-scan/SKILL.md index 7f2fd77e..0c2cf12e 100644 --- a/.claude/skills/security-scan/SKILL.md +++ b/.claude/skills/security-scan/SKILL.md @@ -2,6 +2,7 @@ name: security-scan description: Runs a multi-tool security scan — AgentShield for Claude config, zizmor for GitHub Actions, and optionally Socket CLI for dependency scanning. Produces an A-F graded security report. Use after modifying `.claude/` config, hooks, agents, or GitHub Actions workflows, and before releases. user-invocable: true +allowed-tools: Task, Bash, Read, Grep, Glob --- # Security Scan diff --git a/.git-hooks/_api-key-check.sh b/.git-hooks/_api-key-check.sh new file mode 100755 index 00000000..ce07b250 --- /dev/null +++ b/.git-hooks/_api-key-check.sh @@ -0,0 +1,51 @@ +#!/bin/bash +# Shared helpers for git hooks — API-key scanner allowlist + color codes. +# Sourced by .git-hooks/commit-msg, pre-commit, pre-push. +# +# Constants +# --------- +# ALLOWED_PUBLIC_KEY The real public API key shipped in socket-lib test +# fixtures. Safe to appear in commits anywhere in the +# fleet. +# FAKE_TOKEN_MARKER Substring marker used in test fixtures (see +# socket-lib/test/unit/utils/fake-tokens.ts). Any line +# containing this string is treated as a test fixture +# by the API-key scanner. +# FAKE_TOKEN_LEGACY Legacy lib-scoped marker — accepted during the +# rename from `socket-lib-test-fake-token` to +# `socket-test-fake-token`. Drop when lib's rename PR +# lands. +# SOCKET_SECURITY_ENV Name of the env var used in shell examples; not a +# token value itself. Exempted from scanners. +# +# Functions +# --------- +# filter_allowed_api_keys Reads stdin, drops lines matching any allowlist +# entry, prints the rest. Usage: +# echo "$text" | filter_allowed_api_keys +# grep ... | filter_allowed_api_keys +# +# Colors +# ------ +# RED, GREEN, YELLOW, NC + +# shellcheck disable=SC2034 # constants sourced by other hooks +ALLOWED_PUBLIC_KEY="sktsec_t_--RAN5U4ivauy4w37-6aoKyYPDt5ZbaT5JBVMqiwKo_api" +FAKE_TOKEN_MARKER="socket-test-fake-token" +FAKE_TOKEN_LEGACY="socket-lib-test-fake-token" +SOCKET_SECURITY_ENV="SOCKET_SECURITY_API_KEY=" + +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' + +# Strips lines that match the allowlist: public key, current fake-token +# marker, legacy lib-scoped marker, env-var name, or `.example` paths. +filter_allowed_api_keys() { + grep -v "$ALLOWED_PUBLIC_KEY" \ + | grep -v "$FAKE_TOKEN_MARKER" \ + | grep -v "$FAKE_TOKEN_LEGACY" \ + | grep -v "$SOCKET_SECURITY_ENV" \ + | grep -v '\.example' +} diff --git a/.git-hooks/commit-msg b/.git-hooks/commit-msg index 7acf4c56..f628230b 100755 --- a/.git-hooks/commit-msg +++ b/.git-hooks/commit-msg @@ -4,8 +4,8 @@ set -e -# shellcheck source=./_helpers.sh -. "$(dirname "$0")/_helpers.sh" +# shellcheck source=./_api-key-check.sh +. "$(dirname "$0")/_api-key-check.sh" ERRORS=0 @@ -16,6 +16,11 @@ COMMITTED_FILES=$(git diff --cached --name-only --diff-filter=ACM 2>/dev/null || if [ -n "$COMMITTED_FILES" ]; then for file in $COMMITTED_FILES; do if [ -f "$file" ]; then + # Skip hook scripts: they contain the sktsec_ regex pattern literal. + if echo "$file" | grep -qE '\.git-hooks/|\.husky/'; then + continue + fi + # Check for Socket API keys (except allowed). if grep -E 'sktsec_[a-zA-Z0-9_-]+' "$file" 2>/dev/null | filter_allowed_api_keys | grep -q .; then printf "${RED}✗ SECURITY: Potential API key detected in commit!${NC}\n" @@ -23,8 +28,11 @@ if [ -n "$COMMITTED_FILES" ]; then ERRORS=$((ERRORS + 1)) fi - # Check for .env files. - if echo "$file" | grep -qE '^\.env(\.[^/]+)?$' && ! echo "$file" | grep -qE '^\.env\.(example|test)$'; then + # Check for .env files. Allow committed templates (.env.example, + # .env.test, .env.precommit) at any depth — they're tooling + # config, not secrets. Block bare .env / .env.local at any depth. + base=$(basename "$file") + if echo "$base" | grep -qE '^\.env(\.[^/]+)?$' && ! echo "$base" | grep -qE '^\.env\.(example|test|precommit)$'; then printf "${RED}✗ SECURITY: .env file in commit!${NC}\n" ERRORS=$((ERRORS + 1)) fi @@ -32,25 +40,8 @@ if [ -n "$COMMITTED_FILES" ]; then done fi -# Block Linear issue references in the commit message. -# Linear tracking lives in Linear; keep commit history tool-agnostic. -# Team keys enumerated from the Socket workspace. PATCH listed before PAT so -# the engine matches the longer prefix first on strings like "PATCH-123". -COMMIT_MSG_FILE="$1" -LINEAR_TEAM_KEYS='ASK|AUTO|BOT|CE|CORE|DAT|DES|DEV|ENG|INFRA|LAB|MAR|MET|OPS|PAR|PATCH|PAT|PLAT|REA|SALES|SBOM|SEC|SMO|SUP|TES|TI|WEB' -if [ -f "$COMMIT_MSG_FILE" ]; then - LINEAR_HITS=$(grep -vE '^#' "$COMMIT_MSG_FILE" 2>/dev/null \ - | grep -oE "(^|[^A-Za-z0-9_])($LINEAR_TEAM_KEYS)-[0-9]+($|[^A-Za-z0-9_])|linear\.app/[A-Za-z0-9/_-]+" \ - | head -5 || true) - if [ -n "$LINEAR_HITS" ]; then - printf "${RED}✗ Commit message references Linear issue(s):${NC}\n" - printf '%s\n' "$LINEAR_HITS" | sed 's/^/ /' - printf "${RED}Linear tracking lives in Linear. Remove the reference from the commit message.${NC}\n" - ERRORS=$((ERRORS + 1)) - fi -fi - # Auto-strip AI attribution from commit message. +COMMIT_MSG_FILE="$1" if [ -f "$COMMIT_MSG_FILE" ]; then # Create a temporary file to store the cleaned message. TEMP_FILE=$(mktemp) || { diff --git a/.git-hooks/pre-push b/.git-hooks/pre-push index 8f8637b8..0dcfe20d 100755 --- a/.git-hooks/pre-push +++ b/.git-hooks/pre-push @@ -15,11 +15,49 @@ set -e -# shellcheck source=./_helpers.sh -. "$(dirname "$0")/_helpers.sh" +# shellcheck source=./_api-key-check.sh +. "$(dirname "$0")/_api-key-check.sh" printf "${GREEN}Running mandatory pre-push validation...${NC}\n" +# ── CHECK 0: Submodule pristine check ──────────────────────────────── +# Ensures all submodules are at their expected commits with no +# uncommitted changes. Prevents accidental submodule pointer drift. +if [ -f .gitmodules ]; then + printf "Checking submodules are pristine...\n" + SUBMODULE_ERRORS=0 + while IFS= read -r line; do + # git submodule status prefixes: ' ' = clean, '+' = different commit, '-' = not initialized, 'U' = merge conflict + prefix="${line:0:1}" + rest="${line:1}" + sm_path=$(echo "$rest" | awk '{print $2}') + case "$prefix" in + +) + printf "${RED}✗ BLOCKED: Submodule has wrong commit: %s${NC}\n" "$sm_path" + printf " Run: git submodule update --init %s\n" "$sm_path" + SUBMODULE_ERRORS=$((SUBMODULE_ERRORS + 1)) + ;; + -) + # Uninitialized submodules are OK — CI shallow clones and scheduled + # triggers don't init all submodules. Only wrong-commit (+) and + # merge-conflict (U) states are errors. + ;; + U) + printf "${RED}✗ BLOCKED: Submodule has merge conflict: %s${NC}\n" "$sm_path" + SUBMODULE_ERRORS=$((SUBMODULE_ERRORS + 1)) + ;; + esac + done < <(git submodule status) + + if [ $SUBMODULE_ERRORS -gt 0 ]; then + printf "\n" + printf "${RED}✗ Push blocked: %d submodule(s) not pristine!${NC}\n" "$SUBMODULE_ERRORS" + printf "Fix submodules before pushing.\n" + exit 1 + fi + printf "${GREEN}✓ All submodules pristine${NC}\n" +fi + # Get the remote name and URL from git (passed as arguments to pre-push hooks). remote="$1" url="$2" @@ -60,7 +98,22 @@ while read local_ref local_sha remote_ref remote_sha; do fi else # Existing branch — only check commits not yet on the remote. - range="$remote_sha..$local_sha" + # Handle force-push / history rewrite: if remote_sha doesn't exist + # locally (e.g. after squash), fall back to default branch comparison. + if ! git cat-file -e "$remote_sha" 2>/dev/null; then + default_branch=$(git symbolic-ref "refs/remotes/$remote/HEAD" 2>/dev/null | sed "s@^refs/remotes/$remote/@@") + if [ -z "$default_branch" ]; then + default_branch="main" + fi + if git rev-parse "$remote/$default_branch" >/dev/null 2>&1; then + range="$remote/$default_branch..$local_sha" + else + printf "${GREEN}✓ Skipping validation (no baseline for force-push)${NC}\n" + continue + fi + else + range="$remote_sha..$local_sha" + fi fi # Validate the computed range before using it. @@ -129,7 +182,9 @@ while read local_ref local_sha remote_ref remote_sha; do # Check file contents for secrets and hardcoded paths. while IFS= read -r file; do - if [ -f "$file" ] && [ ! -d "$file" ]; then + # Only scan files that are tracked at HEAD (skip deleted/untracked files + # that still exist on disk — e.g. profdata after being removed from git). + if [ -f "$file" ] && [ ! -d "$file" ] && git ls-files --error-unmatch "$file" >/dev/null 2>&1; then # Skip test files, example files, and hook scripts themselves. if echo "$file" | grep -qE '\.(test|spec)\.(m?[jt]s|tsx?)$|\.example$|/test/|/tests/|fixtures/|\.git-hooks/|\.husky/'; then continue diff --git a/.github/paths-allowlist.yml b/.github/paths-allowlist.yml new file mode 100644 index 00000000..3cf4378c --- /dev/null +++ b/.github/paths-allowlist.yml @@ -0,0 +1,30 @@ +# Path-hygiene gate allowlist. +# Mantra: 1 path, 1 reference. +# +# Each entry exempts a specific finding from `scripts/check-paths.mts`. +# Entries MUST carry a `reason` so the list stays audit-able and +# entries can be removed when the underlying code changes. +# +# Schema (all top-level keys optional except `reason`): +# +# - rule: Rule letter (A, B, C, D, F, G). Omit to match any rule. +# file: Substring match against the relative file path. +# pattern: Substring match against the offending snippet. +# line: Line number; matches if within ±2 of the finding. +# reason: Why this site is genuinely exempt. Required. +# +# Prefer narrow entries (rule + file + line + pattern) over blanket +# `file:` entries that exempt the whole file. Genuine exemptions are +# rare — most "false positives" should be reported as gate bugs. +# +# Example: +# +# - rule: A +# file: packages/foo/scripts/legacy-build.mts +# line: 42 +# pattern: "path.join(testDir, 'out', 'Final')" +# reason: | +# legacy-build.mts is scheduled for removal in v2.0; refactoring +# its path construction now would conflict with the rewrite. + +# (No allowlist entries yet — socket-btm is meant to be clean.) diff --git a/CLAUDE.md b/CLAUDE.md index 550a72dd..326fe0d4 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -73,6 +73,27 @@ Emojis allowed sparingly: 📦 💡 🚀 🎉. Prefer text-based symbols for ter --- +### 1 path, 1 reference + +**A path is _constructed_ exactly once. Everywhere else _references_ the constructed value.** + +Referencing a single computed path many times is fine — that's the whole point of computing it once. What's banned is _re-constructing_ the same path in multiple places, because that's where drift is born. + +- **Within a package**: every script imports its own `scripts/paths.mts` (or `lib/paths.mts`). No `path.join('build', mode, ...)` outside that module. +- **Across packages**: when package B consumes package A's output, B imports A's `paths.mts` via the workspace `exports` field. Never `path.join(PKG, '..', '', 'build', ...)`. +- **Workflows, Dockerfiles, shell scripts**: they can't `import` TS, so they construct the string once and reference it everywhere downstream. Workflows: a "Compute paths" step exposes `steps.paths.outputs.final_dir`; later steps read `${{ steps.paths.outputs.final_dir }}`. Dockerfiles/shell: assign once to a variable / `ENV`, reference by name thereafter. Each canonical construction carries a comment naming the source-of-truth `paths.mts`. **Re-building** the same path in a second step is the violation, not referring to the constructed value many times. +- **Comments**: may describe path _structure_ with placeholders ("`/`") but should not encode a complete literal path string. The import statement IS the comment. + +Code execution takes priority over docs: violations in `.mts`/`.cts`, Makefiles, Dockerfiles, workflow YAML, and shell scripts are blocking. README and doc-comment violations are advisory unless they contain a fully-qualified path with no parametric placeholders. + +**Three-level enforcement:** + +- **Hook** — `.claude/hooks/path-guard/` blocks `Edit`/`Write` calls that would introduce a violation in a `.mts`/`.cts` file at edit time. +- **Gate** — `scripts/check-paths.mts` runs in `pnpm check`. Fails the build on any violation that isn't allowlisted in `.github/paths-allowlist.yml`. +- **Skill** — `/path-guard` audits the repo and fixes findings; `/path-guard check` reports only; `/path-guard install` drops the gate + hook + rule into a fresh repo. + +The mantra is intentionally short so it sticks: **1 path, 1 reference**. When in doubt, find the canonical owner and import from it. + ## 🏗️ SDK-SPECIFIC ### Architecture diff --git a/package.json b/package.json index 9ae02d43..bd9e9160 100644 --- a/package.json +++ b/package.json @@ -45,6 +45,7 @@ "build": "node scripts/build.mts", "bump": "node scripts/bump.mts", "check": "node scripts/check.mts", + "check:paths": "node scripts/check-paths.mts", "clean": "node scripts/clean.mts", "cover": "node scripts/cover.mts", "fix": "node scripts/fix.mts", diff --git a/scripts/check-paths.mts b/scripts/check-paths.mts new file mode 100644 index 00000000..cbecc71e --- /dev/null +++ b/scripts/check-paths.mts @@ -0,0 +1,947 @@ +#!/usr/bin/env node +/** + * @fileoverview Path-hygiene gate. + * + * Mantra: 1 path, 1 reference. A path is constructed exactly once; + * everywhere else references the constructed value. + * + * Whole-repo scan complementing the per-edit `.claude/hooks/path-guard` + * hook. The hook stops new violations from landing; this gate finds + * the existing ones and blocks merges that introduce more. + * + * Rules enforced: + * + * A — Multi-stage path constructed inline. A `path.join(...)` call + * (or template literal) in a `.mts`/`.cts` file outside a + * `paths.mts` that stitches together two or more "stage" + * segments (Final, Release, Stripped, Compressed, Optimized, + * Synced, wasm, downloaded), or one stage plus a build-root + * (`build`/`out`) plus a mode (`dev`/`prod`/`shared`). The + * construction belongs in the package's `paths.mts` (or a + * build-infra helper); every consumer imports the computed + * value. + * + * B — Cross-package path traversal. A `path.join(*, '..', '', 'build', ...)` reaches into a sibling's build + * output without going through its `exports`. The sibling owns + * its layout; consumers declare a workspace dep and import the + * sibling's `paths.mts`. + * + * C — Hand-built workflow path. A `.github/workflows/*.yml` step + * constructs `build/${...}/out//...` inline outside a + * canonical "Compute paths" step. Workflows can carry path + * strings, but the strings are constructed once and exposed via + * step outputs / job env that downstream steps reference. + * + * D — Comment-encoded paths. Comments (in code or YAML) that re-state + * a fully-qualified multi-stage path. Comments may describe the + * structure ("Final dir" or "build//...") but should not + * encode a complete path string that a tool would parse — the + * canonical construction IS the documentation. + * + * F — Same path constructed in multiple places. The same shape of + * multi-stage `path.join(...)` (or workflow `build/${...}/...` + * string template) appearing in two or more files. Construct + * once and import; references of the constructed value are + * unlimited. + * + * G — Hand-built paths in Makefiles, Dockerfiles, and shell scripts. + * Same shape as A, applied to executable artifacts that don't + * run TypeScript. Each canonical construction must carry a + * comment naming the source-of-truth `paths.mts` so the script + * can't drift from TS without a flagged change. + * + * Allowlist: `.github/paths-allowlist.yml`. Each entry needs a + * `reason` so the list stays audit-able. Patterns are deliberately + * narrow — entries should be specific, not blanket. + * + * Usage: + * node scripts/check-paths.mts # default: report + fail + * node scripts/check-paths.mts --explain # long-form explanation + * node scripts/check-paths.mts --json # machine-readable + * node scripts/check-paths.mts --quiet # silent on clean + * + * Exit codes: + * 0 — clean (no findings, or every finding is allowlisted) + * 1 — findings present + * 2 — gate itself crashed + */ + +import { createHash } from 'node:crypto' +import { existsSync, readFileSync, readdirSync } from 'node:fs' +import path from 'node:path' +import process from 'node:process' + +import { fileURLToPath } from 'node:url' + +import { parseArgs } from 'node:util' + +import { + BUILD_ROOT_SEGMENTS, + KNOWN_SIBLING_PACKAGES, + MODE_SEGMENTS, + STAGE_SEGMENTS, +} from '../.claude/hooks/path-guard/segments.mts' + +// Plain stderr/stdout output — no @socketsecurity/lib dependency so +// the gate is self-contained and works in socket-lib itself (which +// would otherwise import itself). +const logger = { + log: (msg: string) => process.stdout.write(msg + '\n'), + error: (msg: string) => process.stderr.write(msg + '\n'), + step: (msg: string) => process.stdout.write(`→ ${msg}\n`), + success: (msg: string) => process.stdout.write(`✔ ${msg}\n`), + substep: (msg: string) => process.stdout.write(` ${msg}\n`), +} + +const __filename = fileURLToPath(import.meta.url) +const __dirname = path.dirname(__filename) +const REPO_ROOT = path.resolve(__dirname, '..') + +// Stage / build-root / mode / sibling-package vocabularies are imported +// from `.claude/hooks/path-guard/segments.mts` (the canonical source). +// Both this gate and the path-guard hook share that single definition +// — Mantra: 1 path, 1 reference. + +// File-path patterns that legitimately enumerate path segments. +const EXEMPT_FILE_PATTERNS: RegExp[] = [ + // Any paths.mts is the canonical constructor. + /(^|\/)paths\.(mts|cts|js)$/, + // Build-infra owns shared helpers that enumerate stages. + /packages\/build-infra\/lib\/paths\.mts$/, + /packages\/build-infra\/lib\/constants\.mts$/, + // Path-scanning gates that intentionally enumerate. + /scripts\/check-paths\.mts$/, + /scripts\/check-consistency\.mts$/, + /\.claude\/hooks\/path-guard\//, + // Allowlist + config files. + /\.github\/paths-allowlist\.yml$/, +] + +type Finding = { + rule: 'A' | 'B' | 'C' | 'D' | 'F' | 'G' + file: string + line: number + snippet: string + message: string + fix: string +} + +const findings: Finding[] = [] + +const args = parseArgs({ + options: { + explain: { type: 'boolean', default: false }, + json: { type: 'boolean', default: false }, + quiet: { type: 'boolean', default: false }, + 'show-hashes': { type: 'boolean', default: false }, + }, + strict: false, +}) + +const isExempt = (filePath: string): boolean => + EXEMPT_FILE_PATTERNS.some(re => re.test(filePath)) + +// ────────────────────────────────────────────────────────────────── +// Allowlist loading +// ────────────────────────────────────────────────────────────────── + +type AllowlistEntry = { + file?: string + pattern?: string + rule?: string + line?: number + snippet_hash?: string + reason: string +} + +const loadAllowlist = (): AllowlistEntry[] => { + const allowlistPath = path.join(REPO_ROOT, '.github', 'paths-allowlist.yml') + if (!existsSync(allowlistPath)) { + return [] + } + const text = readFileSync(allowlistPath, 'utf8') + // Tiny YAML parser — only the shape we need: list of entries with + // `file`, `pattern`, `rule`, `line`, `reason` scalar fields, plus + // YAML 1.2 block-scalar indicators `|` (literal) and `>` (folded) + // for multi-line reasons. Avoids a yaml dep for a gate that has to + // be self-contained. + const entries: AllowlistEntry[] = [] + let current: Partial | null = null + // When set, subsequent more-indented lines fold into this key as a + // block scalar (literal '|' keeps newlines, folded '>' joins with + // spaces). + let blockKey: string | null = null + let blockKind: '|' | '>' | null = null + let blockIndent = 0 + let blockLines: string[] = [] + const flushBlock = () => { + if (current && blockKey) { + const value = + blockKind === '>' + ? blockLines.join(' ').replace(/\s+/g, ' ').trim() + : blockLines.join('\n').replace(/\n+$/, '') + ;(current as any)[blockKey] = value + } + blockKey = null + blockKind = null + blockLines = [] + } + const indentOf = (line: string): number => { + let i = 0 + while (i < line.length && line[i] === ' ') { + i += 1 + } + return i + } + const lines = text.split('\n') + for (let i = 0; i < lines.length; i++) { + const raw = lines[i]! + const line = raw.replace(/\r$/, '') + // Block-scalar accumulation takes precedence over normal parsing. + if (blockKey !== null) { + if (line.trim() === '') { + // Preserve blank lines inside a literal block; folded blocks + // turn them into paragraph breaks (kept as separate joins). + blockLines.push('') + continue + } + const indent = indentOf(line) + if (indent >= blockIndent) { + blockLines.push(line.slice(blockIndent)) + continue + } + flushBlock() + // Fall through and re-process the dedented line as normal. + } + if (!line.trim() || line.trim().startsWith('#')) { + continue + } + const tryAssign = (key: string, value: string) => { + const trimmed = value.trim() + if (current === null) { + return + } + if (trimmed === '|' || trimmed === '>') { + blockKey = key + blockKind = trimmed as '|' | '>' + blockIndent = indentOf(lines[i + 1] ?? '') || indentOf(line) + 2 + blockLines = [] + return + } + ;(current as any)[key] = + key === 'line' ? Number(unquote(trimmed)) : unquote(trimmed) + } + if (line.startsWith('- ')) { + if (current && current.reason) { + entries.push(current as AllowlistEntry) + } + current = {} + const rest = line.slice(2).trim() + if (rest) { + const m = rest.match(/^([\w-]+):\s*(.*)$/) + if (m) { + tryAssign(m[1]!, m[2]!) + } + } + } else if (current) { + const m = line.match(/^\s+([\w-]+):\s*(.*)$/) + if (m) { + tryAssign(m[1]!, m[2]!) + } + } + } + if (blockKey !== null) { + flushBlock() + } + if (current && current.reason) { + entries.push(current as AllowlistEntry) + } + return entries +} + +const unquote = (s: string): string => { + const t = s.trim() + if ( + (t.startsWith('"') && t.endsWith('"')) || + (t.startsWith("'") && t.endsWith("'")) + ) { + return t.slice(1, -1) + } + return t +} + +const ALLOWLIST = loadAllowlist() + +/** + * Stable, normalized snippet hash. Whitespace-insensitive so trivial + * reformatting (indent change, trailing comma, line wrap) doesn't + * invalidate an allowlist entry, but content-changing edits do. The + * hash exposes only the first 12 hex chars (~48 bits) which is plenty + * for collision-resistance within a single repo's finding set and + * keeps the YAML readable. + */ +const snippetHash = (snippet: string): string => { + const normalized = snippet.replace(/\s+/g, ' ').trim() + return createHash('sha256').update(normalized).digest('hex').slice(0, 12) +} + +/** + * Allowlist matching trades off two failure modes: + * + * - Drift via reformatting (a line shift breaks an entry, the + * finding re-surfaces, devs paper over with a new entry). + * - Stealth allowlisting (an entry pinned to "anywhere in this file" + * silently exempts unrelated future violations). + * + * Strategy: exact line match OR `snippet_hash` match (whitespace- + * normalized SHA-256, first 12 hex). Either is sufficient. Lines stay + * exact (was ±2; the slack let reformatting silently slide), and + * `snippet_hash` provides reformatting-tolerant matching that's still + * tied to the literal text — paste-and-edit cheating would change the + * hash. If neither `line` nor `snippet_hash` is provided, the entry + * matches purely by `rule` + `file` + `pattern` (file-level exempt; + * use sparingly and always pair with a precise `pattern`). + */ +const isAllowlisted = (finding: Finding): boolean => + ALLOWLIST.some(entry => { + if (entry.rule && entry.rule !== finding.rule) { + return false + } + if (entry.file && !finding.file.includes(entry.file)) { + return false + } + if (entry.pattern && !finding.snippet.includes(entry.pattern)) { + return false + } + const lineProvided = entry.line !== undefined + const hashProvided = + typeof entry.snippet_hash === 'string' && entry.snippet_hash.length > 0 + if (lineProvided || hashProvided) { + const lineMatches = lineProvided && entry.line === finding.line + const hashMatches = + hashProvided && entry.snippet_hash === snippetHash(finding.snippet) + if (!(lineMatches || hashMatches)) { + return false + } + } + return true + }) + +// ────────────────────────────────────────────────────────────────── +// File walking +// ────────────────────────────────────────────────────────────────── + +const SKIP_DIRS = new Set([ + '.git', + 'node_modules', + 'build', + 'dist', + 'out', + 'target', + '.cache', + 'upstream', +]) + +const walk = function* ( + dir: string, + filter: (relPath: string) => boolean, +): Generator { + let entries + try { + entries = readdirSync(dir, { withFileTypes: true }) + } catch { + return + } + for (const e of entries) { + if (SKIP_DIRS.has(e.name)) { + continue + } + const full = path.join(dir, e.name) + const rel = path.relative(REPO_ROOT, full) + if (e.isDirectory()) { + yield* walk(full, filter) + } else if (e.isFile() && filter(rel)) { + yield rel + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule A + B: code scan (.mts / .cts) +// ────────────────────────────────────────────────────────────────── + +// Locate `path.join(` or `path.resolve(` call sites; argument-list +// extraction uses a paren-balancing scanner below to handle arbitrary +// nesting depth (the previous regex-only approach silently missed any +// argument containing 2+ levels of nested function calls). +const PATH_CALL_RE = /\bpath\.(?:join|resolve)\s*\(/g +const STRING_LITERAL_RE = /(['"])((?:\\.|(?!\1)[^\\])*)\1/g + +// Template literal scanner. Captures backtick-delimited strings +// (including those with `${...}` placeholders) so Rule A also catches +// path construction via template literals like +// `${buildDir}/out/Final/${binary}` or `build/${mode}/out/Final`. +const TEMPLATE_LITERAL_RE = + /`((?:\\.|(?:\$\{(?:[^{}]|\{[^{}]*\})*\})|(?!`)[^\\])*)`/g + +/** + * Convert a template-literal body into a synthetic forward-slash path + * by replacing `${...}` placeholders with a sentinel and normalizing + * separators. Returns the sequence of path segments split on `/`. The + * sentinel doesn't match any STAGE/BUILD_ROOT/MODE token, so a + * placeholder-only segment (`${binaryName}`) won't match those sets. + */ +const templateLiteralSegments = (body: string): string[] => { + // Strip placeholders so they don't introduce noise in segments. + // Empty result for a placeholder is fine; downstream filters by set + // membership and skips empties. + const stripped = body.replace(/\$\{(?:[^{}]|\{[^{}]*\})*\}/g, '\x00') + return stripped.split('/').filter(seg => seg.length > 0 && seg !== '\x00') +} + +/** + * Extract every `path.join(...)` and `path.resolve(...)` call from the + * source text, returning each call's literal start offset and argument + * substring. Uses paren-balancing so deeply-nested arguments like + * `path.join(getDir(child(x)), 'build', 'Final')` are captured fully. + */ +const extractPathCalls = ( + source: string, +): Array<{ offset: number; args: string }> => { + const calls: Array<{ offset: number; args: string }> = [] + PATH_CALL_RE.lastIndex = 0 + let match: RegExpExecArray | null + while ((match = PATH_CALL_RE.exec(source)) !== null) { + const callStart = match.index + const argsStart = PATH_CALL_RE.lastIndex + let depth = 1 + let i = argsStart + let inString: '"' | "'" | '`' | null = null + while (i < source.length && depth > 0) { + const ch = source[i]! + if (inString) { + if (ch === '\\') { + i += 2 + continue + } + if (ch === inString) { + inString = null + } + } else { + if (ch === '"' || ch === "'" || ch === '`') { + inString = ch + } else if (ch === '(') { + depth += 1 + } else if (ch === ')') { + depth -= 1 + if (depth === 0) { + break + } + } + } + i += 1 + } + if (depth === 0) { + calls.push({ offset: callStart, args: source.slice(argsStart, i) }) + PATH_CALL_RE.lastIndex = i + 1 + } + } + return calls +} + +const extractStringLiterals = (args: string): string[] => { + const literals: string[] = [] + let match: RegExpExecArray | null + STRING_LITERAL_RE.lastIndex = 0 + while ((match = STRING_LITERAL_RE.exec(args)) !== null) { + if (match[2] !== undefined) { + literals.push(match[2]) + } + } + return literals +} + +const scanCodeFile = (relPath: string): void => { + const full = path.join(REPO_ROOT, relPath) + let content: string + try { + content = readFileSync(full, 'utf8') + } catch { + return + } + const lines = content.split('\n') + // Build a line-offset map so we can map regex offsets back to line + // numbers cheaply. + const lineOffsets: number[] = [0] + for (let i = 0; i < content.length; i++) { + if (content[i] === '\n') { + lineOffsets.push(i + 1) + } + } + const offsetToLine = (offset: number): number => { + let lo = 0 + let hi = lineOffsets.length - 1 + while (lo < hi) { + const mid = (lo + hi + 1) >>> 1 + if (lineOffsets[mid]! <= offset) { + lo = mid + } else { + hi = mid - 1 + } + } + return lo + 1 + } + + for (const call of extractPathCalls(content)) { + const literals = extractStringLiterals(call.args) + const stages = literals.filter(l => STAGE_SEGMENTS.has(l)) + const buildRoots = literals.filter(l => BUILD_ROOT_SEGMENTS.has(l)) + const modes = literals.filter(l => MODE_SEGMENTS.has(l)) + + // Rule A: 2+ stages OR (1 stage + 1 build-root + 1 mode). + const triggersA = + stages.length >= 2 || + (stages.length >= 1 && buildRoots.length >= 1 && modes.length >= 1) + if (triggersA) { + const line = offsetToLine(call.offset) + const snippet = (lines[line - 1] ?? '').trim() + findings.push({ + rule: 'A', + file: relPath, + line, + snippet, + message: 'Multi-stage path constructed inline (outside paths.mts).', + fix: 'Construct in the owning paths.mts (or use getFinalBinaryPath / getDownloadedDir from build-infra/lib/paths). Import the computed value here.', + }) + } + + // Rule B: each '..' opens a window; the window stays open only + // until the next non-'..' literal. A sibling-package literal + // *immediately after* a '..' (no path segment between them) + // triggers, AND there must be build context elsewhere in the + // call. Resetting per-segment prevents false positives where '..' + // appears earlier and sibling-name appears much later in an + // unrelated position. + const hasBuildContext = literals.some( + l => BUILD_ROOT_SEGMENTS.has(l) || STAGE_SEGMENTS.has(l), + ) + if (hasBuildContext) { + for (let i = 0; i < literals.length - 1; i++) { + if ( + literals[i] === '..' && + KNOWN_SIBLING_PACKAGES.has(literals[i + 1]!) + ) { + const sibling = literals[i + 1]! + const line = offsetToLine(call.offset) + const snippet = (lines[line - 1] ?? '').trim() + findings.push({ + rule: 'B', + file: relPath, + line, + snippet, + message: `Cross-package traversal into '${sibling}' build output.`, + fix: `Add '${sibling}: workspace:*' as a dep, declare an exports entry on '${sibling}' (e.g. './scripts/paths' → './scripts/paths.mts'), and import the path from there.`, + }) + break + } + } + } + } + + // Rule A (template literal variant). Backtick strings like + // `${buildDir}/out/Final/${binary}` or `build/${mode}/${arch}/out/Final` + // construct paths the same way `path.join(...)` does — flag the + // same shapes. Skip raw imports / template tag positions by + // filtering out leading `import.meta.url`-style / tag positions + // implicitly: TEMPLATE_LITERAL_RE matches any backtick string and + // we rely on segment composition to decide if it's a path. + TEMPLATE_LITERAL_RE.lastIndex = 0 + let tmpl: RegExpExecArray | null + while ((tmpl = TEMPLATE_LITERAL_RE.exec(content)) !== null) { + const body = tmpl[1] ?? '' + if (!body.includes('/')) { + continue + } + const segments = templateLiteralSegments(body) + const stages = segments.filter(s => STAGE_SEGMENTS.has(s)) + const buildRoots = segments.filter(s => BUILD_ROOT_SEGMENTS.has(s)) + const modes = segments.filter(s => MODE_SEGMENTS.has(s)) + // Template literal trigger is tighter than path.join() because + // backtick strings often appear in patch fixtures, error messages, + // and other multi-line content that incidentally contains stage + // tokens like `wasm`. Require the canonical build-output shape: + // - 'build' + 'out' + stage (canonical multi-stage layout), OR + // - 2+ stage segments AND 'out' (e.g. `wasm/out/Final`), OR + // - 'build' + stage + literal mode (back-compat with path.join). + const hasBuildAndOut = + buildRoots.includes('build') && buildRoots.includes('out') + const hasOut = buildRoots.includes('out') + const hasBuild = buildRoots.includes('build') + const triggersA = + (hasBuildAndOut && stages.length >= 1) || + (stages.length >= 2 && hasOut) || + (hasBuild && stages.length >= 1 && modes.length >= 1) + if (triggersA) { + const line = offsetToLine(tmpl.index) + const snippet = (lines[line - 1] ?? '').trim() + findings.push({ + rule: 'A', + file: relPath, + line, + snippet, + message: + 'Multi-stage path constructed inline via template literal (outside paths.mts).', + fix: 'Construct in the owning paths.mts (or use getFinalBinaryPath / getDownloadedDir from build-infra/lib/paths). Import the computed value here.', + }) + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule C + D: workflow YAML scan +// ────────────────────────────────────────────────────────────────── + +const WORKFLOW_PATH_RE = + /build\/\$\{[^}]+\}\/[^"'`\s]*\/out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/g +const WORKFLOW_GH_EXPR_PATH_RE = + /build\/\$\{\{\s*[^}]+\}\}\/[^"'`\s]*\/out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/g + +const isInsideComputePathsBlock = ( + lines: string[], + lineIdx: number, +): boolean => { + // Walk backwards up to 60 lines looking for the start of the + // current step. If that step is a "Compute paths" step, the line + // is exempt. + for (let i = lineIdx; i >= Math.max(0, lineIdx - 60); i--) { + const l = lines[i] ?? '' + if (/^\s*-\s*name:/i.test(l)) { + // Step boundary — check if THIS step is a Compute paths step. + // The step body may include `id: paths` even if the name is + // something else (e.g. `id: stub-paths`), so look at the next + // ~20 lines for either marker. + for (let j = i; j < Math.min(lines.length, i + 20); j++) { + const m = lines[j] ?? '' + if ( + /^\s*-\s*name:\s*Compute\s+[\w-]+\s+paths/i.test(m) || + /^\s*id:\s*[\w-]*paths\s*$/i.test(m) + ) { + return true + } + if (j > i && /^\s*-\s*name:/i.test(m)) { + // Hit the next step — current step is NOT Compute paths. + return false + } + } + return false + } + } + return false +} + +const scanWorkflowFile = (relPath: string): void => { + const full = path.join(REPO_ROOT, relPath) + let content: string + try { + content = readFileSync(full, 'utf8') + } catch { + return + } + const lines = content.split('\n') + + // First pass: collect every hand-built path occurrence outside a + // "Compute paths" step. Per the mantra, a single reference is fine + // — what's banned is reconstructing the same path 2+ times. + type PathHit = { + line: number + snippet: string + pathStr: string + } + const occurrences = new Map() + + for (let i = 0; i < lines.length; i++) { + const line = lines[i]! + if (/^\s*#/.test(line)) { + // Skip comment lines from C scan; they're under D below. + continue + } + if (isInsideComputePathsBlock(lines, i)) { + // Inside the canonical construction step — exempt. + continue + } + WORKFLOW_PATH_RE.lastIndex = 0 + WORKFLOW_GH_EXPR_PATH_RE.lastIndex = 0 + const matches: string[] = [] + let m: RegExpExecArray | null + while ((m = WORKFLOW_PATH_RE.exec(line)) !== null) { + matches.push(m[0]) + } + while ((m = WORKFLOW_GH_EXPR_PATH_RE.exec(line)) !== null) { + matches.push(m[0]) + } + for (const pathStr of matches) { + const list = occurrences.get(pathStr) ?? [] + list.push({ line: i + 1, snippet: line.trim(), pathStr }) + occurrences.set(pathStr, list) + } + } + + // Flag every occurrence of a shape that appears 2+ times. + for (const [pathStr, hits] of occurrences) { + if (hits.length < 2) { + continue + } + for (const hit of hits) { + findings.push({ + rule: 'C', + file: relPath, + line: hit.line, + snippet: hit.snippet, + message: `Workflow constructs the same path ${hits.length} times: ${pathStr}`, + fix: 'Add a "Compute paths" step (id: paths) early in the job that computes this path ONCE and exposes it via $GITHUB_OUTPUT. Reference as ${{ steps.paths.outputs. }} in subsequent steps. References of the constructed value are unlimited; reconstructing is the violation.', + }) + } + } + + // Rule D: comments encoding a fully-qualified multi-stage path + // (separate scan since it has different semantics). + for (let i = 0; i < lines.length; i++) { + const line = lines[i]! + if (!/^\s*#/.test(line)) { + continue + } + const literalShape = + /build\/(?:dev|prod|shared)\/[a-z0-9-]+\/(?:wasm\/)?out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/i + if (literalShape.test(line)) { + findings.push({ + rule: 'D', + file: relPath, + line: i + 1, + snippet: line.trim(), + message: 'Comment encodes a fully-qualified path string.', + fix: 'Cite the canonical paths.mts (e.g. "see packages//scripts/paths.mts:getBuildPaths()") instead of duplicating the path string. Comments may describe structure with placeholders ("/") but should not be a parsable path.', + }) + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule G: Makefile / Dockerfile / shell scan +// ────────────────────────────────────────────────────────────────── + +const SCRIPT_HAND_BUILT_RE = + /build\/\$?\{?(?:BUILD_MODE|MODE|prod|dev)\}?\/[\w${}.-]*\/out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/g + +const scanScriptFile = (relPath: string): void => { + const full = path.join(REPO_ROOT, relPath) + let content: string + try { + content = readFileSync(full, 'utf8') + } catch { + return + } + const lines = content.split('\n') + const isDockerfile = + /Dockerfile/i.test(relPath) || /\.glibc$|\.musl$/.test(relPath) + + // First pass: collect every multi-stage path occurrence in this file, + // scoped per Dockerfile stage (each `FROM ... AS ...` starts a new + // scope where ENV/ARG don't propagate). + type Hit = { line: number; text: string; pathStr: string; stage: number } + const hits: Hit[] = [] + let stage = 0 + for (let i = 0; i < lines.length; i++) { + const line = lines[i]! + if (/^\s*#/.test(line)) { + // Skip comments — documentation, not construction. + continue + } + if (isDockerfile && /^FROM\s+/i.test(line)) { + stage += 1 + continue + } + SCRIPT_HAND_BUILT_RE.lastIndex = 0 + let m: RegExpExecArray | null + while ((m = SCRIPT_HAND_BUILT_RE.exec(line)) !== null) { + hits.push({ + line: i + 1, + text: line.trim(), + pathStr: m[0], + stage, + }) + } + } + + // Group by (stage, pathStr) — only flag when a path is built 2+ + // times within the SAME Dockerfile stage (or anywhere in non- + // Dockerfile scripts, where stages don't apply). + const grouped = new Map() + for (const h of hits) { + const key = `${h.stage}::${h.pathStr}` + const list = grouped.get(key) ?? [] + list.push(h) + grouped.set(key, list) + } + for (const [, list] of grouped) { + if (list.length < 2) { + continue + } + for (const hit of list) { + findings.push({ + rule: 'G', + file: relPath, + line: hit.line, + snippet: hit.text, + message: `Hand-built multi-stage path constructed ${list.length} times in this file: ${hit.pathStr}`, + fix: 'Assign to a variable / ENV once near the top of the script / Dockerfile stage, with a comment naming the canonical paths.mts. Reference the variable everywhere downstream. References of a single construction are unlimited; reconstructing the same path is the violation.', + }) + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule F: cross-file path repetition +// ────────────────────────────────────────────────────────────────── + +const checkRuleF = (): void => { + // A path is "constructed" each time we see a new path.join with a + // matching shape. Group findings of Rule A by their snippet shape; + // when the same shape appears in 2+ files, demote them to Rule F so + // the message is more accurate. + const byShape = new Map() + for (const f of findings) { + if (f.rule !== 'A') { + continue + } + // Normalize: strip whitespace, identifiers, surrounding context; + // keep just the literal path-segment shape. + const literalsRe = /'[^']*'|"[^"]*"/g + const literals = (f.snippet.match(literalsRe) ?? []).join(',') + if (!literals) { + continue + } + const list = byShape.get(literals) ?? [] + list.push(f) + byShape.set(literals, list) + } + for (const [shape, list] of byShape) { + if (list.length < 2) { + continue + } + // Promote each Rule-A finding in this group to Rule F so the + // message tells the reader the issue is cross-file repetition, + // not just a single hand-build. + for (const f of list) { + f.rule = 'F' + f.message = `Same path shape constructed in ${list.length} places: ${shape.slice(0, 100)}` + f.fix = + 'Construct this path ONCE in a paths.mts (or build-infra helper) and import the computed value. References of the computed variable are unlimited; re-constructing the same shape twice is the violation.' + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Main +// ────────────────────────────────────────────────────────────────── + +const main = (): number => { + // Scan code files (Rule A + B). + for (const rel of walk( + REPO_ROOT, + p => p.endsWith('.mts') || p.endsWith('.cts'), + )) { + if (isExempt(rel)) { + continue + } + scanCodeFile(rel) + } + // Scan workflows (Rule C + D). + const workflowDir = path.join(REPO_ROOT, '.github', 'workflows') + if (existsSync(workflowDir)) { + for (const rel of walk(workflowDir, p => p.endsWith('.yml'))) { + if (isExempt(rel)) { + continue + } + scanWorkflowFile(rel) + } + } + // Scan scripts/Makefiles/Dockerfiles (Rule G). + for (const rel of walk(REPO_ROOT, p => { + const base = path.basename(p) + return ( + base === 'Makefile' || + base.endsWith('.mk') || + base.endsWith('.Dockerfile') || + base === 'Dockerfile' || + base.endsWith('.glibc') || + base.endsWith('.musl') || + (base.endsWith('.sh') && !p.includes('test/')) + ) + })) { + if (isExempt(rel)) { + continue + } + scanScriptFile(rel) + } + // Promote cross-file Rule-A repeats to Rule F. + checkRuleF() + + // Filter against allowlist. + const blocking = findings.filter(f => !isAllowlisted(f)) + + if (args.values.json) { + process.stdout.write( + JSON.stringify( + { findings: blocking, allowlisted: findings.length - blocking.length }, + null, + 2, + ) + '\n', + ) + return blocking.length === 0 ? 0 : 1 + } + + if (blocking.length === 0) { + if (!args.values.quiet) { + logger.success('Path-hygiene check passed (1 path, 1 reference)') + if (findings.length > 0) { + logger.substep(`${findings.length} finding(s) allowlisted`) + } + } + return 0 + } + + logger.error(`Path-hygiene check FAILED — ${blocking.length} finding(s)`) + logger.log('') + logger.log('Mantra: 1 path, 1 reference') + logger.log('') + for (const f of blocking) { + logger.log(` [${f.rule}] ${f.file}:${f.line}`) + logger.log(` ${f.snippet}`) + logger.log(` → ${f.message}`) + if (args.values['show-hashes']) { + logger.log(` snippet_hash: ${snippetHash(f.snippet)}`) + } + if (args.values.explain) { + logger.log(` Fix: ${f.fix}`) + } + logger.log('') + } + if (!args.values.explain) { + logger.log('Run with --explain to see fix suggestions per finding.') + logger.log( + 'Add intentional exceptions to .github/paths-allowlist.yml with a `reason` field.', + ) + logger.log( + 'Run with --show-hashes to print the snippet_hash for each finding (drift-resistant allowlisting).', + ) + } + return 1 +} + +try { + process.exitCode = main() +} catch (e) { + logger.error(`Path-hygiene gate crashed: ${e}`) + process.exitCode = 2 +} diff --git a/scripts/check.mts b/scripts/check.mts index 5b2d3fb6..3e4b1337 100644 --- a/scripts/check.mts +++ b/scripts/check.mts @@ -96,6 +96,12 @@ async function main(): Promise { args: ['scripts/validate-file-count.mts'], command: 'node', }, + // Path-hygiene gate (1 path, 1 reference). See + // .claude/skills/path-guard/ + .claude/hooks/path-guard/. + { + args: ['scripts/check-paths.mts', '--quiet'], + command: 'node', + }, ] const exitCodes = await runParallel(checks) diff --git a/scripts/xport-emit-schema.mts b/scripts/xport-emit-schema.mts new file mode 100644 index 00000000..5bc6a1e6 --- /dev/null +++ b/scripts/xport-emit-schema.mts @@ -0,0 +1,37 @@ +/** + * @fileoverview Emit `xport.schema.json` from the TypeBox schema. + * + * The TypeBox schema in `scripts/xport-schema.mts` is the source of truth. + * TypeBox schemas are JSON Schema natively — no conversion library needed, + * just serialize the schema object and add the draft-2020-12 meta headers. + * + * Run via `pnpm run xport:emit-schema` when the schema changes. + */ + +import { writeFileSync } from 'node:fs' +import path from 'node:path' +import { fileURLToPath } from 'node:url' + +import { getDefaultLogger } from '@socketsecurity/lib/logger' + +import { XportManifestSchema } from './xport-schema.mts' + +const logger = getDefaultLogger() + +const __dirname = path.dirname(fileURLToPath(import.meta.url)) +const rootDir = path.resolve(__dirname, '..') +const outPath = path.join(rootDir, 'xport.schema.json') + +// TypeBox schemas carry JSON Schema shape directly, plus a Symbol-keyed +// [Kind] marker that JSON.stringify drops. Spreading the schema first +// then layering the canonical $schema / $id / title on top gives a clean +// draft-2020-12 document with the Socket-specific headers. +const enriched = { + $schema: 'https://json-schema.org/draft/2020-12/schema', + $id: 'https://github.com/SocketDev/xport.schema.json', + title: 'xport lock-step manifest', + ...XportManifestSchema, +} + +writeFileSync(outPath, JSON.stringify(enriched, null, 2) + '\n', 'utf8') +logger.success(`wrote ${path.relative(rootDir, outPath)}`) diff --git a/scripts/xport-schema.mts b/scripts/xport-schema.mts new file mode 100644 index 00000000..aa6c0d04 --- /dev/null +++ b/scripts/xport-schema.mts @@ -0,0 +1,355 @@ +/** + * @fileoverview TypeBox schema for xport.json — single source of truth. + * + * Everything else is derived: + * - TypeScript types in scripts/xport.mts via `Static` + * - xport.schema.json (draft 2020-12) via direct JSON.stringify of the + * TypeBox schema, emitted by scripts/xport-emit-schema.mts + * - Runtime validation at harness startup via + * `validateSchema(XportManifestSchema, ...)` from + * `@socketsecurity/lib/validation/validate-schema` + * + * Byte-identical across socket-tui / socket-btm / socket-sdxgen / ultrathink / + * socket-registry / socket-repo-template via sync-scaffolding.mjs. + */ + +import { Type, type Static } from '@sinclair/typebox' + +// --------------------------------------------------------------------------- +// Shared primitives. +// --------------------------------------------------------------------------- + +const IdSchema = Type.String({ + pattern: '^[a-z0-9][A-Za-z0-9-]*$', + description: + 'Stable identifier, unique within the manifest. Starts with lowercase letter or digit; remaining characters are letters/digits/hyphens. Kebab-case preferred, but camelCase segments are allowed (e.g. `export-findNodeAt` when the id mirrors an API name).', +}) + +const CriticalitySchema = Type.Integer({ + minimum: 1, + maximum: 10, + description: + 'Stay-in-step importance (1 = cosmetic, 10 = security-sensitive). Harness surfaces high-criticality drift louder.', +}) + +const UpstreamRefSchema = Type.String({ + description: 'Key into the top-level `upstreams` map.', +}) + +const ConformanceTestSchema = Type.String({ + description: + "Path to a test that enforces behavior parity (modulo documented deviations). Strongly recommended — static checks can't catch silent behavioral drift.", +}) + +const NotesSchema = Type.String({ + description: + 'Free-form context — why this row exists, what gotchas to watch for.', +}) + +const PortStatusSchema = Type.Object( + { + status: Type.Union([Type.Literal('implemented'), Type.Literal('opt-out')]), + reason: Type.Optional( + Type.String({ + description: 'Required when status is `opt-out`. Explain why.', + }), + ), + path: Type.Optional( + Type.String({ + description: + "Optional path to the port's implementation of this row. Useful for module-inventory rows where each language points at a different directory.", + }), + ), + note: Type.Optional( + Type.String({ + description: + "Optional free-form note attached to a specific port's status.", + }), + ), + }, + { + additionalProperties: false, + description: + 'Per-port status for a lang-parity row. `implemented` = port meets assertions; `opt-out` = port consciously skips, requires non-empty `reason`.', + }, +) + +const UpstreamSchema = Type.Object( + { + submodule: Type.String({ + description: 'Submodule path, relative to repo root.', + }), + repo: Type.String({ + pattern: '^https?://', + description: 'Upstream repository URL (http:// or https://).', + }), + }, + { additionalProperties: false }, +) + +const SiteSchema = Type.Object( + { + path: Type.String({ + description: "Path to the port's root directory, relative to repo root.", + }), + language: Type.Optional( + Type.String({ description: 'Language label, for human reports.' }), + ), + }, + { additionalProperties: false }, +) + +const FixtureCheckSchema = Type.Object( + { + fixture_path: Type.String(), + snapshot_path: Type.Optional(Type.String()), + diff_tolerance: Type.Optional( + Type.Union([ + Type.Literal('exact'), + Type.Literal('line-by-line'), + Type.Literal('semantic'), + ]), + ), + }, + { + additionalProperties: false, + description: + "Golden-input verification. Prefer snapshot-based diffs over hardcoded counts (brittleness lesson from sdxgen's lock-step-features).", + }, +) + +// --------------------------------------------------------------------------- +// Row kinds. +// --------------------------------------------------------------------------- + +const FileForkRowSchema = Type.Object( + { + kind: Type.Literal('file-fork'), + id: IdSchema, + upstream: UpstreamRefSchema, + criticality: Type.Optional(CriticalitySchema), + conformance_test: Type.Optional(ConformanceTestSchema), + notes: Type.Optional(NotesSchema), + local: Type.String({ + description: 'Path to our ported file, relative to repo root.', + }), + upstream_path: Type.String({ + description: 'Path to the source file within the upstream submodule.', + }), + forked_at_sha: Type.String({ + pattern: '^[0-9a-f]{40}$', + description: + 'Full 40-char SHA of the upstream commit we forked from. Harness runs `git log ..HEAD -- ` to surface drift.', + }), + deviations: Type.Array(Type.String(), { + minItems: 1, + description: + "Human-readable list of intentional differences. Zero deviations = use upstream directly; don't fork.", + }), + }, + { + additionalProperties: false, + description: + 'A local file derived from an upstream file with intentional modifications. Drift = upstream moved forward without us.', + }, +) + +const VersionPinRowSchema = Type.Object( + { + kind: Type.Literal('version-pin'), + id: IdSchema, + upstream: UpstreamRefSchema, + criticality: Type.Optional(CriticalitySchema), + conformance_test: Type.Optional(ConformanceTestSchema), + notes: Type.Optional(NotesSchema), + pinned_sha: Type.String({ + pattern: '^[0-9a-f]{40}$', + description: 'Full 40-char SHA the submodule is pinned to.', + }), + pinned_tag: Type.Optional( + Type.String({ + description: + 'Human-readable release tag (e.g., `v3.2.1`). Optional — the SHA is authoritative.', + }), + ), + upgrade_policy: Type.Union( + [ + Type.Literal('track-latest'), + Type.Literal('major-gate'), + Type.Literal('locked'), + ], + { + description: + 'track-latest: any new release is actionable; major-gate: only major bumps require review; locked: explicit decision per upgrade.', + }, + ), + }, + { + additionalProperties: false, + description: + "A submodule pinned to an upstream release. Drift = upstream cut a new release we haven't adopted.", + }, +) + +const FeatureParityRowSchema = Type.Object( + { + kind: Type.Literal('feature-parity'), + id: IdSchema, + upstream: UpstreamRefSchema, + criticality: CriticalitySchema, + conformance_test: Type.Optional(ConformanceTestSchema), + notes: Type.Optional(NotesSchema), + local_area: Type.String({ + description: + 'Path to the local module/directory implementing the feature. Code pattern scan targets this directory (excluding test files).', + }), + test_area: Type.Optional( + Type.String({ + description: + 'Optional path to the directory where tests for this feature live. When absent, the harness searches inside `local_area`.', + }), + ), + code_patterns: Type.Optional( + Type.Array(Type.String(), { + description: + 'Regex patterns the local implementation must contain. Prefer anchored patterns (function signatures) over loose keywords to avoid comment false positives.', + }), + ), + test_patterns: Type.Optional( + Type.Array(Type.String(), { + description: 'Regex patterns the test suite must contain.', + }), + ), + fixture_check: Type.Optional(FixtureCheckSchema), + }, + { + additionalProperties: false, + description: + 'A behavioral feature reimplemented locally to match upstream behavior. Three-pillar validation: code patterns, test patterns, fixture snapshots.', + }, +) + +const SpecConformanceRowSchema = Type.Object( + { + kind: Type.Literal('spec-conformance'), + id: IdSchema, + upstream: UpstreamRefSchema, + criticality: Type.Optional(CriticalitySchema), + conformance_test: Type.Optional(ConformanceTestSchema), + notes: Type.Optional(NotesSchema), + local_impl: Type.String(), + spec_version: Type.String(), + spec_path: Type.Optional( + Type.String({ + description: + 'Path within the upstream submodule to the spec document, if applicable.', + }), + ), + }, + { + additionalProperties: false, + description: + 'A local reimplementation of an external specification. Drift = the spec was revised.', + }, +) + +// Assertions are deliberately untyped — each matrix area defines its own +// assertion shapes. The harness ignores fields it doesn't recognize. +// Historical precedent: ultrathink's xlang-harness.mts treats this as +// `unknown[]`. +const AssertionSchema = Type.Record(Type.String(), Type.Unknown()) + +const LangParityRowSchema = Type.Object( + { + kind: Type.Literal('lang-parity'), + id: IdSchema, + name: Type.String(), + description: Type.String(), + category: Type.String({ + description: + 'Grouping tag. `rejected` is reserved for anti-patterns (every port must be opt-out; reintroduction exits 2).', + }), + criticality: Type.Optional(CriticalitySchema), + conformance_test: Type.Optional(ConformanceTestSchema), + notes: Type.Optional(NotesSchema), + assertions: Type.Optional( + Type.Array(AssertionSchema, { + description: + 'Open-ended assertion list. Each has a `kind` string the harness dispatches on. Unknown kinds are skipped with a log line.', + }), + ), + matrix_files: Type.Optional( + Type.Array(Type.String(), { + description: + 'For inventory rows that index other xport-lang-*.json files. Paths relative to this manifest.', + }), + ), + ports: Type.Record(Type.String(), PortStatusSchema, { + description: 'Per-site status. Keys must match top-level `sites`.', + }), + }, + { + additionalProperties: false, + description: + 'N sibling language ports of one spec within a single project. Drift = one port diverged from its siblings.', + }, +) + +export const RowSchema = Type.Union([ + FileForkRowSchema, + VersionPinRowSchema, + FeatureParityRowSchema, + SpecConformanceRowSchema, + LangParityRowSchema, +]) + +// --------------------------------------------------------------------------- +// Top-level manifest. +// --------------------------------------------------------------------------- + +export const XportManifestSchema = Type.Object( + { + $schema: Type.Optional(Type.String()), + description: Type.Optional(Type.String()), + area: Type.Optional( + Type.String({ + description: + "Optional label for this manifest file. Used as a grouping key in harness output. Defaults to 'root' for the top-level file and to the filename stem for included files.", + }), + ), + includes: Type.Optional( + Type.Array(Type.String(), { + description: + 'Relative paths to sub-manifests. Top-level `upstreams` and `sites` maps override any same-keyed entries in included manifests.', + }), + ), + upstreams: Type.Optional( + Type.Record(Type.String(), UpstreamSchema, { + description: + 'Named upstream submodules. Referenced by rows[].upstream on file-fork, version-pin, feature-parity, spec-conformance rows. Omit when the manifest only has lang-parity rows.', + }), + ), + sites: Type.Optional( + Type.Record(Type.String(), SiteSchema, { + description: + 'Named sibling ports (typically per-language). Referenced by rows[].ports. on lang-parity rows. Omit when the manifest has no lang-parity rows.', + }), + ), + rows: Type.Array(RowSchema), + }, + { + description: + 'Unified lock-step manifest shared across Socket repos. One schema, all cases — `kind` discriminator on each row selects which flavor of lock-step applies.', + }, +) + +export type Row = Static +export type XportManifest = Static +export type Upstream = Static +export type Site = Static +export type PortStatus = Static +export type FileForkRow = Static +export type VersionPinRow = Static +export type FeatureParityRow = Static +export type SpecConformanceRow = Static +export type LangParityRow = Static diff --git a/scripts/xport.mts b/scripts/xport.mts new file mode 100644 index 00000000..0e1d3c5d --- /dev/null +++ b/scripts/xport.mts @@ -0,0 +1,989 @@ +/** + * @fileoverview xport lock-step harness (canonical; mirrored in + * socket-repo-template/template/scripts/xport.mts). + * + * Reads `xport.json` (+ any `includes[]` sub-manifests) and validates each + * row against its upstream or sibling ports. Every supported `kind` has a + * checker; a repo populates its manifest only with the kinds it needs. + * + * Kinds: + * file-fork vendored upstream file with local deviations; + * drift = upstream moved since our fork SHA. + * version-pin submodule pinned to a specific SHA/tag; + * drift = upstream cut a new release (on default ref). + * feature-parity local impl should match an upstream behavior; + * three-pillar score: code + test + fixture snapshot. + * spec-conformance local impl of an external spec at a known version. + * lang-parity N sibling language ports of one spec; + * drift = port diverged, or rejected anti-pattern + * reintroduced on any port. + * + * Exit codes: + * 0 — manifest valid, no drift. + * 1 — schema violation, missing file, unreachable baseline, unknown kind. + * 2 — drift (upstream moved, parity below floor, rejected anti-pattern). + * + * Output: + * Default — human-readable, compact per-area summary + detailed rows. + * `--format=json` or `--json` — single JSON object for CI tooling. + * + * Sources and learnings: + * - file-fork and version-pin semantics: socket-tui (this repo). + * - feature-parity three-pillar scoring: socket-sdxgen + * lock-step-features.json (snapshots replace the 20% tolerance). + * - lang-parity ports, rejected anti-pattern, per-area summaries, exit + * code 2 semantics: ultrathink/acorn/scripts/xlang-harness.mts. + */ + +import { existsSync, readdirSync, readFileSync, statSync } from 'node:fs' +import path from 'node:path' +import process from 'node:process' +import { fileURLToPath } from 'node:url' + +import { errorMessage } from '@socketsecurity/lib/errors' +import { getDefaultLogger } from '@socketsecurity/lib/logger' +import { spawnSync } from '@socketsecurity/lib/spawn' +import { validateSchema } from '@socketsecurity/lib/schema/validate' + +import { + XportManifestSchema, + type FeatureParityRow, + type FileForkRow, + type LangParityRow, + type PortStatus, + type Row, + type Site, + type SpecConformanceRow, + type Upstream, + type VersionPinRow, + type XportManifest, +} from './xport-schema.mts' + +const logger = getDefaultLogger() + +const __dirname = path.dirname(fileURLToPath(import.meta.url)) +const rootDir = path.resolve(__dirname, '..') + +type Manifest = XportManifest + +// --------------------------------------------------------------------------- +// Report types — one per kind so dispatcher output is typed precisely. +// --------------------------------------------------------------------------- + +type Severity = 'ok' | 'drift' | 'error' + +interface ReportBase { + area: string + id: string + severity: Severity + messages: string[] +} + +interface DriftCommit { + sha: string + summary: string +} + +interface FileForkReport extends ReportBase { + kind: 'file-fork' + local: string + upstream: string + upstream_path: string + forked_at_sha: string + drift: DriftCommit[] +} + +interface VersionPinReport extends ReportBase { + kind: 'version-pin' + upstream: string + pinned_sha: string + pinned_tag: string | null + upgrade_policy: string + head_sha: string | null + drift_count: number +} + +interface FeatureParityReport extends ReportBase { + kind: 'feature-parity' + upstream: string + local_area: string + criticality: number + code_score: number + test_score: number + fixture_score: number + total_score: number +} + +interface SpecConformanceReport extends ReportBase { + kind: 'spec-conformance' + upstream: string + local_impl: string + spec_version: string + spec_path: string | null +} + +interface LangParityReport extends ReportBase { + kind: 'lang-parity' + category: string + ports: Record +} + +type Report = + | FileForkReport + | VersionPinReport + | FeatureParityReport + | SpecConformanceReport + | LangParityReport + +// --------------------------------------------------------------------------- +// Generic helpers. +// --------------------------------------------------------------------------- + +function readManifest(manifestPath: string): Manifest { + if (!existsSync(manifestPath)) { + logger.error(`xport: manifest not found at ${manifestPath}`) + process.exit(1) + } + let raw: unknown + try { + raw = JSON.parse(readFileSync(manifestPath, 'utf8')) + } catch (e) { + logger.error(`xport: could not parse ${manifestPath}`) + logger.fail(` ${errorMessage(e)}`) + process.exit(1) + } + const result = validateSchema(XportManifestSchema, raw) + if (result.ok) { + return result.value + } + logger.error(`xport: schema validation failed for ${manifestPath}`) + for (const issue of result.errors) { + const loc = issue.path.length ? issue.path.join('.') : '' + logger.fail(` ${loc}: ${issue.message}`) + } + process.exit(1) +} + +/** + * Resolve a manifest + all its `includes[]` sub-manifests into a single + * flattened view. Each sub-manifest contributes its rows; the top-level + * upstreams/sites maps are merged (top-level wins on conflict). + */ +function loadManifestTree(rootManifestPath: string): { + areas: Array<{ area: string; manifest: Manifest }> + merged: Manifest +} { + const rootManifest = readManifest(rootManifestPath) + const rootArea = rootManifest.area ?? 'root' + const areas: Array<{ area: string; manifest: Manifest }> = [ + { area: rootArea, manifest: rootManifest }, + ] + + const includes = rootManifest.includes ?? [] + const baseDir = path.dirname(rootManifestPath) + for (const rel of includes) { + const subPath = path.resolve(baseDir, rel) + const sub = readManifest(subPath) + const area = sub.area ?? path.basename(rel, '.json').replace(/^xport-/, '') + areas.push({ area, manifest: sub }) + } + + // Null-prototype maps guard against prototype pollution via untrusted + // manifest keys. Double-cast through `unknown` so the + // `exactOptionalPropertyTypes + noUncheckedIndexedAccess` strict + // tsconfig in some repos accepts the `__proto__` sigil. + const mergedUpstreams: Record = { + __proto__: null, + } as unknown as Record + const mergedSites: Record = { + __proto__: null, + } as unknown as Record + + const mergedRows: Row[] = [] + // Include order, root last so it wins on duplicate keys. + for (const { manifest } of [...areas.slice(1), ...areas.slice(0, 1)]) { + for (const [k, v] of Object.entries(manifest.upstreams ?? {})) { + mergedUpstreams[k] = v + } + for (const [k, v] of Object.entries(manifest.sites ?? {})) { + mergedSites[k] = v + } + } + for (const { manifest } of areas) { + mergedRows.push(...manifest.rows) + } + return { + areas, + merged: { + upstreams: mergedUpstreams, + sites: mergedSites, + rows: mergedRows, + }, + } +} + +function gitIn(submoduleDir: string, args: string[]): string { + const result = spawnSync('git', ['-C', submoduleDir, ...args], { + stdio: ['ignore', 'pipe', 'pipe'], + stdioString: true, + }) + if (result.error) { + throw result.error + } + if (result.status !== 0) { + throw new Error( + `git ${args.join(' ')} failed (status ${result.status}): ${String(result.stderr).trim()}`, + ) + } + return String(result.stdout) +} + +function shaIsReachable(submoduleDir: string, sha: string): boolean { + try { + gitIn(submoduleDir, ['cat-file', '-e', sha]) + return true + } catch { + return false + } +} + +function driftCommitsSince( + submoduleDir: string, + sha: string, + pathInRepo: string, +): DriftCommit[] { + try { + const out = gitIn(submoduleDir, [ + 'log', + '--pretty=format:%H%x09%s', + `${sha}..HEAD`, + '--', + pathInRepo, + ]) + const trimmed = out.trim() + if (!trimmed) { + return [] + } + return trimmed.split('\n').map(line => { + // Preserve any embedded tabs in the commit subject (rare but + // possible) — `.split` destructuring would truncate at the + // first tab inside the summary. + const [commitSha, ...summaryParts] = line.split('\t') + return { + sha: commitSha ?? '', + summary: summaryParts.join('\t') ?? '', + } + }) + } catch { + return [] + } +} + +function resolveUpstream( + manifest: Manifest, + alias: string, + messages: string[], +): Upstream | null { + const upstream = manifest.upstreams?.[alias] + if (!upstream) { + const known = Object.keys(manifest.upstreams ?? {}).join(', ') || '(none)' + messages.push(`unknown upstream alias '${alias}' (known: ${known})`) + return null + } + return upstream +} + +function walkDirFiles(dir: string, extRe: RegExp): string[] { + const files: string[] = [] + if (!existsSync(dir)) { + return files + } + const stack: string[] = [dir] + while (stack.length > 0) { + const current = stack.pop()! + let entries: string[] = [] + try { + entries = readdirSync(current) + } catch { + continue + } + for (const entry of entries) { + if (entry === 'node_modules' || entry === '.git' || entry === 'dist') { + continue + } + const full = path.join(current, entry) + let stat + try { + stat = statSync(full) + } catch { + continue + } + if (stat.isDirectory()) { + stack.push(full) + } else if (stat.isFile() && extRe.test(entry)) { + files.push(full) + } + } + } + return files +} + +function countPatternHits(files: string[], patterns: string[]): number { + if (patterns.length === 0) { + return 0 + } + // Manifest authors occasionally land a bad regex; surface the bad + // pattern and keep going rather than throwing a SyntaxError that + // kills the whole run. + const compiled: RegExp[] = [] + for (const p of patterns) { + try { + compiled.push(new RegExp(p)) + } catch (e) { + logger.warn( + `xport: skipping invalid regex ${JSON.stringify(p)}: ${errorMessage(e)}`, + ) + } + } + let hits = 0 + for (const pat of compiled) { + for (const file of files) { + let content: string + try { + content = readFileSync(file, 'utf8') + } catch { + continue + } + if (pat.test(content)) { + hits += 1 + break + } + } + } + return hits +} + +// --------------------------------------------------------------------------- +// Kind checkers. +// --------------------------------------------------------------------------- + +function checkFileFork( + row: FileForkRow, + manifest: Manifest, + area: string, +): FileForkReport { + const messages: string[] = [] + const upstream = resolveUpstream(manifest, row.upstream, messages) + const base: FileForkReport = { + kind: 'file-fork', + area, + id: row.id, + severity: 'ok', + messages, + local: row.local, + upstream: row.upstream, + upstream_path: row.upstream_path, + forked_at_sha: row.forked_at_sha, + drift: [], + } + if (!upstream) { + base.severity = 'error' + return base + } + const submoduleDir = path.join(rootDir, upstream.submodule) + const localPath = path.join(rootDir, row.local) + const upstreamFilePath = path.join(submoduleDir, row.upstream_path) + + if (!existsSync(localPath)) { + base.severity = 'error' + messages.push(`local file missing: ${row.local}`) + } + if (!existsSync(upstreamFilePath)) { + base.severity = 'error' + messages.push( + `upstream file missing — submodule out of date, or upstream_path stale`, + ) + } + if (!shaIsReachable(submoduleDir, row.forked_at_sha)) { + base.severity = 'error' + messages.push( + `forked_at_sha unreachable in submodule — submodule too shallow, or SHA typo`, + ) + } + if (base.severity === 'error') { + return base + } + const drift = driftCommitsSince( + submoduleDir, + row.forked_at_sha, + row.upstream_path, + ) + base.drift = drift + if (drift.length > 0) { + base.severity = 'drift' + messages.push( + `${drift.length} upstream commit(s) since fork — review for bugfixes/features`, + ) + } + return base +} + +function checkVersionPin( + row: VersionPinRow, + manifest: Manifest, + area: string, +): VersionPinReport { + const messages: string[] = [] + const upstream = resolveUpstream(manifest, row.upstream, messages) + const base: VersionPinReport = { + kind: 'version-pin', + area, + id: row.id, + severity: 'ok', + messages, + upstream: row.upstream, + pinned_sha: row.pinned_sha, + pinned_tag: row.pinned_tag ?? null, + upgrade_policy: row.upgrade_policy, + head_sha: null, + drift_count: 0, + } + if (!upstream) { + base.severity = 'error' + return base + } + const submoduleDir = path.join(rootDir, upstream.submodule) + if (!existsSync(submoduleDir)) { + base.severity = 'error' + messages.push( + `submodule not checked out at ${upstream.submodule} — run \`git submodule update --init\``, + ) + return base + } + if (!shaIsReachable(submoduleDir, row.pinned_sha)) { + base.severity = 'error' + messages.push(`pinned_sha unreachable — submodule too shallow, or SHA typo`) + return base + } + let head = '' + try { + head = gitIn(submoduleDir, ['rev-parse', 'HEAD']).trim() + } catch { + base.severity = 'error' + messages.push(`could not read submodule HEAD`) + return base + } + base.head_sha = head + + if (head !== row.pinned_sha) { + base.severity = 'error' + messages.push( + `submodule HEAD (${head.slice(0, 12)}) does not match pinned_sha (${row.pinned_sha.slice(0, 12)}) — run \`git submodule update\``, + ) + return base + } + + // Count commits on the upstream default branch since pinned SHA. + let driftRef = '' + try { + const remoteRefs = gitIn(submoduleDir, [ + 'for-each-ref', + '--format=%(refname)', + 'refs/remotes/origin/', + ]) + const lines = remoteRefs.split('\n').filter(s => s.trim()) + const pref = [ + 'refs/remotes/origin/HEAD', + 'refs/remotes/origin/main', + 'refs/remotes/origin/master', + ] + for (const p of pref) { + if (lines.includes(p)) { + driftRef = p + break + } + } + } catch { + // no remotes available — drift can't be computed; report OK with a note. + } + if (!driftRef) { + messages.push(`no origin remote ref found; cannot compute upstream drift`) + return base + } + try { + const count = gitIn(submoduleDir, [ + 'rev-list', + '--count', + `${row.pinned_sha}..${driftRef}`, + ]).trim() + const n = parseInt(count, 10) + if (!Number.isNaN(n) && n > 0) { + base.drift_count = n + base.severity = 'drift' + const tagSuffix = row.pinned_tag ? ` (from ${row.pinned_tag})` : '' + messages.push( + `${n} upstream commit(s) since pin${tagSuffix} on ${driftRef.replace('refs/remotes/', '')}`, + ) + } + } catch { + // silent — drift ref not fetched. + } + return base +} + +function checkFeatureParity( + row: FeatureParityRow, + _manifest: Manifest, + area: string, +): FeatureParityReport { + const messages: string[] = [] + const base: FeatureParityReport = { + kind: 'feature-parity', + area, + id: row.id, + severity: 'ok', + messages, + upstream: row.upstream, + local_area: row.local_area, + criticality: row.criticality, + code_score: 0, + test_score: 0, + fixture_score: 0, + total_score: 0, + } + const localAreaPath = path.join(rootDir, row.local_area) + if (!existsSync(localAreaPath)) { + base.severity = 'error' + messages.push(`local_area path missing: ${row.local_area}`) + return base + } + + const codePatterns = row.code_patterns ?? [] + const testPatterns = row.test_patterns ?? [] + const codeFiles = walkDirFiles(localAreaPath, /\.(m?[jt]sx?|json)$/).filter( + f => !/[/\\](test|tests|__tests__)[/\\]|\.test\.|\.spec\./.test(f), + ) + + const codeScore = + codePatterns.length === 0 + ? 1 + : countPatternHits(codeFiles, codePatterns) / codePatterns.length + + // Test files: by default search local_area; if test_area is set, search + // that directory instead (sdxgen-style where tests live outside the + // parser directory). + const testAreaPath = path.join(rootDir, row.test_area ?? row.local_area) + const testAreaFiles = walkDirFiles(testAreaPath, /\.(m?[jt]sx?|json)$/) + const testFiles = row.test_area + ? testAreaFiles + : testAreaFiles.filter(f => + /[/\\](test|tests|__tests__)[/\\]|\.test\.|\.spec\./.test(f), + ) + const testScore = + testPatterns.length === 0 + ? 1 + : countPatternHits(testFiles, testPatterns) / testPatterns.length + + let fixtureScore = 1 + if (row.fixture_check) { + const fixturePath = path.join(rootDir, row.fixture_check.fixture_path) + if (!existsSync(fixturePath)) { + fixtureScore = 0 + messages.push(`fixture not found: ${row.fixture_check.fixture_path}`) + } else if (row.fixture_check.snapshot_path) { + const snapPath = path.join(rootDir, row.fixture_check.snapshot_path) + if (!existsSync(snapPath)) { + fixtureScore = 0 + messages.push( + `snapshot not found: ${row.fixture_check.snapshot_path} — run test suite to generate`, + ) + } + } + } + + base.code_score = Math.round(codeScore * 100) / 100 + base.test_score = Math.round(testScore * 100) / 100 + base.fixture_score = Math.round(fixtureScore * 100) / 100 + const total = 0.3 * codeScore + 0.3 * testScore + 0.4 * fixtureScore + base.total_score = Math.round(total * 100) / 100 + + // Floor: higher criticality = stricter. Cap at 0.85 so 10/10 criticality + // doesn't demand perfect pattern coverage (code is prose, patterns miss). + const floor = Math.min(0.85, row.criticality / 10) + if (total < floor) { + base.severity = 'drift' + messages.push( + `parity score ${base.total_score} below floor ${Math.round(floor * 100) / 100} (criticality ${row.criticality})`, + ) + } + return base +} + +function checkSpecConformance( + row: SpecConformanceRow, + manifest: Manifest, + area: string, +): SpecConformanceReport { + const messages: string[] = [] + const upstream = resolveUpstream(manifest, row.upstream, messages) + const base: SpecConformanceReport = { + kind: 'spec-conformance', + area, + id: row.id, + severity: 'ok', + messages, + upstream: row.upstream, + local_impl: row.local_impl, + spec_version: row.spec_version, + spec_path: row.spec_path ?? null, + } + if (!upstream) { + base.severity = 'error' + return base + } + const localImplPath = path.join(rootDir, row.local_impl) + if (!existsSync(localImplPath)) { + base.severity = 'error' + messages.push(`local_impl missing: ${row.local_impl}`) + return base + } + if (row.spec_path) { + const specPath = path.join(rootDir, upstream.submodule, row.spec_path) + if (!existsSync(specPath)) { + base.severity = 'error' + messages.push(`spec_path missing in upstream submodule: ${row.spec_path}`) + return base + } + } + return base +} + +function checkLangParity( + row: LangParityRow, + manifest: Manifest, + area: string, +): LangParityReport { + const messages: string[] = [] + const base: LangParityReport = { + kind: 'lang-parity', + area, + id: row.id, + severity: 'ok', + messages, + category: row.category, + ports: row.ports, + } + + const declaredSites = Object.keys(manifest.sites ?? {}) + if (declaredSites.length === 0) { + base.severity = 'error' + messages.push(`manifest has lang-parity rows but no top-level 'sites' map`) + return base + } + + for (const site of declaredSites) { + if (!(site in row.ports)) { + base.severity = 'error' + messages.push(`port '${site}' missing (declared in sites)`) + } + } + for (const port of Object.keys(row.ports)) { + if (!declaredSites.includes(port)) { + base.severity = 'error' + messages.push(`port '${port}' not in sites map`) + } + const state = row.ports[port]! + if (state.status === 'opt-out' && (!state.reason || !state.reason.trim())) { + base.severity = 'error' + messages.push(`port '${port}' is opt-out without a reason`) + } + } + + if (row.category === 'rejected') { + for (const port of Object.keys(row.ports)) { + const state = row.ports[port]! + if (state.status !== 'opt-out') { + base.severity = 'drift' + messages.push( + `REJECTED anti-pattern reintroduced: port '${port}' is '${state.status}' (must be 'opt-out' for category=rejected)`, + ) + } + } + } + + return base +} + +// --------------------------------------------------------------------------- +// Cross-row consistency checks (beyond zod's per-row validation). +// --------------------------------------------------------------------------- + +/** + * Cross-row checks that zod validation can't express: unique ids, upstream + * refs resolve to the `upstreams` map, port keys resolve to the `sites` + * map. Zod's `XportManifestSchema.parse()` (called from `loadManifestTree`) + * already covers per-row shape, enum values, id pattern, and required + * fields — this is the referential-integrity layer on top. + */ +function checkCrossRowConsistency( + rowsWithArea: Array<{ row: Row; area: string }>, + merged: Manifest, +): string[] { + const errors: string[] = [] + // Ids are unique per area, not globally. Same concept can legitimately + // appear in multiple areas (e.g. ultrathink has `transport-stdio` in both + // lsp and mcp). Scope the seen-set per area. + const seenIdsPerArea = new Map>() + const upstreamAliases = new Set(Object.keys(merged.upstreams ?? {})) + const siteKeys = new Set(Object.keys(merged.sites ?? {})) + + for (const { row, area } of rowsWithArea) { + const loc = `[${area}/${row.id}]` + + let areaIds = seenIdsPerArea.get(area) + if (!areaIds) { + areaIds = new Set() + seenIdsPerArea.set(area, areaIds) + } + if (areaIds.has(row.id)) { + errors.push(`${loc} duplicate id within area`) + } + areaIds.add(row.id) + + if ( + row.kind === 'file-fork' || + row.kind === 'version-pin' || + row.kind === 'feature-parity' || + row.kind === 'spec-conformance' + ) { + if (!upstreamAliases.has(row.upstream)) { + errors.push( + `${loc} upstream '${row.upstream}' not in upstreams map (known: ${[...upstreamAliases].join(', ') || '(none)'})`, + ) + } + } + + if (row.kind === 'lang-parity') { + for (const port of Object.keys(row.ports)) { + if (!siteKeys.has(port)) { + errors.push( + `${loc} port '${port}' not in sites map (known: ${[...siteKeys].join(', ') || '(none)'})`, + ) + } + } + } + } + + return errors +} + +// --------------------------------------------------------------------------- +// Dispatcher. +// --------------------------------------------------------------------------- + +function evaluate( + rowsWithArea: Array<{ row: Row; area: string }>, + merged: Manifest, +): Report[] { + const reports: Report[] = [] + for (const { row, area } of rowsWithArea) { + switch (row.kind) { + case 'file-fork': + reports.push(checkFileFork(row, merged, area)) + break + case 'version-pin': + reports.push(checkVersionPin(row, merged, area)) + break + case 'feature-parity': + reports.push(checkFeatureParity(row, merged, area)) + break + case 'spec-conformance': + reports.push(checkSpecConformance(row, merged, area)) + break + case 'lang-parity': + reports.push(checkLangParity(row, merged, area)) + break + default: { + const anyRow = row as { kind: string; id: string } + reports.push({ + kind: 'file-fork', + area, + id: anyRow.id, + severity: 'error', + messages: [`no checker registered for kind '${anyRow.kind}'`], + local: '', + upstream: '', + upstream_path: '', + forked_at_sha: '', + drift: [], + }) + process.exitCode = 1 + } + } + } + return reports +} + +// --------------------------------------------------------------------------- +// Per-area summary (learned from ultrathink xlang-harness). +// --------------------------------------------------------------------------- + +interface AreaSummary { + area: string + total: number + ok: number + drift: number + error: number +} + +function summarize(reports: Report[]): AreaSummary[] { + const byArea = new Map() + for (const r of reports) { + let s = byArea.get(r.area) + if (!s) { + s = { area: r.area, total: 0, ok: 0, drift: 0, error: 0 } + byArea.set(r.area, s) + } + s.total += 1 + s[r.severity] += 1 + } + return [...byArea.values()].sort((a, b) => a.area.localeCompare(b.area)) +} + +// --------------------------------------------------------------------------- +// Output. +// --------------------------------------------------------------------------- + +function emitHuman(reports: Report[], summaries: AreaSummary[]): number { + logger.info( + `xport — ${reports.length} row(s) across ${summaries.length} area(s)`, + ) + logger.info('') + for (const s of summaries) { + const label = s.area.padEnd(24) + const parts = `total=${String(s.total).padStart(3)} ok=${String(s.ok).padStart(3)} drift=${String(s.drift).padStart(3)} error=${String(s.error).padStart(3)}` + logger.info(` ${label}${parts}`) + } + logger.info('') + + let hadError = false + let hadDrift = false + for (const r of reports) { + const banner = `[${r.area}/${r.id}] (${r.kind})` + if (r.kind === 'file-fork') { + logger.info(banner) + logger.info(` local: ${r.local}`) + logger.info( + ` upstream: ${r.upstream}:${r.upstream_path} @ ${r.forked_at_sha.slice(0, 12)}`, + ) + } else if (r.kind === 'version-pin') { + logger.info(banner) + const tag = r.pinned_tag ? ` (${r.pinned_tag})` : '' + logger.info( + ` upstream: ${r.upstream} @ ${r.pinned_sha.slice(0, 12)}${tag}, policy=${r.upgrade_policy}`, + ) + } else if (r.kind === 'feature-parity') { + logger.info(banner) + logger.info( + ` upstream: ${r.upstream}, local_area: ${r.local_area}, criticality: ${r.criticality}`, + ) + logger.info( + ` scores: code=${r.code_score} test=${r.test_score} fixture=${r.fixture_score} total=${r.total_score}`, + ) + } else if (r.kind === 'spec-conformance') { + logger.info(banner) + logger.info( + ` upstream: ${r.upstream}, local_impl: ${r.local_impl}, spec_version: ${r.spec_version}`, + ) + } else if (r.kind === 'lang-parity') { + logger.info(banner) + logger.info(` category: ${r.category}`) + for (const [port, state] of Object.entries(r.ports)) { + const suffix = + state.status === 'opt-out' ? ` (${state.reason ?? ''})` : '' + logger.info(` ${port}: ${state.status}${suffix}`) + } + } + + for (const msg of r.messages) { + if (r.severity === 'error') { + logger.fail(` ${msg}`) + } else if (r.severity === 'drift') { + logger.warn(` ${msg}`) + } else { + logger.info(` ${msg}`) + } + } + + if (r.kind === 'file-fork') { + for (const c of r.drift) { + logger.info(` ${c.sha.slice(0, 12)} ${c.summary}`) + } + } + + if (r.severity === 'ok') { + logger.success(` ok`) + } else if (r.severity === 'error') { + hadError = true + } else if (r.severity === 'drift') { + hadDrift = true + } + logger.info('') + } + + if (hadError) { + return 1 + } + if (hadDrift) { + return 2 + } + return 0 +} + +function main(): void { + const rootManifestPath = path.join(rootDir, 'xport.json') + const { areas, merged } = loadManifestTree(rootManifestPath) + + const rowsWithArea: Array<{ row: Row; area: string }> = [] + for (const { area, manifest } of areas) { + for (const row of manifest.rows) { + rowsWithArea.push({ row, area }) + } + } + + const crossRowErrors = checkCrossRowConsistency(rowsWithArea, merged) + if (crossRowErrors.length > 0) { + for (const err of crossRowErrors) { + logger.fail(err) + } + logger.error( + `xport: ${crossRowErrors.length} cross-row error(s) — fix before running drift checks`, + ) + process.exit(1) + } + + const reports = evaluate(rowsWithArea, merged) + const summaries = summarize(reports) + + const jsonMode = + process.argv.includes('--json') || process.argv.includes('--format=json') + + if (jsonMode) { + process.stdout.write(JSON.stringify({ reports, summaries }, null, 2) + '\n') + const anyError = reports.some(r => r.severity === 'error') + const anyDrift = reports.some(r => r.severity === 'drift') + if (anyError) { + process.exitCode = 1 + } else if (anyDrift) { + process.exitCode = 2 + } + return + } + + const code = emitHuman(reports, summaries) + if (code !== 0) { + process.exitCode = code + } +} + +main() diff --git a/xport.schema.json b/xport.schema.json new file mode 100644 index 00000000..6cbd8019 --- /dev/null +++ b/xport.schema.json @@ -0,0 +1,449 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://github.com/SocketDev/xport.schema.json", + "title": "xport lock-step manifest", + "description": "Unified lock-step manifest shared across Socket repos. One schema, all cases — `kind` discriminator on each row selects which flavor of lock-step applies.", + "type": "object", + "required": ["rows"], + "properties": { + "$schema": { + "type": "string" + }, + "description": { + "type": "string" + }, + "area": { + "description": "Optional label for this manifest file. Used as a grouping key in harness output. Defaults to 'root' for the top-level file and to the filename stem for included files.", + "type": "string" + }, + "includes": { + "description": "Relative paths to sub-manifests. Top-level `upstreams` and `sites` maps override any same-keyed entries in included manifests.", + "type": "array", + "items": { + "type": "string" + } + }, + "upstreams": { + "description": "Named upstream submodules. Referenced by rows[].upstream on file-fork, version-pin, feature-parity, spec-conformance rows. Omit when the manifest only has lang-parity rows.", + "type": "object", + "patternProperties": { + "^(.*)$": { + "additionalProperties": false, + "type": "object", + "required": ["submodule", "repo"], + "properties": { + "submodule": { + "description": "Submodule path, relative to repo root.", + "type": "string" + }, + "repo": { + "pattern": "^https?://", + "description": "Upstream repository URL (http:// or https://).", + "type": "string" + } + } + } + } + }, + "sites": { + "description": "Named sibling ports (typically per-language). Referenced by rows[].ports. on lang-parity rows. Omit when the manifest has no lang-parity rows.", + "type": "object", + "patternProperties": { + "^(.*)$": { + "additionalProperties": false, + "type": "object", + "required": ["path"], + "properties": { + "path": { + "description": "Path to the port's root directory, relative to repo root.", + "type": "string" + }, + "language": { + "description": "Language label, for human reports.", + "type": "string" + } + } + } + } + }, + "rows": { + "type": "array", + "items": { + "anyOf": [ + { + "additionalProperties": false, + "description": "A local file derived from an upstream file with intentional modifications. Drift = upstream moved forward without us.", + "type": "object", + "required": [ + "kind", + "id", + "upstream", + "local", + "upstream_path", + "forked_at_sha", + "deviations" + ], + "properties": { + "kind": { + "const": "file-fork", + "type": "string" + }, + "id": { + "pattern": "^[a-z0-9][A-Za-z0-9-]*$", + "description": "Stable identifier, unique within the manifest. Starts with lowercase letter or digit; remaining characters are letters/digits/hyphens. Kebab-case preferred, but camelCase segments are allowed (e.g. `export-findNodeAt` when the id mirrors an API name).", + "type": "string" + }, + "upstream": { + "description": "Key into the top-level `upstreams` map.", + "type": "string" + }, + "criticality": { + "minimum": 1, + "maximum": 10, + "description": "Stay-in-step importance (1 = cosmetic, 10 = security-sensitive). Harness surfaces high-criticality drift louder.", + "type": "integer" + }, + "conformance_test": { + "description": "Path to a test that enforces behavior parity (modulo documented deviations). Strongly recommended — static checks can't catch silent behavioral drift.", + "type": "string" + }, + "notes": { + "description": "Free-form context — why this row exists, what gotchas to watch for.", + "type": "string" + }, + "local": { + "description": "Path to our ported file, relative to repo root.", + "type": "string" + }, + "upstream_path": { + "description": "Path to the source file within the upstream submodule.", + "type": "string" + }, + "forked_at_sha": { + "pattern": "^[0-9a-f]{40}$", + "description": "Full 40-char SHA of the upstream commit we forked from. Harness runs `git log ..HEAD -- ` to surface drift.", + "type": "string" + }, + "deviations": { + "minItems": 1, + "description": "Human-readable list of intentional differences. Zero deviations = use upstream directly; don't fork.", + "type": "array", + "items": { + "type": "string" + } + } + } + }, + { + "additionalProperties": false, + "description": "A submodule pinned to an upstream release. Drift = upstream cut a new release we haven't adopted.", + "type": "object", + "required": [ + "kind", + "id", + "upstream", + "pinned_sha", + "upgrade_policy" + ], + "properties": { + "kind": { + "const": "version-pin", + "type": "string" + }, + "id": { + "pattern": "^[a-z0-9][A-Za-z0-9-]*$", + "description": "Stable identifier, unique within the manifest. Starts with lowercase letter or digit; remaining characters are letters/digits/hyphens. Kebab-case preferred, but camelCase segments are allowed (e.g. `export-findNodeAt` when the id mirrors an API name).", + "type": "string" + }, + "upstream": { + "description": "Key into the top-level `upstreams` map.", + "type": "string" + }, + "criticality": { + "minimum": 1, + "maximum": 10, + "description": "Stay-in-step importance (1 = cosmetic, 10 = security-sensitive). Harness surfaces high-criticality drift louder.", + "type": "integer" + }, + "conformance_test": { + "description": "Path to a test that enforces behavior parity (modulo documented deviations). Strongly recommended — static checks can't catch silent behavioral drift.", + "type": "string" + }, + "notes": { + "description": "Free-form context — why this row exists, what gotchas to watch for.", + "type": "string" + }, + "pinned_sha": { + "pattern": "^[0-9a-f]{40}$", + "description": "Full 40-char SHA the submodule is pinned to.", + "type": "string" + }, + "pinned_tag": { + "description": "Human-readable release tag (e.g., `v3.2.1`). Optional — the SHA is authoritative.", + "type": "string" + }, + "upgrade_policy": { + "description": "track-latest: any new release is actionable; major-gate: only major bumps require review; locked: explicit decision per upgrade.", + "anyOf": [ + { + "const": "track-latest", + "type": "string" + }, + { + "const": "major-gate", + "type": "string" + }, + { + "const": "locked", + "type": "string" + } + ] + } + } + }, + { + "additionalProperties": false, + "description": "A behavioral feature reimplemented locally to match upstream behavior. Three-pillar validation: code patterns, test patterns, fixture snapshots.", + "type": "object", + "required": ["kind", "id", "upstream", "criticality", "local_area"], + "properties": { + "kind": { + "const": "feature-parity", + "type": "string" + }, + "id": { + "pattern": "^[a-z0-9][A-Za-z0-9-]*$", + "description": "Stable identifier, unique within the manifest. Starts with lowercase letter or digit; remaining characters are letters/digits/hyphens. Kebab-case preferred, but camelCase segments are allowed (e.g. `export-findNodeAt` when the id mirrors an API name).", + "type": "string" + }, + "upstream": { + "description": "Key into the top-level `upstreams` map.", + "type": "string" + }, + "criticality": { + "minimum": 1, + "maximum": 10, + "description": "Stay-in-step importance (1 = cosmetic, 10 = security-sensitive). Harness surfaces high-criticality drift louder.", + "type": "integer" + }, + "conformance_test": { + "description": "Path to a test that enforces behavior parity (modulo documented deviations). Strongly recommended — static checks can't catch silent behavioral drift.", + "type": "string" + }, + "notes": { + "description": "Free-form context — why this row exists, what gotchas to watch for.", + "type": "string" + }, + "local_area": { + "description": "Path to the local module/directory implementing the feature. Code pattern scan targets this directory (excluding test files).", + "type": "string" + }, + "test_area": { + "description": "Optional path to the directory where tests for this feature live. When absent, the harness searches inside `local_area`.", + "type": "string" + }, + "code_patterns": { + "description": "Regex patterns the local implementation must contain. Prefer anchored patterns (function signatures) over loose keywords to avoid comment false positives.", + "type": "array", + "items": { + "type": "string" + } + }, + "test_patterns": { + "description": "Regex patterns the test suite must contain.", + "type": "array", + "items": { + "type": "string" + } + }, + "fixture_check": { + "additionalProperties": false, + "description": "Golden-input verification. Prefer snapshot-based diffs over hardcoded counts (brittleness lesson from sdxgen's lock-step-features).", + "type": "object", + "required": ["fixture_path"], + "properties": { + "fixture_path": { + "type": "string" + }, + "snapshot_path": { + "type": "string" + }, + "diff_tolerance": { + "anyOf": [ + { + "const": "exact", + "type": "string" + }, + { + "const": "line-by-line", + "type": "string" + }, + { + "const": "semantic", + "type": "string" + } + ] + } + } + } + } + }, + { + "additionalProperties": false, + "description": "A local reimplementation of an external specification. Drift = the spec was revised.", + "type": "object", + "required": [ + "kind", + "id", + "upstream", + "local_impl", + "spec_version" + ], + "properties": { + "kind": { + "const": "spec-conformance", + "type": "string" + }, + "id": { + "pattern": "^[a-z0-9][A-Za-z0-9-]*$", + "description": "Stable identifier, unique within the manifest. Starts with lowercase letter or digit; remaining characters are letters/digits/hyphens. Kebab-case preferred, but camelCase segments are allowed (e.g. `export-findNodeAt` when the id mirrors an API name).", + "type": "string" + }, + "upstream": { + "description": "Key into the top-level `upstreams` map.", + "type": "string" + }, + "criticality": { + "minimum": 1, + "maximum": 10, + "description": "Stay-in-step importance (1 = cosmetic, 10 = security-sensitive). Harness surfaces high-criticality drift louder.", + "type": "integer" + }, + "conformance_test": { + "description": "Path to a test that enforces behavior parity (modulo documented deviations). Strongly recommended — static checks can't catch silent behavioral drift.", + "type": "string" + }, + "notes": { + "description": "Free-form context — why this row exists, what gotchas to watch for.", + "type": "string" + }, + "local_impl": { + "type": "string" + }, + "spec_version": { + "type": "string" + }, + "spec_path": { + "description": "Path within the upstream submodule to the spec document, if applicable.", + "type": "string" + } + } + }, + { + "additionalProperties": false, + "description": "N sibling language ports of one spec within a single project. Drift = one port diverged from its siblings.", + "type": "object", + "required": [ + "kind", + "id", + "name", + "description", + "category", + "ports" + ], + "properties": { + "kind": { + "const": "lang-parity", + "type": "string" + }, + "id": { + "pattern": "^[a-z0-9][A-Za-z0-9-]*$", + "description": "Stable identifier, unique within the manifest. Starts with lowercase letter or digit; remaining characters are letters/digits/hyphens. Kebab-case preferred, but camelCase segments are allowed (e.g. `export-findNodeAt` when the id mirrors an API name).", + "type": "string" + }, + "name": { + "type": "string" + }, + "description": { + "type": "string" + }, + "category": { + "description": "Grouping tag. `rejected` is reserved for anti-patterns (every port must be opt-out; reintroduction exits 2).", + "type": "string" + }, + "criticality": { + "minimum": 1, + "maximum": 10, + "description": "Stay-in-step importance (1 = cosmetic, 10 = security-sensitive). Harness surfaces high-criticality drift louder.", + "type": "integer" + }, + "conformance_test": { + "description": "Path to a test that enforces behavior parity (modulo documented deviations). Strongly recommended — static checks can't catch silent behavioral drift.", + "type": "string" + }, + "notes": { + "description": "Free-form context — why this row exists, what gotchas to watch for.", + "type": "string" + }, + "assertions": { + "description": "Open-ended assertion list. Each has a `kind` string the harness dispatches on. Unknown kinds are skipped with a log line.", + "type": "array", + "items": { + "type": "object", + "patternProperties": { + "^(.*)$": {} + } + } + }, + "matrix_files": { + "description": "For inventory rows that index other xport-lang-*.json files. Paths relative to this manifest.", + "type": "array", + "items": { + "type": "string" + } + }, + "ports": { + "description": "Per-site status. Keys must match top-level `sites`.", + "type": "object", + "patternProperties": { + "^(.*)$": { + "additionalProperties": false, + "description": "Per-port status for a lang-parity row. `implemented` = port meets assertions; `opt-out` = port consciously skips, requires non-empty `reason`.", + "type": "object", + "required": ["status"], + "properties": { + "status": { + "anyOf": [ + { + "const": "implemented", + "type": "string" + }, + { + "const": "opt-out", + "type": "string" + } + ] + }, + "reason": { + "description": "Required when status is `opt-out`. Explain why.", + "type": "string" + }, + "path": { + "description": "Optional path to the port's implementation of this row. Useful for module-inventory rows where each language points at a different directory.", + "type": "string" + }, + "note": { + "description": "Optional free-form note attached to a specific port's status.", + "type": "string" + } + } + } + } + } + } + } + ] + } + } + } +}