Skip to content

fix(hooks): track Bash file modifications to prevent false-positive commit blocks#1483

Merged
carlos-alm merged 7 commits into
mainfrom
fix/hooks-bash-tracking-1457
Jun 13, 2026
Merged

fix(hooks): track Bash file modifications to prevent false-positive commit blocks#1483
carlos-alm merged 7 commits into
mainfrom
fix/hooks-bash-tracking-1457

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

`guard-git.sh` was blocking commits for files legitimately modified by Bash commands (`sed -i`, `printf`, redirects, build tools) because `session-edits.log` only recorded Edit/Write tool calls and `mv`/`cp`.

Adds a pre/post Bash hook pair that snapshots `git status --porcelain` before each Bash tool call and diffs after, logging any newly modified or created files to `session-edits.log`. This is robust to any Bash command regardless of how it writes files.

What changed

  • .claude/hooks/snapshot-pre-bash.sh (new, PreToolUse Bash): captures `git status --porcelain` to a per-worktree temp file in `/tmp` before each Bash call. Skips known read-only commands (ls, git log, git status, grep, etc.) to avoid unnecessary overhead.
  • .claude/hooks/track-bash-writes.sh (new, PostToolUse Bash): reads the snapshot, diffs against current porcelain output using `comm -13`, and appends any newly dirty paths to `.claude/session-edits.log` in the same timestamped format as other tracking hooks.
  • .claude/settings.json: wires `snapshot-pre-bash.sh` as the first PreToolUse Bash entry (before `guard-git.sh`) and `track-bash-writes.sh` as the first PostToolUse Bash entry.

Key properties

  • Worktree-safe: snapshot file is keyed by a short SHA1 of the project root path, so parallel worktrees never collide
  • No false positives: `comm -13` only reports paths that were absent from the pre-snapshot — pre-existing dirt is always excluded
  • No double-logging: skips paths already present in `session-edits.log` from Edit/Write/track-moves
  • Resilient: if the snapshot file is missing (first run, or pre-hook skipped a read-only command), the post-hook exits cleanly with no effect

Closes #1457

… diff

Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh
(PostToolUse Bash): the pre-hook captures git status --porcelain to a
per-worktree temp file before each Bash call; the post-hook diffs the
before/after state and appends newly modified or created files to
.claude/session-edits.log.

This closes the gap where files written by sed -i, printf redirects,
tee, heredocs, or build tools (Cargo.lock, lockfiles) were never
recorded, causing guard-git.sh to emit false-positive BLOCKED errors.

Closes #1457
@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


PR Review: fix(hooks): track Bash file modifications to prevent false-positive commit blocks

  • Gather context and read changed files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance
  • Post final review

Root Cause Analysis

Problem Being Solved: guard-git.sh performs commit validation by checking if staged files appear in .claude/session-edits.log. However, that log only captured files modified via Edit/Write tool calls (via track-edits.sh) and mv/cp (via track-moves.sh). Files created or modified by Bash commands — sed -i, printf >, build tool artifacts like Cargo.lock, tee, heredocs — were never logged, causing false-positive BLOCKED errors at commit time.

Why This Approach: The PR takes a git-state snapshot approach: capture git status --porcelain before each Bash call and diff it after to identify newly dirtied files. This is the correct method — it's language-agnostic, robust to any file-writing mechanism, and doesn't require enumerating all possible write-capable commands. The alternative (allowlisting/denylisting specific commands) would be brittle and incomplete.

Risk Assessment: Low. Both hooks always exit 0 (informational only). Worst case for a bug: files go un-logged (the current state) or are double-logged (benign since guard-git.sh checks for presence, not count).


Phase 2: Configuration Change Detection

No linting rules relaxed. No test timeouts increased. No TypeScript flags changed. No coverage thresholds lowered. The settings.json change purely adds two hooks without modifying any existing ones. ✓


Phase 3: Test Integrity

tests/unit/visitor.test.ts (commit ef8ea4f): Removes a // biome-ignore lint/suspicious/noExplicitAny: ... suppression comment above let parse: any. This is a cleanup — Biome now accepts the any without a suppression (or it was already passing; either way it's code hygiene). No assertions removed, no test weakened. ✓

No test coverage for the new hooks: The hooks are bash scripts exercising git status --porcelain, comm -13, and awk parsing. They don't have unit tests — but this is consistent with the existing hooks in this repo (e.g. track-edits.sh, guard-git.sh) which also have no dedicated test files. This is an acceptable gap given the infrastructure-script nature of the code.


Phase 4: Code Quality

snapshot-pre-bash.sh

L13–20 / L38–45 — Node.js JSON parsing and hashing used twice: Both scripts spawn node processes to parse JSON and compute a SHA1 hash. This works but is expensive — every Bash hook call spawns two Node processes. An all-bash alternative using jq or python3 would be faster, but node is already a hard dependency of the project and jq/python3 are not guaranteed, so this is defensible.

L29 — Read-only command denylist: The regex pattern to skip "read-only" commands is heuristic and incomplete. For example, npm test, cargo build, make, npx tsc are all omitted — these can absolutely write files (build artifacts, lockfiles) and would benefit from tracking. The current list only skips commands that are truly read-only (ls, cat, grep, git log, etc.). The denylist approach is correct in its narrowness — it skips only commands that can never write. Commands that might write (build tools) correctly fall through to snapshot. This is good defensive design.

L34 — CLAUDE_PROJECT_DIR fallback: Falls back to . if git rev-parse fails and CLAUDE_PROJECT_DIR is unset. In that case, the snapshot goes to the calling directory, which may not be the project root. This is a pre-existing pattern from other hooks, so it's an acceptable inherited limitation.

L52 — git -C "$PROJECT_DIR" status --porcelain: Correctly scopes to the project root, not CWD. ✓

track-bash-writes.sh

L60–80 — parse_paths awk function: Handles both regular paths and rename format ("old -> new").

⚠️ Edge case — quoted paths with spaces: Git porcelain v1 quotes paths with special characters (spaces, non-ASCII) using C-style quoting: M "path with spaces/file.txt". The awk strips leading/trailing " but does not handle C-escape sequences (e.g. \n, \t in filenames, which git encodes as "path\twith\ttabs"). In practice this is unlikely to matter since session-edits.log and guard-git.sh itself would face the same issue, but it's a gap.

L91 — comm -13: Requires both inputs to be sorted. BEFORE_PATHS and AFTER_PATHS are both sorted on L82–83. ✓ However, there is a subtle correctness issue: if BEFORE_PATHS is empty (perfectly clean working tree before the Bash call), comm receives an empty string from echo "" rather than an empty file/stream, which may produce a spurious empty line that parse_paths then filters to nothing. The [ -z "$AFTER_PATHS" ] guard on L85 handles the fully-clean-after case, but the empty-BEFORE case is fine since comm -13 of (empty, after) yields all of after, which is correct behavior.

L102 — awk '{print $2}' on session-edits.log: Extracts the second whitespace-delimited field (the path). This will silently fail to match log entries where the path contains spaces — but track-edits.sh L45 and track-bash-writes.sh L116 both write "$TS $REL_PATH" with no quoting, so paths with spaces break the log format across all hooks, not just here. Pre-existing issue.

L55 — Snapshot cleanup (rm -f): The snapshot is deleted after each post-hook run. This means if the post-hook is skipped (e.g., timeout), the stale snapshot from the previous command will be consumed by the next command's post-hook, potentially generating false log entries. The [ ! -f "$SNAPSHOT_FILE" ] guard (L44) mitigates this by checking for file existence — but once the pre-hook writes a new snapshot, the stale file is overwritten. The risk window is: pre-hook A writes snapshot A → post-hook A times out → pre-hook B overwrites with snapshot B → post-hook B computes (B_after - B_before) which is correct. So the stale-file scenario doesn't actually cause false positives. ✓

L113 — Double-logging check: Uses grep -qxF (fixed string, exact line match) which is correct. ✓

settings.json

L12 — snapshot-pre-bash.sh as first PreToolUse Bash entry: Correct placement — it must run before guard-git.sh (which fires on commits). Since the pre-hooks run sequentially in order, this ordering guarantees the snapshot is taken before any commit decision. ✓

L89 — track-bash-writes.sh as first PostToolUse Bash entry: Correct — runs before track-moves.sh and post-git-ops.sh. ✓

SKILL.md changes (commit a372b82)

Two documentation-only fixes:

  1. Adds close to the --start-from enum in the argument hint — it was documented in the body but missing from the hint.
  2. Adds grind to the <PHASE> replacement list — it was also missing from the list despite a dedicated grind section existing.

Both are clearly correct. ✓

.gitignore additions (commit 3892e7d)

.claude/session-edits.log
.claude/worktrees/
generated/DEPENDENCIES.md
generated/DEPENDENCIES.json

session-edits.log is runtime state (correct to ignore). worktrees/ is also runtime (correct). The generated/ entries are artifact paths. All appropriate.

Wait — .claude/session-edits.log and .claude/worktrees/ look like they were already being ignored or it's a new addition. Looking at the diff, artifacts/, pkg/, target/ are also added. These are Rust/cargo build output dirs. All appropriate. ✓


Phase 5: Backlog Compliance

  • Zero-dep: ✓ — No new dependencies added. Uses bash, awk, and Node.js (already a hard dependency). git CLI usage is already assumed.
  • Foundation-aligned: ✓ — This is pure developer tooling (hooks), not part of the core pipeline. Does not add cloud calls, Docker, external DBs, or code generation. Does not affect the programmatic API.
  • Problem-fit: 2/5 — This is hooks infrastructure that helps Claude Code sessions avoid false-positive commit blocks. It's legitimate quality-of-life work for the development workflow of this specific repo but does not directly reduce token waste, prevent hallucinated deps, or catch blast-radius breakage for AI agents using codegraph. It's internal tooling, not a feature of the codegraph product itself.
  • Breaking: No — Purely additive. Existing hooks unchanged. Existing behavior unaffected.
  • Tier: 1 — Zero-dep ✓, foundation-aligned ✓

Critical Concerns

Minor — Subprocess overhead on every Bash call: Each Bash hook call now spawns 2 additional Node.js processes (JSON parse + SHA1 hash) in both pre and post hooks, totalling 4 extra Node spawns per Bash tool call. For typical development sessions with tens or hundreds of Bash calls, this adds latency. A lighter implementation using shell-only hashing (echo "$PROJECT_DIR" | sha1sum | cut -c1-8 — available on Linux, shasum on macOS) would eliminate the Node process spawning for the hash. The JSON parsing step is harder to avoid without jq. This is a performance nit, not a blocker.

Minor — close missing from the --start-from denylist check in SKILL.md --start-from enum was fixed (commit a372b82), but this PR contains 4 commits touching different concerns (hooks, SKILL.md docs, test cleanup, .gitignore). Per repo CLAUDE.md: "One PR = one concern." The non-hook changes (commits ef8ea4f, 3892e7d, a372b82) are unrelated to the hooks fix. They are benign and arguably landed in the same branch incidentally, but they should ideally be separate PRs.


Final Recommendation

## Root Cause Analysis
Problem Being Solved: guard-git.sh false-positives on files written by Bash commands
  (sed -i, printf redirects, build artifacts) that were never tracked in session-edits.log
Why This Approach: Correct — git state diffing is the only robust, language-agnostic method
Risk Assessment: Very low — both hooks are informational-only (always exit 0)

## Backlog Compliance
- Zero-dep: ✓ — no new runtime dependencies
- Foundation-aligned: ✓ — internal developer tooling, doesn't touch product pipeline
- Problem-fit: 2/5 — workflow fix for Claude Code sessions, not a codegraph product feature
- Breaking: No — purely additive
- Tier: 1

## Concerns
1. MINOR: 4 Node.js subprocess spawns per Bash call for hashing/parsing — shell-based
   sha1sum/shasum could eliminate 2 of these
2. MINOR: PR mixes 4 separate concerns (hooks fix, SKILL.md docs, test cleanup, .gitignore) —
   violates repo's "one PR = one concern" rule from CLAUDE.md
3. MINOR: Quoted paths with C-escape sequences in git porcelain output not handled in
   parse_paths() awk — unlikely to matter in practice but technically incomplete

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE WITH CONDITIONS
- Reasoning: The core fix is correct and well-implemented. Root cause is real,
  approach is sound, code is defensive (always exits 0, handles missing snapshots
  gracefully, worktree-safe). The minor concerns don't block the fix. The mixed-commit
  concern (CLAUDE.md: "one PR = one concern") is worth noting but the unrelated changes
  are all small and correct — splitting retroactively would be more disruptive than helpful.
  Approve and merge.

@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds a PreToolUse/PostToolUse Bash hook pair that snapshots git status --porcelain before each Bash call and diffs it after, logging newly-dirty paths to session-edits.log so guard-git.sh no longer false-positive-blocks commits for files written by sed -i, redirects, and build tools.

  • snapshot-pre-bash.sh (PreToolUse): skips provably-read-only commands, then keys a /tmp snapshot file by (project_hash, cmd_hash) to handle parallel worktrees and reduce — but not fully eliminate — concurrent-call collisions.
  • track-bash-writes.sh (PostToolUse): derives the same key, diffs BEFORE/AFTER with comm -13, and appends newly-dirty paths with deduplication against the existing log; always exits 0.
  • settings.json: registers both hooks as the first entries in their respective Bash hook lists.

Confidence Score: 4/5

Safe to merge for the common case; a residual race exists when two identical Bash commands run concurrently, but this is an edge case and the overall fix is a clear improvement over the pre-PR state.

The pre/post hook snapshot approach correctly handles the vast majority of Bash-write scenarios, and the (project, command) keying fixes the previously-flagged parallel-worktree and different-command concurrent-call races. The remaining gap is two concurrent Bash calls that happen to use the exact same command string — the second post-hook deletes the shared snapshot file and the first post-hook exits without recording its writes, leaving those paths unlogged and blocked at commit time. The tool_use_id field in hook stdin JSON would eliminate this entirely.

snapshot-pre-bash.sh — the CMD_HASH snapshot-key derivation is the only remaining concern; track-bash-writes.sh and settings.json look correct.

Important Files Changed

Filename Overview
.claude/hooks/snapshot-pre-bash.sh New PreToolUse hook that snapshots git status before each Bash call; correctly excludes read-only-only commands and uses per-call keying, but the (project, command) hash key still collides when the same command runs concurrently.
.claude/hooks/track-bash-writes.sh New PostToolUse hook; correctly diffs before/after git-status, handles renames, deduplicates against existing log entries, and cleans up the snapshot file.
.claude/settings.json Wires snapshot-pre-bash.sh as the first PreToolUse Bash entry and track-bash-writes.sh as the first PostToolUse Bash entry; ordering and timeout values are consistent with existing hooks.

Sequence Diagram

sequenceDiagram
    participant CC as Claude Code
    participant Pre as snapshot-pre-bash.sh
    participant Bash as Bash tool
    participant Post as track-bash-writes.sh
    participant Log as session-edits.log
    participant Guard as guard-git.sh

    CC->>Pre: PreToolUse (stdin JSON w/ command)
    Pre->>Pre: skip read-only commands
    Pre->>Pre: hash(PROJECT_DIR) + hash(COMMAND)
    Pre->>Pre: "git status --porcelain → /tmp/snapshot-{P}-{C}.txt"

    CC->>Bash: Execute command (e.g. sed -i, npm build)
    Bash-->>CC: Result

    CC->>Post: PostToolUse (same stdin JSON)
    Post->>Post: derive same snapshot path
    Post->>Post: git status --porcelain (AFTER)
    Post->>Post: comm -13 BEFORE AFTER → NEW_PATHS
    Post->>Log: append TIMESTAMP path for each new path
    Post->>Post: rm snapshot file

    CC->>Guard: PreToolUse (git commit ...)
    Guard->>Log: read EDITED_FILES
    Guard->>Guard: compare staged files vs EDITED_FILES
    Guard-->>CC: allow or deny commit
Loading

Reviews (6): Last reviewed commit: "fix(hooks): remove node -e/-p from read-..." | Re-trigger Greptile

Comment thread .claude/hooks/snapshot-pre-bash.sh Outdated
Comment thread .claude/hooks/snapshot-pre-bash.sh Outdated
Comment thread .claude/hooks/snapshot-pre-bash.sh Outdated
…by command hash

Three correctness gaps addressed in the snapshot-pre-bash / track-bash-writes hook pair:

1. echo, printf, find, and awk were listed as read-only skip candidates but all
   four can write files (echo/printf via redirections, find via -exec/-delete,
   awk via getline/redirection). Remove them from the skip list so their file
   writes are captured in session-edits.log.

2. The snapshot file was keyed only by project root hash, causing a race when
   Claude Code issues multiple Bash tool calls in parallel: call B's pre-hook
   would overwrite call A's snapshot before A's post-hook ran, silently dropping
   A's file writes. Fix by including a hash of the command string in the filename
   so each concurrent call gets a distinct snapshot file.
@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm

Copy link
Copy Markdown
Contributor Author

Addressed Claude's feedback: the P1 concern (echo/printf/find/awk in the skip list) was fixed in commit 12364b6 — all four were removed from the read-only skip list since all can write files via redirections or -exec/-delete. The snapshot file path now also includes a CMD_HASH to handle concurrent parallel Bash calls (P2 concern). The minor observations (subprocess overhead, C-escape sequences in paths, mixed PR scope) are noted; the subprocess approach uses the already-required node runtime so adds no new dependency, and the C-escape edge case degrades to double-logging which is benign.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

Fixed in commit c3d4722: removed node -e and node -p from the read-only skip list in snapshot-pre-bash.sh — both can write files via the Node.js fs module, so they must fall through to the snapshot/diff path. Updated the comment block to document why they're excluded.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 9de318c into main Jun 13, 2026
22 checks passed
@carlos-alm carlos-alm deleted the fix/hooks-bash-tracking-1457 branch June 13, 2026 04:42
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hooks: session-edit tracking misses Bash file modifications (sed, printf, redirects), causing false-positive commit blocks

1 participant