fix(review): Phase M — Gemini + Codex review feedback (7 issues)

ComBba · claude · ComBba · commit 0851b68757bc · 2026-04-23T06:09:18.000+09:00
Applies reviewer findings from PR #5 external code review. All tests still pass (45/45 verify-plugin + expanded CI matrices). 🔴 Critical — command injection in detect-surface.sh (Gemini) Before: python3 -c "json.loads('''$JSON''')" — backticks or $(…) in idea text would execute before python parsed them. After: payload piped via stdin (printf + sys.stdin.read). No shell interpolation of user-controlled data anywhere in the script. Second python emission (scores JSON) also moved to env vars for defense-in-depth consistency. CI: new security-regression test injects `touch /tmp/pf-injection-canary` and asserts the file is NOT created. 🟡 Important — cost-regression wrote to wrong table (Codex P1) Before: CREATE TABLE events(ts, kind, severity, payload) — not readable by /pf:status or /pf:budget which query the canonical `blackboard` table. After: writes to `blackboard` with schema matching CLAUDE.md §6: (ts, agent_id, key, value, tier, dept). key = "status.cost_{warn|alert}", value = JSON payload, agent_id = "cost-regression", tier = 1 (Meta), dept = "meta". idx_bb_key index created for the polling pattern. CI: new schema assertion confirms `blackboard` table exists after breach. 🟡 Medium — Korean single-char tokens dropped (Gemini) Before: filter len(t) > 1 dropped 앱 (app), 웹 (web), 봇 (bot), 툴 (tool). After: len(t) > 1 OR not t.isascii() — ASCII single-chars still dropped as noise (a, i, o), non-ASCII single chars kept as signal. STOPWORDS already handles Korean particles (가, 는, 을, …). CI: new KR on-idea fixture exercises the path. 🟡 Medium — grep -oc counted lines not occurrences (Gemini) Before: grep -oc returns line count (max 1 for single-line text), so "api api api" scored 1 instead of 3, breaking the `rest > 2*ui` rule. After: grep -o | wc -l — true occurrence count. CI: regression guard asserts `"rest": 3` for "api api api". 🟡 Medium — ls in for loop fragile (Gemini) Before: for d in $(ls -d runs/*/ 2>/dev/null) — word-splits on spaces, ARG_MAX limit under many runs. After: for d in runs/*/ ; do [ -d "$d" ] || continue — glob-safe. 🟡 Medium — profile["cost_ceiling"] KeyError risk (Gemini) Before: direct subscript assumed schema compliance at runtime. After: profile.get("cost_ceiling") + field presence check → early return 0. Schema validation at CI still prevents broken profiles from merging, but runtime is now defensive. CI: malformed profile fixture asserts exit 0, no crash. 🟢 Nice-to-have — cache key ignored --previews override (Codex P2) Before: cmd_key(idea, profile) derived advocate count from profile default only. Same idea + pro profile + --previews=9 vs --previews=18 collided. After: cmd_key(idea, profile, previews_override?) — 3rd optional arg. When set, overrides the profile's default count in the key input. Backwards compatible (2-arg callers unchanged). CI: new assertion K(idea,pro) != K(idea,pro,9). Also hardened all other python3 -c shell-interpolation points in preview-cache.sh (cmd_get + cmd_prune) to pass paths via argv — no longer interpolated into python source. Defense-in-depth against future path injection even though cache dir is under user's HOME. Test matrix growth: - detect-surface: 3 → 5 cases (+ occurrence regression + injection canary) - cost-regression: 6 → 8 cases (+ blackboard schema + defensive unknown profile) - preview-cache: 4 → 5 cases (+ --previews override produces distinct key) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -221,11 +221,37 @@ jobs:
               else:
                   print(f"  ✗ {name}: expected {expected}, got {p.returncode}")
                   failed += 1
+
+          # Canonical blackboard schema: confirm breach writes to `blackboard` (not `events`)
+          import sqlite3
+          breach_run = [r for name, profile, snap, expected in cases if expected > 0
+                        for r in [f"{tmp}/runs/r-{profile}-{snap['tokens_total']}"]
+                        if os.path.exists(f"{r}/blackboard.db")][0]
+          con = sqlite3.connect(f"{breach_run}/blackboard.db")
+          tables = [r[0] for r in con.execute("SELECT name FROM sqlite_master WHERE type='table'")]
+          if "blackboard" not in tables:
+              print(f"  ✗ schema: canonical 'blackboard' table missing, found: {tables}")
+              failed += 1
+          else:
+              print(f"  ✓ schema: writes to canonical 'blackboard' table (not 'events')")
+
+          # Defensive: malformed profile must NOT crash
+          bad_run = f"{tmp}/runs/r-malformed"
+          os.makedirs(bad_run, exist_ok=True)
+          with open(f"{bad_run}/.profile", "w") as f: f.write("nonexistent")
+          with open(f"{bad_run}/cost-snapshot.json", "w") as f: json.dump({"tokens_total": 999999}, f)
+          p = subprocess.run(["python3", "plugins/preview-forge/hooks/cost-regression.py", bad_run], env=env, capture_output=True, text=True)
+          if p.returncode == 0:
+              print(f"  ✓ defensive: unknown profile returns 0 (no crash)")
+          else:
+              print(f"  ✗ defensive: unknown profile returned {p.returncode}: {p.stderr}")
+              failed += 1
+
           shutil.rmtree(tmp)
           if failed:
               print(f"FAIL: {failed} cost-regression cases")
               sys.exit(1)
-          print("✓ cost-regression: 6/6 tests pass")
+          print("✓ cost-regression: 8/8 tests pass (6 classification + schema + defensive)")
           PYEOF
 
       - name: Test detect-surface (Proposal #2)
@@ -236,7 +262,14 @@ jobs:
           echo "$R2" | grep -q '"surface": "ui-first"' || { echo "FAIL: UI case: $R2"; exit 1; }
           R3=$(bash scripts/detect-surface.sh <<<'{"text":"Admin panel with dashboard UI and REST API for programmatic access. Self-service customer portal with settings page."}')
           echo "$R3" | grep -q '"surface": "hybrid"' || { echo "FAIL: hybrid case: $R3"; exit 1; }
-          echo "✓ detect-surface: 3/3 fixtures classify correctly"
+          # Regression guard: grep -oc vs grep -o|wc -l — three "api" must score 3, not 1.
+          R4=$(bash scripts/detect-surface.sh <<<'{"text":"api api api"}')
+          echo "$R4" | grep -q '"rest": 3' || { echo "FAIL: grep occurrence count regression: $R4"; exit 1; }
+          # Security guard: command injection in idea text must not execute.
+          rm -f /tmp/pf-injection-canary
+          bash scripts/detect-surface.sh <<<'{"text":"`touch /tmp/pf-injection-canary` injected"}' > /dev/null
+          [[ ! -f /tmp/pf-injection-canary ]] || { echo "FAIL: command injection in idea text executed"; rm /tmp/pf-injection-canary; exit 1; }
+          echo "✓ detect-surface: 5/5 cases (3 classify + 1 occurrence + 1 security)"
 
       - name: Test preview-cache (Proposal #11)
         env:
@@ -249,6 +282,9 @@ jobs:
           [[ "$K1" == "$K2" ]] || { echo "FAIL: key not deterministic"; exit 1; }
           K3=$(bash scripts/preview-cache.sh key "build todo app" standard)
           [[ "$K1" != "$K3" ]] || { echo "FAIL: profile doesn't change key"; exit 1; }
+          # --previews=N override must produce a distinct key (same idea + profile, different N)
+          K4=$(bash scripts/preview-cache.sh key "build todo app" pro 9)
+          [[ "$K1" != "$K4" ]] || { echo "FAIL: --previews override didn't change key"; exit 1; }
           echo '{"profile":"pro","previews":[]}' > /tmp/pf-test.json
           bash scripts/preview-cache.sh put "$K1" /tmp/pf-test.json
           bash scripts/preview-cache.sh get "$K1" > /dev/null || { echo "FAIL: get miss after put"; exit 1; }
diff --git a/plugins/preview-forge/hooks/cost-regression.py b/plugins/preview-forge/hooks/cost-regression.py
@@ -71,21 +71,36 @@ def load_snapshot(run_dir: Path) -> dict | None:
 
 
 def write_blackboard(run_dir: Path, severity: str, payload: dict) -> None:
-    """Insert a row into runs/<id>/blackboard.db. Creates table on demand."""
+    """Insert a row into runs/<id>/blackboard.db `blackboard` table.
+
+    Schema matches CLAUDE.md §6 so /pf:status and /pf:budget can read it
+    using the same SELECT patterns as other hooks (auto-retro-trigger,
+    factory-policy observability, supervisor polling).
+    """
     db = run_dir / "blackboard.db"
     try:
         con = sqlite3.connect(str(db))
         con.execute("""
-            CREATE TABLE IF NOT EXISTS events (
-                ts INTEGER NOT NULL,
-                kind TEXT NOT NULL,
-                severity TEXT NOT NULL,
-                payload TEXT NOT NULL
+            CREATE TABLE IF NOT EXISTS blackboard (
+                ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                agent_id TEXT NOT NULL,
+                key TEXT NOT NULL,
+                value TEXT,
+                tier INTEGER,
+                dept TEXT
             )
         """)
+        con.execute("CREATE INDEX IF NOT EXISTS idx_bb_key ON blackboard(key)")
         con.execute(
-            "INSERT INTO events (ts, kind, severity, payload) VALUES (?, ?, ?, ?)",
-            (int(time.time()), "cost-regression", severity, json.dumps(payload)),
+            "INSERT INTO blackboard (agent_id, key, value, tier, dept) "
+            "VALUES (?, ?, ?, ?, ?)",
+            (
+                "cost-regression",
+                f"status.cost_{severity}",
+                json.dumps(payload),
+                1,  # Meta tier
+                "meta",
+            ),
         )
         con.commit()
         con.close()
@@ -124,19 +139,28 @@ def main(argv: list[str]) -> int:
     if not profile:
         return 0
 
+    ceiling = profile.get("cost_ceiling")
+    if not ceiling or not all(
+        k in ceiling for k in ("p95_tokens", "p95_minutes", "hard_tokens", "hard_minutes")
+    ):
+        # Malformed or partial profile — treat as "no baseline", skip.
+        # Schema validation in CI catches this at merge time, but guard
+        # at runtime so a bad user-authored profile doesn't crash runs.
+        return 0
+
     snap = load_snapshot(run_dir)
     if not snap:
         return 0
 
     tokens = int(snap.get("tokens_total", 0))
     minutes = float(snap.get("elapsed_minutes", 0))
 
-    severity, reason = classify(tokens, minutes, profile["cost_ceiling"])
+    severity, reason = classify(tokens, minutes, ceiling)
     payload = {
-        "profile": profile["name"],
+        "profile": profile.get("name", "unknown"),
         "tokens": tokens,
         "minutes": minutes,
-        "ceiling": profile["cost_ceiling"],
+        "ceiling": ceiling,
         "reason": reason,
     }
 
diff --git a/plugins/preview-forge/hooks/idea-drift-detector.py b/plugins/preview-forge/hooks/idea-drift-detector.py
@@ -63,9 +63,18 @@
 
 
 def tokenize(text: str) -> set[str]:
-    """Lowercased word set with stopwords removed."""
+    """Lowercased word set with stopwords removed.
+
+    Keeps single-character CJK/Hangul tokens (앱, 웹, 봇, 툴) because
+    they carry product-intent meaning in Korean ideas, while dropping
+    single-char ASCII tokens (a, i, o) which are almost always noise.
+    Korean particles (가, 는, 을, …) are already in STOPWORDS.
+    """
     tokens = {w.lower() for w in WORD_RE.findall(text or "")}
-    return {t for t in tokens if len(t) > 1 and t not in STOPWORDS}
+    return {
+        t for t in tokens
+        if t not in STOPWORDS and (len(t) > 1 or not t.isascii())
+    }
 
 
 def containment(reference: set[str], candidate: set[str]) -> float:
diff --git a/plugins/preview-forge/monitors/monitors.json b/plugins/preview-forge/monitors/monitors.json
@@ -6,7 +6,7 @@
   },
   {
     "name": "cost-regression",
-    "command": "for d in $(ls -d runs/*/ 2>/dev/null); do python3 ${CLAUDE_PLUGIN_ROOT}/hooks/cost-regression.py \"$d\" 2>&1 | grep -E '(WARN|ALERT)' || true; done; sleep 30",
+    "command": "for d in runs/*/; do [ -d \"$d\" ] || continue; python3 \"${CLAUDE_PLUGIN_ROOT}/hooks/cost-regression.py\" \"$d\" 2>&1 | grep -E '(WARN|ALERT)' || true; done; sleep 30",
     "description": "P0-B cost-regression sentinel — per-profile P95/hard ceiling breach detection. Writes blackboard row + emits to stderr. M1 supervisor reacts to alert severity."
   },
   {
diff --git a/scripts/detect-surface.sh b/scripts/detect-surface.sh
@@ -29,11 +29,13 @@ else
   JSON=$(cat "$INPUT")
 fi
 
-# Extract idea fields (handle missing gracefully). No jq dep — python3 only.
-IDEA_TEXT=$(python3 -c "
+# Extract idea fields via python stdin (NO shell substitution — prevents
+# command injection from user-controlled idea text like `$(rm -rf ~)` or
+# backticks). No jq dep — python3 only.
+IDEA_TEXT=$(printf '%s' "$JSON" | python3 -c "
 import json, sys
 try:
-    d = json.loads('''$JSON''')
+    d = json.loads(sys.stdin.read())
     parts = [str(d.get('text','')), str(d.get('idea','')), str(d.get('title','')), str(d.get('pitch',''))]
     print(' '.join(p for p in parts if p).lower())
 except Exception:
@@ -42,7 +44,7 @@ except Exception:
 
 if [[ -z "$IDEA_TEXT" ]]; then
   # Treat raw stdin as the idea text.
-  IDEA_TEXT=$(echo "$JSON" | tr '[:upper:]' '[:lower:]')
+  IDEA_TEXT=$(printf '%s' "$JSON" | tr '[:upper:]' '[:lower:]')
 fi
 
 # REST-first signals
@@ -75,8 +77,12 @@ count_hits() {
   shift
   local hits=0
   for kw in "$@"; do
-    # word-boundary-ish match; grep -c returns lines, we need occurrences
-    local n=$(echo "$text" | grep -oc "$kw" 2>/dev/null || true)
+    # grep -o prints each match on its own line; wc -l counts lines.
+    # grep -oc returns *line* count (max 1 for single-line text), so
+    # repeated occurrences within one line would undercount.
+    local n
+    n=$(printf '%s' "$text" | grep -o -- "$kw" 2>/dev/null | wc -l | tr -d ' ')
+    [[ -z "$n" ]] && n=0
     hits=$((hits + n))
   done
   echo "$hits"
@@ -109,12 +115,20 @@ elif [[ "$REST_HITS" -ge "$UI_HITS" && "$REST_HITS" -gt 0 ]]; then
   STACK_HINT="nestia"
 fi
 
-# Emit single-line JSON, easily consumed by SpecDD lead.
+# Emit single-line JSON via env vars (no shell interpolation into python
+# source — SURFACE/STACK_HINT come from fixed string literals but we pipe
+# through env for consistency with defense-in-depth).
+SURFACE="$SURFACE" STACK_HINT="$STACK_HINT" \
+REST_HITS="$REST_HITS" UI_HITS="$UI_HITS" HYBRID_HITS="$HYBRID_HITS" \
 python3 -c "
-import json
+import json, os
 print(json.dumps({
-    'surface': '$SURFACE',
-    'scores': {'rest': $REST_HITS, 'ui': $UI_HITS, 'hybrid': $HYBRID_HITS},
-    'stack_hint': '$STACK_HINT'
+    'surface': os.environ['SURFACE'],
+    'scores': {
+        'rest': int(os.environ['REST_HITS']),
+        'ui': int(os.environ['UI_HITS']),
+        'hybrid': int(os.environ['HYBRID_HITS']),
+    },
+    'stack_hint': os.environ['STACK_HINT'],
 }))
 "
diff --git a/scripts/preview-cache.sh b/scripts/preview-cache.sh
@@ -41,19 +41,29 @@ print(hashlib.sha256(data).hexdigest()[:16])
 cmd_key() {
   local idea="$1"
   local profile="${2:-pro}"
+  # Optional 3rd arg: explicit preview count override (from /pf:new --previews=N).
+  # When set, the advocate set is distinct from the profile's default count —
+  # runs with different N must not collide in cache.
+  local previews_override="${3:-}"
 
   # Load profile's preview count to derive advocate set hash. If profile
   # file missing, fall back to the profile name as the set discriminator.
-  local advocate_set=""
+  local advocate_count=""
   if [[ -n "$PLUGIN_ROOT" && -f "$PLUGIN_ROOT/profiles/$profile.json" ]]; then
-    advocate_set=$(python3 -c "
-import json
-p = json.load(open('$PLUGIN_ROOT/profiles/$profile.json'))
-print(f'{p[\"previews\"][\"count\"]}-{p[\"name\"]}')
-")
-  else
-    advocate_set="$profile"
+    advocate_count=$(python3 -c "
+import json, sys
+try:
+    p = json.load(open(sys.argv[1]))
+    print(p['previews']['count'])
+except Exception:
+    print('')
+" "$PLUGIN_ROOT/profiles/$profile.json")
   fi
+  # Override takes precedence if provided.
+  if [[ -n "$previews_override" ]]; then
+    advocate_count="$previews_override"
+  fi
+  local advocate_set="${advocate_count:-unknown}-${profile}"
 
   printf '%s\x1f%s\x1f%s\x1f%s' "$idea" "$advocate_set" "$MODEL_VERSION" "$profile" | hash
 }
@@ -67,28 +77,30 @@ cmd_get() {
   fi
 
   # TTL check — compare file mtime against profile's ttl_seconds.
-  # We need the profile to know TTL, but the key doesn't carry it;
-  # read .profile field from the cached blob itself.
+  # Path args passed via argv (NOT shell-interpolated into source) to
+  # stay safe even if PLUGIN_ROOT or cache keys ever contain odd chars.
   local ttl=0
   local profile_name
   profile_name=$(python3 -c "
-import json
+import json, sys
 try:
-    d = json.load(open('$file'))
-    print(d.get('profile', 'pro'))
+    print(json.load(open(sys.argv[1])).get('profile', 'pro'))
 except Exception:
     print('pro')
-")
+" "$file")
   if [[ -n "$PLUGIN_ROOT" && -f "$PLUGIN_ROOT/profiles/$profile_name.json" ]]; then
     ttl=$(python3 -c "
-import json
-print(json.load(open('$PLUGIN_ROOT/profiles/$profile_name.json'))['caching']['ttl_seconds'])
-")
+import json, sys
+try:
+    print(json.load(open(sys.argv[1]))['caching']['ttl_seconds'])
+except Exception:
+    print(0)
+" "$PLUGIN_ROOT/profiles/$profile_name.json")
   fi
 
   if [[ "$ttl" -gt 0 ]]; then
     local age
-    age=$(python3 -c "import os,time; print(int(time.time() - os.path.getmtime('$file')))")
+    age=$(python3 -c "import os,sys,time; print(int(time.time() - os.path.getmtime(sys.argv[1])))" "$file")
     if [[ "$age" -gt "$ttl" ]]; then
       return 1
     fi
@@ -122,26 +134,29 @@ cmd_prune() {
     [[ -f "$f" ]] || continue
     local profile_name
     profile_name=$(python3 -c "
-import json
+import json, sys
 try:
-    print(json.load(open('$f')).get('profile', 'pro'))
+    print(json.load(open(sys.argv[1])).get('profile', 'pro'))
 except Exception:
     print('pro')
-")
+" "$f")
     local ttl=0
     if [[ -f "$PLUGIN_ROOT/profiles/$profile_name.json" ]]; then
       ttl=$(python3 -c "
-import json
-print(json.load(open('$PLUGIN_ROOT/profiles/$profile_name.json'))['caching']['ttl_seconds'])
-")
+import json, sys
+try:
+    print(json.load(open(sys.argv[1]))['caching']['ttl_seconds'])
+except Exception:
+    print(0)
+" "$PLUGIN_ROOT/profiles/$profile_name.json")
     fi
     if [[ "$ttl" -eq 0 ]]; then
       rm -f "$f"
       removed=$((removed + 1))
       continue
     fi
     local age
-    age=$(python3 -c "import os,time; print(int(time.time() - os.path.getmtime('$f')))")
+    age=$(python3 -c "import os,sys,time; print(int(time.time() - os.path.getmtime(sys.argv[1])))" "$f")
     if [[ "$age" -gt "$ttl" ]]; then
       rm -f "$f"
       removed=$((removed + 1))

Original file line number	Diff line number	Diff line change
`@@ -6,7 +6,7 @@`
`6`	`6`	`},`
`7`	`7`	`{`
`8`	`8`	`"name": "cost-regression",`
`9`		`- "command": "for d in $(ls -d runs/*/ 2>/dev/null); do python3 ${CLAUDE_PLUGIN_ROOT}/hooks/cost-regression.py \"$d\" 2>&1 \| grep -E '(WARN\|ALERT)' \|\| true; done; sleep 30",`
	`9`	`+ "command": "for d in runs/*/; do [ -d \"$d\" ] \|\| continue; python3 \"${CLAUDE_PLUGIN_ROOT}/hooks/cost-regression.py\" \"$d\" 2>&1 \| grep -E '(WARN\|ALERT)' \|\| true; done; sleep 30",`
`10`	`10`	`"description": "P0-B cost-regression sentinel — per-profile P95/hard ceiling breach detection. Writes blackboard row + emits to stderr. M1 supervisor reacts to alert severity."`
`11`	`11`	`},`
`12`	`12`	`{`