Skip to content

Commit 0851b68

Browse files
ComBbaclaude
andcommitted
fix(review): Phase M — Gemini + Codex review feedback (7 issues)
Applies reviewer findings from PR #5 external code review. All tests still pass (45/45 verify-plugin + expanded CI matrices). 🔴 Critical — command injection in detect-surface.sh (Gemini) Before: python3 -c "json.loads('''$JSON''')" — backticks or $(…) in idea text would execute before python parsed them. After: payload piped via stdin (printf + sys.stdin.read). No shell interpolation of user-controlled data anywhere in the script. Second python emission (scores JSON) also moved to env vars for defense-in-depth consistency. CI: new security-regression test injects `touch /tmp/pf-injection-canary` and asserts the file is NOT created. 🟡 Important — cost-regression wrote to wrong table (Codex P1) Before: CREATE TABLE events(ts, kind, severity, payload) — not readable by /pf:status or /pf:budget which query the canonical `blackboard` table. After: writes to `blackboard` with schema matching CLAUDE.md §6: (ts, agent_id, key, value, tier, dept). key = "status.cost_{warn|alert}", value = JSON payload, agent_id = "cost-regression", tier = 1 (Meta), dept = "meta". idx_bb_key index created for the polling pattern. CI: new schema assertion confirms `blackboard` table exists after breach. 🟡 Medium — Korean single-char tokens dropped (Gemini) Before: filter len(t) > 1 dropped 앱 (app), 웹 (web), 봇 (bot), 툴 (tool). After: len(t) > 1 OR not t.isascii() — ASCII single-chars still dropped as noise (a, i, o), non-ASCII single chars kept as signal. STOPWORDS already handles Korean particles (가, 는, 을, …). CI: new KR on-idea fixture exercises the path. 🟡 Medium — grep -oc counted lines not occurrences (Gemini) Before: grep -oc returns line count (max 1 for single-line text), so "api api api" scored 1 instead of 3, breaking the `rest > 2*ui` rule. After: grep -o | wc -l — true occurrence count. CI: regression guard asserts `"rest": 3` for "api api api". 🟡 Medium — ls in for loop fragile (Gemini) Before: for d in $(ls -d runs/*/ 2>/dev/null) — word-splits on spaces, ARG_MAX limit under many runs. After: for d in runs/*/ ; do [ -d "$d" ] || continue — glob-safe. 🟡 Medium — profile["cost_ceiling"] KeyError risk (Gemini) Before: direct subscript assumed schema compliance at runtime. After: profile.get("cost_ceiling") + field presence check → early return 0. Schema validation at CI still prevents broken profiles from merging, but runtime is now defensive. CI: malformed profile fixture asserts exit 0, no crash. 🟢 Nice-to-have — cache key ignored --previews override (Codex P2) Before: cmd_key(idea, profile) derived advocate count from profile default only. Same idea + pro profile + --previews=9 vs --previews=18 collided. After: cmd_key(idea, profile, previews_override?) — 3rd optional arg. When set, overrides the profile's default count in the key input. Backwards compatible (2-arg callers unchanged). CI: new assertion K(idea,pro) != K(idea,pro,9). Also hardened all other python3 -c shell-interpolation points in preview-cache.sh (cmd_get + cmd_prune) to pass paths via argv — no longer interpolated into python source. Defense-in-depth against future path injection even though cache dir is under user's HOME. Test matrix growth: - detect-surface: 3 → 5 cases (+ occurrence regression + injection canary) - cost-regression: 6 → 8 cases (+ blackboard schema + defensive unknown profile) - preview-cache: 4 → 5 cases (+ --previews override produces distinct key) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9031fc3 commit 0851b68

6 files changed

Lines changed: 150 additions & 52 deletions

File tree

.github/workflows/ci.yml

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -221,11 +221,37 @@ jobs:
221221
else:
222222
print(f" ✗ {name}: expected {expected}, got {p.returncode}")
223223
failed += 1
224+
225+
# Canonical blackboard schema: confirm breach writes to `blackboard` (not `events`)
226+
import sqlite3
227+
breach_run = [r for name, profile, snap, expected in cases if expected > 0
228+
for r in [f"{tmp}/runs/r-{profile}-{snap['tokens_total']}"]
229+
if os.path.exists(f"{r}/blackboard.db")][0]
230+
con = sqlite3.connect(f"{breach_run}/blackboard.db")
231+
tables = [r[0] for r in con.execute("SELECT name FROM sqlite_master WHERE type='table'")]
232+
if "blackboard" not in tables:
233+
print(f" ✗ schema: canonical 'blackboard' table missing, found: {tables}")
234+
failed += 1
235+
else:
236+
print(f" ✓ schema: writes to canonical 'blackboard' table (not 'events')")
237+
238+
# Defensive: malformed profile must NOT crash
239+
bad_run = f"{tmp}/runs/r-malformed"
240+
os.makedirs(bad_run, exist_ok=True)
241+
with open(f"{bad_run}/.profile", "w") as f: f.write("nonexistent")
242+
with open(f"{bad_run}/cost-snapshot.json", "w") as f: json.dump({"tokens_total": 999999}, f)
243+
p = subprocess.run(["python3", "plugins/preview-forge/hooks/cost-regression.py", bad_run], env=env, capture_output=True, text=True)
244+
if p.returncode == 0:
245+
print(f" ✓ defensive: unknown profile returns 0 (no crash)")
246+
else:
247+
print(f" ✗ defensive: unknown profile returned {p.returncode}: {p.stderr}")
248+
failed += 1
249+
224250
shutil.rmtree(tmp)
225251
if failed:
226252
print(f"FAIL: {failed} cost-regression cases")
227253
sys.exit(1)
228-
print("✓ cost-regression: 6/6 tests pass")
254+
print("✓ cost-regression: 8/8 tests pass (6 classification + schema + defensive)")
229255
PYEOF
230256
231257
- name: Test detect-surface (Proposal #2)
@@ -236,7 +262,14 @@ jobs:
236262
echo "$R2" | grep -q '"surface": "ui-first"' || { echo "FAIL: UI case: $R2"; exit 1; }
237263
R3=$(bash scripts/detect-surface.sh <<<'{"text":"Admin panel with dashboard UI and REST API for programmatic access. Self-service customer portal with settings page."}')
238264
echo "$R3" | grep -q '"surface": "hybrid"' || { echo "FAIL: hybrid case: $R3"; exit 1; }
239-
echo "✓ detect-surface: 3/3 fixtures classify correctly"
265+
# Regression guard: grep -oc vs grep -o|wc -l — three "api" must score 3, not 1.
266+
R4=$(bash scripts/detect-surface.sh <<<'{"text":"api api api"}')
267+
echo "$R4" | grep -q '"rest": 3' || { echo "FAIL: grep occurrence count regression: $R4"; exit 1; }
268+
# Security guard: command injection in idea text must not execute.
269+
rm -f /tmp/pf-injection-canary
270+
bash scripts/detect-surface.sh <<<'{"text":"`touch /tmp/pf-injection-canary` injected"}' > /dev/null
271+
[[ ! -f /tmp/pf-injection-canary ]] || { echo "FAIL: command injection in idea text executed"; rm /tmp/pf-injection-canary; exit 1; }
272+
echo "✓ detect-surface: 5/5 cases (3 classify + 1 occurrence + 1 security)"
240273
241274
- name: Test preview-cache (Proposal #11)
242275
env:
@@ -249,6 +282,9 @@ jobs:
249282
[[ "$K1" == "$K2" ]] || { echo "FAIL: key not deterministic"; exit 1; }
250283
K3=$(bash scripts/preview-cache.sh key "build todo app" standard)
251284
[[ "$K1" != "$K3" ]] || { echo "FAIL: profile doesn't change key"; exit 1; }
285+
# --previews=N override must produce a distinct key (same idea + profile, different N)
286+
K4=$(bash scripts/preview-cache.sh key "build todo app" pro 9)
287+
[[ "$K1" != "$K4" ]] || { echo "FAIL: --previews override didn't change key"; exit 1; }
252288
echo '{"profile":"pro","previews":[]}' > /tmp/pf-test.json
253289
bash scripts/preview-cache.sh put "$K1" /tmp/pf-test.json
254290
bash scripts/preview-cache.sh get "$K1" > /dev/null || { echo "FAIL: get miss after put"; exit 1; }

plugins/preview-forge/hooks/cost-regression.py

Lines changed: 35 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -71,21 +71,36 @@ def load_snapshot(run_dir: Path) -> dict | None:
7171

7272

7373
def write_blackboard(run_dir: Path, severity: str, payload: dict) -> None:
74-
"""Insert a row into runs/<id>/blackboard.db. Creates table on demand."""
74+
"""Insert a row into runs/<id>/blackboard.db `blackboard` table.
75+
76+
Schema matches CLAUDE.md §6 so /pf:status and /pf:budget can read it
77+
using the same SELECT patterns as other hooks (auto-retro-trigger,
78+
factory-policy observability, supervisor polling).
79+
"""
7580
db = run_dir / "blackboard.db"
7681
try:
7782
con = sqlite3.connect(str(db))
7883
con.execute("""
79-
CREATE TABLE IF NOT EXISTS events (
80-
ts INTEGER NOT NULL,
81-
kind TEXT NOT NULL,
82-
severity TEXT NOT NULL,
83-
payload TEXT NOT NULL
84+
CREATE TABLE IF NOT EXISTS blackboard (
85+
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
86+
agent_id TEXT NOT NULL,
87+
key TEXT NOT NULL,
88+
value TEXT,
89+
tier INTEGER,
90+
dept TEXT
8491
)
8592
""")
93+
con.execute("CREATE INDEX IF NOT EXISTS idx_bb_key ON blackboard(key)")
8694
con.execute(
87-
"INSERT INTO events (ts, kind, severity, payload) VALUES (?, ?, ?, ?)",
88-
(int(time.time()), "cost-regression", severity, json.dumps(payload)),
95+
"INSERT INTO blackboard (agent_id, key, value, tier, dept) "
96+
"VALUES (?, ?, ?, ?, ?)",
97+
(
98+
"cost-regression",
99+
f"status.cost_{severity}",
100+
json.dumps(payload),
101+
1, # Meta tier
102+
"meta",
103+
),
89104
)
90105
con.commit()
91106
con.close()
@@ -124,19 +139,28 @@ def main(argv: list[str]) -> int:
124139
if not profile:
125140
return 0
126141

142+
ceiling = profile.get("cost_ceiling")
143+
if not ceiling or not all(
144+
k in ceiling for k in ("p95_tokens", "p95_minutes", "hard_tokens", "hard_minutes")
145+
):
146+
# Malformed or partial profile — treat as "no baseline", skip.
147+
# Schema validation in CI catches this at merge time, but guard
148+
# at runtime so a bad user-authored profile doesn't crash runs.
149+
return 0
150+
127151
snap = load_snapshot(run_dir)
128152
if not snap:
129153
return 0
130154

131155
tokens = int(snap.get("tokens_total", 0))
132156
minutes = float(snap.get("elapsed_minutes", 0))
133157

134-
severity, reason = classify(tokens, minutes, profile["cost_ceiling"])
158+
severity, reason = classify(tokens, minutes, ceiling)
135159
payload = {
136-
"profile": profile["name"],
160+
"profile": profile.get("name", "unknown"),
137161
"tokens": tokens,
138162
"minutes": minutes,
139-
"ceiling": profile["cost_ceiling"],
163+
"ceiling": ceiling,
140164
"reason": reason,
141165
}
142166

plugins/preview-forge/hooks/idea-drift-detector.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,9 +63,18 @@
6363

6464

6565
def tokenize(text: str) -> set[str]:
66-
"""Lowercased word set with stopwords removed."""
66+
"""Lowercased word set with stopwords removed.
67+
68+
Keeps single-character CJK/Hangul tokens (앱, 웹, 봇, 툴) because
69+
they carry product-intent meaning in Korean ideas, while dropping
70+
single-char ASCII tokens (a, i, o) which are almost always noise.
71+
Korean particles (가, 는, 을, …) are already in STOPWORDS.
72+
"""
6773
tokens = {w.lower() for w in WORD_RE.findall(text or "")}
68-
return {t for t in tokens if len(t) > 1 and t not in STOPWORDS}
74+
return {
75+
t for t in tokens
76+
if t not in STOPWORDS and (len(t) > 1 or not t.isascii())
77+
}
6978

7079

7180
def containment(reference: set[str], candidate: set[str]) -> float:

plugins/preview-forge/monitors/monitors.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
},
77
{
88
"name": "cost-regression",
9-
"command": "for d in $(ls -d runs/*/ 2>/dev/null); do python3 ${CLAUDE_PLUGIN_ROOT}/hooks/cost-regression.py \"$d\" 2>&1 | grep -E '(WARN|ALERT)' || true; done; sleep 30",
9+
"command": "for d in runs/*/; do [ -d \"$d\" ] || continue; python3 \"${CLAUDE_PLUGIN_ROOT}/hooks/cost-regression.py\" \"$d\" 2>&1 | grep -E '(WARN|ALERT)' || true; done; sleep 30",
1010
"description": "P0-B cost-regression sentinel — per-profile P95/hard ceiling breach detection. Writes blackboard row + emits to stderr. M1 supervisor reacts to alert severity."
1111
},
1212
{

scripts/detect-surface.sh

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,13 @@ else
2929
JSON=$(cat "$INPUT")
3030
fi
3131

32-
# Extract idea fields (handle missing gracefully). No jq dep — python3 only.
33-
IDEA_TEXT=$(python3 -c "
32+
# Extract idea fields via python stdin (NO shell substitution — prevents
33+
# command injection from user-controlled idea text like `$(rm -rf ~)` or
34+
# backticks). No jq dep — python3 only.
35+
IDEA_TEXT=$(printf '%s' "$JSON" | python3 -c "
3436
import json, sys
3537
try:
36-
d = json.loads('''$JSON''')
38+
d = json.loads(sys.stdin.read())
3739
parts = [str(d.get('text','')), str(d.get('idea','')), str(d.get('title','')), str(d.get('pitch',''))]
3840
print(' '.join(p for p in parts if p).lower())
3941
except Exception:
@@ -42,7 +44,7 @@ except Exception:
4244

4345
if [[ -z "$IDEA_TEXT" ]]; then
4446
# Treat raw stdin as the idea text.
45-
IDEA_TEXT=$(echo "$JSON" | tr '[:upper:]' '[:lower:]')
47+
IDEA_TEXT=$(printf '%s' "$JSON" | tr '[:upper:]' '[:lower:]')
4648
fi
4749

4850
# REST-first signals
@@ -75,8 +77,12 @@ count_hits() {
7577
shift
7678
local hits=0
7779
for kw in "$@"; do
78-
# word-boundary-ish match; grep -c returns lines, we need occurrences
79-
local n=$(echo "$text" | grep -oc "$kw" 2>/dev/null || true)
80+
# grep -o prints each match on its own line; wc -l counts lines.
81+
# grep -oc returns *line* count (max 1 for single-line text), so
82+
# repeated occurrences within one line would undercount.
83+
local n
84+
n=$(printf '%s' "$text" | grep -o -- "$kw" 2>/dev/null | wc -l | tr -d ' ')
85+
[[ -z "$n" ]] && n=0
8086
hits=$((hits + n))
8187
done
8288
echo "$hits"
@@ -109,12 +115,20 @@ elif [[ "$REST_HITS" -ge "$UI_HITS" && "$REST_HITS" -gt 0 ]]; then
109115
STACK_HINT="nestia"
110116
fi
111117

112-
# Emit single-line JSON, easily consumed by SpecDD lead.
118+
# Emit single-line JSON via env vars (no shell interpolation into python
119+
# source — SURFACE/STACK_HINT come from fixed string literals but we pipe
120+
# through env for consistency with defense-in-depth).
121+
SURFACE="$SURFACE" STACK_HINT="$STACK_HINT" \
122+
REST_HITS="$REST_HITS" UI_HITS="$UI_HITS" HYBRID_HITS="$HYBRID_HITS" \
113123
python3 -c "
114-
import json
124+
import json, os
115125
print(json.dumps({
116-
'surface': '$SURFACE',
117-
'scores': {'rest': $REST_HITS, 'ui': $UI_HITS, 'hybrid': $HYBRID_HITS},
118-
'stack_hint': '$STACK_HINT'
126+
'surface': os.environ['SURFACE'],
127+
'scores': {
128+
'rest': int(os.environ['REST_HITS']),
129+
'ui': int(os.environ['UI_HITS']),
130+
'hybrid': int(os.environ['HYBRID_HITS']),
131+
},
132+
'stack_hint': os.environ['STACK_HINT'],
119133
}))
120134
"

scripts/preview-cache.sh

Lines changed: 40 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -41,19 +41,29 @@ print(hashlib.sha256(data).hexdigest()[:16])
4141
cmd_key() {
4242
local idea="$1"
4343
local profile="${2:-pro}"
44+
# Optional 3rd arg: explicit preview count override (from /pf:new --previews=N).
45+
# When set, the advocate set is distinct from the profile's default count —
46+
# runs with different N must not collide in cache.
47+
local previews_override="${3:-}"
4448

4549
# Load profile's preview count to derive advocate set hash. If profile
4650
# file missing, fall back to the profile name as the set discriminator.
47-
local advocate_set=""
51+
local advocate_count=""
4852
if [[ -n "$PLUGIN_ROOT" && -f "$PLUGIN_ROOT/profiles/$profile.json" ]]; then
49-
advocate_set=$(python3 -c "
50-
import json
51-
p = json.load(open('$PLUGIN_ROOT/profiles/$profile.json'))
52-
print(f'{p[\"previews\"][\"count\"]}-{p[\"name\"]}')
53-
")
54-
else
55-
advocate_set="$profile"
53+
advocate_count=$(python3 -c "
54+
import json, sys
55+
try:
56+
p = json.load(open(sys.argv[1]))
57+
print(p['previews']['count'])
58+
except Exception:
59+
print('')
60+
" "$PLUGIN_ROOT/profiles/$profile.json")
5661
fi
62+
# Override takes precedence if provided.
63+
if [[ -n "$previews_override" ]]; then
64+
advocate_count="$previews_override"
65+
fi
66+
local advocate_set="${advocate_count:-unknown}-${profile}"
5767

5868
printf '%s\x1f%s\x1f%s\x1f%s' "$idea" "$advocate_set" "$MODEL_VERSION" "$profile" | hash
5969
}
@@ -67,28 +77,30 @@ cmd_get() {
6777
fi
6878

6979
# TTL check — compare file mtime against profile's ttl_seconds.
70-
# We need the profile to know TTL, but the key doesn't carry it;
71-
# read .profile field from the cached blob itself.
80+
# Path args passed via argv (NOT shell-interpolated into source) to
81+
# stay safe even if PLUGIN_ROOT or cache keys ever contain odd chars.
7282
local ttl=0
7383
local profile_name
7484
profile_name=$(python3 -c "
75-
import json
85+
import json, sys
7686
try:
77-
d = json.load(open('$file'))
78-
print(d.get('profile', 'pro'))
87+
print(json.load(open(sys.argv[1])).get('profile', 'pro'))
7988
except Exception:
8089
print('pro')
81-
")
90+
" "$file")
8291
if [[ -n "$PLUGIN_ROOT" && -f "$PLUGIN_ROOT/profiles/$profile_name.json" ]]; then
8392
ttl=$(python3 -c "
84-
import json
85-
print(json.load(open('$PLUGIN_ROOT/profiles/$profile_name.json'))['caching']['ttl_seconds'])
86-
")
93+
import json, sys
94+
try:
95+
print(json.load(open(sys.argv[1]))['caching']['ttl_seconds'])
96+
except Exception:
97+
print(0)
98+
" "$PLUGIN_ROOT/profiles/$profile_name.json")
8799
fi
88100

89101
if [[ "$ttl" -gt 0 ]]; then
90102
local age
91-
age=$(python3 -c "import os,time; print(int(time.time() - os.path.getmtime('$file')))")
103+
age=$(python3 -c "import os,sys,time; print(int(time.time() - os.path.getmtime(sys.argv[1])))" "$file")
92104
if [[ "$age" -gt "$ttl" ]]; then
93105
return 1
94106
fi
@@ -122,26 +134,29 @@ cmd_prune() {
122134
[[ -f "$f" ]] || continue
123135
local profile_name
124136
profile_name=$(python3 -c "
125-
import json
137+
import json, sys
126138
try:
127-
print(json.load(open('$f')).get('profile', 'pro'))
139+
print(json.load(open(sys.argv[1])).get('profile', 'pro'))
128140
except Exception:
129141
print('pro')
130-
")
142+
" "$f")
131143
local ttl=0
132144
if [[ -f "$PLUGIN_ROOT/profiles/$profile_name.json" ]]; then
133145
ttl=$(python3 -c "
134-
import json
135-
print(json.load(open('$PLUGIN_ROOT/profiles/$profile_name.json'))['caching']['ttl_seconds'])
136-
")
146+
import json, sys
147+
try:
148+
print(json.load(open(sys.argv[1]))['caching']['ttl_seconds'])
149+
except Exception:
150+
print(0)
151+
" "$PLUGIN_ROOT/profiles/$profile_name.json")
137152
fi
138153
if [[ "$ttl" -eq 0 ]]; then
139154
rm -f "$f"
140155
removed=$((removed + 1))
141156
continue
142157
fi
143158
local age
144-
age=$(python3 -c "import os,time; print(int(time.time() - os.path.getmtime('$f')))")
159+
age=$(python3 -c "import os,sys,time; print(int(time.time() - os.path.getmtime(sys.argv[1])))" "$f")
145160
if [[ "$age" -gt "$ttl" ]]; then
146161
rm -f "$f"
147162
removed=$((removed + 1))

0 commit comments

Comments
 (0)