You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(detect): restore secret + legacy-phrase coverage; strengthen HARD FP control
Addresses three Codex findings on Spec 077 US1's deterministic detect engine.
#3 (MED): removing the legacy security.NewDetector(nil) path silently dropped
sensitive-file-path and high-entropy secret coverage. The secret.embedded
check now restores both (curated sensitive-path regexes + a self-contained
Shannon-entropy scan, stdlib only), keeping detect offline/deterministic with
no new dependency.
#4 (MED): with the legacy tpaRules deleted, several dangerous phrases matched
neither tier. Restored per spec posture: a high-confidence guardrail override
("ignore your guidelines") is HARD phrase.injection; weaker, benignly-phrasable
directives ("always call this tool first", "before using any other tool",
"developer mode", external data-forwarding) are SOFT directive.imperative
(review-only). Data-forwarding requires an external/remote target so benign
first-party uploads do not match.
#5 (MED): strengthened the HARD false-positive control with colon-anchored
content cues ("text:", "output:", ...). A benign tool that RETURNS an injection
string ("Returns training text: ignore all previous instructions ...") is now
example-position and discounted below the hard floor, without losing recall on
genuine period-introduced imperatives.
Corpus: added a gated malicious guardrail-override positive and an
attack-resembling benign hard-negative for the returns-content case; the
scan-eval gate stays at recall 1.0 / FP 0.0. Unit tests cover the restored
secret categories, the SOFT legacy phrases, the HARD guardrail override, the
benign near-miss, and the colon-cue position classification.
Related: Spec 077 (specs/077-scanner-simplification)
// "before using any other tool". Broader than tool-preamble and benignly
75
+
// phrasable ("call this tool first to authenticate"), so it lives in the
76
+
// SOFT tier (review, never auto-quarantine). "always" stems to "alway".
77
+
re: regexp.MustCompile(`\b(?:(?:alway\w* )?(?:call|us\w*|invok\w*|run\w*) this tool (?:first|before)|before (?:us\w*|call\w*|invok\w*|run\w*) any other tool)\b`),
0 commit comments