test(security): recognize phrase_injection as a gated detect category (Spec 077 US1)

Dumbris · Dumbris · commit 0f5838e6e24d · 2026-07-01T09:31:34.000+03:00
Related #784 Related: Spec 077 (specs/077-scanner-simplification) The detect-corpus validator (specs/065-evaluation-foundation/datasets) hardcodes the set of coherent malicious categories and the gated-category coverage rules. Spec 077 US1 promoted phrase_injection to a real gated hard category (registered in cmd/scan-eval gateChecks + categoryCheck), so the validator must recognize it or reject the new corpus entries. ## Changes - validDetectCategory: accept malicious category "phrase_injection". - gatedDetectCategories: add "phrase_injection" (now measured by the gate; capability_mismatch stays excluded — soft/measured-not-gated). - hardNegPrefix: map "phrase_injection" -> "hn_phrase". - Rename the two branch-local phrase_injection hard-negatives (hn_send_email/hn_upload_file -> hn_phrase_*) to satisfy the id-prefix convention. Pre-existing corpus entries untouched (append-only respected). This STRENGTHENS coverage: the gate now requires phrase_injection to carry both malicious samples and resembling hard-negatives. ## Testing - go test ./... — all ok (exit 0); previously-failing TestDetectCorpus_SchemaAndProvenance + TestDetectCorpus_GatedCoverage pass. - scan-eval --gate — recall 1.0000, fp 0.0000 (phrase_injection gated 7/7). - golangci-lint v2 clean.
diff --git a/specs/065-evaluation-foundation/datasets/detect_corpus_test.go b/specs/065-evaluation-foundation/datasets/detect_corpus_test.go
@@ -20,11 +20,13 @@ import (
 
 const detectCorpusFile = "detect_corpus_v1.json"
 
-// gatedDetectCategories are the malicious taxonomies a US1-merged detect.Engine
-// can measure today. capability_mismatch is intentionally excluded — its check
-// lands in US2, so the corpus may carry samples but the gate must not enforce
-// them yet (the gate handles this via its category→check registration map).
-var gatedDetectCategories = []string{"unicode_smuggling", "decoded_payload", "shadowing"}
+// gatedDetectCategories are the malicious taxonomies the detect.Engine can
+// measure today. phrase_injection joined the gate in Spec 077 US1 (its curated
+// hard-tier check is registered in gateChecks). capability_mismatch is still
+// intentionally excluded — its check is soft/measured-not-gated, so the corpus
+// may carry samples but the gate must not enforce them (the gate handles this
+// via its category→check registration map).
+var gatedDetectCategories = []string{"unicode_smuggling", "decoded_payload", "shadowing", "phrase_injection"}
 
 // hardNegPrefix maps a gated category to the id prefix its resembling
 // hard-negatives use, so INV-3 (which attack a benign FP mimics) stays
@@ -33,6 +35,7 @@ var hardNegPrefix = map[string]string{
 	"unicode_smuggling": "hn_unicode",
 	"decoded_payload":   "hn_decoded",
 	"shadowing":         "hn_shadowing",
+	"phrase_injection":  "hn_phrase",
 }
 
 type detectTool struct {
@@ -67,7 +70,7 @@ func validDetectCategory(label, category string) bool {
 	switch label {
 	case "malicious":
 		switch category {
-		case "unicode_smuggling", "decoded_payload", "shadowing", "capability_mismatch":
+		case "unicode_smuggling", "decoded_payload", "shadowing", "phrase_injection", "capability_mismatch":
 			return true
 		}
 	case "benign":
diff --git a/specs/065-evaluation-foundation/datasets/detect_corpus_v1.json b/specs/065-evaluation-foundation/datasets/detect_corpus_v1.json
@@ -642,7 +642,7 @@
       }
     },
     {
-      "id": "hn_send_email",
+      "id": "hn_phrase_send_email",
       "label": "benign",
       "category": "hard_negative",
       "resembles": "phrase_injection",
@@ -657,7 +657,7 @@
       }
     },
     {
-      "id": "hn_upload_file",
+      "id": "hn_phrase_upload_file",
       "label": "benign",
       "category": "hard_negative",
       "resembles": "phrase_injection",

Original file line number	Diff line number	Diff line change
`@@ -642,7 +642,7 @@`
`642`	`642`	`}`
`643`	`643`	`},`
`644`	`644`	`{`
`645`		`- "id": "hn_send_email",`
	`645`	`+ "id": "hn_phrase_send_email",`
`646`	`646`	`"label": "benign",`
`647`	`647`	`"category": "hard_negative",`
`648`	`648`	`"resembles": "phrase_injection",`
`@@ -657,7 +657,7 @@`
`657`	`657`	`}`
`658`	`658`	`},`
`659`	`659`	`{`
`660`		`- "id": "hn_upload_file",`
	`660`	`+ "id": "hn_phrase_upload_file",`
`661`	`661`	`"label": "benign",`
`662`	`662`	`"category": "hard_negative",`
`663`	`663`	`"resembles": "phrase_injection",`