Skip to content

Commit 0f5838e

Browse files
committed
test(security): recognize phrase_injection as a gated detect category (Spec 077 US1)
Related #784 Related: Spec 077 (specs/077-scanner-simplification) The detect-corpus validator (specs/065-evaluation-foundation/datasets) hardcodes the set of coherent malicious categories and the gated-category coverage rules. Spec 077 US1 promoted phrase_injection to a real gated hard category (registered in cmd/scan-eval gateChecks + categoryCheck), so the validator must recognize it or reject the new corpus entries. ## Changes - validDetectCategory: accept malicious category "phrase_injection". - gatedDetectCategories: add "phrase_injection" (now measured by the gate; capability_mismatch stays excluded — soft/measured-not-gated). - hardNegPrefix: map "phrase_injection" -> "hn_phrase". - Rename the two branch-local phrase_injection hard-negatives (hn_send_email/hn_upload_file -> hn_phrase_*) to satisfy the id-prefix convention. Pre-existing corpus entries untouched (append-only respected). This STRENGTHENS coverage: the gate now requires phrase_injection to carry both malicious samples and resembling hard-negatives. ## Testing - go test ./... — all ok (exit 0); previously-failing TestDetectCorpus_SchemaAndProvenance + TestDetectCorpus_GatedCoverage pass. - scan-eval --gate — recall 1.0000, fp 0.0000 (phrase_injection gated 7/7). - golangci-lint v2 clean.
1 parent 0c0d520 commit 0f5838e

2 files changed

Lines changed: 11 additions & 8 deletions

File tree

specs/065-evaluation-foundation/datasets/detect_corpus_test.go

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,13 @@ import (
2020

2121
const detectCorpusFile = "detect_corpus_v1.json"
2222

23-
// gatedDetectCategories are the malicious taxonomies a US1-merged detect.Engine
24-
// can measure today. capability_mismatch is intentionally excluded — its check
25-
// lands in US2, so the corpus may carry samples but the gate must not enforce
26-
// them yet (the gate handles this via its category→check registration map).
27-
var gatedDetectCategories = []string{"unicode_smuggling", "decoded_payload", "shadowing"}
23+
// gatedDetectCategories are the malicious taxonomies the detect.Engine can
24+
// measure today. phrase_injection joined the gate in Spec 077 US1 (its curated
25+
// hard-tier check is registered in gateChecks). capability_mismatch is still
26+
// intentionally excluded — its check is soft/measured-not-gated, so the corpus
27+
// may carry samples but the gate must not enforce them (the gate handles this
28+
// via its category→check registration map).
29+
var gatedDetectCategories = []string{"unicode_smuggling", "decoded_payload", "shadowing", "phrase_injection"}
2830

2931
// hardNegPrefix maps a gated category to the id prefix its resembling
3032
// hard-negatives use, so INV-3 (which attack a benign FP mimics) stays
@@ -33,6 +35,7 @@ var hardNegPrefix = map[string]string{
3335
"unicode_smuggling": "hn_unicode",
3436
"decoded_payload": "hn_decoded",
3537
"shadowing": "hn_shadowing",
38+
"phrase_injection": "hn_phrase",
3639
}
3740

3841
type detectTool struct {
@@ -67,7 +70,7 @@ func validDetectCategory(label, category string) bool {
6770
switch label {
6871
case "malicious":
6972
switch category {
70-
case "unicode_smuggling", "decoded_payload", "shadowing", "capability_mismatch":
73+
case "unicode_smuggling", "decoded_payload", "shadowing", "phrase_injection", "capability_mismatch":
7174
return true
7275
}
7376
case "benign":

specs/065-evaluation-foundation/datasets/detect_corpus_v1.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -642,7 +642,7 @@
642642
}
643643
},
644644
{
645-
"id": "hn_send_email",
645+
"id": "hn_phrase_send_email",
646646
"label": "benign",
647647
"category": "hard_negative",
648648
"resembles": "phrase_injection",
@@ -657,7 +657,7 @@
657657
}
658658
},
659659
{
660-
"id": "hn_upload_file",
660+
"id": "hn_phrase_upload_file",
661661
"label": "benign",
662662
"category": "hard_negative",
663663
"resembles": "phrase_injection",

0 commit comments

Comments
 (0)