You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: harden PII canary detection with portmanteau middle names and sub-pattern scanning
- Replace real-word middle names with portmanteaus (e.g. Thundaze, Lunarex) that
cannot appear in legitimate output but survive canonicalization as scan tokens
- Extract CANARY_DOMAIN constant and add sub_pattern_map() to inject the email
domain and each middle name as additional Aho-Corasick patterns in both the
daemon scanner and the default (library/test) scanner -- catches leaks where an
LLM reproduces just the domain or a distinctive middle name without the full value
- Promote canary_paraphrase advisory to block when k_observed >= 10x k_threshold,
leaving the advisory tier for early warning and reserving block for overwhelming
n-gram accumulation (195 distinct n-grams on a full canary value in isolation)
- Update TC-066-06 to accept block at high n-gram counts (reflects new behavior;
exact scanner still fires first in the real pipeline)
0 commit comments