Is there an existing issue for the same bug?
Environment (optional)
memoria 0.3.3, MCP mode, macOS
Actual behavior
The phone pattern in memoria/crates/memoria-core/src/sensitivity.rs redacts any unbroken 10-digit number to [phone], because its separators are optional:
regex: r"\b\d{3}[-.]?\d{3,4}[-.]?\d{4}\b",
replacement: "[phone]",
With [-.]? being optional, a plain 10-digit sequence like 4345955198 matches and gets redacted. Real-world collateral damage I hit today:
- GitHub comment URLs:
https://github.com/org/repo/issues/123#issuecomment-4345955198 → the numeric suffix is rewritten to [phone], the link becomes unusable.
memory_correct on a memory containing such a URL silently loses the reference; repeated corrections can't restore it.
Likely also affected: order IDs, transaction IDs, any 10–11 digit opaque identifier pasted in context.
Expected behavior
The filter should only redact digit sequences that actually look like phone numbers — i.e. require at least one separator, or require surrounding context (+, tel:, parens, international prefix) before redacting.
Alternatives worth considering:
- Tighten the regex to require separators:
\b\d{3}[-.]\d{3,4}[-.]\d{4}\b (no ?), plus a separate pattern for +<country>\d{7,} style.
- URL-aware skip: don't run MEDIUM-tier redaction inside obvious URLs (e.g. tokens that contain
:// or match [a-z]+://\S+).
- Allow callers (especially
memory_correct / memory_store) to opt out per-field via a flag like disable_pii_redaction=true when the caller knows the content is trusted.
Steps to reproduce
// in any place that calls check_sensitivity
let s = "see https://github.com/foo/bar/issues/1#issuecomment-4345955198";
let r = memoria_core::sensitivity::check_sensitivity(s);
assert_eq!(r.redacted_content, None); // currently fails — gets redacted to [phone]
Or via MCP:
memory_store(content="ref https://github.com/farion1231/cc-switch/issues/2423#issuecomment-4345955198")
# retrieved content shows: ref https://github.com/farion1231/cc-switch/issues/2423#issuecomment-[phone]
Additional information
Same failure mode likely applies to credit_card (\b(?:\d[ -]*?){13,19}\b) — that pattern will match any 13–19 digit sequence without a Luhn check, so e.g. a 15-digit order number or bank reference would also get redacted to [card]. Worth auditing all MEDIUM patterns for the same over-reach.
Suggested priority: low-med — bug degrades retrieval quality but doesn't cause data loss on the primary store (original stays intact in snapshots).
Is there an existing issue for the same bug?
Environment (optional)
memoria 0.3.3, MCP mode, macOS
Actual behavior
The
phonepattern inmemoria/crates/memoria-core/src/sensitivity.rsredacts any unbroken 10-digit number to[phone], because its separators are optional:With
[-.]?being optional, a plain 10-digit sequence like4345955198matches and gets redacted. Real-world collateral damage I hit today:https://github.com/org/repo/issues/123#issuecomment-4345955198→ the numeric suffix is rewritten to[phone], the link becomes unusable.memory_correcton a memory containing such a URL silently loses the reference; repeated corrections can't restore it.Likely also affected: order IDs, transaction IDs, any 10–11 digit opaque identifier pasted in context.
Expected behavior
The filter should only redact digit sequences that actually look like phone numbers — i.e. require at least one separator, or require surrounding context (
+,tel:, parens, international prefix) before redacting.Alternatives worth considering:
\b\d{3}[-.]\d{3,4}[-.]\d{4}\b(no?), plus a separate pattern for+<country>\d{7,}style.://or match[a-z]+://\S+).memory_correct/memory_store) to opt out per-field via a flag likedisable_pii_redaction=truewhen the caller knows the content is trusted.Steps to reproduce
Or via MCP:
Additional information
Same failure mode likely applies to
credit_card(\b(?:\d[ -]*?){13,19}\b) — that pattern will match any 13–19 digit sequence without a Luhn check, so e.g. a 15-digit order number or bank reference would also get redacted to[card]. Worth auditing all MEDIUM patterns for the same over-reach.Suggested priority: low-med — bug degrades retrieval quality but doesn't cause data loss on the primary store (original stays intact in snapshots).