Skip to content

Commit 9378452

Browse files
author
praison
committed
fix(pii): remove false-positive \d{16} CC regex — matches timestamps/trace IDs
@claude: The bare \b\d{16}\b branch matched ANY 16-digit sequence (UNIX microsecond timestamps, distributed trace IDs, order/invoice numbers) causing false-positive redactions in LLM tool outputs and RAG results. Removed — the canonical 4-group pattern (?:\d{4}[ -]){3}\d{4} covers real-world formatted cards with near-zero FP rate.
1 parent d2c8922 commit 9378452

1 file changed

Lines changed: 7 additions & 2 deletions

File tree

  • src/praisonai-agents/praisonaiagents/trace

src/praisonai-agents/praisonaiagents/trace/redact.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,8 +131,13 @@ def _redact_key_value(key: str, value: Any) -> Any:
131131
(re.compile(r"\bsk-[A-Za-z0-9]{12,}\b"), "[REDACTED]"),
132132
# US SSN
133133
(re.compile(r"\b\d{3}-\d{2}-\d{4}\b"), "[REDACTED-SSN]"),
134-
# Credit card — canonical 4-group or unspaced 16-digit only
135-
(re.compile(r"\b(?:\d{4}[ -]){3}\d{4}\b|\b\d{16}\b"), "[REDACTED-CC]"),
134+
# Credit card — canonical 4-group format only (e.g. "4111 1111 1111 1111").
135+
# @claude: The bare \b\d{16}\b branch was removed — it matches ANY 16-digit sequence:
136+
# UNIX microsecond timestamps, distributed trace IDs, order/invoice numbers, phone
137+
# numbers — causing false-positive redactions in LLM tool outputs and RAG results.
138+
# Unformatted 16-digit strings are indistinguishable from non-card IDs at regex level.
139+
# The 4-group pattern covers real-world formatted card numbers with near-zero FP rate.
140+
(re.compile(r"\b(?:\d{4}[ -]){3}\d{4}\b"), "[REDACTED-CC]"),
136141
# Email (optional — often safe, but default-scrub for compliance)
137142
(re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b"), "[REDACTED-EMAIL]"),
138143
)

0 commit comments

Comments
 (0)