Skip to content

Commit 66c55cb

Browse files
fix: redact flax msgpack evidence (#1409)
* fix: redact flax msgpack evidence * fix: normalize Flax evidence key paths * fix: close flax evidence redaction gaps Redact capability tokens embedded in URL paths and standalone secret-shaped evidence. Stringify non-scalar Flax metadata keys before redaction so extension keys serialize safely. * fix: broaden GitHub token evidence redaction * fix: redact remaining flax evidence fields * fix: redact encoded evidence tokens * fix: redact URL-safe OpenAI project keys * fix: harden Flax evidence redaction * fix: redact Hugging Face access tokens * fix: harden Flax binary evidence scanning * fix: redact encoded credential evidence * fix: bound encoded evidence redaction
1 parent f147ebc commit 66c55cb

7 files changed

Lines changed: 1109 additions & 74 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
6868
- bound Jinja sandbox safety probes so render amplification fails closed instead of exhausting scanner resources
6969
- harden structured JSON/YAML/GGUF Jinja template extraction against oversized values, nested containers, and colliding template paths
7070
- redact capability tokens embedded in network URL path segments
71+
- redact Flax/JAX MessagePack scanner samples, contexts, key paths, structured fields, metadata, and errors
7172
- redact secret previews and URL path credentials from metadata scanner findings
7273
- redact secret-shaped dictionary keys from embedded-secret detector finding contexts
7374
- redact compound credential names and malformed userinfo URLs in scanner evidence

modelaudit/detectors/network_comm.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,9 @@
2121
_SENSITIVE_PATH_TOKEN_PATTERN = re.compile(
2222
r"(?i)^(?:"
2323
r"AKIA[0-9A-Z]{16}|"
24-
r"gh[ps]_[A-Za-z0-9]{36}|"
24+
r"gh[opsur]_[A-Za-z0-9]{36}|"
2525
r"github_pat_[A-Za-z0-9]{22}_[A-Za-z0-9]{59}|"
26+
r"hf_[A-Za-z0-9]{30,}|"
2627
r"eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_.+/=-]*|"
2728
r"sk-(?:proj-)?[A-Za-z0-9]{24,}|"
2829
r"xox[baprs]-[0-9A-Za-z-]{20,}"
@@ -334,6 +335,14 @@ def _is_azure_authority_container(container: str) -> bool:
334335
return decoded == container and _AZURE_CONTAINER_NAME_PATTERN.fullmatch(container) is not None
335336

336337

338+
def _is_azure_container_authority(scheme: str, hostname: str, authority: str) -> bool:
339+
return (
340+
scheme in _AZURE_AUTHORITY_CONTAINER_SCHEMES
341+
and hostname.lower().endswith(_AZURE_STORAGE_HOST_SUFFIXES)
342+
and _is_azure_authority_container(authority)
343+
)
344+
345+
337346
def _redact_path_parameter_tokens(segment: str) -> str | None:
338347
token_candidate, trailing_delimiters = _split_trailing_path_delimiters(segment)
339348
decoded = unquote(token_candidate)
@@ -445,9 +454,9 @@ def redact_url_for_finding(url: str) -> str:
445454
netloc_host = f"{hostname}:{port}" if port is not None else hostname
446455
netloc = netloc_host
447456
scheme = parsed.scheme.lower()
448-
if scheme in _AZURE_AUTHORITY_CONTAINER_SCHEMES and "@" in parsed.netloc:
457+
if "@" in parsed.netloc:
449458
container, _separator, _host = parsed.netloc.rpartition("@")
450-
if _is_azure_authority_container(container):
459+
if _is_azure_container_authority(scheme, hostname, container):
451460
netloc = f"{container}@{netloc_host}"
452461

453462
safe_path = _redact_url_path_tokens(scheme, hostname.lower(), parsed.path)

0 commit comments

Comments
 (0)