Skip to content

feat(evaluators): add yelp.detect_secrets contrib evaluator#196

Open
lan17 wants to merge 21 commits intomainfrom
feature/detect-secrets-evaluator
Open

feat(evaluators): add yelp.detect_secrets contrib evaluator#196
lan17 wants to merge 21 commits intomainfrom
feature/detect-secrets-evaluator

Conversation

@lan17
Copy link
Copy Markdown
Contributor

@lan17 lan17 commented Apr 22, 2026

Summary

New contrib evaluator that scans selector-selected payloads for potential secrets using
Yelp detect-secrets, wired through the
detect-secrets-async subprocess-pool runtime.

Registered under the entry point yelp.detect_secrets.

Why this matters for agent workflows

Agent payloads move secrets around constantly without anyone meaning to. Concrete leak surfaces
in selector-selected step payloads:

  • LLM tool outputs that echo upstream API responses (auth headers, bearer tokens, session cookies).
  • Config dumps and environment snapshots fetched by a code tool.
  • Log lines emitted by user-facing tools and fed back into the LLM context.
  • Retrieved documents that happened to contain credentials.

Agents chain tool calls, so a leaked token in one step becomes input to the next — that's the
blast radius this evaluator is here to cap. Agent Control's evaluator layer is the natural gate:
it runs after the selector narrows the payload to the relevant field and before the control's
policy decision is committed.

Yelp detect-secrets contributes a battle-tested detector set — AWS keys, GitHub tokens, Basic
auth, private keys, high-entropy blobs, and keyword patterns — with per-request plugin narrowing
when you want fewer false positives on a specific control.

Why a separate async runtime

detect-secrets is synchronous and configures itself via process-global settings, which makes
asyncio.to_thread (theatrical timeouts — the scan keeps running) and a serialising lock (kills
throughput) both poor fits for Agent Control. The external
detect-secrets-async package wraps it in a
bounded pool of long-lived subprocess workers with real timeouts, per-request plugin isolation,
and automatic worker replacement on timeout/crash/cancellation. This PR is the thin Agent Control
adapter on top of that runtime.

What the evaluator does

  1. Normalize the selector payload:
    • None → no match
    • str → scanned directly
    • dict / list → deterministic pretty JSON with RFC 6901 pointer mapping
    • int / float / bool → JSON scalar text
  2. Filter lines matching exclude_lines_regex (blanked, so line numbers stay stable).
  3. Enforce the max_bytes cap on post-filter UTF-8 bytes.
  4. Scan via detect-secrets-async using the shared host-level runtime.
  5. Map findings into EvaluatorResult:
    • str payloads get line_number.
    • Structured payloads get json_pointer, conservatively truncated at any secret-looking
      segment in the path so a token appearing as a key name never leaks through the pointer.
    • Plaintext, snippets, full matching lines, and upstream hashed_secret are never surfaced.

Example

A control fragment that scans the output field for GitHub and AWS credentials, failing open on
evaluator errors:

{
  "selector": { "path": "output" },
  "evaluator": {
    "name": "yelp.detect_secrets",
    "config": {
      "timeout_ms": 10000,
      "on_error": "allow",
      "enabled_plugins": ["GitHubTokenDetector", "AWSKeyDetector"]
    }
  }
}

For a plain-string payload "github_token = 'ghp_abc...'":

result.matched = True
result.confidence = 1.0
result.metadata = {
    "findings_count": 1,
    "findings": [{"type": "GitHub Token", "line_number": 1}],
    "normalized_payload_type": "str",
    "detect_secrets_version": "1.5.0",
}

For a structured payload {"response": {"headers": {"authorization": "ghp_abc..."}}}:

result.matched = True
result.metadata = {
    "findings_count": 1,
    "findings": [
        {"type": "GitHub Token", "json_pointer": "/response/headers/authorization"}
    ],
    "normalized_payload_type": "dict",
    "detect_secrets_version": "1.5.0",
}

A dict keyed by a secret-looking string reports the safe ancestor instead of leaking the key:

# payload: {"ghp_abc...": {"nested": "safe"}}
# result.metadata["findings"] == [{"type": "GitHub Token", "json_pointer": ""}]

Config

Field Default Purpose
timeout_ms 10_000 Full request lifecycle: queue wait + scan.
on_error "allow" Fail-open (allow) or fail-closed (deny) on evaluator failure.
max_bytes 1_048_576 Max normalized post-filter UTF-8 size.
enabled_plugins None Upstream plugin class names; validated at config parse time via detect_secrets_async.get_runtime_info().available_plugin_names. None uses the pinned upstream default set.
exclude_lines_regex [] RE2 patterns; matching lines are blanked before scanning.

Failure handling

Every failure maps to a stable metadata["failure_mode"]: normalization_error,
payload_too_large, queue_full, queue_timeout, worker_startup_error, worker_timeout,
worker_crash, worker_protocol_error, runtime_error.

on_error controls the fallback in EvaluatorResult:

  • allowmatched=False, metadata["fallback_action"]="allow"
  • denymatched=True, metadata["fallback_action"]="deny"

Consumers should branch on metadata["failure_mode"] + metadata["fallback_action"], not on
matched alone, to distinguish a real finding from a fail-closed evaluator failure.

Dependencies

Validation

make check in evaluators/contrib/detect_secrets:

  • 90 tests pass, covering detection, structured-pointer mapping, RE2 exclusion on both string
    and structured payloads, plugin validation (strip + dedup + unknown rejection), max_bytes
    boundary behavior, recursive / NaN / empty-container / unsupported-type normalization paths,
    timeout short-circuit, every failure_mode path × {allow, deny} (16 runtime + 4
    evaluator-layer combinations), FAILURE_MESSAGES drift pin against the ScanFailureCode enum,
    concurrent dispatch on a cached evaluator instance, and entry-point .load() round-trip.
  • mypy strict clean.
  • ruff check + format clean.
  • Coverage 98% (config.py 100%, evaluator.py 98%, normalization.py 98%).

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 97.94721% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...tor_detect_secrets/detect_secrets/normalization.py 98.01% 3 Missing ⚠️
...agent_control_evaluator_detect_secrets/__init__.py 71.42% 2 Missing ⚠️
...aluator_detect_secrets/detect_secrets/evaluator.py 98.50% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@lan17 lan17 changed the title feat: add detect-secrets contrib evaluator feat(evaluators): add detect-secrets contrib evaluator Apr 22, 2026
@lan17 lan17 changed the title feat(evaluators): add detect-secrets contrib evaluator feat(evaluators): add yelp.detect_secrets contrib evaluator Apr 22, 2026
lan17 added 4 commits April 22, 2026 17:11
Adds the 10 gap categories flagged in review, with given/when/then
behavioral style:

- parametric failure-mode matrix: every ScanFailureCode x {allow, deny}
  plus evaluator-layer failures (normalization_error, payload_too_large)
- FAILURE_MESSAGES drift pin against ScanFailureCode enum
- normalization edge cases: top-level set, NaN/+-inf primitives,
  empty dict / list, boolean/None dict keys, tuple dict keys
- runtime-error paths: get_runtime_info failure during non-None
  evaluate (previously only reached via None short-circuit),
  RuntimeConfigConflictError from get_runtime
- exclude_lines_regex on structured payloads: blanking suppresses
  findings on matched lines and preserves pointers for unmatched ones
- max_bytes boundary: exactly-at-limit accepted, one-byte-over rejected
- multi-line string with distinct findings preserves line numbers
- list with scalar element maps pointer to index
- concurrent evaluate() on one cached instance stays correct
- _safe_structured_pointer returns None for missing location
- _key_name_is_secret_like for None and non-identifier/scalar-like keys
- entry-point .load() round-trips to DetectSecretsEvaluator
- config validator edges: explicit None enabled_plugins, whitespace-only
  entry rejected, whitespace strip + dedup, positive-int bounds on
  timeout_ms / max_bytes, Literal validation on on_error

Coverage: 93% -> 98% (config 96 -> 100, evaluator 94 -> 98,
normalization 92 -> 98). 39 -> 90 passing tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant