feat(evaluators): add yelp.detect_secrets contrib evaluator by lan17 · Pull Request #196 · agentcontrol/agent-control

lan17 · 2026-04-22T02:26:50Z

Summary

New contrib evaluator that scans selector-selected payloads for potential secrets using
Yelp detect-secrets, wired through the
detect-secrets-async subprocess-pool runtime.

Registered under the entry point yelp.detect_secrets.

Why this matters for agent workflows

Agent payloads move secrets around constantly without anyone meaning to. Concrete leak surfaces
in selector-selected step payloads:

LLM tool outputs that echo upstream API responses (auth headers, bearer tokens, session cookies).
Config dumps and environment snapshots fetched by a code tool.
Log lines emitted by user-facing tools and fed back into the LLM context.
Retrieved documents that happened to contain credentials.

Agents chain tool calls, so a leaked token in one step becomes input to the next — that's the
blast radius this evaluator is here to cap. Agent Control's evaluator layer is the natural gate:
it runs after the selector narrows the payload to the relevant field and before the control's
policy decision is committed.

Yelp detect-secrets contributes a battle-tested detector set — AWS keys, GitHub tokens, Basic
auth, private keys, high-entropy blobs, and keyword patterns — with per-request plugin narrowing
when you want fewer false positives on a specific control.

Why a separate async runtime

detect-secrets is synchronous and configures itself via process-global settings, which makes
asyncio.to_thread (theatrical timeouts — the scan keeps running) and a serialising lock (kills
throughput) both poor fits for Agent Control. The external
detect-secrets-async package wraps it in a
bounded pool of long-lived subprocess workers with real timeouts, per-request plugin isolation,
and automatic worker replacement on timeout/crash/cancellation. This PR is the thin Agent Control
adapter on top of that runtime.

What the evaluator does

Normalize the selector payload:
- None → no match
- str → scanned directly
- dict / list → deterministic pretty JSON with RFC 6901 pointer mapping
- int / float / bool → JSON scalar text
Filter lines matching exclude_lines_regex (blanked, so line numbers stay stable).
Enforce the max_bytes cap on post-filter UTF-8 bytes.
Scan via detect-secrets-async using the shared host-level runtime.
Map findings into EvaluatorResult:
- str payloads get line_number.
- Structured payloads get json_pointer, conservatively truncated at any secret-looking
  segment in the path so a token appearing as a key name never leaks through the pointer.
- Plaintext, snippets, full matching lines, and upstream hashed_secret are never surfaced.

Example

A control fragment that scans the output field for GitHub and AWS credentials, failing open on
evaluator errors:

{
  "selector": { "path": "output" },
  "evaluator": {
    "name": "yelp.detect_secrets",
    "config": {
      "timeout_ms": 10000,
      "on_error": "allow",
      "enabled_plugins": ["GitHubTokenDetector", "AWSKeyDetector"]
    }
  }
}

For a plain-string payload "github_token = 'ghp_abc...'":

result.matched = True
result.confidence = 1.0
result.metadata = {
    "findings_count": 1,
    "findings": [{"type": "GitHub Token", "line_number": 1}],
    "normalized_payload_type": "str",
    "detect_secrets_version": "1.5.0",
}

For a structured payload {"response": {"headers": {"authorization": "ghp_abc..."}}}:

result.matched = True
result.metadata = {
    "findings_count": 1,
    "findings": [
        {"type": "GitHub Token", "json_pointer": "/response/headers/authorization"}
    ],
    "normalized_payload_type": "dict",
    "detect_secrets_version": "1.5.0",
}

A dict keyed by a secret-looking string reports the safe ancestor instead of leaking the key:

# payload: {"ghp_abc...": {"nested": "safe"}}
# result.metadata["findings"] == [{"type": "GitHub Token", "json_pointer": ""}]

Config

Field	Default	Purpose
`timeout_ms`	`10_000`	Full request lifecycle: queue wait + scan.
`on_error`	`"allow"`	Fail-open (`allow`) or fail-closed (`deny`) on evaluator failure.
`max_bytes`	`1_048_576`	Max normalized post-filter UTF-8 size.
`enabled_plugins`	`None`	Upstream plugin class names; validated at config parse time via `detect_secrets_async.get_runtime_info().available_plugin_names`. `None` uses the pinned upstream default set.
`exclude_lines_regex`	`[]`	RE2 patterns; matching lines are blanked before scanning.

Failure handling

Every failure maps to a stable metadata["failure_mode"]: normalization_error,
payload_too_large, queue_full, queue_timeout, worker_startup_error, worker_timeout,
worker_crash, worker_protocol_error, runtime_error.

on_error controls the fallback in EvaluatorResult:

allow → matched=False, metadata["fallback_action"]="allow"
deny → matched=True, metadata["fallback_action"]="deny"

Consumers should branch on metadata["failure_mode"] + metadata["fallback_action"], not on
matched alone, to distinguish a real finding from a fail-closed evaluator failure.

Dependencies

detect-secrets-async>=0.2.0,<0.3.0 — the
async subprocess-pool runtime over Yelp detect-secrets.
google-re2>=1.1 — matches existing Agent Control regex policy.

Validation

make check in evaluators/contrib/detect_secrets:

90 tests pass, covering detection, structured-pointer mapping, RE2 exclusion on both string
and structured payloads, plugin validation (strip + dedup + unknown rejection), max_bytes
boundary behavior, recursive / NaN / empty-container / unsupported-type normalization paths,
timeout short-circuit, every failure_mode path × {allow, deny} (16 runtime + 4
evaluator-layer combinations), FAILURE_MESSAGES drift pin against the ScanFailureCode enum,
concurrent dispatch on a cached evaluator instance, and entry-point .load() round-trip.
mypy strict clean.
ruff check + format clean.
Coverage 98% (config.py 100%, evaluator.py 98%, normalization.py 98%).

codecov · 2026-04-22T02:29:23Z

Codecov Report

❌ Patch coverage is 97.94721% with 7 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...tor_detect_secrets/detect_secrets/normalization.py	98.01%	3 Missing ⚠️
...agent_control_evaluator_detect_secrets/__init__.py	71.42%	2 Missing ⚠️
...aluator_detect_secrets/detect_secrets/evaluator.py	98.50%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Adds the 10 gap categories flagged in review, with given/when/then behavioral style: - parametric failure-mode matrix: every ScanFailureCode x {allow, deny} plus evaluator-layer failures (normalization_error, payload_too_large) - FAILURE_MESSAGES drift pin against ScanFailureCode enum - normalization edge cases: top-level set, NaN/+-inf primitives, empty dict / list, boolean/None dict keys, tuple dict keys - runtime-error paths: get_runtime_info failure during non-None evaluate (previously only reached via None short-circuit), RuntimeConfigConflictError from get_runtime - exclude_lines_regex on structured payloads: blanking suppresses findings on matched lines and preserves pointers for unmatched ones - max_bytes boundary: exactly-at-limit accepted, one-byte-over rejected - multi-line string with distinct findings preserves line numbers - list with scalar element maps pointer to index - concurrent evaluate() on one cached instance stays correct - _safe_structured_pointer returns None for missing location - _key_name_is_secret_like for None and non-identifier/scalar-like keys - entry-point .load() round-trips to DetectSecretsEvaluator - config validator edges: explicit None enabled_plugins, whitespace-only entry rejected, whitespace strip + dedup, positive-int bounds on timeout_ms / max_bytes, Literal validation on on_error Coverage: 93% -> 98% (config 96 -> 100, evaluator 94 -> 98, normalization 92 -> 98). 39 -> 90 passing tests.

feat: add detect-secrets contrib evaluator

0c93046

lan17 changed the title ~~feat: add detect-secrets contrib evaluator~~ feat(evaluators): add detect-secrets contrib evaluator Apr 22, 2026

lan17 added 16 commits April 21, 2026 19:36

fix: preserve pointers for structured key findings

b328619

fix: avoid leaking secret-bearing key paths

0eb63a7

fix: batch structured key probes

ebe870e

fix: apply timeout budget to initial scan

f585261

fix: tighten detect-secrets pointer mapping

e363e0a

fix: reuse detect-secrets runtime settings

4cbcf8a

merge: update detect-secrets evaluator branch with main

ff9634c

fix: align detect-secrets contrib metadata

aa7c61b

fix: tighten detect-secrets pointer fallback

a4dc7e5

fix: honor detect-secrets runtime failures

9314678

fix: canonicalize structured detect-secrets payloads

21b8917

fix: reject colliding detect-secrets keys

56bbc41

fix: harden detect-secrets result mapping

1f65d22

fix: honor detect-secrets fail-open semantics

03ef785

refactor: simplify detect-secrets pointer mapping

dba17d5

fix: tighten detect-secrets error handling

8035455

lan17 changed the title ~~feat(evaluators): add detect-secrets contrib evaluator~~ feat(evaluators): add yelp.detect_secrets contrib evaluator Apr 22, 2026

lan17 added 4 commits April 22, 2026 17:11

fix: handle invalid unicode in detect-secrets evaluator

655e439

fix: harden detect-secrets unicode handling

2584c78

fix: preserve detect-secrets line mapping under exclusions

abfd49d

lan17 mentioned this pull request Apr 25, 2026

Define engine semantics for evaluator fallback failures and timeouts #199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluators): add yelp.detect_secrets contrib evaluator#196

feat(evaluators): add yelp.detect_secrets contrib evaluator#196
lan17 wants to merge 21 commits intomainfrom
feature/detect-secrets-evaluator

lan17 commented Apr 22, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lan17 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters for agent workflows

Why a separate async runtime

What the evaluator does

Example

Config

Failure handling

Dependencies

Validation

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lan17 commented Apr 22, 2026 •

edited

Loading

codecov Bot commented Apr 22, 2026 •

edited

Loading