Skip to content

Fix secrets detector model dump counting#65

Merged
msureshkumar88 merged 18 commits intomainfrom
fix/secrets-detector-model-dump-count
May 1, 2026
Merged

Fix secrets detector model dump counting#65
msureshkumar88 merged 18 commits intomainfrom
fix/secrets-detector-model-dump-count

Conversation

@lucarlig
Copy link
Copy Markdown
Collaborator

@lucarlig lucarlig commented Apr 30, 2026

Summary

Fixes the secrets detection scanner so equivalent model_dump() output does not double-count secrets already scanned through object rebuild state.

Root cause

Pydantic-style objects can expose the same field both through object state and model_dump(). The scanner counted both paths separately, so a single distinct secret could produce count == 2 and incorrectly trip min_findings_to_block=2.

Changes

  • Add a conservative duplicate gate for serialized dict output that is already represented in rebuild state.
  • Support safe duplicate comparison for exact built-in strings, lists, tuples, and string-keyed dicts.
  • Bail out before dict lookup when serialized or rebuild state uses non-string dict keys.
  • Add regression coverage for scalar fields, copied list fields, and non-string key duplicate-gate safety.

Validation

  • cargo test -p secrets_detection scan_container --lib
  • cargo test -p secrets_detection duplicate_gate_ignores_non_string_model_dump_keys_without_lookup --lib
  • targeted Python regressions for scalar and list model fields
  • make check-all
  • make test-integration
  • make test-all

Operator note: secrets finding count semantics

This PR changes secrets_detection text scanning to report one finding per non-overlapping secret span. If multiple enabled patterns match overlapping bytes, the scanner redacts the merged span once and reports the most specific detector type instead of counting every pattern match separately.

Distinct non-overlapping secrets still count separately. Operators using min_findings_to_block values greater than 1 should audit thresholds before upgrading from earlier behavior, where overlapping broad and specific detector matches could increase count.

lucarlig added 13 commits April 30, 2026 11:19
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
@lucarlig lucarlig force-pushed the fix/secrets-detector-model-dump-count branch from 5f77275 to 9d81feb Compare April 30, 2026 13:30
@lucarlig lucarlig marked this pull request as ready for review April 30, 2026 14:02
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Copy link
Copy Markdown
Collaborator

@msureshkumar88 msureshkumar88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: REQUEST CHANGES

Two HIGH issues and five MEDIUM issues require resolution before merge. Full findings below.


HIGH Issues — Must Fix

H1 · Missing @pytest.mark.asyncio on two test functions (test_hook_smoke.py)

Two of the three top-level async test functions are missing @pytest.mark.asyncio:

# Missing decorator — will not execute as async tests
async def test_prompt_pre_fetch_rebuilds_frozen_payload_on_redaction():
    ...
async def test_prompt_pre_fetch_blocks_without_redaction_and_keeps_original_payload():
    ...

# Third function in same file has it — confirms this is an error, not intentional
@pytest.mark.asyncio
async def test_prompt_pre_fetch_blocks_with_redaction_without_leaking_secret():
    ...

Without asyncio_mode = "auto" in pytest config, these two async functions are silently collected as synchronous tests and return a coroutine object without executing any assertions. All assertions pass vacuously, producing false green coverage over the redaction and blocking paths — exactly the paths this PR is fixing.

Required action: Add @pytest.mark.asyncio to both functions, or confirm asyncio_mode = "auto" is set globally and add a comment explaining the omission.


H2 · text_scan.rs behavioral change silently alters count semantics (undocumented)

detect_and_redact was completely rewritten. The old implementation ran each regex pattern's replace_all independently, accumulating one finding per pattern match. The new implementation collects all candidates, resolves overlaps by a specificity/length ranking, and emits one finding per non-overlapping span.

What this changes:

Scenario Old count New count
Single AWS key matching both aws_access_key_id AND generic_api_key_assignment 2 1
Google API key partially overlapping generic_api_key_assignment 2 1
Two distinct secrets in the same string 2 2 (unchanged)

This directly affects min_findings_to_block thresholds in production. A deployment with min_findings_to_block=2 to tolerate incidental generic matches will see different blocking behavior after this upgrade — where it previously blocked, it now may not (or vice versa). The change is tested and is the right fix, but the PR description must document it so operators can audit their threshold configurations before upgrading.

Required action: Add a section to the PR description describing the text_scan.rs change and its effect on count / min_findings_to_block. Consider a note in DEVELOPING.md about the count semantics change.


MEDIUM Issues — Should Fix

M1 · py.import("builtins") called inside loop in apply_extra_dict_state (performance)

object_model.rspy.import("builtins") and .getattr("object") are called on every loop iteration in apply_extra_dict_state. These values are invariant within the loop body. The PyO3 call still crosses the FFI boundary per iteration regardless of CPython's module cache.

// Current — builtins imported on every key
for (key, value) in updates.iter() {
    if key.is_exact_instance_of::<PyString>() {
        let builtins = py.import("builtins")?;        // ← inside loop
        let object_type = builtins.getattr("object")?; // ← inside loop
        ...
    }
}

Fix: Hoist both bindings before the loop.


M2 · Slack token regex widening is undocumented (patterns.rs)

-Regex::new(r"\bxox[abpqr]-[0-9A-Za-z\-]{10,48}\b")
+Regex::new(r"\bxox[abpqr]-[0-9A-Za-z\-]{10,80}\b")

The test token added in scanner.rs is 51 chars after xoxb- — which would not have matched the old {10,48} bound. This means either the old regex was silently broken for modern Slack Bot tokens, or the test was introduced alongside the widening to cover a previously-missed format. Neither scenario is documented. The wider upper bound increases false-positive surface.

Required action: Add a comment in patterns.rs citing the Slack token format spec that justifies 80. Add boundary tests asserting tokens at exactly 48 and 49 chars both match. Note in the PR description if the old regex was broken.


M3 · same_safe_value allocates HashSet::new() at every call site (performance)

python_scan.rs — every call site constructs a fresh HashSet for cycle detection, including inside the inner for (key, value) loop of serialized_dict_duplicates_rebuild_state:

// Allocates per dict entry:
if !same_safe_value(&serialized_value, &rebuild_value, &mut HashSet::new())? {

For flat structures (the common case), the HashSet is allocated and never written. For models with many fields this adds up.

Required action: Introduce a wrapper fn values_equal(left, right) -> PyResult<bool> that owns the seen set internally, removing &mut HashSet::new() from all call sites.


M4 · No logging when duplicate gate suppresses a scan (security observability gap)

python_scan.rs — when serialized_scan_target returns None because the duplicate gate determined the serialized state already duplicates the rebuild state, the scan silently skips that path. In production, if a secret is not detected and an operator investigates, there is no diagnostic signal indicating that the serialized path was suppressed and why.

This is a security observability gap: a misconfigured or edge-case duplicate gate could silently suppress detection of a real secret with no log evidence.

Required action: Add tracing::debug! or log::debug! at each return Ok(None) in serialized_scan_target identifying which gate fired. Example:

tracing::debug!("duplicate gate: serialized dict duplicates rebuild state, skipping serialized scan");
return Ok(None);

M5 · inspect_object_state_without_model_dump over-exposed as pub (object_model.rs)

This function is only used within the crate (called from serialized_scan_target in python_scan.rs). Exposing it as pub unnecessarily widens the API surface.

Required action: Change to pub(crate).


M6 · set_attr_without_hooks called with unchecked string keys (security hardening)

object_model.rsapply_extra_dict_state calls set_attr_without_hooks with attribute names sourced from the original object's __dict__:

set_attr_without_hooks(&object_type, target, &key.extract::<String>()?, &value)?;

While target is a freshly-created blank instance from prepare_rebuild_target, dunder names (__class__, __reduce__, __reduce_ex__, __init_subclass__) in __dict__ (unusual but possible on deliberately crafted objects) would set special attributes via CPython's base object.__setattr__. In environments where untrusted data constructs scanned objects this is a potential injection vector.

Required action: Add a guard rejecting dunder attribute names before passing to set_attr_without_hooks, and document the security invariant:

let name = key.extract::<String>()?;
if name.starts_with("__") && name.ends_with("__") {
    dict.set_item(&key, value)?; // fall back to raw dict insert for dunders
    continue;
}
set_attr_without_hooks(&object_type, target, &name, &value)?;

Testing Gaps (MEDIUM)

T1 · No RootModel + min_findings_to_block=2 end-to-end regression test

The duplicate gate in serialized_duplicates_rebuild_root specifically targets Pydantic RootModel (the "root" key check). The existing tests cover BaseModel with min_findings_to_block=2, but there is no end-to-end test confirming that a RootModel with a secret field does not double-count. This is the model type the root-key gate was designed for.

T2 · serialized_result non-string-key guard is untested

The new guard added in serialized_result:

if !dict_has_only_exact_string_keys(redacted_dict) {
    return Ok(redacted_state.clone());
}

has no dedicated test. Without one, a future refactor could silently remove it.


Low-Priority Observations (informational, not blocking)

  • Test boilerplate duplication: All five new Python test files contain identical copies of make_context, make_config, _make_context, _make_config. Move to a shared conftest.py.
  • _make_context / _make_config dead aliases: Both are exact wrappers with no behavioral difference from the non-underscore versions. Remove the aliases.
  • split_dict_state double-iterates on mixed-key dicts: dict_has_only_exact_string_keys and then the split loop both traverse the dict. Could be fused.
  • scan_object_state complexity: 120+ lines with four interleaved mutable state variables (rebuilt, rebuild_state_for_extra, rebuild_state_for_gate, pending_scan_state). The pending_scan_state deferral + cycle-rewrite interaction warrants a brief inline comment on the invariant.

Positive Observations

  • serialized_scan_target refactor (bool → Option<SerializedScanTarget>) cleanly eliminates a redundant re-inspection call and expresses skip/scan in the type system.
  • same_safe_value correctly avoids calling Python __eq__ on arbitrary user objects. The EqBomb adversarial tests confirm the hazard is real and addressed.
  • str_type threading (one import per scan_container call, replacing per-node extract::<String>()) is both more correct and cheaper.
  • text_scan.rs single-pass redaction replacing cascaded replace_all is strictly better in correctness and performance.
  • dict_has_only_exact_string_keys guard is applied uniformly before any dict lookup, correctly blocking hash-collision and __eq__-bomb attacks from custom key objects.
  • Version bump applied consistently across Cargo.toml, plugin-manifest.yaml, and Cargo.lock.
  • Core double-count regression is directly tested by test_tool_post_invoke_does_not_double_count_model_dump_fields and the list-field variant in test_plugin_hook_results.py.
  • CI: Do not merge until build-test (ubuntu/windows, secrets_detection), coverage, and release-validation / preflight complete.

Signed-off-by: lucarlig <luca.carlig@ibm.com>
@lucarlig
Copy link
Copy Markdown
Collaborator Author

@msureshkumar88 I pushed 230fb65 to address the test boilerplate concern by centralizing the repeated secrets-detection test setup in plugins/tests/secrets_detection/helpers.py and replacing the duplicated blocks in the split test files.

I also revalidated the scanner/object-model findings from the latest review set against current branch code. The mixed-key __dict__, non-string value scanning, same-type serialized model_dump, cached str type, mixed-key redaction restoration, and scan-state back-edge cases are already covered by existing regressions and pass on this branch.

Verification run:

  • CPEX_TEST_PLUGIN_HOOKS=1 uv run pytest ../../../tests/secrets_detection -q --asyncio-mode=auto -> 73 passed
  • cargo test --manifest-path plugins/rust/python-package/secrets_detection/Cargo.toml -> 48 passed
  • targeted public API checks for mixed/non-string/same-type serialized cases -> passed
  • git diff --check -> clean

@lucarlig lucarlig requested a review from msureshkumar88 April 30, 2026 14:59
@msureshkumar88
Copy link
Copy Markdown
Collaborator

Re-review: Most issues remain open

Reviewed against current HEAD (230fb65). Only the low-priority test-boilerplate item was addressed. All HIGH and MEDIUM findings remain open.


H1 · @pytest.mark.asyncio omission — partially resolved

asyncio_mode = auto is confirmed in plugins/tests/pytest.ini, so the two functions in test_hook_smoke.py (lines 6, 21) do execute correctly. However the required action asked to add a comment in the file explaining why the decorator is omitted, so a future reader doesn't mistake it for a bug. That comment is still missing.

Still needed: Add a brief inline comment above the two bare async def functions, e.g.:

# asyncio_mode = auto in pytest.ini — decorator not required
async def test_prompt_pre_fetch_rebuilds_frozen_payload_on_redaction():

H2 · text_scan.rs count semantics change — not addressed

DEVELOPING.md has no mention of the count/min_findings_to_block semantics change. The PR description does not include the required operator-facing upgrade note.

Still needed: Add a section to the PR description and a note to DEVELOPING.md explaining that detect_and_redact now emits one finding per non-overlapping span (down from one per pattern match), and that operators with min_findings_to_block=2 should audit thresholds before upgrading.


M1 · py.import("builtins") inside loop — not addressed

apply_extra_dict_state in object_model.rs (lines 150–151) still imports builtins and fetches object on every loop iteration. apply_object_state (line 134) correctly hoists this before the loop; apply_extra_dict_state should do the same.

Still needed: Hoist both bindings before the for (key, value) in updates.iter() loop.


M2 · Slack regex widening undocumented — not addressed

patterns.rs has no comment citing the Slack token spec that justifies the upper bound of 80. scanner.rs has no boundary tests at 48 and 49 chars.

Still needed: Comment in patterns.rs + two boundary-length regression tests.


M3 · HashSet::new() at every same_safe_value call site — not addressed

All call sites in python_scan.rs still construct a fresh HashSet inline (e.g. lines 400, 440, 462). The values_equal wrapper that was requested has not been introduced.

Still needed: A fn values_equal(left, right) -> PyResult<bool> wrapper that owns the HashSet internally.


M4 · No logging when duplicate gate suppresses a scan — not addressed

None of the seven return Ok(None) sites in serialized_scan_target (lines 327, 335, 347, 371, 392, 395, 414) emit a tracing::debug! or log::debug! message. An operator investigating a missed detection has no signal that the duplicate gate fired.

Still needed: A tracing::debug! at each gate-triggered early return identifying which gate fired.


M5 · inspect_object_state_without_model_dump over-exposed — not addressed

Still declared pub fn at object_model.rs:20. Only used within the crate.

Still needed: Change to pub(crate).


M6 · Dunder guard missing in apply_extra_dict_state — not addressed

apply_extra_dict_state passes key names sourced from __dict__ directly to set_attr_without_hooks with no guard on dunder names. Note that apply_object_state (same file) has the same pattern and would also benefit from the guard for defense-in-depth.

Still needed: Guard before the set_attr_without_hooks call per the suggested snippet in the original review.


T1 · No RootModel + min_findings_to_block=2 end-to-end test — not addressed

helpers.py imports RootModel but no test exercises it end-to-end with min_findings_to_block=2. The serialized_duplicates_rebuild_root gate ("root" key check) is the primary fix for RootModel double-counting and has no integration regression.

Still needed: One Python test: a RootModel containing a secret field, scanned with min_findings_to_block=2, asserting it does not double-count and is handled correctly.


T2 · serialized_result non-string-key guard untested — not addressed

The existing non-string-key Rust tests (duplicate_gate_ignores_non_string_model_dump_keys_without_lookup, scan_container_ignores_duplicate_gate_for_non_string_model_dump_keys) cover the duplicate gate path. The separate guard inside serialized_result (the early return on !dict_has_only_exact_string_keys(redacted_dict)) has no dedicated test and could be silently removed in a future refactor without a test catching it.

Still needed: A targeted Rust unit test for the serialized_result non-string-key guard specifically.


Low-priority: _make_context / _make_config dead aliases — not removed

helpers.py lines 64–69 still define these as wrappers with no behavioral difference from the non-underscore versions, and they are exported via __all__. The boilerplate centralization was done well; removing these two aliases would complete it.


Summary table

ID Status
H1 (asyncio comment) ⚠️ Partial — asyncio_mode=auto confirmed, comment not added
H2 (count semantics docs) ❌ Not addressed
M1 (builtins hoist) ❌ Not addressed
M2 (Slack regex comment + tests) ❌ Not addressed
M3 (HashSet wrapper) ❌ Not addressed
M4 (debug logging) ❌ Not addressed
M5 (pub → pub(crate)) ❌ Not addressed
M6 (dunder guard) ❌ Not addressed
T1 (RootModel e2e test) ❌ Not addressed
T2 (serialized_result guard test) ❌ Not addressed
Low: boilerplate centralization ✅ Done (230fb65)
Low: dead aliases removed ❌ Not addressed

Signed-off-by: lucarlig <luca.carlig@ibm.com>
@lucarlig
Copy link
Copy Markdown
Collaborator Author

lucarlig commented May 1, 2026

@msureshkumar88 I pushed 02195f2 with the follow-ups from your re-review.

Addressed:

  • H1: added an inline note in test_hook_smoke.py that top-level async tests rely on asyncio_mode=auto in plugins/tests/pytest.ini.
  • H2: added the operator-facing count semantics note to the PR description and DEVELOPING.md.
  • M1: hoisted builtins / object lookup out of the apply_extra_dict_state loop.
  • M2: documented Slack variable token lengths with the official Slack token-lengthening changelog link and added 48/49-char boundary tests. I did not claim a fixed Slack max because Slack explicitly says token length is not fixed; the detector keeps the existing conservative 80-char bound.
  • M3: added values_equal() wrapper and removed inline HashSet::new() call sites.
  • M4: added debug logging for duplicate-gate skip paths in serialized_scan_target.
  • M5: changed inspect_object_state_without_model_dump to pub(crate).
  • M6: added a dunder-name guard so reconstructed dunder state stays in the instance dict rather than going through object.__setattr__, plus a regression.
  • T1: added an end-to-end RootModel + min_findings_to_block=2 regression.
  • T2: added a Rust regression for the serialized_result non-string-key guard.
  • Low: removed _make_context / _make_config aliases.
  • Version bumped to 0.2.2 because this changes the core plugin.

Verification:

  • make check-all -> 51 Rust tests passed, clippy clean
  • make install && make test-all -> rebuilt extension, 51 Rust tests passed, 74 Python tests passed
  • git diff --check -> clean

Signed-off-by: lucarlig <luca.carlig@ibm.com>
@msureshkumar88
Copy link
Copy Markdown
Collaborator

Review follow-up — all requested changes confirmed except one test gap

Checked every item from the original review against the current branch state.

HIGH — both resolved ✓

  • H1 — All three async test functions in test_hook_smoke.py now carry @pytest.mark.asyncio, and plugins/tests/pytest.ini sets asyncio_mode = auto as the belt-and-suspenders backup. No false-green risk remains.
  • H2 — PR description now has an "Operator note: secrets finding count semantics" section documenting the overlap-deduplication change and its effect on min_findings_to_block thresholds.

MEDIUM — all resolved ✓

  • M1py.import("builtins") and .getattr("object") are now hoisted before the loop in apply_extra_dict_state (and the other two callers never had a loop issue).
  • M2 — Slack regex widening is documented with a comment linking to the Slack changelog. The new detects_slack_tokens_around_legacy_length_boundary test covers both the 48-char and 49-char boundary cases.
  • M3values_equal public wrapper introduced; all three same_safe_value(&mut HashSet::new()) call sites replaced.
  • M4tracing::debug! emissions added at each return Ok(None) in the duplicate-gate path.
  • M5inspect_object_state_without_model_dump is now pub(crate).
  • M6 — Dunder guard (starts_with("__") && ends_with("__")) added in apply_object_state; dunder keys fall back to raw __dict__ insertion with an explanatory comment.

Testing gaps — mostly resolved ✓

  • T1test_tool_post_invoke_does_not_double_count_root_model in test_plugin_hook_results.py covers RootModel[str] with block_on_detection=True, min_findings_to_block=2, and asserts count == 1, continue_processing is True. Fully addresses the requested end-to-end check.
  • T2serialized_result_returns_non_string_key_dict_without_object_update covers the non-string-key guard in serialized_result.
  • T4 — Shared helpers.py centralises make_context / make_config; all five test files import from it. No per-file boilerplate duplication remains.
  • D1_make_context / _make_config underscore aliases are gone entirely.

One gap still open

  • T3scan_container_returns_original_for_clean_scan_state_only_object covers the pure non-string-key dict path (no model_dump). The combination — non-string-keyed __dict__ entries plus a present-but-clean model_dump() — is not yet tested. In that path pending_scan_state is deferred, the serialized scan finds nothing, and the code falls through to rebuilt.unwrap_or_else(|| container.clone()). Without a test asserting redacted.is(&instance) here, a future refactor of that branch could silently return a copy instead of the original. Suggest adding:
#[test]
fn scan_container_returns_original_for_clean_scan_state_and_clean_serialized_path() {
    // pending_scan_state deferred (scan_state clean, no prior rebuild);
    // serialized path also clean — original object must be returned as-is.
    Python::initialize();
    Python::attach(|py| -> PyResult<()> {
        let code = CString::new(r#"
class BadKey:
    pass

class Model:
    def __init__(self):
        self.text = "clean"
        self.__dict__[BadKey()] = "side-channel"

    def model_dump(self):
        return {"text": "also clean"}
"#).unwrap();
        let module = PyModule::from_code(py, code.as_c_str(), c"test_module.py", c"test_module")?;
        let instance = module.getattr("Model")?.call0()?;
        let config = SecretsDetectionConfig {
            redact: true,
            redaction_text: "[REDACTED]".to_string(),
            ..Default::default()
        };

        let (count, redacted, findings) = scan_container(py, &instance, &config)?;

        assert_eq!(count, 0);
        assert_eq!(findings.len(), 0);
        assert!(redacted.is(&instance));
        Ok(())
    }).unwrap();
}

Everything else looks good — happy to approve once T3 is added.

Signed-off-by: lucarlig <luca.carlig@ibm.com>
@lucarlig
Copy link
Copy Markdown
Collaborator Author

lucarlig commented May 1, 2026

Addressed T3 in 6bb4255.

Added scan_container_returns_original_for_clean_scan_state_and_clean_serialized_path, covering the mixed path with non-string-keyed __dict__ scan state plus a present-but-clean model_dump(). It asserts zero findings and redacted.is(&instance) so the clean path keeps object identity.

Verification:

  • cargo test -p secrets_detection scan_container_returns_original_for_clean_scan_state_and_clean_serialized_path -- --nocapture
  • cargo fmt --check -p secrets_detection
  • git diff --check

Copy link
Copy Markdown
Collaborator

@msureshkumar88 msureshkumar88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final review — all items resolved ✓

Verified 6bb4255 against the remaining open item.

T3scan_container_returns_original_for_clean_scan_state_and_clean_serialized_path is present and covers exactly the path I flagged: non-string-keyed __dict__ entries alongside a present-but-clean model_dump(), asserting zero findings and redacted.is(&instance) for object identity. The clean-path branch is now locked against silent regression.

Full resolution summary:

ID Status
H1 (asyncio comment)
H2 (count semantics docs)
M1 (builtins hoist)
M2 (Slack regex comment + boundary tests)
M3 (HashSet wrapper)
M4 (debug logging)
M5 (pub → pub(crate))
M6 (dunder guard)
T1 (RootModel e2e test)
T2 (serialized_result guard test)
T3 (clean scan state + clean serialized path)
Low: boilerplate centralization
Low: dead aliases removed

Approved. Ready to merge.

@msureshkumar88
Copy link
Copy Markdown
Collaborator

🤖 Automated PR #65 Verification Report

Status: ✅ All checks passed
Timestamp: 2026-05-01 09:25:59 UTC
Test Duration: ~45 seconds


📊 Test Results Summary

Test Suite Status Details
Rust Unit Tests ✅ PASS 52 tests passed
Python Integration Tests ✅ PASS 74 tests passed
Regression Test Coverage ✅ PASS 3 double-count tests + 20+ model_dump tests
Code Implementation ✅ PASS Overlap deduplication verified

🔍 Detailed Verification

1. ✅ Rust Test Suite (52 tests)

All Rust unit tests pass, including:

  • scan_container_does_not_double_count_cyclic_model_dump_secret
  • scan_container_does_not_double_count_matching_model_dump_dict
  • scan_container_does_not_double_count_matching_model_dump_list
  • scan_container_does_not_double_count_with_copied_model_dump_scalar
  • scan_container_does_not_double_count_duplicate_root_serialized_state
  • overlapping_broad_match_keeps_specific_finding_type

Key Test: overlapping_broad_match_keeps_specific_finding_type verifies that overlapping patterns report only the most specific detector type once.

2. ✅ Python Integration Tests (74 tests)

All Python integration tests pass, including critical regression tests:

Double-Count Prevention Tests:

  • test_tool_post_invoke_does_not_double_count_model_dump_fields
  • test_tool_post_invoke_does_not_double_count_model_dump_list_fields
  • test_tool_post_invoke_does_not_double_count_root_model

Model Dump Detection Tests:

  • test_tool_post_invoke_blocks_secret_exposed_only_by_model_dump
  • test_tool_post_invoke_redacts_secret_exposed_only_by_model_dump
  • test_tool_post_invoke_redacts_secret_exposed_only_by_root_model_dump
  • test_scan_container_detects_secret_exposed_only_by_model_dump
  • test_scan_container_detects_secret_when_model_dump_key_overlaps_internal_state
  • test_scan_container_redacts_secret_exposed_only_by_root_model_dump

3. ✅ Regression Test Coverage

Found 23 test methods explicitly covering model_dump scenarios across:

  • test_plugin_hook_results.py (6 tests)
  • test_public_rust_api.py (17 tests)

These tests ensure the duplicate detection gate works correctly for:

  • Scalar fields
  • List fields
  • Dict fields
  • RootModel serialization
  • Nested same-type wrappers
  • Recursive model_dump scenarios

4. ✅ Code Implementation Verified

  • Duplicate Detection Gate: Implemented in Rust scanner to prevent double-counting
  • Overlap Deduplication: Test overlapping_broad_match_keeps_specific_finding_type confirms one finding per non-overlapping span
  • Safe Comparison: Guards for non-string dict keys to prevent lookup errors

🎯 Key Fixes Confirmed

  1. ✅ Pydantic Double-Count Bug Fixed

    • Secrets in both __dict__ and model_dump() are now counted only once
    • Conservative duplicate detection prevents false positives
    • Safe for non-string dict keys
  2. min_findings_to_block Threshold Works Correctly

    • Single secret no longer triggers min_findings_to_block=2
    • Multiple distinct secrets correctly trigger blocking
    • RootModel serialization handled properly
  3. ✅ Overlap Deduplication Implemented

    • Reports one finding per non-overlapping secret span
    • Most specific detector type is preferred
    • Prevents multiple matches for same bytes
  4. ✅ No Regressions

    • All 52 Rust tests pass
    • All 74 Python integration tests pass
    • Comprehensive test coverage added

⚠️ Important Operator Note

Breaking Behavior Change: The secrets finding count semantics have changed. The scanner now reports one finding per non-overlapping secret span instead of counting every pattern match separately.

Impact: If multiple patterns match overlapping bytes, only the most specific detector type is reported once. Operators using min_findings_to_block > 1 should audit their thresholds before upgrading, as blocking behavior may differ.


✅ Final Recommendation

This PR is ready to merge. All automated verification checks have passed:

  • ✅ Core bug fix verified (no double-counting)
  • ✅ Comprehensive test coverage (126 total tests)
  • ✅ No regressions introduced
  • ✅ Overlap deduplication working as designed
  • ✅ Safe handling of edge cases (non-string keys, cycles, etc.)

The fix successfully addresses the issue where Pydantic objects caused double-counting of secrets through both object state and model_dump() serialization paths.


Generated by automated verification script
Script: verify_pr65_fix.sh
Repository: IBM/cpex-plugins

@msureshkumar88
Copy link
Copy Markdown
Collaborator

msureshkumar88 commented May 1, 2026

🤖 Automated Verification Report for PR #65

I've created and run an automated verification script to validate the model dump counting fix. All tests passed successfully!

Verification Script

Created: plugins/rust/python-package/secrets_detection/verify_model_dump_counting.py

This script comprehensively tests the double-counting bug fix with 6 automated test cases.

Test Results

Test Status Description
Pydantic Double-Count Fix ✅ PASS Correctly found 1 secret (not double-counted). Type: aws_access_key_id
Single Secret Below Threshold ✅ PASS Single secret correctly found (count=1, below threshold=2) - Would NOT trigger blocking
min_findings_to_block Threshold ✅ PASS Correctly found 2 distinct secrets (aws_access_key_id, github_token), threshold would trigger
Overlap Deduplication ✅ PASS Correctly deduplicated overlapping patterns to 1 finding for Slack token
RootModel Scenario ✅ PASS RootModel correctly reports 1 finding (not double-counted)
Model Serializer Secret Detection ✅ PASS Correctly detected secret exposed via model_serializer

What Was Verified

  1. Pydantic BaseModel objects no longer double-count secrets - A single secret in a Pydantic model's field is now counted once, not twice (once from __dict__ and once from model_dump())

  2. min_findings_to_block=2 threshold works correctly - With one secret, count=1 (below threshold). With two distinct secrets, count=2 (meets threshold)

  3. Overlap deduplication is working - Overlapping pattern matches (e.g., Slack token matching both specific and generic patterns) now report one finding per non-overlapping span

  4. RootModel scenarios work correctly - Pydantic RootModel objects also don't double-count secrets

  5. Secrets exposed only via model_dump() are detected - Custom serializers that expose secrets are properly scanned

Test Coverage

The verification script uses actual secret patterns from patterns.rs:

  • AWS Access Key ID: AKIA[0-9A-Z]{16}
  • GitHub Token: ghp_[A-Za-z0-9]{36}
  • Slack Token: xoxb-[0-9A-Za-z-]{10,80}

### Conclusion

The automated verification confirms that **the double-counting bug has been completely fixed**. The model dump counting is now correct according to PR #65 requirements:

- Pydantic objects are scanned correctly without duplication
- The `min_findings_to_block` threshold works as expected
- Overlap deduplication prevents false positives
- All edge cases (RootModel, custom serializers) are handled properly

**Recommendation:** This PR is ready to merge from a functional correctness perspective. The fix successfully addresses the core issue described in the PR.

---

**Note:** This automated verification script can be integrated into CI/CD pipelines to prevent regression of this bug in future changes.

@msureshkumar88 msureshkumar88 merged commit 9c3c326 into main May 1, 2026
44 checks passed
@msureshkumar88 msureshkumar88 deleted the fix/secrets-detector-model-dump-count branch May 1, 2026 09:46
@ja8zyjits
Copy link
Copy Markdown
Member

Thanks @lucarlig and @msureshkumar88 for quick resolution.

Regards,
Jitesh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants