Add secrets detection tool pre-invoke hook#94
Conversation
f6f82a4 to
ff17b39
Compare
Signed-off-by: lucarlig <luca.carlig@ibm.com>
ff17b39 to
ba6ff62
Compare
Signed-off-by: lucarlig <luca.carlig@ibm.com>
msureshkumar88
left a comment
There was a problem hiding this comment.
Review findings
1. Double-scan on dirty path — document the tradeoff (plugin.rs scan_payload_attr)
When redact=true and secrets exist, the payload is scanned twice:
let (mut count, mut findings) = scan_container_findings(py, &value, &self.config)?; // scan 1
if has_findings(count) {
let (redacted_count, redacted, redacted_findings) =
scan_container(py, &value, &self.config)?; // scan 2Old code scanned once. This trades dirty-path performance for clean-path allocation savings. Reasonable only if clean payloads dominate — add a one-line comment making the intent explicit, e.g.:
// Two-pass: skip copy allocation on clean payloads; dirty path (secrets present) scans twice.2. Unusual bool-tuple pattern match (plugin.rs)
if let (true, true) = (self.config.redact, has_findings(count)) {Idiomatic Rust: if self.config.redact && has_findings(count). No bug, but the tuple form is surprising. Appears in both scan_payload_attr and resource_post_fetch.
3. Missing clean-path test for tool_pre_invoke
No test covers tool_pre_invoke with a payload containing no secrets. All other hooks have a clean-path case. Add one asserting continue_processing=True, violation=None, modified_payload=None.
Not blocking (follow-up OK):
resource_post_fetchstill uses old single-scan path — doesn't benefit from thescan_container_findingsclean-path optimisation. Consistent with the refactor goal but worth a TODO.has_findings(count)wrapscount != 0— trivial helper adds indirection without clarity gain;count > 0inline is equally readable.
Signed-off-by: lucarlig <luca.carlig@ibm.com>
|
Addressed review Changes:
Verified locally: cargo fmt --check -p secrets_detection
cargo nextest run -p secrets_detection tool_pre_invoke
cargo nextest run -p secrets_detection
make test-integration
make check-all |
Review follow-up — all requested changes verifiedRequested changes: done
Clean-path test asserts New issues from the fix: none
Still open (pre-existing, not-blocking per original review)
Approving. |
Signed-off-by: lucarlig <luca.carlig@ibm.com>
|
@msureshkumar88 the requested changes from review Summary:
Current verification: make check-all
# 62 Rust tests passed
make test-integration
# 82 passed
PYO3_PYTHON=$(command -v python3) cargo mutants -p secrets_detection --in-diff cargo-mutants.diff
# 243 mutants tested, 23 caught, 220 unviable, 0 missedCould you re-review or submit the approval review? GitHub still shows |
msureshkumar88
left a comment
There was a problem hiding this comment.
Good progress on sharing the hook result construction across all four hooks — the refactor is clean and the two-pass comment (addressing the previous review) reads well.
One blocking issue surfaced during integration testing against a live cpex gateway that needs to be resolved before this ships.
Blocking: cast::<PyDict>() rejects Python dict subclasses — secrets go undetected
What's happening
At runtime, the cpex framework's _isolate_payload wraps ToolPreInvokePayload.args in a CopyOnWriteDict (a dict subclass, not a bare dict) before dispatching to plugins. In PyO3 0.28, Bound<PyAny>::cast::<PyDict>() is a raw pointer cast that only succeeds for exact PyDict instances — it rejects subclasses. So when scan_findings_inner (and the existing scan_container_inner) encounters a CopyOnWriteDict, the dict branch is skipped entirely and the scan returns zero findings. No violation is raised, and secrets pass through silently.
This affects tool_pre_invoke specifically because it's the first hook where the framework applies the copy-on-write isolation. prompt_pre_fetch was already affected (pre-existing), and this PR inherits that behaviour in scan_container_inner without addressing it.
Confirmed in production
We reproduced this end-to-end: an AWS_ACCESS_KEY_ID=AKIA... payload passed through the installed 0.3.2 wheel without triggering a violation. Adding type(payload.args) is not dict → model_copy(args=dict(payload.args)) in the Python shim restored correct blocking. The Rust tests pass because they construct payloads with a plain-Python class ToolPayload using a bare dict — the framework's CopyOnWriteDict is never exercised in the test harness.
Fix
Replace cast with downcast at the three container-type checks in scan_findings_inner (new in this PR) and the three matching checks in scan_container_inner (pre-existing, same file):
// Before
if let Ok(dict) = container.cast::<PyDict>() {
// After
if let Ok(dict) = container.downcast::<PyDict>() {downcast calls Python's isinstance() internally, so it accepts dict subclasses. The same applies to PyList and PyTuple casts in both functions — worth auditing those too, though CopyOnWriteDict is the one the framework actually uses today.
Test gap
The in-process Rust tests use a stub ToolPayload with a plain dict. A test that passes a CopyOnWriteDict (or any dict subclass) as args and asserts the secret is still detected would have caught this. Worth adding one that imports or mocks CopyOnWriteDict — or at minimum uses type('CowDict', (dict,), {})({'message': 'AKIA...'}) as a minimal subclass to verify the downcast path.
Everything else (two-pass tradeoff, clean-path allocation skip, the follow-up clean-path test) looks good. Happy to iterate quickly on this one.
Signed-off-by: lucarlig <luca.carlig@ibm.com>
|
@msureshkumar88 confirmed the blocking behavior is real and fixed it in One detail: Fix:
Coverage added:
Verified: make install
make check-all
# 64 Rust tests passed
make test-integration
# 83 passed
PYO3_PYTHON=$(command -v python3) cargo mutants -p secrets_detection --in-diff cargo-mutants.diff
# 344 mutants tested, 27 caught, 317 unviable, 0 missed |
msureshkumar88
left a comment
There was a problem hiding this comment.
Requested changes addressed in df70f11 — approving.
The fix is right. The clarification on the actual failure mode is helpful: PyDict_Check (not PyDict_CheckExact) does accept the subclass, but PyO3's fast PyDict iterator reads the underlying C-level dict state, which is empty for CopyOnWriteDict since the subclass stores values in a shadow layer and exposes them only through items(). The two-branch pattern — exact instance on the fast path, isinstance + call_method0("items") for subclasses — handles both correctly and avoids introducing Python-level overhead for the common plain-dict case.
Regression coverage (Rust detection + redaction for dict-subclass shape, Python tool_pre_invoke integration with CopyOnWriteDict args) closes the test gap. CI is still running on the new commit; no failures so far, and the earlier run was 34/35 passed.
Summary
tool_pre_invoketo the secrets detection manifest, Python shim, and Rust coreToolPreInvokePayload.argsbefore tool execution and block/redact with existing secrets violation behaviorcpex-secrets-detectionto 0.3.2Fixes #93
Tests
args, block+redact coverage, and clean nested container identity54 passedToolPreInvokePayload(args=...)blocked withSECRETS_DETECTEDmake installmake test-integration: 81 passedmake check-all: fmt-check, clippy, and 52 Rust tests passedmake verify-stubsReview follow-up
Addressed detailed review findings: canonical
argscontract, block+redact coverage, clean-path scanner behavior, and duplicated hook result construction.