Fix PII result copy regression by lucarlig · Pull Request #89 · IBM/cpex-plugins

lucarlig · 2026-05-08T13:51:44Z

Summary

process mapping-like payloads through the Python mapping protocol so CPEX copy-on-write wrappers are scanned correctly
add a real CPEX policy/isolation regression test for tool_post_invoke result masking
bump pii_filter from 0.3.1 to 0.3.2

Root Cause

CPEX hook payload policies now enforce which top-level fields plugins may mutate. For tool_post_invoke, policy allows result, so PII masking should survive if the plugin returns a modified result.

When policy is active, CPEX isolates payloads with copy-on-write mapping wrappers. The Rust PII scanner treated mapping input as a raw PyDict, which missed entries exposed only through the wrapper protocol. The plugin then returned modified_payload=None, so apply_policy() had no allowed result update to merge and gateway kept the original tool result.

Fix

The PII scanner now handles Python mappings through the mapping protocol instead of only reading raw PyDict storage. This makes CPEX copy-on-write wrappers visible to nested scanning and causes PII masking to return a policy-visible modified payload.

The new regression test uses real CPEX isolation and apply_policy() so it catches the gateway-enforcement path, not only direct plugin method behavior.

Follow-Up

This PR fixes the PII filter case only. Issue #90 tracks the broader audit needed for all plugins that mutate hook payloads under CPEX policy enforcement.

Validation

make test-unit
make test-integration
Gateway reproduction with latest mcp-context-forge main and local cpex-pii-filter: modified True, filtered True, Contact [REDACTED]

Note

Run the standard detailed code review workflow before marking this PR ready for review.

Signed-off-by: lucarlig <luca.carlig@ibm.com>

gandhipratik203

What's good

Real bug, clear root-cause story. The PyDict::iter() → PyMapping::items() swap is exactly the right level. CPEX's CopyOnWriteDict wrapper inherits from dict, so the C-level cast::<PyDict>() succeeds, but dict.iter() only walks the underlying PyDict storage table — which is empty for an unmodified COW wrapper because everything lives in self._original. Going through the mapping protocol picks up items() from the subclass. Surgical fix.
Test coverage is layered well. Rust unit tests pin the max_collection_items boundary (allows at limit, rejects over) against the new mapping.len() path. Python integration tests cover the actual user-visible behavior at two levels — direct plugin call on MCP-shaped content, and end-to-end through wrap_payload_for_isolation + apply_policy({"result"}). The second test is the real regression sentinel; without it this exact bug could come back silently.
Scoping is honest. The description explicitly calls out that this only fixes the PII filter and that #90 tracks the broader audit for other plugins under policy enforcement. Avoids overreach.
Manual end-to-end check noted ("Gateway reproduction with latest mcp-context-forge main and local cpex-pii-filter: modified True, filtered True, Contact [REDACTED]"). Good given the stub-vs-real-CPEX caveat below.
Version bump is appropriate (patch bump on Cargo.toml, plugin-manifest, and Cargo.lock).

Findings

F1 (low) — Mapping protocol overhead vs raw PyDict iteration on a hot path.
mapping.items()?.iter() allocates a list of tuples and dispatches each entry through cast::<PyTuple>()? + two get_item(...)? calls, where the old dict.iter() had a C-level fast path with no allocation. For typical tool results (a few dozen entries) this is almost certainly noise, but tool_post_invoke runs on every successful tool call, so worth measuring once on a representative payload — particularly nested content arrays — to make sure the safety win doesn't show up as a p99 regression. Not a blocker; just worth a criterion data point if you have one handy or want to add one.

F2 (low) — Test-stub CopyOnWriteDict differs slightly from "real" CPEX.
The stub in plugins/tests/plugin_hooks.py doesn't override __contains__, so if key in self falls back to the underlying empty dict storage — works for the current tests because they only read originals, but it's fragile if a future test writes a new key via super().__setitem__ (the override of __getitem__ would route correctly, but key in self would still miss it). Manual validation against real CPEX in the gateway covers this for the bug-fix path; just flagging that the stub is not a full COW. Worth a one-line comment in the stub noting "minimal stub for these tests, not a full COW".

F3 (nit) — Could the Rust unit tests cover a non-PyDict mapping directly?
The new Rust tests construct PyDict and exercise the mapping.len()/items() paths through it. That validates the boundary, but doesn't actually exercise the bug scenario (a mapping whose underlying PyDict storage is empty). The Python integration test is what really pins the regression — but if you want belt-and-braces in Rust, a small test that builds a CopyOnWriteDict-style wrapper in Python and feeds it to process_nested would lock the Rust side too. Optional.

Verdict

Solid PR. F1 is the only one I'd be tempted to follow up on, and only if it turns out the hot path is hot enough to care. F2/F3 are tidiness asks.

msureshkumar88

Review: Approve with Findings

Core fix is correct — switching to PyMapping protocol is the right call for CPEX copy-on-write wrappers.

Findings (non-blocking)

Rust test gap: Both new Rust tests (allows_collection_limit_boundary, rejects_over_collection_limit) pass plain PyDict, not a dict subclass with hidden entries. They test boundary logic but don't exercise the actual regression path. A follow-up test creating a Python dict subclass with overridden __iter__ would close this gap.

CopyOnWriteDict missing __contains__: The test stub doesn't override __contains__, so key in cow_dict falls through to C-level dict which checks only empty underlying storage — always returns False for keys from _original. Doesn't break current tests, but fragile for future ones.

apply_policy equality: new_value == old_value works coincidentally because CopyOnWriteDict compares as {} via dict.__eq__. Implementation-detail dependency, not a semantic guarantee.

Failing CI check: 34/35 pass — what's failing? Confirm it's unrelated before merge.

mapping.items() materializes to PyList: Minor extra allocation vs lazy PyDict::iter(). Acceptable given the max_collection_items guard.

Strengths

test_tool_post_invoke_survives_cpex_policy_with_isolated_payload correctly exercises the full gateway enforcement path
Version bump hits all three required locations
Follow-up issue #90 scoped correctly for broader audit

lucarlig mentioned this pull request May 8, 2026

Audit plugin mutations under CPEX hook payload policy enforcement #90

Open

lucarlig force-pushed the fix/pii-result-copy branch from 3db74a4 to 34056e9 Compare May 8, 2026 13:57

Fix PII result copy regression

682905a

Signed-off-by: lucarlig <luca.carlig@ibm.com>

lucarlig force-pushed the fix/pii-result-copy branch from 34056e9 to 682905a Compare May 8, 2026 14:00

lucarlig marked this pull request as ready for review May 8, 2026 14:16

lucarlig requested review from gandhipratik203 and msureshkumar88 as code owners May 8, 2026 14:16

gandhipratik203 approved these changes May 8, 2026

View reviewed changes

msureshkumar88 approved these changes May 8, 2026

View reviewed changes

msureshkumar88 merged commit 5077ff9 into main May 8, 2026
35 checks passed

msureshkumar88 deleted the fix/pii-result-copy branch May 8, 2026 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PII result copy regression#89

Fix PII result copy regression#89
msureshkumar88 merged 1 commit intomainfrom
fix/pii-result-copy

lucarlig commented May 8, 2026 •

edited

Loading

Uh oh!

gandhipratik203 left a comment

Uh oh!

msureshkumar88 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lucarlig commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Follow-Up

Validation

Note

Uh oh!

gandhipratik203 left a comment

Choose a reason for hiding this comment

What's good

Findings

Verdict

Uh oh!

msureshkumar88 left a comment

Choose a reason for hiding this comment

Review: Approve with Findings

Findings (non-blocking)

Strengths

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucarlig commented May 8, 2026 •

edited

Loading