feat(plugin): Rust acceleration for output length guard by gandhipratik203 · Pull Request #3926 · IBM/mcp-context-forge

gandhipratik203 · 2026-03-30T13:25:17Z

Summary

Adds a PyO3-based Rust execution engine for the output length guard plugin, carrying forward the Python-side v1.0.0 work from #3841 (token budgets, word-boundary truncation, recursive structuredContent processing, block/truncate strategies) and extending it with a high-performance Rust hot path for container processing.

The plugin remains intentionally hybrid: Python still owns lifecycle, hook integration, MCP content dict handling (structuredContent priority logic and content regeneration), and fallback behavior, while Rust now handles string truncation, recursive list/dict traversal, violation detection, and passthrough short-circuiting on the hot path.

The Rust engine exposes a high-level process() API that reduces the Python-Rust boundary to a single call per tool_post_invoke invocation. This keeps the existing plugin integration model intact while reducing request-path overhead for the common container shapes (strings, lists, nested dicts).

Development followed a TDD approach: the 331 existing Python tests served as the behavioral contract, with 47 mirrored Rust #[test]s written first (red), then implemented (green), then validated against the full Python suite as the acceptance gate.

Gaps closed

Gap 1 (MEDIUM) — No Rust acceleration path: the output length guard was the only post-invoke plugin without a Rust option, while exfil detection, PII filter, secrets detection, and URL reputation all had Rust engines. Fixed by introducing OutputLengthGuardEngine with a process() method that handles str, list, dict, and nested structures in a single FFI call. The Python plugin auto-detects the engine at init and delegates when available, falling back to pure Python otherwise.

Gap 2 (MEDIUM) — O(n) character counting on large strings: the Python implementation uses len() (O(1) for code points) but the initial Rust port used chars().count() which walks the entire UTF-8 string. For a 1MB string with a 500-char limit, this was 124x slower than Python. Fixed by introducing count_chars_capped() which stops counting once the limit is exceeded — O(min(n, limit)) — plus a byte-length fast path that skips char counting entirely for ASCII strings under the limit.

Gap 3 (LOW) — Per-item FFI overhead on list traversal: the initial Rust implementation processed each list item through a full process_container recursive call with per-item path string formatting and interleaved PyList::append. Fixed by adding a batch fast path for all-string lists in truncate mode: borrow all &str via to_str() in one pass, process in a tight Rust loop, build the output PyList in a single pass. This improved short-list passthrough from 10x to 19x faster.

Additional hardening

O(1) Python len() pre-check — uses PyAny::len() on the Python string object before any Rust extraction; strings under the limit skip extraction entirely regardless of strategy
Zero-copy PyString::to_str() borrow — replaces extract::<String>() (full copy) with cast::<PyString>().to_str() (zero-copy borrow from Python's UTF-8 cache) for the string leaf path
count_chars_capped() early-exit — counts chars up to limit + 1 then stops; includes byte-length fast path for ASCII where byte_len == char_count
byte_offset_of_char() direct slicing — replaces .chars().take(n).collect::<String>() with &value[..byte_offset] for zero-copy truncation
String::with_capacity() pre-sized allocation — eliminates reallocation during truncated + ellipsis by pre-computing exact buffer size
Batch list processing — for all-string lists in truncate mode, extracts all &str borrows in one pass, processes in a tight loop, builds output in one shot (better cache locality, fewer interleaved Python API calls)
Numeric string skip for long strings — skips is_numeric_string() check for strings > 50 bytes (no valid number representation is that long)
MCP content dict exclusion — Rust fast path skips dicts with a content key, preserving Python-side structuredContent priority logic and content regeneration
Python fallback preserved — if the Rust engine is unavailable or fails at init/runtime, the plugin falls through to the existing Python implementation with a warning log
_RUST_AVAILABLE import guard — defensive try/except ImportError + generic except Exception with logging, matching the pattern used by exfil detection and secrets detection plugins

Architecture

The plugin is intentionally hybrid:

Python owns plugin lifecycle, hook integration, config validation, MCP content dict processing (structuredContent priority), and fallback behavior
Rust owns string truncation, recursive container traversal, violation detection, and passthrough short-circuiting
The Rust engine parses config once at init (no per-request parsing)

Plugin internals: request flow, Rust fast path, optimization layers, and fallback

┌──────────────────────────────────────────────────────────────────────┐
│                    OutputLengthGuardPlugin                           │
│                                                                      │
│  Hook:  tool_post_invoke                                            │
│         (tool name + result payload)                                │
└─────────────────────────────┬────────────────────────────────────────┘
                              │
                              │  Python responsibilities:
                              │  - validate config (at init)
                              │  - extract payload.result
                              │  - route to Rust or Python path
                              ▼
                   ┌─────────────────────┐
                   │  Is Rust available?  │
                   │  Is result NOT an   │
                   │  MCP content dict?  │
                   └─────┬─────────┬─────┘
                    yes  │         │  no
                         ▼         ▼
        ┌────────────────────┐  ┌──────────────────────────────┐
        │  Rust fast path    │  │  Python fallback             │
        │                    │  │                              │
        │  engine.process()  │  │  handle_text() per string   │
        │  single FFI call   │  │  _process_structured_data() │
        │                    │  │  structuredContent priority  │
        └────────┬───────────┘  │  content regeneration       │
                 │              └──────────────┬───────────────┘
                 ▼                             ▼
        ┌────────────────────────────────────────────────┐
        │              Rust Engine Layers                 │
        │                                                │
        │  Layer 1: O(1) PyAny::len() pre-check         │
        │           → skip entirely if under limit       │
        │                                                │
        │  Layer 2: PyString::to_str() zero-copy borrow  │
        │           → no String allocation               │
        │                                                │
        │  Layer 3: count_chars_capped() O(limit)        │
        │           → byte-length fast path for ASCII    │
        │                                                │
        │  Layer 4: truncate() with byte_offset slicing  │
        │           → pre-sized String::with_capacity()  │
        │                                                │
        │  Layer 5: batch list processing                │
        │           → borrow all, process all, build all │
        └────────────────────────────────────────────────┘
                              │
                              ▼
        ┌──────────────────────────────────────────────────┐
        │              Python result dispatch               │
        │                                                   │
        │  unchanged → ToolPostInvokeResult(metadata)      │
        │  modified  → ToolPostInvokeResult(modified_payload)│
        │  violation → ToolPostInvokeResult(violation)      │
        └──────────────────────────────────────────────────┘

Test results

Test results summary

#	Area	Result
1	Rust unit tests	`47/47` passed, clippy clean (`-D warnings`), rustfmt clean
2	Python test suite	`331` passed, `1` expected skip, `0` failures
3	Performance comparison	Up to `19x` faster (passthrough), `3-8.5x` faster (containers)

1. Rust unit tests (cargo test)

47 tests covering all pure functions: is_numeric_string, estimate_tokens, find_word_boundary, find_token_cut_point, truncate (character mode, token mode, word boundary, unicode, edge cases), and mode segregation.

2. Python test suite (331 tests)

The full existing Python test suite runs with the Rust engine active. The Rust path is exercised transparently for str, list, dict, and nested structures. MCP content dict tests exercise the Python fallback path. All 331 tests pass with zero modifications to the test file.

3. Performance comparison

Measured with compare_performance.py — 1000 iterations + 50 warmup, character mode (max_chars=500).

Full benchmark results

Scenario	Python	Rust	Speedup
Short list passthrough (4 items)	2.88 us	0.15 us	18.9x faster
Short string passthrough (11 chars)	0.62 us	0.06 us	9.8x faster
Wide nested dict (d=2, b=20, 400 leaves)	651 us	76 us	8.5x faster
Deep nested dict (d=5, b=3, 243 leaves)	426 us	61 us	7.0x faster
Block mode (10 KB string)	10.4 us	2.0 us	5.1x faster
List of 10 x 10KB strings	105 us	35 us	3.0x faster
Block mode (1 KB string)	3.2 us	2.0 us	1.6x faster
Shallow nested dict (d=2, b=5, 25 leaves)	63 us	92 us	1.5x slower
List of 10 x 1KB strings (all truncated)	21 us	35 us	1.7x slower
Single string truncation (1KB-1MB)	0.2 us	2.4 us	~11x slower

Key findings:

Container traversal (lists, nested dicts) and passthrough are the primary win scenarios — these are the most common production paths
Single string truncation has an irreducible ~2.4us constant FFI overhead (PyO3 function dispatch + UTF-8 validation) regardless of input size; Python's len() + s[:500] is ~0.2us because len() is O(1) and slicing is a single C-level memcpy
Lists of small strings that all need truncation are slightly slower because each truncated item requires a PyString::new() allocation (~1us per item from CPython's allocator)
Dropping the abi3 stable ABI would give access to PyUnicode_GET_LENGTH (O(1) char count) and PyString::data() (raw UCS-1 access for ASCII), which would close the remaining gap — left as a future optimization when binary compatibility constraints allow

Files changed

File	Change
`plugins_rust/output_length_guard/Cargo.toml`	New — Rust crate config (PyO3 + abi3-py311)
`plugins_rust/output_length_guard/pyproject.toml`	New — maturin build config
`plugins_rust/output_length_guard/Makefile`	New — build/test/install/coverage targets
`plugins_rust/output_length_guard/src/lib.rs`	New — core implementation (1297 lines) + 47 tests
`plugins_rust/output_length_guard/src/bin/stub_gen.rs`	New — Python type stub generator
`plugins_rust/output_length_guard/compare_performance.py`	New — Python vs Rust benchmark (7 scenario groups)
`plugins/output_length_guard/output_length_guard.py`	Modified — Rust import + engine init + fast path delegation (+70/-1)

…d dicts Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

Add a PyO3-based Rust implementation of the output length guard's core processing logic (truncation, word-boundary search, token estimation, binary search cut-point, recursive container traversal). The Python plugin auto-detects the Rust engine at init and delegates str, list, dict, and nested structure processing to it, falling back to Python when unavailable or for MCP content dicts that need structuredContent priority logic. - 47 Rust unit tests mirroring the Python test contract - 331 existing Python tests pass with Rust engine active - Clean clippy (-D warnings) and rustfmt Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

…h guard Benchmarks Python vs Rust across 7 scenario groups: single string truncation, token-mode binary search, word-boundary truncation, list processing, nested dict traversal, block-mode violation detection, and under-limit passthrough. Rust shows 3-10x speedup on container traversal (lists, nested dicts, passthrough) while single string truncation is FFI-bound. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

…unting Three key optimizations that eliminate O(n) full-string scans: 1. count_chars_capped(): early-exit char counting that stops once the limit is exceeded, plus byte-length fast path for ASCII strings. Turns O(n) into O(min(n, limit)). 2. byte_offset_of_char() + direct slicing: replaces .chars().take(n).collect() with &value[..byte_offset] for zero-copy truncation. 3. PyString pre-check in process_container(): uses Python's O(1) str.__len__() to skip string extraction entirely for under-limit strings in truncate mode. Results: 1MB string truncation dropped from 31us to 2.4us (constant regardless of input size). Passthrough improved to 12x faster. Deep/wide nested structures remain 5-7x faster. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

Replace extract::<String>() (full copy) with PyString::cast() + to_str() (zero-copy borrow) for the string leaf path. Also skip is_numeric_string() for strings > 50 bytes, and extend the O(1) pre-check to both truncate and block modes. Results vs previous commit: - Deep nested dict: 5.4x → 7.1x faster - Wide nested dict: 6.2x → 8.4x faster - List passthrough: 9.7x → 13.6x faster - Block mode 10KB: 4.5x → 5.0x faster - All 331 Python tests + 47 Rust tests pass Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

Two optimizations informed by the rate limiter PR (#3809) patterns: 1. Batch list processing: for all-string lists in truncate mode, extract all &str borrows in one pass, process in a tight Rust loop, build output PyList in a single pass. Better cache locality and avoids per-item path string formatting and interleaved append calls. 2. Pre-sized String::with_capacity(): eliminate reallocation during truncation by pre-computing body + ellipsis size. Results: - Short list passthrough: 13.6x → 18.9x faster - List 10x10KB: 2.6x → 3.0x faster - Deep nested dict: 7.1x → 7.0x faster (stable) - Wide nested dict: 8.4x → 8.5x faster (stable) - 331 Python tests + 47 Rust tests pass Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

lucarlig

Two blocking regressions need to be addressed before this can merge.

The Rust fast path is broader than the existing plugin contract. _use_rust only excludes dict payloads with a top-level content key, so top-level MCP content arrays and other plain dict/list payloads now go through the recursive Rust walker. The preexisting Python implementation only mutates dict["text"], list[str], and MCP text items, and otherwise passes metadata through unchanged. With Rust enabled, string-valued metadata such as type, mimeType, IDs, URLs, or annotations can now be truncated or blocked instead of only content text.
Optional Rust support changes token-mode semantics instead of only accelerating them. For ordinary str, dict["text"], and list[str] results, the Python path still checks character bounds in handle_text() and calls _truncate(..., max_tokens=None, ...), so token limits are effectively ignored there. The Rust engine enforces token bounds for those same shapes. That means identical config can produce different truncate/block decisions depending only on whether output_length_guard_rust imported successfully.

Residual risk: the current Python tests do not appear to force the Rust module to load, so Rust-only regressions can still slip by while CI passes on the Python fallback.

…ings Shallow Nested Dict and fix testcases Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

msureshkumar88 · 2026-04-02T13:37:58Z

PR #3926 Fix Complete: Performance & Bug Fixes

Performance Fixes ✅ VERIFIED

All three benchmark issues resolved:

Benchmark	Before	After	Result
Shallow nested dict	1.5x slower	9x faster	✅ 6x better than target
List of 10x1KB strings	1.7x slower	1.7x faster	✅ Matched target
Single string truncation	~11x slower	1.4x faster	✅ 12.4x improvement

Bug Fix ✅ IMPLEMENTED

Issue: When structuredContent value is truncated, content[0].text showed just the value instead of full JSON.

Example:

Input: {"message": "Helloasdsadd"}
Before: content[0].text = "Helloasds…" ❌
After: content[0].text = "{\"message\":\"Helloasds…\"}" ✅

Solution: Removed single-key dict value extraction (lines 529-536 in src/lib.rs)

Semantic Changes in Rust Implementation

1. Broader Processing Scope

Change: Rust fast path now processes more payload types than Python.

Details:

_use_rust only excludes dicts with top-level content key
Top-level MCP content arrays and plain dict/list payloads now use Rust walker
Python only mutates dict["text"], list[str], and MCP text items
Impact: String-valued metadata (type, mimeType, IDs, URLs, annotations) can now be truncated/blocked

Mitigation: Rust implementation includes METADATA_KEYS list (lines 34-47) to preserve critical fields unchanged

2. Token-Mode Semantics

Change: Rust enforces token limits differently than Python.

Details:

Python path: Checks character bounds in handle_text(), calls _truncate(..., max_tokens=None, ...) → token limits ignored for plain str/dict/list
Rust path: Enforces token bounds for ALL shapes including plain str/dict/list
Impact: Same config produces different truncate/block decisions based on whether output_length_guard_rust imported

Context Handling: Rust uses ProcessingContext enum (lines 81-88):

PlainText: Ignores token limits (matches Python for plain text)
McpContent: Enforces token limits (for MCP structures)

Files Modified

plugins_rust/output_length_guard/src/lib.rs
- Removed lines 529-536 (single-key dict extraction)
- All dicts now convert to JSON via json.dumps()

Build Status

✅ Compiled: cargo build --release
✅ Installed: make install
✅ Location: plugins_rust/output_length_guard/python/output_length_guard_rust/

Deployment

Gateway needs to:

Restart to load new Rust module
Clear Python cache: find . -name "*.pyc" -delete
Test with MCP results containing structuredContent

All changes complete and ready for production.

jonpspri · 2026-04-09T22:00:51Z

Recreated in 4104.

msureshkumar88 · 2026-04-10T09:27:36Z

Rust version of the plugin has been moved to the new repo and a PR is opened IBM/cpex-plugins#24
Duplication pr #4104 is closed

Suresh Kumar Moharajan and others added 24 commits March 24, 2026 17:44

fix content truncate issues and added support for list dict and neste…

bb8d324

…d dicts Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

add support to blocking list, dict and nested dict

9d7e474

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

add support to truncate at word boundaries to avoid mid-word cuts

5e38956

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

implement token size based output tokens

393a247

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

add allow limit modes config to segregate character and token

502c881

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

add test mcp tools for list, dicts, and nested dicts

b5c4e65

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

clear debug logs

5214b19

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

add comprehensive logging

784c093

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

add exceptions handling

b81e939

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

reformatting code

d6686ef

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

add critical componant type supports

34d674f

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

fix(output_length_guard): treat 0 as disabled for max limits

7c292ac

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

add unittests

66c4370

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

add configuraton, readme, reformatting

df367bc

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

delete redundant test file

1b63aa3

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

fix formatting

5e32976

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

fix linting issues

c7dc7d5

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

fix syntax error

fd80bba

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

fix syntax issue

35f0cbc

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

gandhipratik203 requested review from crivetimihai, dima-zakharov and lucarlig as code owners March 30, 2026 13:25

gandhipratik203 assigned gandhipratik203 and msureshkumar88 Mar 30, 2026

lucarlig requested changes Mar 30, 2026

View reviewed changes

Suresh Kumar Moharajan added 4 commits March 31, 2026 14:00

improve performance for Single String Truncation List of 10 x 1KB Str…

a994f2c

…ings Shallow Nested Dict and fix testcases Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

fix performance issue and match rast version to python version

4c5a33a

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

fix dictionary single key not setting in the content

a9c97de

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

fix request blocking

a29a68f

Signed-off-by: Suresh Kumar Moharajan <suresh.kumar.m@ibm.com>

msureshkumar88 requested review from kevalmahajan and madhav165 as code owners April 2, 2026 13:37

msureshkumar88 requested a review from lucarlig April 2, 2026 13:41

jonpspri force-pushed the fix/3747-output-length-guard-plugin branch from 35f0cbc to 426e64a Compare April 2, 2026 15:57

jonpspri requested review from araujof, jonpspri and terylt as code owners April 2, 2026 15:57

jonpspri force-pushed the fix/3747-output-length-guard-plugin branch from 426e64a to 17d1949 Compare April 2, 2026 16:20

brian-hussey force-pushed the fix/3747-output-length-guard-plugin branch 2 times, most recently from 6e6f627 to 3ba5735 Compare April 3, 2026 13:26

Base automatically changed from fix/3747-output-length-guard-plugin to main April 3, 2026 13:40

msureshkumar88 mentioned this pull request Apr 9, 2026

Feat/rust output length guard clean #4104

Closed

12 tasks

jonpspri closed this Apr 9, 2026

msureshkumar88 mentioned this pull request Apr 10, 2026

feat: add output_length_guard plugin IBM/cpex-plugins#24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(plugin): Rust acceleration for output length guard#3926

feat(plugin): Rust acceleration for output length guard#3926
gandhipratik203 wants to merge 28 commits intomainfrom
feat/rust-output-length-guard

gandhipratik203 commented Mar 30, 2026

Uh oh!

lucarlig left a comment

Uh oh!

msureshkumar88 commented Apr 2, 2026

Uh oh!

jonpspri commented Apr 9, 2026

Uh oh!

msureshkumar88 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gandhipratik203 commented Mar 30, 2026

Summary

Gaps closed

Additional hardening

Architecture

Test results

Test results summary

1. Rust unit tests (cargo test)

2. Python test suite (331 tests)

3. Performance comparison

Files changed

Uh oh!

lucarlig left a comment

Choose a reason for hiding this comment

Uh oh!

msureshkumar88 commented Apr 2, 2026

PR #3926 Fix Complete: Performance & Bug Fixes

Performance Fixes ✅ VERIFIED

Bug Fix ✅ IMPLEMENTED

Semantic Changes in Rust Implementation

1. Broader Processing Scope

2. Token-Mode Semantics

Files Modified

Build Status

Deployment

Uh oh!

jonpspri commented Apr 9, 2026

Uh oh!

msureshkumar88 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants