feat: add proactive context compression to conversation managers by opieter-aws · Pull Request #2239 · strands-agents/sdk-python

opieter-aws · 2026-05-01T18:20:27Z

Description

Conversation managers today are entirely reactive — they only reduce context after the model rejects a request with a ContextWindowOverflowException. This wastes a round-trip and causes output token
starvation as input tokens approach the limit.

The ConversationManager base class now accepts an optional compression_threshold parameter. When set, it registers a BeforeModelCallEvent hook that checks projected_input_tokens / context_window_limit and calls reduce_on_threshold() on the subclass before the model call. Both built-in managers implement reduce_on_threshold. Custom managers that don't override it are
unaffected — existing behavior is completely unchanged when compression_threshold is not set.

This is the Python port of strands-agents/sdk-typescript#965, implementing the design from
0008-proactive-context-compression.

# Before: only reactive overflow recovery
SlidingWindowConversationManager(window_size=50)
SummarizingConversationManager()

# After: proactive compression at 70% context usage
SlidingWindowConversationManager(window_size=50, compression_threshold=0.7)
SummarizingConversationManager(compression_threshold=0.7)

ConversationManager base class gains:

compression_threshold keyword argument in init() — float in (0, 1] or None
reduce_on_threshold(agent, model) method — subclasses override to opt in; default returns False

SlidingWindowConversationManager.reduce_on_threshold delegates to existing reduce_context() logic. SummarizingConversationManager.reduce_on_threshold calls a shared _summarize_oldest() with best-effort
error handling — errors are swallowed so the model call can still proceed.

Related Issues

Resolves: #555

Documentation PR

Will follow

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov · 2026-05-01T18:22:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions · 2026-05-01T18:26:25Z

Assessment: Comment

This is a well-structured feature implementation that cleanly ports the TypeScript design to Python. The hook-based architecture, backward compatibility, and opt-in semantics are all sound. The test coverage is solid at ~97%.

Review Categories

API Design: The model parameter in reduce_on_threshold is unused by both implementations—consider whether it's needed or if agent.model suffices. This PR adds a new public abstract-ish method to ConversationManager that custom subclasses may want to implement; consider adding the needs-api-review label.
Global State: The module-level _context_window_limit_warned boolean introduces thread-safety concerns and test isolation issues. Instance-level state would be cleaner.
Error Handling: There's a double error-swallowing pattern between the base class hook handler and the SummarizingConversationManager.reduce_on_threshold—documenting the contract clearly would help future implementors.
Logging Severity: The logger.error in the best-effort reduce_on_threshold path may be too noisy given the explicitly non-fatal semantics.

The design is solid and aligned with the design doc. The issues raised are mostly about API hygiene and operational clarity rather than correctness.

github-actions · 2026-05-01T21:01:46Z

Assessment: Comment

The PR has been updated to address previous review feedback — the model parameter was removed from reduce_on_threshold, global state was moved to instance variables, warnings.warn was replaced with logger.warning, and the error-handling contract is now documented in the base class docstring. These are all good improvements.

Remaining Items

API Design: The new DEFAULT_CONTEXT_WINDOW_LIMIT = 200_000 fallback diverges from the design doc's stance that users must set context_window_limit. This silently masks misconfiguration for models with different context windows. Consider whether disabling compression (previous behavior) was more appropriate, or document the tradeoff.
Behavioral Clarity: The SlidingWindowConversationManager.reduce_on_threshold delegates to reduce_context which always trims at least 2 messages when within window size — this behavior under proactive compression could be surprising and warrants a docstring note.
Test Gap: No test verifies the "warn only once per instance" behavior of _context_window_limit_warned.
API Review Label: This PR introduces a new public method (reduce_on_threshold) and constructor parameter on ConversationManager that custom subclasses may implement. Per the API Bar Raising process, consider whether a needs-api-review label is warranted for this moderate-scope API addition.

The implementation is clean and well-tested overall. The main open question is whether the DEFAULT_CONTEXT_WINDOW_LIMIT fallback is the right DX choice vs. requiring explicit configuration.

agent-of-mkmeral · 2026-05-06T19:59:12Z

Aligned the Python implementation with sdk-typescript#965 in commit af2e0da. Here's a summary:

Changes Made

1. Renamed `compression_threshold` → `proactive_compression`

Now accepts bool | ProactiveCompressionConfig | None matching the TS interface:

# Before
SlidingWindowConversationManager(window_size=50, compression_threshold=0.7)

# After (matches TypeScript DX)
SlidingWindowConversationManager(window_size=50, proactive_compression=True)  # default 0.7
SummarizingConversationManager(proactive_compression={"compression_threshold": 0.8})  # custom

2. Removed `reduce_on_threshold()` — unified into `reduce_context()`

Matches TS's single reduce() method. The e parameter distinguishes reactive vs proactive:

e set → reactive overflow recovery: MUST reduce or rethrow
e is None → proactive compression: best-effort (errors swallowed)

3. Always register `BeforeModelCallEvent` hook

The hook is now always registered (check happens inside the handler with early return). Removed the subclass-override detection logic.

4. `SummarizingConversationManager.reduce_context()` error handling

Reactive (e set): rethrow on failure (unchanged)
Proactive (e None): log warning and return silently (matches TS best-effort semantics)

5. New exports

ProactiveCompressionConfig TypedDict exported from conversation_manager package
DEFAULT_COMPRESSION_THRESHOLD = 0.7 constant

API Mapping (Python ↔ TypeScript)

TypeScript	Python
`proactiveCompression: true`	`proactive_compression=True`
`proactiveCompression: { compressionThreshold: 0.8 }`	`proactive_compression={"compression_threshold": 0.8}`
`reduce({ agent, model, error })`	`reduce_context(agent, e=None)`
`error` set → reactive	`e` set → reactive
`error` undefined → proactive	`e` is None → proactive

Tests: All 100 conversation manager tests pass ✅
Agent tests: All 465 pass ✅

github-actions · 2026-05-06T20:41:40Z

+        When ``e`` is set (reactive overflow recovery), attempts to truncate large tool results
+        first before falling back to message trimming.
+
+        When ``e`` is None (proactive compression or routine management), only trims messages


Issue: Docstring/code mismatch. The docstring states: "When e is None (proactive compression or routine management), only trims messages without attempting tool result truncation." However, the code at lines 197-206 unconditionally attempts tool result truncation regardless of whether e is set or None. The e parameter only affects the error path at lines 263-264.

Suggestion: Either:

Update the docstring to match the actual behavior (tool result truncation always happens first, followed by message trimming), or

Add an if e is not None: guard around lines 197-206 to match the documented behavior of skipping truncation in proactive mode.

Fixed in the latest push — tool result truncation is now guarded by if e is not None: (line 198), and a new test test_sliding_window_proactive_compression_skips_tool_result_truncation verifies the behavior. ✅

github-actions · 2026-05-06T20:41:42Z

            **kwargs: Additional keyword arguments for future extensibility.
+        """
+        # Always subscribe — the threshold check happens inside the handler
+        registry.add_callback(BeforeModelCallEvent, self._on_before_model_call_threshold)


Issue: The hook is now always registered even when proactive_compression is not configured (checking happens inside the handler via early return). While this simplifies the registration logic, it means every BeforeModelCallEvent for every agent with a conversation manager will invoke this callback — even when proactive compression is disabled. For agents with high call frequency this adds a (small) per-call overhead.

Suggestion: This is likely acceptable for simplicity and consistency (matching the TypeScript approach), but consider adding a brief comment here explaining why "always register" was chosen over "register only when configured" — e.g., for dynamic enable/disable in the future, or to keep registration logic simple.

github-actions · 2026-05-06T20:41:47Z

Assessment: Comment

Good rework aligning the Python API with the TypeScript SDK. The unified reduce_context(agent, e=None) approach is simpler than the separate reduce_on_threshold() method — using e to distinguish reactive vs. proactive is elegant and reduces the public API surface for custom subclass implementors.

Review Items

Bug (Important): SlidingWindowConversationManager.reduce_context docstring claims tool result truncation is skipped when e is None, but the code always attempts it. Either the doc or code needs updating.
Bug (Important): NullConversationManager.reduce_context raises even when e=None, which violates the new proactive best-effort contract. Should return silently when e is None.
Suggestion: The always-register hook pattern adds a (tiny) per-call overhead for all agents even when proactive compression is disabled. Worth a brief code comment explaining the design choice.
Suggestion: The warning log at line 136 could better follow the project's structured logging format from AGENTS.md.

The DEFAULT_CONTEXT_WINDOW_LIMIT discussion from prior reviews has been acknowledged by the author (resolved via upcoming lookup table). The two behavioral bugs above should be addressed before merge.

github-actions · 2026-05-07T14:56:49Z

Assessment: Approve

All critical and important issues from previous review rounds have been addressed. The docstring/code mismatch is fixed (tool truncation guarded by if e is not None:), NullConversationManager now correctly handles the proactive contract, and new tests verify both behaviors. Code coverage is at 100%.

Minor Remaining Suggestions (non-blocking)

The warning log at line 136 of conversation_manager.py could follow the project's field=<value> | message structured format from AGENTS.md.
A brief inline comment at line 113-114 explaining why the hook is always registered (even when proactive compression is disabled) would help future readers.

Clean implementation aligned with the TypeScript SDK design. The DEFAULT_CONTEXT_WINDOW_LIMIT approach is acknowledged as a temporary measure pending the per-provider lookup table.

Match the implementation and interfaces from strands-agents/sdk-typescript#965: 1. Rename parameter: compression_threshold → proactive_compression - Accepts bool | ProactiveCompressionConfig | None - True = default threshold (0.7), dict = custom, None/False = disabled 2. Remove reduce_on_threshold() method — unified into reduce_context() - reduce_context(agent, e=None) distinguishes reactive vs proactive: - e set: reactive overflow recovery, MUST reduce or rethrow - e None: proactive/routine, best-effort (errors swallowed) 3. Always register BeforeModelCallEvent hook - Check happens inside the handler (early return when not configured) - Removed subclass-override detection logic 4. SummarizingConversationManager.reduce_context() error handling: - Reactive (e set): rethrow on failure - Proactive (e None): log warning and return silently 5. Export ProactiveCompressionConfig from conversation_manager package 6. Add DEFAULT_COMPRESSION_THRESHOLD = 0.7 constant Usage (matches TypeScript DX): SlidingWindowConversationManager(window_size=50, proactive_compression=True) SummarizingConversationManager(proactive_compression={'compression_threshold': 0.8})

github-actions · 2026-05-07T21:06:33Z

Assessment: Approve

No substantive changes since my last review — the code remains in good shape. The logging format at line 136 now follows the project's structured field=<value> | message convention, addressing the last minor suggestion.

All previously identified issues (docstring/code mismatch, NullConversationManager contract, error handling, test coverage) remain resolved. No new concerns.

github-actions Bot added the size/l label May 1, 2026

opieter-aws had a problem deploying to auto-approve May 1, 2026 18:20 — with GitHub Actions Failure

opieter-aws temporarily deployed to manual-approval May 1, 2026 18:20 — with GitHub Actions Inactive

github-actions Bot added the strands-running label May 1, 2026

github-actions Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated

github-actions Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated

github-actions Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/strands/agent/conversation_manager/conversation_manager.py

github-actions Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated

github-actions Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/strands/agent/conversation_manager/sliding_window_conversation_manager.py Outdated

github-actions Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/strands/agent/conversation_manager/summarizing_conversation_manager.py Outdated

github-actions Bot reviewed May 1, 2026

View reviewed changes

Comment thread tests/strands/agent/test_conversation_manager.py Outdated

github-actions Bot removed the strands-running label May 1, 2026

opieter-aws force-pushed the feat/compression-threshold branch from 18c31b8 to 7a03888 Compare May 1, 2026 20:52

github-actions Bot added size/l and removed size/l labels May 1, 2026

opieter-aws temporarily deployed to manual-approval May 1, 2026 20:56 — with GitHub Actions Inactive

opieter-aws had a problem deploying to auto-approve May 1, 2026 20:56 — with GitHub Actions Failure