feat: add proactive context compression to conversation managers#2239
feat: add proactive context compression to conversation managers#2239opieter-aws wants to merge 2 commits intostrands-agents:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Assessment: Comment This is a well-structured feature implementation that cleanly ports the TypeScript design to Python. The hook-based architecture, backward compatibility, and opt-in semantics are all sound. The test coverage is solid at ~97%. Review Categories
The design is solid and aligned with the design doc. The issues raised are mostly about API hygiene and operational clarity rather than correctness. |
18c31b8 to
7a03888
Compare
|
Assessment: Comment The PR has been updated to address previous review feedback — the Remaining Items
The implementation is clean and well-tested overall. The main open question is whether the |
7a03888 to
7db57db
Compare
|
Aligned the Python implementation with sdk-typescript#965 in commit Changes Made1. Renamed
|
| TypeScript | Python |
|---|---|
proactiveCompression: true |
proactive_compression=True |
proactiveCompression: { compressionThreshold: 0.8 } |
proactive_compression={"compression_threshold": 0.8} |
reduce({ agent, model, error }) |
reduce_context(agent, e=None) |
error set → reactive |
e set → reactive |
error undefined → proactive |
e is None → proactive |
Tests: All 100 conversation manager tests pass ✅
Agent tests: All 465 pass ✅
| When ``e`` is set (reactive overflow recovery), attempts to truncate large tool results | ||
| first before falling back to message trimming. | ||
|
|
||
| When ``e`` is None (proactive compression or routine management), only trims messages |
There was a problem hiding this comment.
Issue: Docstring/code mismatch. The docstring states: "When e is None (proactive compression or routine management), only trims messages without attempting tool result truncation." However, the code at lines 197-206 unconditionally attempts tool result truncation regardless of whether e is set or None. The e parameter only affects the error path at lines 263-264.
Suggestion: Either:
- Update the docstring to match the actual behavior (tool result truncation always happens first, followed by message trimming), or
- Add an
if e is not None:guard around lines 197-206 to match the documented behavior of skipping truncation in proactive mode.
There was a problem hiding this comment.
Fixed in the latest push — tool result truncation is now guarded by if e is not None: (line 198), and a new test test_sliding_window_proactive_compression_skips_tool_result_truncation verifies the behavior. ✅
| **kwargs: Additional keyword arguments for future extensibility. | ||
| """ | ||
| # Always subscribe — the threshold check happens inside the handler | ||
| registry.add_callback(BeforeModelCallEvent, self._on_before_model_call_threshold) |
There was a problem hiding this comment.
Issue: The hook is now always registered even when proactive_compression is not configured (checking happens inside the handler via early return). While this simplifies the registration logic, it means every BeforeModelCallEvent for every agent with a conversation manager will invoke this callback — even when proactive compression is disabled. For agents with high call frequency this adds a (small) per-call overhead.
Suggestion: This is likely acceptable for simplicity and consistency (matching the TypeScript approach), but consider adding a brief comment here explaining why "always register" was chosen over "register only when configured" — e.g., for dynamic enable/disable in the future, or to keep registration logic simple.
|
Assessment: Comment Good rework aligning the Python API with the TypeScript SDK. The unified Review Items
The |
af2e0da to
5608322
Compare
|
Assessment: Approve All critical and important issues from previous review rounds have been addressed. The docstring/code mismatch is fixed (tool truncation guarded by Minor Remaining Suggestions (non-blocking)
Clean implementation aligned with the TypeScript SDK design. The |
Match the implementation and interfaces from strands-agents/sdk-typescript#965: 1. Rename parameter: compression_threshold → proactive_compression - Accepts bool | ProactiveCompressionConfig | None - True = default threshold (0.7), dict = custom, None/False = disabled 2. Remove reduce_on_threshold() method — unified into reduce_context() - reduce_context(agent, e=None) distinguishes reactive vs proactive: - e set: reactive overflow recovery, MUST reduce or rethrow - e None: proactive/routine, best-effort (errors swallowed) 3. Always register BeforeModelCallEvent hook - Check happens inside the handler (early return when not configured) - Removed subclass-override detection logic 4. SummarizingConversationManager.reduce_context() error handling: - Reactive (e set): rethrow on failure - Proactive (e None): log warning and return silently 5. Export ProactiveCompressionConfig from conversation_manager package 6. Add DEFAULT_COMPRESSION_THRESHOLD = 0.7 constant Usage (matches TypeScript DX): SlidingWindowConversationManager(window_size=50, proactive_compression=True) SummarizingConversationManager(proactive_compression={'compression_threshold': 0.8})
5608322 to
e8433c8
Compare
|
Assessment: Approve No substantive changes since my last review — the code remains in good shape. The logging format at line 136 now follows the project's structured All previously identified issues (docstring/code mismatch, NullConversationManager contract, error handling, test coverage) remain resolved. No new concerns. |
Description
Conversation managers today are entirely reactive — they only reduce context after the model rejects a request with a
ContextWindowOverflowException. This wastes a round-trip and causes output tokenstarvation as input tokens approach the limit.
The
ConversationManagerbase class now accepts an optionalcompression_thresholdparameter. When set, it registers aBeforeModelCallEventhook that checksprojected_input_tokens / context_window_limitand callsreduce_on_threshold()on the subclass before the model call. Both built-in managers implementreduce_on_threshold. Custom managers that don't override it areunaffected — existing behavior is completely unchanged when
compression_thresholdis not set.This is the Python port of strands-agents/sdk-typescript#965, implementing the design from
0008-proactive-context-compression.
ConversationManager base class gains:
SlidingWindowConversationManager.reduce_on_threshold delegates to existing reduce_context() logic. SummarizingConversationManager.reduce_on_threshold calls a shared _summarize_oldest() with best-effort
error handling — errors are swallowed so the model call can still proceed.
Related Issues
Resolves: #555
Documentation PR
Will follow
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.