feat: add proactive context compression to conversation managers#2239
feat: add proactive context compression to conversation managers#2239opieter-aws wants to merge 1 commit intostrands-agents:mainfrom
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| ratio, | ||
| self._compression_threshold, | ||
| ) | ||
| try: |
There was a problem hiding this comment.
Issue: There's redundant error handling between this base class try/except (line 157-160) and SummarizingConversationManager.reduce_on_threshold which also catches and swallows all exceptions. The base class already provides the safety net, so the subclass's try/except is effectively dead code for error propagation.
Suggestion: The design intent is clear (best-effort, non-fatal), but consider documenting in the reduce_on_threshold docstring that the base class already catches exceptions, so subclass implementations don't need to defensively swallow errors—they can let them propagate and the base class will handle them. This simplifies subclass implementations.
There was a problem hiding this comment.
This has been addressed — SummarizingConversationManager.reduce_on_threshold no longer catches exceptions itself, and the base class docstring now explicitly documents the error-handling contract (lines 80-83, 94-96). The code is cleaner and the contract is clear. ✅
|
Assessment: Comment This is a well-structured feature implementation that cleanly ports the TypeScript design to Python. The hook-based architecture, backward compatibility, and opt-in semantics are all sound. The test coverage is solid at ~97%. Review Categories
The design is solid and aligned with the design doc. The issues raised are mostly about API hygiene and operational clarity rather than correctness. |
18c31b8 to
7a03888
Compare
|
Assessment: Comment The PR has been updated to address previous review feedback — the Remaining Items
The implementation is clean and well-tested overall. The main open question is whether the |
7a03888 to
7db57db
Compare
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| DEFAULT_CONTEXT_WINDOW_LIMIT = 200_000 |
There was a problem hiding this comment.
Issue: This constant introduces implicit behavior that diverges from the design doc, which states: "Users must set contextWindowLimit on their model config until we ship per-provider default lookup tables." A model with a 32k context window using this 200k fallback would never trigger proactive compression until the real overflow occurs — defeating the purpose of the feature.
The warning at line 140 is good, but it only fires once and at WARNING level which may be missed in production logging. The previous iteration disabled compression entirely when context_window_limit was None, which was a safer fail-closed approach.
Suggestion: Consider one of:
- Fail-closed (safer): Skip the threshold check when
context_window_limitis None (log warning, return early). This matches the design doc and avoids silent miscalculation. - Keep fallback but make it explicit: Add
default_context_window_limitas a parameter to__init__so users consciously opt in to the fallback assumption. - If keeping as-is: Add a note in the class docstring explaining the 200k fallback behavior and why it was chosen over disabling.
|
Assessment: Comment The author has addressed feedback from both previous review rounds. The error-handling contract is now well-documented, the Remaining Items
The implementation is in good shape. The |
Description
Conversation managers today are entirely reactive — they only reduce context after the model rejects a request with a
ContextWindowOverflowException. This wastes a round-trip and causes output tokenstarvation as input tokens approach the limit.
The
ConversationManagerbase class now accepts an optionalcompression_thresholdparameter. When set, it registers aBeforeModelCallEventhook that checksprojected_input_tokens / context_window_limitand callsreduce_on_threshold()on the subclass before the model call. Both built-in managers implementreduce_on_threshold. Custom managers that don't override it areunaffected — existing behavior is completely unchanged when
compression_thresholdis not set.This is the Python port of strands-agents/sdk-typescript#965, implementing the design from
0008-proactive-context-compression.
ConversationManager base class gains:
SlidingWindowConversationManager.reduce_on_threshold delegates to existing reduce_context() logic. SummarizingConversationManager.reduce_on_threshold calls a shared _summarize_oldest() with best-effort
error handling — errors are swallowed so the model call can still proceed.
Related Issues
Resolves: #555
Documentation PR
Will follow
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.