Skip to content

feat: add proactive context compression to conversation managers#2239

Open
opieter-aws wants to merge 1 commit intostrands-agents:mainfrom
opieter-aws:feat/compression-threshold
Open

feat: add proactive context compression to conversation managers#2239
opieter-aws wants to merge 1 commit intostrands-agents:mainfrom
opieter-aws:feat/compression-threshold

Conversation

@opieter-aws
Copy link
Copy Markdown
Contributor

@opieter-aws opieter-aws commented May 1, 2026

Description

Conversation managers today are entirely reactive — they only reduce context after the model rejects a request with a ContextWindowOverflowException. This wastes a round-trip and causes output token
starvation as input tokens approach the limit.

The ConversationManager base class now accepts an optional compression_threshold parameter. When set, it registers a BeforeModelCallEvent hook that checks projected_input_tokens / context_window_limit and calls reduce_on_threshold() on the subclass before the model call. Both built-in managers implement reduce_on_threshold. Custom managers that don't override it are
unaffected — existing behavior is completely unchanged when compression_threshold is not set.

This is the Python port of strands-agents/sdk-typescript#965, implementing the design from
0008-proactive-context-compression.

# Before: only reactive overflow recovery
SlidingWindowConversationManager(window_size=50)
SummarizingConversationManager()

# After: proactive compression at 70% context usage
SlidingWindowConversationManager(window_size=50, compression_threshold=0.7)
SummarizingConversationManager(compression_threshold=0.7)

ConversationManager base class gains:

  • compression_threshold keyword argument in init() — float in (0, 1] or None
  • reduce_on_threshold(agent, model) method — subclasses override to opt in; default returns False

SlidingWindowConversationManager.reduce_on_threshold delegates to existing reduce_context() logic. SummarizingConversationManager.reduce_on_threshold calls a shared _summarize_oldest() with best-effort
error handling — errors are swallowed so the model call can still proceed.

Related Issues

Resolves: #555

Documentation PR

Will follow

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 98.41270% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...agent/conversation_manager/conversation_manager.py 97.14% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated
Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated
ratio,
self._compression_threshold,
)
try:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: There's redundant error handling between this base class try/except (line 157-160) and SummarizingConversationManager.reduce_on_threshold which also catches and swallows all exceptions. The base class already provides the safety net, so the subclass's try/except is effectively dead code for error propagation.

Suggestion: The design intent is clear (best-effort, non-fatal), but consider documenting in the reduce_on_threshold docstring that the base class already catches exceptions, so subclass implementations don't need to defensively swallow errors—they can let them propagate and the base class will handle them. This simplifies subclass implementations.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been addressed — SummarizingConversationManager.reduce_on_threshold no longer catches exceptions itself, and the base class docstring now explicitly documents the error-handling contract (lines 80-83, 94-96). The code is cleaner and the contract is clear. ✅

Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated
Comment thread src/strands/agent/conversation_manager/sliding_window_conversation_manager.py Outdated
Comment thread src/strands/agent/conversation_manager/summarizing_conversation_manager.py Outdated
Comment thread tests/strands/agent/test_conversation_manager.py Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Assessment: Comment

This is a well-structured feature implementation that cleanly ports the TypeScript design to Python. The hook-based architecture, backward compatibility, and opt-in semantics are all sound. The test coverage is solid at ~97%.

Review Categories
  • API Design: The model parameter in reduce_on_threshold is unused by both implementations—consider whether it's needed or if agent.model suffices. This PR adds a new public abstract-ish method to ConversationManager that custom subclasses may want to implement; consider adding the needs-api-review label.
  • Global State: The module-level _context_window_limit_warned boolean introduces thread-safety concerns and test isolation issues. Instance-level state would be cleaner.
  • Error Handling: There's a double error-swallowing pattern between the base class hook handler and the SummarizingConversationManager.reduce_on_threshold—documenting the contract clearly would help future implementors.
  • Logging Severity: The logger.error in the best-effort reduce_on_threshold path may be too noisy given the explicitly non-fatal semantics.

The design is solid and aligned with the design doc. The issues raised are mostly about API hygiene and operational clarity rather than correctness.

Comment thread src/strands/agent/conversation_manager/conversation_manager.py
Comment thread src/strands/agent/conversation_manager/conversation_manager.py
Comment thread tests/strands/agent/test_conversation_manager.py
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Assessment: Comment

The PR has been updated to address previous review feedback — the model parameter was removed from reduce_on_threshold, global state was moved to instance variables, warnings.warn was replaced with logger.warning, and the error-handling contract is now documented in the base class docstring. These are all good improvements.

Remaining Items
  • API Design: The new DEFAULT_CONTEXT_WINDOW_LIMIT = 200_000 fallback diverges from the design doc's stance that users must set context_window_limit. This silently masks misconfiguration for models with different context windows. Consider whether disabling compression (previous behavior) was more appropriate, or document the tradeoff.
  • Behavioral Clarity: The SlidingWindowConversationManager.reduce_on_threshold delegates to reduce_context which always trims at least 2 messages when within window size — this behavior under proactive compression could be surprising and warrants a docstring note.
  • Test Gap: No test verifies the "warn only once per instance" behavior of _context_window_limit_warned.
  • API Review Label: This PR introduces a new public method (reduce_on_threshold) and constructor parameter on ConversationManager that custom subclasses may implement. Per the API Bar Raising process, consider whether a needs-api-review label is warranted for this moderate-scope API addition.

The implementation is clean and well-tested overall. The main open question is whether the DEFAULT_CONTEXT_WINDOW_LIMIT fallback is the right DX choice vs. requiring explicit configuration.

@opieter-aws opieter-aws force-pushed the feat/compression-threshold branch from 7a03888 to 7db57db Compare May 1, 2026 21:19
@github-actions github-actions Bot added size/l and removed size/l labels May 1, 2026
@opieter-aws opieter-aws marked this pull request as ready for review May 1, 2026 21:21
@opieter-aws opieter-aws deployed to manual-approval May 1, 2026 21:21 — with GitHub Actions Active

logger = logging.getLogger(__name__)

DEFAULT_CONTEXT_WINDOW_LIMIT = 200_000
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This constant introduces implicit behavior that diverges from the design doc, which states: "Users must set contextWindowLimit on their model config until we ship per-provider default lookup tables." A model with a 32k context window using this 200k fallback would never trigger proactive compression until the real overflow occurs — defeating the purpose of the feature.

The warning at line 140 is good, but it only fires once and at WARNING level which may be missed in production logging. The previous iteration disabled compression entirely when context_window_limit was None, which was a safer fail-closed approach.

Suggestion: Consider one of:

  1. Fail-closed (safer): Skip the threshold check when context_window_limit is None (log warning, return early). This matches the design doc and avoids silent miscalculation.
  2. Keep fallback but make it explicit: Add default_context_window_limit as a parameter to __init__ so users consciously opt in to the fallback assumption.
  3. If keeping as-is: Add a note in the class docstring explaining the 200k fallback behavior and why it was chosen over disabling.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Assessment: Comment

The author has addressed feedback from both previous review rounds. The error-handling contract is now well-documented, the model parameter was removed, instance-level warning state is clean, and a test covers the "warn once per instance" behavior.

Remaining Items
  • DEFAULT_CONTEXT_WINDOW_LIMIT fallback (Important): The 200k fallback diverges from the design doc's fail-closed approach. A model with a smaller context window would get incorrect threshold behavior silently. Consider disabling compression when context_window_limit is None, or making the fallback explicit via a parameter.
  • per_turn + compression_threshold interaction (Suggestion): Both features register independent BeforeModelCallEvent callbacks that can both trim messages in the same cycle. A brief docstring note about execution order would help users combining them.

The implementation is in good shape. The DEFAULT_CONTEXT_WINDOW_LIMIT design choice is the last remaining substantive question — it's a tradeoff between DX convenience and correctness that would benefit from explicit team alignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Proactive Context Compression

1 participant