Skip to content

feat: add proactive context compression to conversation managers#2239

Open
opieter-aws wants to merge 2 commits intostrands-agents:mainfrom
opieter-aws:feat/compression-threshold
Open

feat: add proactive context compression to conversation managers#2239
opieter-aws wants to merge 2 commits intostrands-agents:mainfrom
opieter-aws:feat/compression-threshold

Conversation

@opieter-aws
Copy link
Copy Markdown
Contributor

@opieter-aws opieter-aws commented May 1, 2026

Description

Conversation managers today are entirely reactive — they only reduce context after the model rejects a request with a ContextWindowOverflowException. This wastes a round-trip and causes output token
starvation as input tokens approach the limit.

The ConversationManager base class now accepts an optional compression_threshold parameter. When set, it registers a BeforeModelCallEvent hook that checks projected_input_tokens / context_window_limit and calls reduce_on_threshold() on the subclass before the model call. Both built-in managers implement reduce_on_threshold. Custom managers that don't override it are
unaffected — existing behavior is completely unchanged when compression_threshold is not set.

This is the Python port of strands-agents/sdk-typescript#965, implementing the design from
0008-proactive-context-compression.

# Before: only reactive overflow recovery
SlidingWindowConversationManager(window_size=50)
SummarizingConversationManager()

# After: proactive compression at 70% context usage
SlidingWindowConversationManager(window_size=50, compression_threshold=0.7)
SummarizingConversationManager(compression_threshold=0.7)

ConversationManager base class gains:

  • compression_threshold keyword argument in init() — float in (0, 1] or None
  • reduce_on_threshold(agent, model) method — subclasses override to opt in; default returns False

SlidingWindowConversationManager.reduce_on_threshold delegates to existing reduce_context() logic. SummarizingConversationManager.reduce_on_threshold calls a shared _summarize_oldest() with best-effort
error handling — errors are swallowed so the model call can still proceed.

Related Issues

Resolves: #555

Documentation PR

Will follow

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated
Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated
Comment thread src/strands/agent/conversation_manager/conversation_manager.py
Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated
Comment thread src/strands/agent/conversation_manager/sliding_window_conversation_manager.py Outdated
Comment thread src/strands/agent/conversation_manager/summarizing_conversation_manager.py Outdated
Comment thread tests/strands/agent/test_conversation_manager.py Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Assessment: Comment

This is a well-structured feature implementation that cleanly ports the TypeScript design to Python. The hook-based architecture, backward compatibility, and opt-in semantics are all sound. The test coverage is solid at ~97%.

Review Categories
  • API Design: The model parameter in reduce_on_threshold is unused by both implementations—consider whether it's needed or if agent.model suffices. This PR adds a new public abstract-ish method to ConversationManager that custom subclasses may want to implement; consider adding the needs-api-review label.
  • Global State: The module-level _context_window_limit_warned boolean introduces thread-safety concerns and test isolation issues. Instance-level state would be cleaner.
  • Error Handling: There's a double error-swallowing pattern between the base class hook handler and the SummarizingConversationManager.reduce_on_threshold—documenting the contract clearly would help future implementors.
  • Logging Severity: The logger.error in the best-effort reduce_on_threshold path may be too noisy given the explicitly non-fatal semantics.

The design is solid and aligned with the design doc. The issues raised are mostly about API hygiene and operational clarity rather than correctness.

Comment thread src/strands/agent/conversation_manager/conversation_manager.py
Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated
Comment thread src/strands/agent/conversation_manager/sliding_window_conversation_manager.py Outdated
Comment thread tests/strands/agent/test_conversation_manager.py
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Assessment: Comment

The PR has been updated to address previous review feedback — the model parameter was removed from reduce_on_threshold, global state was moved to instance variables, warnings.warn was replaced with logger.warning, and the error-handling contract is now documented in the base class docstring. These are all good improvements.

Remaining Items
  • API Design: The new DEFAULT_CONTEXT_WINDOW_LIMIT = 200_000 fallback diverges from the design doc's stance that users must set context_window_limit. This silently masks misconfiguration for models with different context windows. Consider whether disabling compression (previous behavior) was more appropriate, or document the tradeoff.
  • Behavioral Clarity: The SlidingWindowConversationManager.reduce_on_threshold delegates to reduce_context which always trims at least 2 messages when within window size — this behavior under proactive compression could be surprising and warrants a docstring note.
  • Test Gap: No test verifies the "warn only once per instance" behavior of _context_window_limit_warned.
  • API Review Label: This PR introduces a new public method (reduce_on_threshold) and constructor parameter on ConversationManager that custom subclasses may implement. Per the API Bar Raising process, consider whether a needs-api-review label is warranted for this moderate-scope API addition.

The implementation is clean and well-tested overall. The main open question is whether the DEFAULT_CONTEXT_WINDOW_LIMIT fallback is the right DX choice vs. requiring explicit configuration.

@opieter-aws opieter-aws force-pushed the feat/compression-threshold branch from 7a03888 to 7db57db Compare May 1, 2026 21:19
@github-actions github-actions Bot added size/l and removed size/l labels May 1, 2026
@agent-of-mkmeral
Copy link
Copy Markdown
Contributor

Aligned the Python implementation with sdk-typescript#965 in commit af2e0da. Here's a summary:

Changes Made

1. Renamed compression_thresholdproactive_compression

Now accepts bool | ProactiveCompressionConfig | None matching the TS interface:

# Before
SlidingWindowConversationManager(window_size=50, compression_threshold=0.7)

# After (matches TypeScript DX)
SlidingWindowConversationManager(window_size=50, proactive_compression=True)  # default 0.7
SummarizingConversationManager(proactive_compression={"compression_threshold": 0.8})  # custom

2. Removed reduce_on_threshold() — unified into reduce_context()

Matches TS's single reduce() method. The e parameter distinguishes reactive vs proactive:

  • e set → reactive overflow recovery: MUST reduce or rethrow
  • e is None → proactive compression: best-effort (errors swallowed)

3. Always register BeforeModelCallEvent hook

The hook is now always registered (check happens inside the handler with early return). Removed the subclass-override detection logic.

4. SummarizingConversationManager.reduce_context() error handling

  • Reactive (e set): rethrow on failure (unchanged)
  • Proactive (e None): log warning and return silently (matches TS best-effort semantics)

5. New exports

  • ProactiveCompressionConfig TypedDict exported from conversation_manager package
  • DEFAULT_COMPRESSION_THRESHOLD = 0.7 constant

API Mapping (Python ↔ TypeScript)

TypeScript Python
proactiveCompression: true proactive_compression=True
proactiveCompression: { compressionThreshold: 0.8 } proactive_compression={"compression_threshold": 0.8}
reduce({ agent, model, error }) reduce_context(agent, e=None)
error set → reactive e set → reactive
error undefined → proactive e is None → proactive

Tests: All 100 conversation manager tests pass ✅
Agent tests: All 465 pass ✅

When ``e`` is set (reactive overflow recovery), attempts to truncate large tool results
first before falling back to message trimming.

When ``e`` is None (proactive compression or routine management), only trims messages
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Docstring/code mismatch. The docstring states: "When e is None (proactive compression or routine management), only trims messages without attempting tool result truncation." However, the code at lines 197-206 unconditionally attempts tool result truncation regardless of whether e is set or None. The e parameter only affects the error path at lines 263-264.

Suggestion: Either:

  1. Update the docstring to match the actual behavior (tool result truncation always happens first, followed by message trimming), or
  2. Add an if e is not None: guard around lines 197-206 to match the documented behavior of skipping truncation in proactive mode.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest push — tool result truncation is now guarded by if e is not None: (line 198), and a new test test_sliding_window_proactive_compression_skips_tool_result_truncation verifies the behavior. ✅

**kwargs: Additional keyword arguments for future extensibility.
"""
# Always subscribe — the threshold check happens inside the handler
registry.add_callback(BeforeModelCallEvent, self._on_before_model_call_threshold)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The hook is now always registered even when proactive_compression is not configured (checking happens inside the handler via early return). While this simplifies the registration logic, it means every BeforeModelCallEvent for every agent with a conversation manager will invoke this callback — even when proactive compression is disabled. For agents with high call frequency this adds a (small) per-call overhead.

Suggestion: This is likely acceptable for simplicity and consistency (matching the TypeScript approach), but consider adding a brief comment here explaining why "always register" was chosen over "register only when configured" — e.g., for dynamic enable/disable in the future, or to keep registration logic simple.

Comment thread src/strands/agent/conversation_manager/conversation_manager.py Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Assessment: Comment

Good rework aligning the Python API with the TypeScript SDK. The unified reduce_context(agent, e=None) approach is simpler than the separate reduce_on_threshold() method — using e to distinguish reactive vs. proactive is elegant and reduces the public API surface for custom subclass implementors.

Review Items
  • Bug (Important): SlidingWindowConversationManager.reduce_context docstring claims tool result truncation is skipped when e is None, but the code always attempts it. Either the doc or code needs updating.
  • Bug (Important): NullConversationManager.reduce_context raises even when e=None, which violates the new proactive best-effort contract. Should return silently when e is None.
  • Suggestion: The always-register hook pattern adds a (tiny) per-call overhead for all agents even when proactive compression is disabled. Worth a brief code comment explaining the design choice.
  • Suggestion: The warning log at line 136 could better follow the project's structured logging format from AGENTS.md.

The DEFAULT_CONTEXT_WINDOW_LIMIT discussion from prior reviews has been acknowledged by the author (resolved via upcoming lookup table). The two behavioral bugs above should be addressed before merge.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Assessment: Approve

All critical and important issues from previous review rounds have been addressed. The docstring/code mismatch is fixed (tool truncation guarded by if e is not None:), NullConversationManager now correctly handles the proactive contract, and new tests verify both behaviors. Code coverage is at 100%.

Minor Remaining Suggestions (non-blocking)
  • The warning log at line 136 of conversation_manager.py could follow the project's field=<value> | message structured format from AGENTS.md.
  • A brief inline comment at line 113-114 explaining why the hook is always registered (even when proactive compression is disabled) would help future readers.

Clean implementation aligned with the TypeScript SDK design. The DEFAULT_CONTEXT_WINDOW_LIMIT approach is acknowledged as a temporary measure pending the per-provider lookup table.

Match the implementation and interfaces from strands-agents/sdk-typescript#965:

1. Rename parameter: compression_threshold → proactive_compression
   - Accepts bool | ProactiveCompressionConfig | None
   - True = default threshold (0.7), dict = custom, None/False = disabled

2. Remove reduce_on_threshold() method — unified into reduce_context()
   - reduce_context(agent, e=None) distinguishes reactive vs proactive:
     - e set: reactive overflow recovery, MUST reduce or rethrow
     - e None: proactive/routine, best-effort (errors swallowed)

3. Always register BeforeModelCallEvent hook
   - Check happens inside the handler (early return when not configured)
   - Removed subclass-override detection logic

4. SummarizingConversationManager.reduce_context() error handling:
   - Reactive (e set): rethrow on failure
   - Proactive (e None): log warning and return silently

5. Export ProactiveCompressionConfig from conversation_manager package

6. Add DEFAULT_COMPRESSION_THRESHOLD = 0.7 constant

Usage (matches TypeScript DX):
  SlidingWindowConversationManager(window_size=50, proactive_compression=True)
  SummarizingConversationManager(proactive_compression={'compression_threshold': 0.8})
@opieter-aws opieter-aws force-pushed the feat/compression-threshold branch from 5608322 to e8433c8 Compare May 7, 2026 14:59
@github-actions github-actions Bot added size/l and removed size/l labels May 7, 2026
@opieter-aws opieter-aws deployed to manual-approval May 7, 2026 14:59 — with GitHub Actions Active
@opieter-aws opieter-aws requested a review from lizradway May 7, 2026 19:42
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Assessment: Approve

No substantive changes since my last review — the code remains in good shape. The logging format at line 136 now follows the project's structured field=<value> | message convention, addressing the last minor suggestion.

All previously identified issues (docstring/code mismatch, NullConversationManager contract, error handling, test coverage) remain resolved. No new concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Proactive Context Compression

3 participants