Skip to content

feat: estimate input tokens before model calls#2221

Merged
opieter-aws merged 1 commit intostrands-agents:mainfrom
opieter-aws:feat/projected-input-tokens
Apr 29, 2026
Merged

feat: estimate input tokens before model calls#2221
opieter-aws merged 1 commit intostrands-agents:mainfrom
opieter-aws:feat/projected-input-tokens

Conversation

@opieter-aws
Copy link
Copy Markdown
Contributor

Description

Adds input token estimation to the agent loop, making it available on BeforeModelCallEvent before every model call. This is the Python port of strands-agents/sdk-typescript#890 and the foundation for proactive context compression.

With projected token counts available before the call, plugins and conversation managers can proactively compress context at a configurable threshold. The estimation uses the token counting strategy: it reads inputTokens + outputTokens from the last assistant message's metadata as a known baseline, then estimates only new messages added since (typically tool results) via model.count_tokens(). On cold start (no metadata available), it falls back to estimating all messages. Estimation is non-fatal. If it fails, the agent proceeds without it.

BeforeModelCallEvent now carries an optional projected_input_tokens field:

from strands.hooks import BeforeModelCallEvent
  
def on_before_model_call(event: BeforeModelCallEvent):
  print(event.projected_input_tokens)  # e.g. 14200
  
  agent.hooks.add_callback(BeforeModelCallEvent, on_before_model_call)

AgentResult and EventLoopMetrics now expose projected_context_size (inputTokens + outputTokens from the last cycle), matching the TypeScript SDK's projectedContextSize on Meter and AgentResult:

result = agent("Hello")
print(result.projected_context_size)  # e.g. 14250

Related Issues

#555

Documentation PR

Will do one docs update when proactive compression ships.

Type of Change

New feature

Testing

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread src/strands/hooks/events.py
Comment thread src/strands/event_loop/event_loop.py Outdated
Comment thread src/strands/telemetry/metrics.py
Comment thread src/strands/event_loop/event_loop.py
Comment thread tests/strands/agent/hooks/test_events.py Outdated
Comment thread src/strands/agent/agent_result.py
Comment thread src/strands/event_loop/event_loop.py Outdated
@github-actions
Copy link
Copy Markdown

Assessment: Comment

Good foundational work for proactive context compression. The estimation strategy (baseline from metadata + delta for new messages) is sound and the non-fatal design is the right call. A few items to address before merge:

Review Categories
  • API Naming Consistency: The two new "projected" surfaces use different names (projected_input_tokens on the hook event vs projected_context_size on metrics/result) and compute different values (forward-looking estimate vs backward-looking sum). This could confuse users building on both APIs. Consider aligning naming or making the distinction more explicit.
  • Cold-Start Accuracy: _estimate_input_tokens accepts tool_specs but is never called with them, so cold-start estimates exclude tool specification tokens. Wiring this up would improve accuracy.
  • API Review Process: Three new public API surfaces warrant the needs-api-review label per the bar raising process.
  • Test Coverage: Missing branch coverage for the partial None case in projected_context_size, and missing trailing newlines in several test files.

Clean implementation with good test coverage overall.

Comment thread src/strands/event_loop/event_loop.py
Comment thread tests/strands/event_loop/test_event_loop.py Outdated
@github-actions
Copy link
Copy Markdown

Assessment: Comment

Good progress since the last review — most prior feedback has been addressed (Pythonic iteration, tool_specs forwarding, missing test coverage, internal doc reference, trailing newlines). Two prior items remain open, and two new items surfaced.

Review Categories
  • API Naming (still open): projected_input_tokens on the hook event vs projected_context_size on metrics/result compute different values at different lifecycle points. This could confuse users building on both surfaces. See thread for details.
  • API Review Process (still open): New public API surfaces warrant the needs-api-review label per bar raising guidelines.
  • Cold-Start Accuracy (new): _estimate_input_tokens doesn't forward system_prompt_content for structured/multimodal system prompts, unlike stream_messages which does. Minor accuracy gap.
  • Test Argument Verification (new): Cold-start test verifies count_tokens was called but doesn't assert the forwarded arguments — assert_called_once_with would catch regressions.

The estimation strategy is solid and the non-fatal design is well-executed.

@github-actions
Copy link
Copy Markdown

Assessment: Approve

All code-level feedback from the prior two review rounds has been addressed. The naming rationale (TypeScript SDK alignment) was provided by the author and is reasonable. The metadata attachment on assistant messages is well-guarded by the message reconstruction in streaming.py and thoroughly tested in test_event_loop_metadata.py. No new issues found.

@opieter-aws opieter-aws marked this pull request as ready for review April 28, 2026 23:11
Comment thread src/strands/event_loop/event_loop.py Outdated
@github-actions
Copy link
Copy Markdown

Assessment: Approve

The ordering concern raised by lizradway has been cleanly addressed — _estimate_input_tokens now resolves tool specs lazily only for cold-start estimation, while the actual model call's tool specs are still resolved after BeforeModelCallEvent. All prior review feedback has been incorporated. No new issues found.

@opieter-aws opieter-aws merged commit 888c98c into strands-agents:main Apr 29, 2026
20 of 21 checks passed
@opieter-aws opieter-aws deleted the feat/projected-input-tokens branch April 29, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants