feat: estimate input tokens before model calls#2221
feat: estimate input tokens before model calls#2221opieter-aws merged 1 commit intostrands-agents:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Assessment: Comment Good foundational work for proactive context compression. The estimation strategy (baseline from metadata + delta for new messages) is sound and the non-fatal design is the right call. A few items to address before merge: Review Categories
Clean implementation with good test coverage overall. |
b0cc02a to
1c21293
Compare
|
Assessment: Comment Good progress since the last review — most prior feedback has been addressed (Pythonic iteration, Review Categories
The estimation strategy is solid and the non-fatal design is well-executed. |
1c21293 to
ea1e163
Compare
|
Assessment: Approve All code-level feedback from the prior two review rounds has been addressed. The naming rationale (TypeScript SDK alignment) was provided by the author and is reasonable. The metadata attachment on assistant messages is well-guarded by the message reconstruction in |
ea1e163 to
532696e
Compare
|
Assessment: Approve The ordering concern raised by lizradway has been cleanly addressed — |
Description
Adds input token estimation to the agent loop, making it available on BeforeModelCallEvent before every model call. This is the Python port of strands-agents/sdk-typescript#890 and the foundation for proactive context compression.
With projected token counts available before the call, plugins and conversation managers can proactively compress context at a configurable threshold. The estimation uses the token counting strategy: it reads inputTokens + outputTokens from the last assistant message's metadata as a known baseline, then estimates only new messages added since (typically tool results) via model.count_tokens(). On cold start (no metadata available), it falls back to estimating all messages. Estimation is non-fatal. If it fails, the agent proceeds without it.
BeforeModelCallEvent now carries an optional projected_input_tokens field:
AgentResult and EventLoopMetrics now expose projected_context_size (inputTokens + outputTokens from the last cycle), matching the TypeScript SDK's projectedContextSize on Meter and AgentResult:
Related Issues
#555
Documentation PR
Will do one docs update when proactive compression ships.
Type of Change
New feature
Testing
Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.