fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809)#1639
Conversation
…oss all HTTP starter base templates (aws#808, aws#809) All five HTTP starter base templates (strands, openaiagents, googleadk, langchain_langgraph, autogen) plus the regenerated assets snapshot. Only base/main.py template content changed; the strands hasMemory branch and all other framework files are untouched. Refs aws#808 Refs aws#809
Package TarballHow to installgh release download pr-1639-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.0.tgz |
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Thanks for the fix — the per-session memory bug is real and the overall approach (key in-process state by context.session_id) is the right design. Most of the diff looks good; flagging a few correctness/scale issues that I think need to be addressed before merge.
The core concern: the PR description explicitly justifies the 128-entry cap for the openaiagents/autogen/strands templates ("caps process memory and evicted sessions restart fresh"), but the googleadk and langchain_langgraph templates use module-level in-memory stores (InMemorySessionService, InMemorySaver) with no bound at all. That just trades the original cross-session bleed (#809-style) for an unbounded memory leak in any long-running process serving many sessions (e.g. agentcore dev over a long session, or non-Runtime deployments which the PR description specifically calls out as the failure mode of #809). The fix should be consistent across all five templates.
The FIFO-vs-LRU comments inline are smaller but worth correcting since the docstrings claim LRU.
| session_service = InMemorySessionService() | ||
| session = await session_service.create_session( | ||
| # Module-level session service and runner (preserves history across invocations) | ||
| _session_service = InMemorySessionService() |
There was a problem hiding this comment.
Unbounded session growth. _session_service = InMemorySessionService() at module scope retains every (app_name, user_id, session_id) triple forever — there is no eviction path. The PR description argues that the 128-entry cap on the other templates "caps process memory and evicted sessions restart fresh", but this template has no cap, so a long-running process (notably agentcore dev over time, and any non-Runtime deployment) will grow without bound as new session ids arrive.
The ADK InMemorySessionService doesn't expose a public eviction API as far as I can see, so options are roughly:
- Wrap session creation in a bounded FIFO/LRU and call
_session_service.delete_session(...)on eviction (ADK does exposedelete_session). - Keep a
collections.OrderedDictof recently-touched(user_id, session_id)keys alongside the service and prune viadelete_sessiononce it exceeds N (e.g. 128, to match the other templates). - Subclass / wrap to enforce the cap at create-time.
Whichever you pick, please match the 128-entry behavior already documented for openaiagents/autogen/strands so the fix is consistent across templates.
| tools = [add_numbers] | ||
|
|
||
| # Module-level checkpointer preserves conversation history across invocations | ||
| _checkpointer = InMemorySaver() |
There was a problem hiding this comment.
Unbounded checkpoint growth. _checkpointer = InMemorySaver() is module-level with no eviction, so every distinct thread_id (= session_id) keeps its checkpoint forever. Same correctness concern as the googleadk template: the 128-cap rationale in the PR description doesn't apply here, so this template will leak memory in any long-running process.
A couple of options:
- Switch to a bounded saver (e.g. wrap
InMemorySaverin something that tracks insertion order and calls_checkpointer.delete_thread(thread_id)past 128 entries —InMemorySaverexposes astoragedict you can prune). - Use
SqliteSaver/AsyncSqliteSaverwith a temp file path for the base template (durable, doesn't grow process memory). - At minimum, document the unboundedness in a comment and explicitly tell the reader to swap in a durable checkpointer for production — but a 128-cap to match the other templates is the cleanest fix.
|
|
||
| # Reuses one AssistantAgent per session_id so each session keeps its own | ||
| # in-process conversation history (best-effort; resets on cold start). Caches up | ||
| # to 128 active sessions; the oldest is evicted and its history reset. |
There was a problem hiding this comment.
The eviction policy is FIFO, not LRU, despite the docstring. Plain dict preserves insertion order, and next(iter(_agents)) returns the oldest-inserted key. Reading an existing entry via _agents[session_id] does not move it to the end, so an actively-used session can be evicted while an idle one sticks around.
Two ways to fix:
- Either drop the "LRU" claim from the comment (the policy is FIFO/insertion-order, which is fine if intended), or
- Use
collections.OrderedDictand call_agents.move_to_end(session_id)on every hit so it's actually LRU.
Same issue applies to the new strands base-template cache (src/assets/python/http/strands/base/main.py ~line 433-438), since the PR description there also describes it as "LRU-style eviction".
(Separate, smaller concern on this template: get_or_create_agent isn't concurrency-safe — two concurrent first-invocations for the same new session_id will both fall through the if session_id not in _agents check, both await get_streamable_http_mcp_tools(), and both build an AssistantAgent; the loser's agent is discarded mid-flight. Not catastrophic, but worth an asyncio.Lock around the create path if you want it tight.)
| def get_or_create_agent(session_id{{#if hasSkillsFetcher}}, skill_plugins=None{{/if}}): | ||
| if session_id not in cache: | ||
| if len(cache) >= 128: | ||
| cache.pop(next(iter(cache))) |
There was a problem hiding this comment.
FIFO, not LRU — same as the autogen template. next(iter(cache)) returns the oldest-inserted key, and reading cache[session_id] doesn't update its position. Either change the comment to say "FIFO eviction (oldest by insertion order)" or switch to collections.OrderedDict + cache.move_to_end(session_id) on hit for real LRU behavior.
This affects the no-memory branch only — the hasMemory branch (lines ~385-423) is unchanged here and was already unbounded; you may want to consider applying a cap there too for consistency, but that's out of scope of #808/#809 so feel free to defer.
| # Caches up to 128 active sessions; LRU eviction silently resets history for | ||
| # the oldest session. For production use, replace with a durable session store | ||
| # (e.g. SQLiteSession with a file path). | ||
| @lru_cache(maxsize=128) |
There was a problem hiding this comment.
Not blocking, just confirming: SQLiteSession(session_id) with no db_path uses an in-memory SQLite DB per-instance, so the lru_cache retention semantics here are: "keep up to 128 in-memory SQLite DBs alive; evicted DBs are GC'd and their history is lost on next access." That matches the docstring and is the correct knob for a starter template. ✅
Unlike the autogen/strands dict-based caches, functools.lru_cache is true LRU — accessing an existing entry moves it to most-recently-used — so no FIFO concern here.
… caches true LRU (aws#808, aws#809) The googleadk (InMemorySessionService) and langchain_langgraph (InMemorySaver) templates kept per-session state at module scope with no eviction, trading the cross-session leak for an unbounded memory leak in long-running processes. Bound both to 128 active sessions with LRU eviction (delete_session / delete_thread on the least-recently-used key), matching the other three templates. The autogen and strands no-memory caches were plain dicts evicting by insertion order (FIFO) despite docstrings claiming LRU. Switch them to OrderedDict with move_to_end on hit so the eviction policy is genuinely LRU and the docstrings are accurate. openaiagents already uses functools.lru_cache (true LRU) and is unchanged. Regenerated the assets snapshot. Refs aws#808 Refs aws#809
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Looks good to merge — all the issues raised in the prior review have been addressed in commit 08bfb25:
- googleadk:
_session_serviceis now bounded to 128 active sessions via_session_keysOrderedDict with LRU eviction callingdelete_sessionon the evicted entry. ✅ - langchain_langgraph:
_checkpointeris now bounded to 128 active threads via_thread_idsOrderedDict with LRU eviction callingdelete_thread. ✅ - autogen: Switched from plain
dicttoOrderedDictwithmove_to_endon hit — now true LRU rather than FIFO. ✅ - strands: Same OrderedDict +
move_to_endtreatment for the no-memory cache. ✅ - openaiagents: Unchanged from the prior review —
functools.lru_cachewas already correct.
All five HTTP templates are now consistent: a 128-entry per-process cap, true LRU eviction, and context.session_id threaded through. The snapshot is regenerated to match.
A couple of minor things I noticed but won't block on (carried over from the prior review or out of scope):
- The previously-noted concurrency window in autogen's
get_or_create_agent(two concurrent first-invocations for the same newsession_idboth falling through the miss check) still exists, and the same pattern is now present in strands and googleadk. For a starter template this seems acceptable. - The strands
hasMemorybranch (lines ~387-425) is still an unbounded plain-dictcache, but as the prior reviewer noted that's outside the #808/#809 scope.
No new serious issues found.
aws#808) OrderedDict.popitem() returns (key, value) where key is the (user_id, session_id) tuple and value is True. The previous unpack assigned the key tuple to old_user_id and True to old_session_id, so delete_session() was called with garbage and the real session was never deleted -- the InMemorySessionService grew unbounded past the 128-session cap, defeating the aws#808 bound for GoogleADK (LangChain unpacked correctly). Destructure the key tuple and discard the value so the real ids reach delete_session. Add an asset test asserting the store caps at 128 with correct eviction ids, and regenerate the asset snapshot.
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Re-reviewed the new commit f3dabe9 (added after the prior approval). Looks good to merge.
f3dabe9 is a real fix. Confirmed by reading the rendered template at src/assets/python/http/googleadk/base/main.py:
- The key stored in
_session_keysis a 2-tuple(user_id, session_id)(line 150), with the value alwaysTrue. OrderedDict.popitem(last=False)returns(key, value), so the outer 2-tuple is(tuple_key, True).- The previous form
old_user_id, old_session_id = popitem(...)unpacked the outer pair, assigning the whole inner key tuple toold_user_idandTruetoold_session_id—delete_session(user_id=<tuple>, session_id=True)would have errored or no-op'd, leaving the underlyingInMemorySessionServiceto grow past 128 unbounded. The new form(old_user_id, old_session_id), _ = popitem(...)destructures correctly. - LangChain (line 75:
evicted, _ = _thread_ids.popitem(last=False)) uses a single-string key, so its unpack was already correct — the commit message's parenthetical is accurate. ✅ - The other LRU caches (autogen line 107, strands line 441) just call
cache.popitem(last=False)and discard the result, which is fine since they don't need an async cleanup callback likedelete_session. ✅
Tests: The new googleadk-session-eviction.test.ts has two effective regression guards — the substring assertion for the fixed form and the negative-match guard against the buggy form (lines 59 and 61) — which is exactly what's needed to prevent this from regressing. The simulateEviction portion (lines 27-51, 68-85) is a JS re-implementation of the Python eviction algorithm, so the caps the store at 128 assertion isn't actually exercising the template; it's documenting the intended algorithm. Not blocking, but if you wanted that assertion to mean something, you'd need to actually run the rendered template under Python — not worth the infra for a starter-template asset test, the two string guards are sufficient.
No new serious issues. The unaddressed-but-out-of-scope items from the prior review (concurrency window in get_or_create_agent, unbounded hasMemory strands branch) still stand and are fine to defer.
|
|
||
| # Define an AssistantAgent with the model and tools | ||
| agent = AssistantAgent( | ||
| _agents[session_id] = AssistantAgent( |
There was a problem hiding this comment.
Potential race condition here given the await between the existence check and the dict write. If two concurrent requests arrive for the same session_id, both pass the first check, both yield at the await block, both potentially could end up creating a fresh AssistantAgent. The second write would orphan the request who got the first agent.
Suggested fix: recheck after the await
mcp_tools = await get_streamable_http_mcp_tools()
if session_id not in _agents:
_agents[session_id] = AssistantAgent(...)
There was a problem hiding this comment.
Good catch — fixed in 6ed3afc. Added a re-check after the await get_streamable_http_mcp_tools(): if a concurrent first-invocation for the same session_id already stored an agent, we keep theirs instead of overwriting (then move_to_end + return). No lock needed since the only shared mutation is the dict write, which is now idempotent per session.
There was a problem hiding this comment.
This this intentionally just a binary file?
There was a problem hiding this comment.
Not intentional — the test used literal NUL bytes (\0) as the key separator in the eviction simulation, which made git classify the file as binary. Fixed in 6ed3afc by switching the separator to ::, so it's now plain UTF-8 text and diffs/reviews normally.
…m eviction test - autogen get_or_create_agent: re-check session_id after the await so two concurrent first-invocations don't both build an AssistantAgent (the loser was orphaned). Re-check + move_to_end before returning. - googleadk-session-eviction.test.ts: the key separator used literal NUL bytes, which made git treat the file as binary. Switched to '::' so it is plain text. Addresses review comments from @nborges-aws on aws#1639.
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Looks good to merge — all serious issues have already been raised in the prior reviews and addressed in this branch:
- Bounded stores in all 5 templates (googleadk/langchain previously unbounded) — addressed in 08bfb25 ✅
- True LRU semantics (was FIFO in autogen/strands despite the docstring) — addressed in 08bfb25 with
OrderedDict+move_to_end✅ - googleadk composite-key unpack bug (
(old_user_id, old_session_id), _ = popitem(...)) — addressed in f3dabe9 ✅ - autogen create-race (two concurrent first-invocations for the same new session_id) — addressed in 6ed3afc with a re-check after the await ✅
- Test file binary classification (NUL bytes as separator) — addressed in 6ed3afc ✅
I confirmed the upstream APIs the templates rely on:
langgraph.checkpoint.memory.InMemorySaver.delete_threadexists inlanggraph-checkpoint>=2.1.0,<4.0.0(the range thatlanggraph>=1.0.2inpyproject.tomlresolves to) and is a no-op on missing thread_ids, so the eviction call is safe.google.adk.sessions.InMemorySessionService.delete_sessionis async (correctlyawaited) andcreate_sessionis also async (correctlyawaited).- The strands template's
hasMemorybranch is still an unbounded plain-dictcache, but as the prior reviews noted that's out of #808/#809 scope.
Carried-over deferred items that are fine to leave as-is:
- The same kind of concurrency window the autogen fix addressed still exists for strands and googleadk (
get_or_create_session/get_or_create_agentin strands). For googleadk the loser will getAlreadyExistsErrorfrom_create_session_impl's existence check, which is slightly worse than autogen's silent-discard, but only triggers on a true first-invocation race for the same brand-new session_id — unlikely under AgentCore Runtime microVM stickiness for the same session. Acceptable for a starter template. - Unbounded
hasMemorystrands cache as noted above.
Description
The HTTP starter templates rebuild agent/session state on every invocation, so within a single session there is no multi-turn recall (turn 2 cannot see turn 1) — which contradicts
agentcore invoke --session-id. The Strands no-memory template is worse: it shares one module-level global_agentacross every invocation in a process, so when one warm microVM serves more than one session, conversation history bleeds between sessions (issue #809). AgentCore Runtime routes a given session ID to the same microVM (microVM stickiness) and keeps that compute warm across invocations, so module-level Python state persists for the session lifecycle — but only if the template reuses the same StrandsAgentobject per session. A StrandsAgentaccumulates history onself.messages; recall happens only when the same Agent instance is reused, and a leak happens when one Agent is shared across sessions.This PR keys in-process agent/session state by
context.session_idin all five HTTP templates, bounded to 128 sessions with LRU eviction so a long-lived microVM serving many sessions keeps per-session isolation without growing memory without limit:OrderedDictcache (move_to_endon hit, evict least-recently-used at the cap).@lru_cache(maxsize=128)over a per-sessionSQLiteSession.InMemorySessionServicebounded via an LRU key set, evicting the oldest viadelete_session.InMemorySaverbounded via an LRU thread set, evicting the oldest viadelete_thread.In-memory recall is best-effort: it survives within a session's warm compute but resets on cold start (idle/stop), and
NullConversationManagerdoes not truncate. For durable, cross-restart history, attach AgentCore Memory (thehasMemorytemplate variant) or a managed conversation manager. This was bug-bashed with real deploy + invoke per framework: per-session recall works and there is no cross-session leak on all frameworks.Related Issue
Closes #808
Closes #809
Documentation PR
N/A — starter-template behavior fix; no devguide change required.
Type of Change
Testing
How have you tested the change?
npm run test:unitandnpm run test:integnpm run typechecknpm run lintsrc/assets/, I rannpm run test:update-snapshotsand committed the updated snapshotsAdditional validation: rendered each template across flag combinations and confirmed valid Python via
py_compile; multi-agent bug bash with real deploy + invoke per framework verified within-session recall and cross-session isolation.Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.