fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809) by aidandaly24 · Pull Request #1639 · aws/agentcore-cli

aidandaly24 · 2026-06-25T15:06:57Z

Description

The HTTP starter templates rebuild agent/session state on every invocation, so within a single session there is no multi-turn recall (turn 2 cannot see turn 1) — which contradicts agentcore invoke --session-id. The Strands no-memory template is worse: it shares one module-level global _agent across every invocation in a process, so when one warm microVM serves more than one session, conversation history bleeds between sessions (issue #809). AgentCore Runtime routes a given session ID to the same microVM (microVM stickiness) and keeps that compute warm across invocations, so module-level Python state persists for the session lifecycle — but only if the template reuses the same Strands Agent object per session. A Strands Agent accumulates history on self.messages; recall happens only when the same Agent instance is reused, and a leak happens when one Agent is shared across sessions.

This PR keys in-process agent/session state by context.session_id in all five HTTP templates, bounded to 128 sessions with LRU eviction so a long-lived microVM serving many sessions keeps per-session isolation without growing memory without limit:

strands, autogen — per-session OrderedDict cache (move_to_end on hit, evict least-recently-used at the cap).
openaiagents — @lru_cache(maxsize=128) over a per-session SQLiteSession.
googleadk — module-scoped InMemorySessionService bounded via an LRU key set, evicting the oldest via delete_session.
langchain_langgraph — module-scoped InMemorySaver bounded via an LRU thread set, evicting the oldest via delete_thread.

In-memory recall is best-effort: it survives within a session's warm compute but resets on cold start (idle/stop), and NullConversationManager does not truncate. For durable, cross-restart history, attach AgentCore Memory (the hasMemory template variant) or a managed conversation manager. This was bug-bashed with real deploy + invoke per framework: per-session recall works and there is no cross-session leak on all frameworks.

Related Issue

Closes #808
Closes #809

Documentation PR

N/A — starter-template behavior fix; no devguide change required.

Type of Change

Testing

How have you tested the change?

I ran npm run test:unit and npm run test:integ
I ran npm run typecheck
I ran npm run lint
If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Additional validation: rendered each template across flag combinations and confirmed valid Python via py_compile; multi-agent bug bash with real deploy + invoke per framework verified within-session recall and cross-session isolation.

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

…oss all HTTP starter base templates (aws#808, aws#809) All five HTTP starter base templates (strands, openaiagents, googleadk, langchain_langgraph, autogen) plus the regenerated assets snapshot. Only base/main.py template content changed; the strands hasMemory branch and all other framework files are untouched. Refs aws#808 Refs aws#809

github-actions · 2026-06-25T15:08:02Z

Package Tarball

aws-agentcore-0.21.0.tgz

How to install

gh release download pr-1639-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.0.tgz

agentcore-devx-automation · 2026-06-25T15:08:23Z

Claude Security Review: no high-confidence findings. (run)

agentcore-cli-automation

Thanks for the fix — the per-session memory bug is real and the overall approach (key in-process state by context.session_id) is the right design. Most of the diff looks good; flagging a few correctness/scale issues that I think need to be addressed before merge.

The core concern: the PR description explicitly justifies the 128-entry cap for the openaiagents/autogen/strands templates ("caps process memory and evicted sessions restart fresh"), but the googleadk and langchain_langgraph templates use module-level in-memory stores (InMemorySessionService, InMemorySaver) with no bound at all. That just trades the original cross-session bleed (#809-style) for an unbounded memory leak in any long-running process serving many sessions (e.g. agentcore dev over a long session, or non-Runtime deployments which the PR description specifically calls out as the failure mode of #809). The fix should be consistent across all five templates.

The FIFO-vs-LRU comments inline are smaller but worth correcting since the docstrings claim LRU.

agentcore-cli-automation · 2026-06-25T15:10:29Z

-    session_service = InMemorySessionService()
-    session = await session_service.create_session(
+# Module-level session service and runner (preserves history across invocations)
+_session_service = InMemorySessionService()


Unbounded session growth. _session_service = InMemorySessionService() at module scope retains every (app_name, user_id, session_id) triple forever — there is no eviction path. The PR description argues that the 128-entry cap on the other templates "caps process memory and evicted sessions restart fresh", but this template has no cap, so a long-running process (notably agentcore dev over time, and any non-Runtime deployment) will grow without bound as new session ids arrive.

The ADK InMemorySessionService doesn't expose a public eviction API as far as I can see, so options are roughly:

Wrap session creation in a bounded FIFO/LRU and call _session_service.delete_session(...) on eviction (ADK does expose delete_session).

Keep a collections.OrderedDict of recently-touched (user_id, session_id) keys alongside the service and prune via delete_session once it exceeds N (e.g. 128, to match the other templates).

Subclass / wrap to enforce the cap at create-time.

Whichever you pick, please match the 128-entry behavior already documented for openaiagents/autogen/strands so the fix is consistent across templates.

agentcore-cli-automation · 2026-06-25T15:10:29Z

 tools = [add_numbers]

+# Module-level checkpointer preserves conversation history across invocations
+_checkpointer = InMemorySaver()


Unbounded checkpoint growth. _checkpointer = InMemorySaver() is module-level with no eviction, so every distinct thread_id (= session_id) keeps its checkpoint forever. Same correctness concern as the googleadk template: the 128-cap rationale in the PR description doesn't apply here, so this template will leak memory in any long-running process.

A couple of options:

Switch to a bounded saver (e.g. wrap InMemorySaver in something that tracks insertion order and calls _checkpointer.delete_thread(thread_id) past 128 entries — InMemorySaver exposes a storage dict you can prune).

Use SqliteSaver/AsyncSqliteSaver with a temp file path for the base template (durable, doesn't grow process memory).

At minimum, document the unboundedness in a comment and explicitly tell the reader to swap in a durable checkpointer for production — but a 128-cap to match the other templates is the cleanest fix.

agentcore-cli-automation · 2026-06-25T15:10:29Z


+# Reuses one AssistantAgent per session_id so each session keeps its own
+# in-process conversation history (best-effort; resets on cold start). Caches up
+# to 128 active sessions; the oldest is evicted and its history reset.


The eviction policy is FIFO, not LRU, despite the docstring. Plain dict preserves insertion order, and next(iter(_agents)) returns the oldest-inserted key. Reading an existing entry via _agents[session_id] does not move it to the end, so an actively-used session can be evicted while an idle one sticks around.

Two ways to fix:

Either drop the "LRU" claim from the comment (the policy is FIFO/insertion-order, which is fine if intended), or

Use collections.OrderedDict and call _agents.move_to_end(session_id) on every hit so it's actually LRU.

Same issue applies to the new strands base-template cache (src/assets/python/http/strands/base/main.py ~line 433-438), since the PR description there also describes it as "LRU-style eviction".

(Separate, smaller concern on this template: get_or_create_agent isn't concurrency-safe — two concurrent first-invocations for the same new session_id will both fall through the if session_id not in _agents check, both await get_streamable_http_mcp_tools(), and both build an AssistantAgent; the loser's agent is discarded mid-flight. Not catastrophic, but worth an asyncio.Lock around the create path if you want it tight.)

agentcore-cli-automation · 2026-06-25T15:10:29Z

+    def get_or_create_agent(session_id{{#if hasSkillsFetcher}}, skill_plugins=None{{/if}}):
+        if session_id not in cache:
+            if len(cache) >= 128:
+                cache.pop(next(iter(cache)))


FIFO, not LRU — same as the autogen template. next(iter(cache)) returns the oldest-inserted key, and reading cache[session_id] doesn't update its position. Either change the comment to say "FIFO eviction (oldest by insertion order)" or switch to collections.OrderedDict + cache.move_to_end(session_id) on hit for real LRU behavior.

This affects the no-memory branch only — the hasMemory branch (lines ~385-423) is unchanged here and was already unbounded; you may want to consider applying a cap there too for consistency, but that's out of scope of #808/#809 so feel free to defer.

agentcore-cli-automation · 2026-06-25T15:10:29Z

+# Caches up to 128 active sessions; LRU eviction silently resets history for
+# the oldest session. For production use, replace with a durable session store
+# (e.g. SQLiteSession with a file path).
+@lru_cache(maxsize=128)


Not blocking, just confirming: SQLiteSession(session_id) with no db_path uses an in-memory SQLite DB per-instance, so the lru_cache retention semantics here are: "keep up to 128 in-memory SQLite DBs alive; evicted DBs are GC'd and their history is lost on next access." That matches the docstring and is the correct knob for a starter template. ✅

Unlike the autogen/strands dict-based caches, functools.lru_cache is true LRU — accessing an existing entry moves it to most-recently-used — so no FIFO concern here.

… caches true LRU (aws#808, aws#809) The googleadk (InMemorySessionService) and langchain_langgraph (InMemorySaver) templates kept per-session state at module scope with no eviction, trading the cross-session leak for an unbounded memory leak in long-running processes. Bound both to 128 active sessions with LRU eviction (delete_session / delete_thread on the least-recently-used key), matching the other three templates. The autogen and strands no-memory caches were plain dicts evicting by insertion order (FIFO) despite docstrings claiming LRU. Switch them to OrderedDict with move_to_end on hit so the eviction policy is genuinely LRU and the docstrings are accurate. openaiagents already uses functools.lru_cache (true LRU) and is unchanged. Regenerated the assets snapshot. Refs aws#808 Refs aws#809

agentcore-devx-automation · 2026-06-25T15:23:31Z

Claude Security Review: no high-confidence findings. (run)

agentcore-cli-automation

Looks good to merge — all the issues raised in the prior review have been addressed in commit 08bfb25:

googleadk: _session_service is now bounded to 128 active sessions via _session_keys OrderedDict with LRU eviction calling delete_session on the evicted entry. ✅
langchain_langgraph: _checkpointer is now bounded to 128 active threads via _thread_ids OrderedDict with LRU eviction calling delete_thread. ✅
autogen: Switched from plain dict to OrderedDict with move_to_end on hit — now true LRU rather than FIFO. ✅
strands: Same OrderedDict + move_to_end treatment for the no-memory cache. ✅
openaiagents: Unchanged from the prior review — functools.lru_cache was already correct.

All five HTTP templates are now consistent: a 128-entry per-process cap, true LRU eviction, and context.session_id threaded through. The snapshot is regenerated to match.

A couple of minor things I noticed but won't block on (carried over from the prior review or out of scope):

The previously-noted concurrency window in autogen's get_or_create_agent (two concurrent first-invocations for the same new session_id both falling through the miss check) still exists, and the same pattern is now present in strands and googleadk. For a starter template this seems acceptable.
The strands hasMemory branch (lines ~387-425) is still an unbounded plain-dict cache, but as the prior reviewer noted that's outside the #808/#809 scope.

No new serious issues found.

aws#808) OrderedDict.popitem() returns (key, value) where key is the (user_id, session_id) tuple and value is True. The previous unpack assigned the key tuple to old_user_id and True to old_session_id, so delete_session() was called with garbage and the real session was never deleted -- the InMemorySessionService grew unbounded past the 128-session cap, defeating the aws#808 bound for GoogleADK (LangChain unpacked correctly). Destructure the key tuple and discard the value so the real ids reach delete_session. Add an asset test asserting the store caps at 128 with correct eviction ids, and regenerate the asset snapshot.

agentcore-devx-automation · 2026-06-25T16:13:41Z

Claude Security Review: no high-confidence findings. (run)

agentcore-cli-automation

Re-reviewed the new commit f3dabe9 (added after the prior approval). Looks good to merge.

f3dabe9 is a real fix. Confirmed by reading the rendered template at src/assets/python/http/googleadk/base/main.py:

The key stored in _session_keys is a 2-tuple (user_id, session_id) (line 150), with the value always True.
OrderedDict.popitem(last=False) returns (key, value), so the outer 2-tuple is (tuple_key, True).
The previous form old_user_id, old_session_id = popitem(...) unpacked the outer pair, assigning the whole inner key tuple to old_user_id and True to old_session_id — delete_session(user_id=<tuple>, session_id=True) would have errored or no-op'd, leaving the underlying InMemorySessionService to grow past 128 unbounded. The new form (old_user_id, old_session_id), _ = popitem(...) destructures correctly.
LangChain (line 75: evicted, _ = _thread_ids.popitem(last=False)) uses a single-string key, so its unpack was already correct — the commit message's parenthetical is accurate. ✅
The other LRU caches (autogen line 107, strands line 441) just call cache.popitem(last=False) and discard the result, which is fine since they don't need an async cleanup callback like delete_session. ✅

Tests: The new googleadk-session-eviction.test.ts has two effective regression guards — the substring assertion for the fixed form and the negative-match guard against the buggy form (lines 59 and 61) — which is exactly what's needed to prevent this from regressing. The simulateEviction portion (lines 27-51, 68-85) is a JS re-implementation of the Python eviction algorithm, so the caps the store at 128 assertion isn't actually exercising the template; it's documenting the intended algorithm. Not blocking, but if you wanted that assertion to mean something, you'd need to actually run the rendered template under Python — not worth the infra for a starter-template asset test, the two string guards are sufficient.

No new serious issues. The unaddressed-but-out-of-scope items from the prior review (concurrency window in get_or_create_agent, unbounded hasMemory strands branch) still stand and are fine to defer.

nborges-aws · 2026-06-26T17:49:36Z

-
-    # Define an AssistantAgent with the model and tools
-    agent = AssistantAgent(
+    _agents[session_id] = AssistantAgent(


Potential race condition here given the await between the existence check and the dict write. If two concurrent requests arrive for the same session_id, both pass the first check, both yield at the await block, both potentially could end up creating a fresh AssistantAgent. The second write would orphan the request who got the first agent.

Suggested fix: recheck after the await

mcp_tools = await get_streamable_http_mcp_tools() if session_id not in _agents: _agents[session_id] = AssistantAgent(...)

Good catch — fixed in 6ed3afc. Added a re-check after the await get_streamable_http_mcp_tools(): if a concurrent first-invocation for the same session_id already stored an agent, we keep theirs instead of overwriting (then move_to_end + return). No lock needed since the only shared mutation is the dict write, which is now idempotent per session.

nborges-aws · 2026-06-26T17:49:39Z

This this intentionally just a binary file?

Not intentional — the test used literal NUL bytes (\0) as the key separator in the eviction simulation, which made git classify the file as binary. Fixed in 6ed3afc by switching the separator to ::, so it's now plain UTF-8 text and diffs/reviews normally.

@nborges-aws

…m eviction test - autogen get_or_create_agent: re-check session_id after the await so two concurrent first-invocations don't both build an AssistantAgent (the loser was orphaned). Re-check + move_to_end before returning. - googleadk-session-eviction.test.ts: the key separator used literal NUL bytes, which made git treat the file as binary. Switched to '::' so it is plain text. Addresses review comments from @nborges-aws on aws#1639.

agentcore-devx-automation · 2026-06-26T18:11:06Z

Claude Security Review: no high-confidence findings. (run)

agentcore-cli-automation

Looks good to merge — all serious issues have already been raised in the prior reviews and addressed in this branch:

Bounded stores in all 5 templates (googleadk/langchain previously unbounded) — addressed in 08bfb25 ✅
True LRU semantics (was FIFO in autogen/strands despite the docstring) — addressed in 08bfb25 with OrderedDict + move_to_end ✅
googleadk composite-key unpack bug ((old_user_id, old_session_id), _ = popitem(...)) — addressed in f3dabe9 ✅
autogen create-race (two concurrent first-invocations for the same new session_id) — addressed in 6ed3afc with a re-check after the await ✅
Test file binary classification (NUL bytes as separator) — addressed in 6ed3afc ✅

I confirmed the upstream APIs the templates rely on:

langgraph.checkpoint.memory.InMemorySaver.delete_thread exists in langgraph-checkpoint>=2.1.0,<4.0.0 (the range that langgraph>=1.0.2 in pyproject.toml resolves to) and is a no-op on missing thread_ids, so the eviction call is safe.
google.adk.sessions.InMemorySessionService.delete_session is async (correctly awaited) and create_session is also async (correctly awaited).
The strands template's hasMemory branch is still an unbounded plain-dict cache, but as the prior reviews noted that's out of #808/#809 scope.

Carried-over deferred items that are fine to leave as-is:

The same kind of concurrency window the autogen fix addressed still exists for strands and googleadk (get_or_create_session / get_or_create_agent in strands). For googleadk the loser will get AlreadyExistsError from _create_session_impl's existence check, which is slightly worse than autogen's silent-discard, but only triggers on a true first-invocation race for the same brand-new session_id — unlikely under AgentCore Runtime microVM stickiness for the same session. Acceptable for a starter template.
Unbounded hasMemory strands cache as noted above.

github-actions Bot added the size/m PR size: M label Jun 25, 2026

aidandaly24 temporarily deployed to e2e-testing June 25, 2026 15:07 — with GitHub Actions Inactive

github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

agentcore-cli-automation suggested changes Jun 25, 2026

View reviewed changes

github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026

aidandaly24 changed the title ~~fix(templates): Make each HTTP framework template reuse per-session conversa (#808, #809)~~ fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809) Jun 25, 2026

github-actions Bot removed the size/m PR size: M label Jun 25, 2026

aidandaly24 temporarily deployed to e2e-testing June 25, 2026 15:22 — with GitHub Actions Inactive

github-actions Bot added the size/m PR size: M label Jun 25, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

agentcore-cli-automation approved these changes Jun 25, 2026

View reviewed changes

github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026

github-actions Bot added size/m PR size: M and removed size/m PR size: M labels Jun 25, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

aidandaly24 temporarily deployed to e2e-testing June 25, 2026 16:12 — with GitHub Actions Inactive

github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

agentcore-cli-automation approved these changes Jun 25, 2026

View reviewed changes

github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026

aidandaly24 marked this pull request as ready for review June 25, 2026 16:42

aidandaly24 requested a review from a team June 25, 2026 16:42

aidandaly24 mentioned this pull request Jun 25, 2026

feat(templates): add conversation history persistence to HTTP agent templates #794

Closed

14 tasks

nborges-aws reviewed Jun 26, 2026

View reviewed changes

github-actions Bot removed the size/m PR size: M label Jun 26, 2026

aidandaly24 temporarily deployed to e2e-testing June 26, 2026 18:09 — with GitHub Actions Inactive

github-actions Bot added size/m PR size: M agentcore-harness-reviewing AgentCore Harness review in progress labels Jun 26, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 26, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 26, 2026

agentcore-cli-automation approved these changes Jun 26, 2026

View reviewed changes

github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 26, 2026

nborges-aws approved these changes Jun 26, 2026

View reviewed changes

aidandaly24 merged commit d791bfa into aws:main Jun 26, 2026
32 checks passed

Uh oh!

Conversation

aidandaly24 commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

github-actions Bot commented Jun 25, 2026

Package Tarball

How to install

Uh oh!

agentcore-devx-automation Bot commented Jun 25, 2026

Uh oh!

agentcore-cli-automation left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agentcore-devx-automation Bot commented Jun 25, 2026

Uh oh!

agentcore-cli-automation left a comment

Choose a reason for hiding this comment

Uh oh!

agentcore-devx-automation Bot commented Jun 25, 2026

Uh oh!

agentcore-cli-automation left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agentcore-devx-automation Bot commented Jun 26, 2026

Uh oh!

agentcore-cli-automation left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aidandaly24 commented Jun 25, 2026 •

edited

Loading