Skip to content

fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809)#1639

Merged
aidandaly24 merged 4 commits into
aws:mainfrom
aidandaly24:fix/808-809
Jun 26, 2026
Merged

fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809)#1639
aidandaly24 merged 4 commits into
aws:mainfrom
aidandaly24:fix/808-809

Conversation

@aidandaly24

@aidandaly24 aidandaly24 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Description

The HTTP starter templates rebuild agent/session state on every invocation, so within a single session there is no multi-turn recall (turn 2 cannot see turn 1) — which contradicts agentcore invoke --session-id. The Strands no-memory template is worse: it shares one module-level global _agent across every invocation in a process, so when one warm microVM serves more than one session, conversation history bleeds between sessions (issue #809). AgentCore Runtime routes a given session ID to the same microVM (microVM stickiness) and keeps that compute warm across invocations, so module-level Python state persists for the session lifecycle — but only if the template reuses the same Strands Agent object per session. A Strands Agent accumulates history on self.messages; recall happens only when the same Agent instance is reused, and a leak happens when one Agent is shared across sessions.

This PR keys in-process agent/session state by context.session_id in all five HTTP templates, bounded to 128 sessions with LRU eviction so a long-lived microVM serving many sessions keeps per-session isolation without growing memory without limit:

  • strands, autogen — per-session OrderedDict cache (move_to_end on hit, evict least-recently-used at the cap).
  • openaiagents@lru_cache(maxsize=128) over a per-session SQLiteSession.
  • googleadk — module-scoped InMemorySessionService bounded via an LRU key set, evicting the oldest via delete_session.
  • langchain_langgraph — module-scoped InMemorySaver bounded via an LRU thread set, evicting the oldest via delete_thread.

In-memory recall is best-effort: it survives within a session's warm compute but resets on cold start (idle/stop), and NullConversationManager does not truncate. For durable, cross-restart history, attach AgentCore Memory (the hasMemory template variant) or a managed conversation manager. This was bug-bashed with real deploy + invoke per framework: per-session recall works and there is no cross-session leak on all frameworks.

Related Issue

Closes #808
Closes #809

Documentation PR

N/A — starter-template behavior fix; no devguide change required.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Additional validation: rendered each template across flag combinations and confirmed valid Python via py_compile; multi-agent bug bash with real deploy + invoke per framework verified within-session recall and cross-session isolation.

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

…oss all HTTP starter base templates (aws#808, aws#809)

All five HTTP starter base templates (strands, openaiagents, googleadk, langchain_langgraph, autogen) plus the regenerated assets snapshot. Only base/main.py template content changed; the strands hasMemory branch and all other framework files are untouched.

Refs aws#808
Refs aws#809
@github-actions github-actions Bot added the size/m PR size: M label Jun 25, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.21.0.tgz

How to install

gh release download pr-1639-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.0.tgz

@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix — the per-session memory bug is real and the overall approach (key in-process state by context.session_id) is the right design. Most of the diff looks good; flagging a few correctness/scale issues that I think need to be addressed before merge.

The core concern: the PR description explicitly justifies the 128-entry cap for the openaiagents/autogen/strands templates ("caps process memory and evicted sessions restart fresh"), but the googleadk and langchain_langgraph templates use module-level in-memory stores (InMemorySessionService, InMemorySaver) with no bound at all. That just trades the original cross-session bleed (#809-style) for an unbounded memory leak in any long-running process serving many sessions (e.g. agentcore dev over a long session, or non-Runtime deployments which the PR description specifically calls out as the failure mode of #809). The fix should be consistent across all five templates.

The FIFO-vs-LRU comments inline are smaller but worth correcting since the docstrings claim LRU.

session_service = InMemorySessionService()
session = await session_service.create_session(
# Module-level session service and runner (preserves history across invocations)
_session_service = InMemorySessionService()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded session growth. _session_service = InMemorySessionService() at module scope retains every (app_name, user_id, session_id) triple forever — there is no eviction path. The PR description argues that the 128-entry cap on the other templates "caps process memory and evicted sessions restart fresh", but this template has no cap, so a long-running process (notably agentcore dev over time, and any non-Runtime deployment) will grow without bound as new session ids arrive.

The ADK InMemorySessionService doesn't expose a public eviction API as far as I can see, so options are roughly:

  1. Wrap session creation in a bounded FIFO/LRU and call _session_service.delete_session(...) on eviction (ADK does expose delete_session).
  2. Keep a collections.OrderedDict of recently-touched (user_id, session_id) keys alongside the service and prune via delete_session once it exceeds N (e.g. 128, to match the other templates).
  3. Subclass / wrap to enforce the cap at create-time.

Whichever you pick, please match the 128-entry behavior already documented for openaiagents/autogen/strands so the fix is consistent across templates.

tools = [add_numbers]

# Module-level checkpointer preserves conversation history across invocations
_checkpointer = InMemorySaver()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded checkpoint growth. _checkpointer = InMemorySaver() is module-level with no eviction, so every distinct thread_id (= session_id) keeps its checkpoint forever. Same correctness concern as the googleadk template: the 128-cap rationale in the PR description doesn't apply here, so this template will leak memory in any long-running process.

A couple of options:

  1. Switch to a bounded saver (e.g. wrap InMemorySaver in something that tracks insertion order and calls _checkpointer.delete_thread(thread_id) past 128 entries — InMemorySaver exposes a storage dict you can prune).
  2. Use SqliteSaver/AsyncSqliteSaver with a temp file path for the base template (durable, doesn't grow process memory).
  3. At minimum, document the unboundedness in a comment and explicitly tell the reader to swap in a durable checkpointer for production — but a 128-cap to match the other templates is the cleanest fix.


# Reuses one AssistantAgent per session_id so each session keeps its own
# in-process conversation history (best-effort; resets on cold start). Caches up
# to 128 active sessions; the oldest is evicted and its history reset.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eviction policy is FIFO, not LRU, despite the docstring. Plain dict preserves insertion order, and next(iter(_agents)) returns the oldest-inserted key. Reading an existing entry via _agents[session_id] does not move it to the end, so an actively-used session can be evicted while an idle one sticks around.

Two ways to fix:

  1. Either drop the "LRU" claim from the comment (the policy is FIFO/insertion-order, which is fine if intended), or
  2. Use collections.OrderedDict and call _agents.move_to_end(session_id) on every hit so it's actually LRU.

Same issue applies to the new strands base-template cache (src/assets/python/http/strands/base/main.py ~line 433-438), since the PR description there also describes it as "LRU-style eviction".

(Separate, smaller concern on this template: get_or_create_agent isn't concurrency-safe — two concurrent first-invocations for the same new session_id will both fall through the if session_id not in _agents check, both await get_streamable_http_mcp_tools(), and both build an AssistantAgent; the loser's agent is discarded mid-flight. Not catastrophic, but worth an asyncio.Lock around the create path if you want it tight.)

def get_or_create_agent(session_id{{#if hasSkillsFetcher}}, skill_plugins=None{{/if}}):
if session_id not in cache:
if len(cache) >= 128:
cache.pop(next(iter(cache)))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIFO, not LRU — same as the autogen template. next(iter(cache)) returns the oldest-inserted key, and reading cache[session_id] doesn't update its position. Either change the comment to say "FIFO eviction (oldest by insertion order)" or switch to collections.OrderedDict + cache.move_to_end(session_id) on hit for real LRU behavior.

This affects the no-memory branch only — the hasMemory branch (lines ~385-423) is unchanged here and was already unbounded; you may want to consider applying a cap there too for consistency, but that's out of scope of #808/#809 so feel free to defer.

# Caches up to 128 active sessions; LRU eviction silently resets history for
# the oldest session. For production use, replace with a durable session store
# (e.g. SQLiteSession with a file path).
@lru_cache(maxsize=128)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, just confirming: SQLiteSession(session_id) with no db_path uses an in-memory SQLite DB per-instance, so the lru_cache retention semantics here are: "keep up to 128 in-memory SQLite DBs alive; evicted DBs are GC'd and their history is lost on next access." That matches the docstring and is the correct knob for a starter template. ✅

Unlike the autogen/strands dict-based caches, functools.lru_cache is true LRU — accessing an existing entry moves it to most-recently-used — so no FIFO concern here.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@aidandaly24 aidandaly24 changed the title fix(templates): Make each HTTP framework template reuse per-session conversa (#808, #809) fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809) Jun 25, 2026
… caches true LRU (aws#808, aws#809)

The googleadk (InMemorySessionService) and langchain_langgraph (InMemorySaver)
templates kept per-session state at module scope with no eviction, trading the
cross-session leak for an unbounded memory leak in long-running processes. Bound
both to 128 active sessions with LRU eviction (delete_session / delete_thread on
the least-recently-used key), matching the other three templates.

The autogen and strands no-memory caches were plain dicts evicting by insertion
order (FIFO) despite docstrings claiming LRU. Switch them to OrderedDict with
move_to_end on hit so the eviction policy is genuinely LRU and the docstrings are
accurate. openaiagents already uses functools.lru_cache (true LRU) and is
unchanged.

Regenerated the assets snapshot.

Refs aws#808
Refs aws#809
@github-actions github-actions Bot removed the size/m PR size: M label Jun 25, 2026
@github-actions github-actions Bot added the size/m PR size: M label Jun 25, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge — all the issues raised in the prior review have been addressed in commit 08bfb25:

  • googleadk: _session_service is now bounded to 128 active sessions via _session_keys OrderedDict with LRU eviction calling delete_session on the evicted entry. ✅
  • langchain_langgraph: _checkpointer is now bounded to 128 active threads via _thread_ids OrderedDict with LRU eviction calling delete_thread. ✅
  • autogen: Switched from plain dict to OrderedDict with move_to_end on hit — now true LRU rather than FIFO. ✅
  • strands: Same OrderedDict + move_to_end treatment for the no-memory cache. ✅
  • openaiagents: Unchanged from the prior review — functools.lru_cache was already correct.

All five HTTP templates are now consistent: a 128-entry per-process cap, true LRU eviction, and context.session_id threaded through. The snapshot is regenerated to match.

A couple of minor things I noticed but won't block on (carried over from the prior review or out of scope):

  • The previously-noted concurrency window in autogen's get_or_create_agent (two concurrent first-invocations for the same new session_id both falling through the miss check) still exists, and the same pattern is now present in strands and googleadk. For a starter template this seems acceptable.
  • The strands hasMemory branch (lines ~387-425) is still an unbounded plain-dict cache, but as the prior reviewer noted that's outside the #808/#809 scope.

No new serious issues found.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
aws#808)

OrderedDict.popitem() returns (key, value) where key is the (user_id,
session_id) tuple and value is True. The previous unpack assigned the key
tuple to old_user_id and True to old_session_id, so delete_session() was
called with garbage and the real session was never deleted -- the
InMemorySessionService grew unbounded past the 128-session cap, defeating
the aws#808 bound for GoogleADK (LangChain unpacked correctly).

Destructure the key tuple and discard the value so the real ids reach
delete_session. Add an asset test asserting the store caps at 128 with
correct eviction ids, and regenerate the asset snapshot.
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels Jun 25, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the new commit f3dabe9 (added after the prior approval). Looks good to merge.

f3dabe9 is a real fix. Confirmed by reading the rendered template at src/assets/python/http/googleadk/base/main.py:

  • The key stored in _session_keys is a 2-tuple (user_id, session_id) (line 150), with the value always True.
  • OrderedDict.popitem(last=False) returns (key, value), so the outer 2-tuple is (tuple_key, True).
  • The previous form old_user_id, old_session_id = popitem(...) unpacked the outer pair, assigning the whole inner key tuple to old_user_id and True to old_session_iddelete_session(user_id=<tuple>, session_id=True) would have errored or no-op'd, leaving the underlying InMemorySessionService to grow past 128 unbounded. The new form (old_user_id, old_session_id), _ = popitem(...) destructures correctly.
  • LangChain (line 75: evicted, _ = _thread_ids.popitem(last=False)) uses a single-string key, so its unpack was already correct — the commit message's parenthetical is accurate. ✅
  • The other LRU caches (autogen line 107, strands line 441) just call cache.popitem(last=False) and discard the result, which is fine since they don't need an async cleanup callback like delete_session. ✅

Tests: The new googleadk-session-eviction.test.ts has two effective regression guards — the substring assertion for the fixed form and the negative-match guard against the buggy form (lines 59 and 61) — which is exactly what's needed to prevent this from regressing. The simulateEviction portion (lines 27-51, 68-85) is a JS re-implementation of the Python eviction algorithm, so the caps the store at 128 assertion isn't actually exercising the template; it's documenting the intended algorithm. Not blocking, but if you wanted that assertion to mean something, you'd need to actually run the rendered template under Python — not worth the infra for a starter-template asset test, the two string guards are sufficient.

No new serious issues. The unaddressed-but-out-of-scope items from the prior review (concurrency window in get_or_create_agent, unbounded hasMemory strands branch) still stand and are fine to defer.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@aidandaly24 aidandaly24 marked this pull request as ready for review June 25, 2026 16:42
@aidandaly24 aidandaly24 requested a review from a team June 25, 2026 16:42

# Define an AssistantAgent with the model and tools
agent = AssistantAgent(
_agents[session_id] = AssistantAgent(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition here given the await between the existence check and the dict write. If two concurrent requests arrive for the same session_id, both pass the first check, both yield at the await block, both potentially could end up creating a fresh AssistantAgent. The second write would orphan the request who got the first agent.

Suggested fix: recheck after the await

  mcp_tools = await get_streamable_http_mcp_tools()
  if session_id not in _agents:
      _agents[session_id] = AssistantAgent(...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 6ed3afc. Added a re-check after the await get_streamable_http_mcp_tools(): if a concurrent first-invocation for the same session_id already stored an agent, we keep theirs instead of overwriting (then move_to_end + return). No lock needed since the only shared mutation is the dict write, which is now idempotent per session.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This this intentionally just a binary file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not intentional — the test used literal NUL bytes (\0) as the key separator in the eviction simulation, which made git classify the file as binary. Fixed in 6ed3afc by switching the separator to ::, so it's now plain UTF-8 text and diffs/reviews normally.

…m eviction test

- autogen get_or_create_agent: re-check session_id after the await so two
  concurrent first-invocations don't both build an AssistantAgent (the loser
  was orphaned). Re-check + move_to_end before returning.
- googleadk-session-eviction.test.ts: the key separator used literal NUL bytes,
  which made git treat the file as binary. Switched to '::' so it is plain text.

Addresses review comments from @nborges-aws on aws#1639.
@github-actions github-actions Bot removed the size/m PR size: M label Jun 26, 2026
@github-actions github-actions Bot added size/m PR size: M agentcore-harness-reviewing AgentCore Harness review in progress labels Jun 26, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 26, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 26, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge — all serious issues have already been raised in the prior reviews and addressed in this branch:

  • Bounded stores in all 5 templates (googleadk/langchain previously unbounded) — addressed in 08bfb25
  • True LRU semantics (was FIFO in autogen/strands despite the docstring) — addressed in 08bfb25 with OrderedDict + move_to_end
  • googleadk composite-key unpack bug ((old_user_id, old_session_id), _ = popitem(...)) — addressed in f3dabe9
  • autogen create-race (two concurrent first-invocations for the same new session_id) — addressed in 6ed3afc with a re-check after the await ✅
  • Test file binary classification (NUL bytes as separator) — addressed in 6ed3afc

I confirmed the upstream APIs the templates rely on:

  • langgraph.checkpoint.memory.InMemorySaver.delete_thread exists in langgraph-checkpoint>=2.1.0,<4.0.0 (the range that langgraph>=1.0.2 in pyproject.toml resolves to) and is a no-op on missing thread_ids, so the eviction call is safe.
  • google.adk.sessions.InMemorySessionService.delete_session is async (correctly awaited) and create_session is also async (correctly awaited).
  • The strands template's hasMemory branch is still an unbounded plain-dict cache, but as the prior reviews noted that's out of #808/#809 scope.

Carried-over deferred items that are fine to leave as-is:

  • The same kind of concurrency window the autogen fix addressed still exists for strands and googleadk (get_or_create_session / get_or_create_agent in strands). For googleadk the loser will get AlreadyExistsError from _create_session_impl's existence check, which is slightly worse than autogen's silent-discard, but only triggers on a true first-invocation race for the same brand-new session_id — unlikely under AgentCore Runtime microVM stickiness for the same session. Acceptable for a starter template.
  • Unbounded hasMemory strands cache as noted above.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 26, 2026
@aidandaly24 aidandaly24 merged commit d791bfa into aws:main Jun 26, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

3 participants