feat(binding-mcp): mcp · proxy cache option (#1737)#1774
Merged
Conversation
Adds the test-first scaffold for the upcoming mcp cache binding: McpCacheIT skeleton with all 22 planned test methods (Group A warmup tests active, Groups B/C/D/F/G/I marked @ignore until their scripts land), four IT zilla.yaml configs, schema patch entries for kind: cache and options.warmup/ttl, and six fully-written Group A warmup .rpt scripts (lifecycle, tools/list, resources/list, prompts/list, lifecycle-persists, guarded-credentials). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds the missing client.rpt for each Group A cache warmup scenario and a CacheIT class in the spec project that runs every script pair peer-to-peer without Zilla. Verifies the scripts are self-consistent before any cache binding implementation exists. Verified locally: ./mvnw -pl specs/binding-mcp.spec verify -Dit.test=CacheIT runs all 6 tests to green. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…pts (#1737) Splits the monolithic CacheIT/McpCacheIT into per-group classes (*WarmupIT, *ListIT, etc.) to keep each group focused as the script set grows. Group A (warmup) renamed to *WarmupIT. Adds Group B (list operations served from cache) — 4 scenarios: agent initialize, tools/list, resources/list, prompts/list. Each scenario carries paired client.rpt (agent at app0) and server.rpt (cache facade) so CacheListIT can verify the agent↔cache contract peer-to-peer. McpCacheListIT pairs the agent client.rpt with the Group A warmup server.rpt at app1 so the cache is populated before the agent's list arrives. B5 (list-before-warmup) stays @ignored with a script TODO. Verified locally: CacheWarmupIT 6/6 + CacheListIT 4/4 green peer-to-peer. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…#1737) The cache binding is one hop above the proxy; proxy fanout across exits is already covered by McpProxyIT (shouldList*WithToolkitMulti). The only cache-specific concern in the original Group D was resilience to a downstream error during warmup, which is more naturally a Group A (warmup) scenario than a fanout one. Adds cache.warmup.session.downstream.error: downstream ABORTs the tools/list stream during warmup; the lifecycle session must survive so the cache can continue with other list types or retry later. CacheWarmupIT 7/7 green peer-to-peer. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Renames/splits the CacheWarmupIT and CacheListIT pairs (peer-to-peer
and engine-driven) into method-scoped ITs aligned with the issue's
Responsibilities-by-MCP-method table:
CacheLifecycleIT — warmup session open/persist/error/guard,
agent initialize-from-cache
CacheToolsListIT — tools/list warmup + served-from-cache
CacheResourcesListIT — resources/list warmup + served-from-cache
CachePromptsListIT — prompts/list warmup + served-from-cache
Same applies to the McpCache* counterparts. No script files moved;
only the IT-class references changed. All 11 peer-to-peer tests
verified green.
Remaining roster slots (CacheToolsCallIT/ResourcesReadIT/PromptsGetIT
for pass-through invocations, and the per-list-method store/refresh
coverage) will be added as their scripts come online.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Replaces the separate kind: cache binding with options.cache on the
existing kind: proxy binding. Since all original two-binding topologies
have an exact equivalent in the folded model, no expressiveness is lost
and the engine "kind: cache" prerequisite goes away.
Also:
- Renames "warmup" → "hydrate" throughout (more precise terminology
for cache population).
- Flattens the warmup wrapper: authorization/store/ttl now live
directly under options.cache instead of options.cache.warmup.
- Drops the guarded vs. unguarded test split — the .rpt scripts don't
observe authorization on BEGIN so the scenarios produce identical
transcripts.
- Renames IT classes: CacheXIT → ProxyCacheXIT and McpCacheXIT →
McpProxyCacheXIT to align with the existing McpProxyIT neighbour.
- Configs rewritten: cache.yaml/cache.multi.yaml/cache.refresh.yaml
→ proxy.cache.yaml/proxy.cache.multi.yaml/proxy.cache.refresh.yaml;
each declares kind: proxy with options.cache and a stores: memory0
reference.
- Schema patch: drop "cache" from kind enum; replace flat
options.warmup/options.ttl with options.cache {store, ttl,
authorization}; store is required.
Verified: 10/10 ProxyCache*IT peer-to-peer tests pass.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Renames scenarios and test methods to use "hydrate" / "serve" as verbs rather than as noun-modifiers. Examples: cache.hydrate.session.tools.list → cache.hydrate.tools cache.agent.tools.list.from.cache → cache.serve.tools.list shouldPopulateToolsViaHydrate → shouldHydrateTools shouldServeAgentToolsListFromCache → shouldServeToolsList shouldKeepHydrateSessionOpenAfter... → shouldHydratePersist Verified: 10/10 ProxyCache*IT peer-to-peer tests pass. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds per-method refresh scenarios that model the cache re-issuing a list call on the same hydrate session after a TTL elapses, plus a refresh-error case where the refresh attempt aborts and the cache must retain its prior cached entry. Scenarios: cache.refresh.tools / cache.refresh.resources / cache.refresh.prompts cache.refresh.tools.error Tests: ProxyCacheToolsListIT.shouldRefreshTools ProxyCacheToolsListIT.shouldRefreshToolsError ProxyCacheResourcesListIT.shouldRefreshResources ProxyCachePromptsListIT.shouldRefreshPrompts (engine-driven counterparts added too) Also renames cache.hydrate.downstream.error -> cache.hydrate.error (and shouldHydrateDownstreamError -> shouldHydrateError) for the single-qualifier convention. Lease-contention coverage (cache.refresh.tools.contended) is deferred: the lease behavior is store-level, only meaningfully testable engine-driven via either a TestStore seeding hook or a multi-worker EngineRule. Both are downstream work from the cache binding implementation; the wire-level refresh tests above cover the protocol shape. Verified: 14/14 ProxyCache*IT peer-to-peer tests pass. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds three scenarios where the agent's list request arrives while the cache is still hydrating. The cache facade holds the request read in full, then waits for a per-method hydrate barrier to fire before writing the cached response. True timing isn't observable peer-to-peer, but the scripts document the required wire ordering for engine-driven tests to enforce. Scenarios + tests: cache.serve.tools.list.hydrating shouldServeToolsListHydrating cache.serve.resources.list.hydrating shouldServeResourcesListHydrating cache.serve.prompts.list.hydrating shouldServePromptsListHydrating Verified: 17/17 ProxyCache*IT peer-to-peer tests pass. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds TestStoreOptionsConfig with a Map<String,String> entries
field, exposed via the standard options-config builder/adapter
pattern, and wires it through TestStoreContext so each new
TestStoreHandler is pre-populated with the configured entries.
Enables tests to set up store state declaratively before a binding
that uses the store begins operating — e.g., seeding a lease lock
key so a binding observes a held lease and exercises its
already-locked code path deterministically.
Example:
stores:
memory0:
type: test
options:
entries:
tools.lock: "worker-0"
Verified: engine unit tests (320/320), engine.spec ITs (39/39),
and binding-mcp.spec ProxyCache*IT (17/17) all pass.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds proxy.cache.contended.yaml — a TestStore configured with tools.lock pre-seeded to a foreign worker id — and a corresponding McpProxyCacheToolsListIT.shouldRefreshToolsContended test method (currently @ignore'd until the cache binding implementation lands). When enabled, the test verifies that the cache binding consults the lease before issuing a refresh tools/list: with the lock held in the store, putIfAbsent returns the seeded value and the refresh path is skipped. The downstream server.rpt models only the hydrate exchange, so any spurious refresh tools/list would hit an unmatched stream and fail the test. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…#1737) Replaces the TestStore-seeded contention approach with a more faithful multi-worker engine test: - cache.refresh.tools.contended/{client,server}.rpt — models two hydrate sessions, exactly two tools/list calls on the wire (one initial hydrate by lease-winner, one refresh by lease-winner). Second worker's lifecycle is observed but its tools/list never hits the wire because the lease was lost. - McpProxyCacheContentionIT — new engine-driven IT class configured with ENGINE_WORKERS=2 so both cache binding instances genuinely race for the hydrate / refresh leases against the shared store-memory. @ignore'd until cache binding implementation lands. - ProxyCacheToolsListIT.shouldRefreshToolsContended — peer-to-peer counterpart added to the existing list IT. - proxy.cache.contended.yaml dropped — no longer needed (the test uses proxy.cache.refresh.yaml which already references store-memory). Verified: 18/18 ProxyCache*IT peer-to-peer tests pass (was 17, added shouldRefreshToolsContended). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
jfallows
commented
May 16, 2026
- proxy.cache.yaml / proxy.cache.refresh.yaml: use binding-level exit instead of single-element routes for the single-exit case. - proxy.cache.multi.yaml → proxy.cache.toolkit.yaml. - Remove explanatory # comments from scripts; scenarios + script bodies are the documentation. - Remove class-level comment from McpProxyCacheContentionIT; same reasoning. Verified: 18/18 ProxyCache*IT peer-to-peer tests pass. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…tentionIT The static import of ENGINE_WORKERS was placed after java/org statics with a blank-line separator. Per the project convention static imports go in a single block alphabetically sorted by full path, so io.aklivity.* comes first. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
The engine-driven McpProxyCache*IT tests fail at engine bring-up because the proxy binding does not yet recognize options.cache. This is the expected test-first state, but CI cannot distinguish "expected failure until impl" from "regression". Add class-level @ignore("TODO: enable when proxy cache option lands") to all 5 McpProxyCache*IT classes: McpProxyCacheLifecycleIT, McpProxyCacheToolsListIT, McpProxyCacheResourcesListIT, McpProxyCachePromptsListIT, McpProxyCacheContentionIT. The peer-to-peer ProxyCache*IT tests in specs/binding-mcp.spec remain active and green — they verify script self-consistency independent of the binding implementation. Verified: runtime/binding-mcp clean verify → BUILD SUCCESS, 146 tests run, 5 skipped (the ignored cache ITs). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
First step of the proxy cache implementation: add the config-layer types that map to options.cache in the schema. No runtime behavior yet — McpProxyFactory still ignores the parsed config. - McpCacheConfig: immutable POJO with store, per-method ttl, authorization map (guard-name → credentials) - McpCacheConfigBuilder: fluent builder mirroring the existing McpAuthorizationConfigBuilder pattern - McpOptionsConfig: add cache field + 4-arg constructor - McpOptionsConfigBuilder: add cache() method (nested-builder pattern matching authorization()) - McpOptionsConfigAdapter: serialize/deserialize the cache block with ttl + per-guard authorization credentials Verified: binding-mcp tests pass, checkstyle clean, binding-mcp.spec SchemaTest still green. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…1737) Phase B of the proxy cache implementation. When options.cache is present, attach() now: - resolves the unconditional exit route - allocates a HydrateSession (initialId/replyId etc.) - schedules a signaler tick that fires immediately - the tick handler issues a lifecycle BEGIN downstream with sessionId="hydrate-1", modelled after KafkaGrpcRemoteServerFactory The session sends reply WINDOW on receipt of the downstream's BEGIN reply, and END on detach. No list-method enumeration yet; tools/ resources/prompts hydrate, store integration, lease coordination, and refresh land in later phases. McpProxyCacheLifecycleIT remains @ignore'd until all four scenarios in the class pass; verified locally that the shouldHydrate scenario itself now runs to green when temporarily un-Ignored. Also adds store-memory test dependency (the cache configs reference type: memory for their backing store). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Phase C — after the hydrate lifecycle reply arrives, the cache now chains tools/list, resources/list, prompts/list as three sequential sub-streams on the same hydrate session. Each list stream sends BEGIN+END (close write), receives BEGIN reply, DATA and END, then signals the parent HydrateSession to start the next. Response bodies are discarded for now; store integration arrives in Phase D. McpProxyCacheLifecycleIT now passes all 4 tests with the engine (shouldHydrate, shouldHydratePersist, shouldHydrateError, shouldServeInitialize) — class-level @ignore removed. McpProxyCache{ToolsList,ResourcesList,PromptsList,Contention}IT remain @ignore'd until serve-from-cache / refresh / lease land. Verified: ./mvnw -pl runtime/binding-mcp clean verify → 149 pass, 4 ITs skipped (the still-Ignored cache list / contention ITs). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Phase D — resolve the configured store at attach() time via
context.supplyStore(resolveId.applyAsLong(cache.store)), thread the
handle through HydrateSession into each HydrateListStream, and on
each list reply END write the accumulated body to the store under
the per-method key ("tools" / "resources" / "prompts") with no
expiry (Long.MAX_VALUE).
Response bodies are buffered byte-by-byte into a per-list-stream
byte[] that grows as needed. On END the accumulated bytes are
decoded as UTF-8 and pushed to the StoreHandler via put().
No serve-from-cache yet — agent list requests still pass through
to the existing proxy code path. Phase E will intercept them and
respond from the store.
Verified: ./mvnw -pl runtime/binding-mcp clean verify → 149 pass,
4 list/contention ITs still @ignore'd.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds McpCacheListServer that intercepts tools/resources/prompts list streams when options.cache is configured on a proxy binding, looks up the cached envelope via StoreHandler.get(), and emits the cached bytes as DATA followed by END without forwarding to upstream. Un-ignores the hydrate + serve tests across the per-kind cache IT classes; periodic refresh and hydrating-wait tests remain ignored pending later phases.
…lel (#1737) Move the per-kind store-key plumbing out of HydrateSession into a new McpListCache attached to McpBindingConfig, dropping the redundant cacheStores map. HydrateSession becomes a populator only; the cache (backed by StoreHandler) is the source of truth for "is kind X ready?" via an async get. With per-kind status independent of session sequencing, the three list-stream round-trips dispatch on the same worker tick - wall-clock hydration drops to a single round-trip, and an error on one kind no longer delays the others or blocks a retry on reconnect. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
jfallows
commented
May 16, 2026
- Rename `HydrateSession` to `McpHydrateSession` to follow the binding's type-prefixed inner class convention. - Make `McpOptionsConfig` constructor package-private; construction is via `McpOptionsConfig.builder()`. - Replace the hardcoded `"hydrate-1"` constant in `McpProxyFactory` with a `Supplier<String>` obtained from `McpConfiguration.sessionIdSupplier()`. Cache ITs configure `MCP_SESSION_ID_NAME` to a static method returning `"hydrate-1"`, mirroring the `McpServerIT` override pattern. - Replace the three flat `Duration ttlTools/ttlResources/ttlPrompts` fields on `McpCacheConfig` with a nested `McpCacheTtlConfig` (built via `McpCacheConfigBuilder.ttl()` returning `McpCacheTtlConfigBuilder`). `McpOptionsConfigAdapter` serializes and deserializes the nested form. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ngConfig Replace the two factory-level maps on `McpProxyFactory` (`sessions: Map<String, McpLifecycleServer>` shared across all bindings, and `hydrateSessions: Long2ObjectHashMap<McpHydrateSession>`) with per-binding fields on `McpBindingConfig`: - `sessions: Map<String, McpProxySession>` — now correctly scoped per binding rather than per worker. Session ids are only meaningful within one binding's namespace. - `hydrate: McpProxyHydrate` — the per-binding hydrate session. `McpProxySession` and `McpProxyHydrate` are package-visible interfaces declared in `internal.config` so `McpBindingConfig` can type the fields without depending on the inner classes inside `McpProxyFactory`. The inner classes implement the interfaces — no behavior change. `McpLifecycleServer` gains a constructor reference to its `McpBindingConfig` so `cleanup` can call `binding.sessions.remove(sessionId)`. This sets up the per-binding hook that the upcoming per-kind factory extraction will route state through. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Move McpLifecycleServer and McpLifecycleClient from McpProxyFactory into a new McpProxyLifecycleFactory as inner classes - same enclosing-instance pattern, just a smaller enclosing factory. McpLifecycleServer (and the cross-class accessors `sender`, `originId`, `routedId`, `sessionId`, `supplyClient`) is package-private so the still-inline call/list dispatch in McpProxyFactory can pattern-match against it. McpLifecycleClient is also package-private because McpClient and McpListClient hold typed references to it. Introduce `Int2ObjectHashMap<BindingHandler> factories` on McpProxyFactory and delegate KIND_LIFECYCLE to the new factory via the dispatch table; other kinds remain inline until subsequent phases extract them. Mirrors KafkaClientFactory's per-kind factory pattern. McpProxyFactory drops 560 lines. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ompts/get, resources/read Introduce an abstract McpProxyItemFactory implementing BindingHandler plus three concrete per-kind subclasses (McpProxyToolsCallFactory, McpProxyPromptsGetFactory, McpProxyResourcesReadFactory). The base owns the shared stream state machine - the McpServer and McpClient inner classes move verbatim out of McpProxyFactory - and exposes three hooks for the kind-specific bits: protected abstract int kind(); protected abstract void injectInitialBeginEx(McpBeginExFW.Builder, String sid, String identifier); protected abstract void injectReplyBeginEx(McpBeginExFW.Builder, String sid, McpBeginExFW upstream); Each subclass implements ~30 lines: its KIND constant and the tools-call/prompts-get/resources-read variants of the BEGIN extension builder. Naming "Item" reflects that the three operations are different verbs (call/get/read) acting on a single identified MCP item - opposite to the List kinds. McpProxyFactory registers all three subclasses in its existing Int2ObjectHashMap<BindingHandler> factories map next to the Phase 2 McpProxyLifecycleFactory entry; the newStream dispatcher's else branch collapses to factories.get(kind).newStream(...). The local McpServer, McpClient, and rewriteReplyBeginEx are removed from McpProxyFactory along with three now-unused flyweight fields (flushRO, challengeRO, mcpChallengeExRW) and the McpChallengeExFW import. Net change: McpProxyFactory shrinks 735 lines (3071 -> 2336); McpProxyItemFactory adds 1229 lines. Each subclass receives the same (McpConfiguration, EngineContext, LongFunction<McpBindingConfig>) constructor signature as the Phase 2 lifecycle factory, and the per-factory do-helpers (doBegin, doData, doEnd, doAbort, doFlush, doChallenge, doReset, doWindow) are local copies rather than parent-shared - matches Phase 2 precedent. No visibility widening was needed beyond what Phase 2 already opened on McpLifecycleServer/McpLifecycleClient. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…e hydrate ITs
The parallel-hydrate path in McpHydrateSession.onBegin iterates over
[KIND_TOOLS_LIST, KIND_RESOURCES_LIST, KIND_PROMPTS_LIST] and dispatches
all three list streams on the same worker tick. The per-kind cache ITs
(McpProxyCache{Resources,Prompts}ListIT.shouldHydrate{Resources,Prompts})
each only `read` their own kind's BEGIN ext, so the other two streams
emit unexpected BEGINs that fail the script assertion. Updating each
script to accept three BEGINs would push parallel-hydrate setup into
tests whose intent is the single-kind hydrate behavior in isolation.
Add MCP_HYDRATE_KIND_FILTER (IntPredicate, default `k -> true`) to
McpConfiguration, exposed as `hydrateKindFilter()`. Mirrors the existing
MCP_SESSION_ID supplier pattern: tests resolve a `Class::method` static
reference; the decoder loads it via MethodHandle. Production keeps the
default — all three kinds hydrate in parallel — so the contention IT
exercises the real behavior unchanged. Each per-kind IT now configures
the filter to its single KIND_*_LIST so only that kind's BEGIN is
emitted and the existing per-kind scripts pass unmodified.
Result locally: 3 hydrate failures fixed
(McpProxyCacheToolsListIT had timed out, also resolved). Three
`shouldServe*` failures remain — cache-serve reads empty payload — and
are independent of hydrate dispatch.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…ompts/list, resources/list
Mirror the Phase 3 item-factory shape for the list slice. Introduce an
abstract McpProxyListFactory implementing BindingHandler plus three
concrete per-kind subclasses (McpProxyToolsListFactory,
McpProxyPromptsListFactory, McpProxyResourcesListFactory). The base
owns the shared state machine - McpListClient, McpListClientDecoder
with its ten decode states, McpListServer (passthrough), and
McpCacheListServer (cache-serve variant), plus the indexOfByte JSON
helper - and exposes seven hooks for the kind-specific bits:
protected abstract int kind();
protected abstract void injectInitialBeginEx(McpBeginExFW.Builder, String sid);
protected abstract void injectReplyBeginEx(McpBeginExFW.Builder, String sid);
protected abstract DirectBuffer listReplyOpenPrelude();
protected abstract JsonParserFactory listItemParserFactory();
protected abstract String arrayKey();
protected abstract String idKey();
Each subclass owns its per-kind prelude bytes (the JSON envelope-open
literal), JsonParserFactory for the streaming item parser, array key
("tools" / "prompts" / "resources"), and id key ("name" or "uri" for
resources). The kind field is removed from McpListClient, McpListServer,
and McpCacheListServer - each factory instance is bound to one kind, so
the seven `switch (kind)` blocks in those classes collapse to direct
hook calls.
Cache-vs-passthrough is now dispatched internally in
McpProxyListFactory.newStream: when binding.cache is non-null an
McpCacheListServer is constructed, otherwise an McpListServer with its
McpListClient via lifecycle.supplyClient. The McpProxyFactory factories
map gets three new entries (KIND_TOOLS_LIST / KIND_PROMPTS_LIST /
KIND_RESOURCES_LIST) and the newStream dispatcher collapses to
factories.get(kind).newStream(...) for every dispatched kind, including
lifecycle - the per-kind factories enforce their own session/route
preconditions.
HydrateListStream, McpHydrateSession, and SIGNAL_INITIATE_HYDRATE stay
in McpProxyFactory because they coordinate multi-kind hydrate from one
session and aren't request-time dispatch paths; the small kind-switch
in HydrateListStream.initiate persists.
McpProxyFactory shrinks 1777 lines (2343 -> 566), shedding all list
machinery plus six now-unused do* helpers (doBegin, doData, doAbort,
two doFlush overloads, doChallenge, doReset), unused flyweight RO/RW
fields and imports, and the no-longer-called sessionId(McpBeginExFW)
helper. doEnd, doWindow, and a local newStream remain because
HydrateListStream still calls them. McpProxyListFactory adds 2016 lines.
No further visibility widening was needed beyond what Phase 2 opened
on the lifecycle accessors.
ITs: 156 pass / 3 fail / 8 skipped, identical to the pre-refactor
baseline. The three `shouldServe*` failures are a pre-existing
cache-serve empty-payload bug unrelated to this refactor.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
The three shouldServe*List runtime ITs raced against hydrate: the agent sent its lifecycle BEGIN, McpProxyLifecycleFactory replied immediately (it has no hydrate awareness), the agent then sent its list BEGIN, and McpCacheListServer hit an empty cache because hydrate was still in flight - the agent received an empty payload. Make these ITs deterministic by pre-seeding the cache instead of racing against hydrate. Mid-hydrate is a distinct scenario that lifecycle gating will cover later (the .hydrating scripts and @ignore'd shouldServe*ListHydrating methods are removed here; when lifecycle gating lands we'll re-introduce a clearer cache.serve.<kind>.refreshing flavor rather than overload "hydrating"). Changes: - New config proxy.cache.seeded.yaml using TestStore with options.entries carrying the tools/resources/prompts JSON. The existing proxy.cache.yaml is unchanged - hydrate ITs continue to verify an empty memory store. - Runtime shouldServe{Tools,Resources,Prompts}List ITs switch to proxy.cache.seeded.yaml and reuse the existing cache.hydrate/server script for the downstream (it only handles the hydrate-1 lifecycle accept + reply, which is exactly what a fully-cached binding produces on the wire when McpHydrateSession.onBegin sees every cache.get returning non-null and spawns no list streams). - Delete the three .hydrating script directories and the @ignore'd shouldServe*ListHydrating test methods (both runtime and peer ITs). Runtime: 156 pass / 0 fail / 5 skipped (was 156p / 3f / 8s). Peer: 15 pass / 0 fail / 0 skipped. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
McpCacheContext no longer holds back-reference to McpProxyCacheHydrater. Signal scheduling sites use small context-capturing lambdas to invoke hydrater methods directly (sigId -> beginLifecycle(context) and sigId -> refresh(context)), so the trampoline methods onInitiateLifecycle and onRefresh on the context are gone too. McpCacheContext now takes the engine Signaler at construction time (provided via McpBindingConfig from EngineContext) which it uses directly for handle dispatch in register() and markComplete(). The detached guard moves into beginLifecycle and refresh on the hydrater where the orchestration actually happens. bind(hydrater, expected) becomes prepare(expected) — just session state reset, no hydrater stored. The lambda allocation per signal scheduling fires at most a handful of times per binding lifetime (attach, lease retry, cache TTL refresh) — nothing on the hot path.
McpCacheContext takes the per-worker hydrater as a final ctor field and owns the full lifecycle state machine. McpProxyFactory.attach calls context.start() (no args) and context.detach() directly; the hydrater's attach/detach/beginLifecycle/onAcquireLifecycleComplete methods are gone. The signaler also lives on context (final ctor field, used directly by register/markComplete and by the state-machine methods). Hydrater no longer holds a signaler reference. Signal callbacks are bare method refs targeting context methods — this::beginLifecycle, this::onRefresh — so zero allocation per signal scheduling. The detached guard and lifecycle orchestration read top-to-bottom on context, where the session state lives. Hydrater becomes pure plumbing: flyweights, per-kind strategies, suppliers, plus accessors (activeHydraterCount, supplyTraceId, supplySessionId), a lifecycle-stream factory (newLifecycleStream), a list-hydrater dispatcher (initiateListHydraters), and the per-kind refresh dispatcher (refresh). The list-hydrater refresh-timing also flows through context: strategies call context.scheduleRefresh(signalId()) instead of touching signaler directly. cacheTtl gating is on context where cacheTtl already lives. McpHydrateLifecycleStream becomes package-private so context can hold a typed reference (replacing the prior Runnable cleanup hook). On stream BEGIN the stream notifies context.onLifecycleOpened so context can dispatch list hydraters; on stream close paths the stream notifies context.onLifecycleClosed so context can drop the typed reference and release the lock. The future reconnect hook will live in onLifecycleClosed. McpBindingConfig accepts the hydrater as a 4th ctor param and threads it to McpCacheContext. McpServerFactory and McpClientFactory pass null since they never construct a cache.
…eanup Three related cleanups to the cache hydrate lifecycle: 1. Defer expensive work until lock acquired. beginLifecycle now just calls acquireLifecycle; the traceId, sessionId, and guard reauthorization happen inside onAcquireLifecycleComplete only when acquired == true. Avoids minting a sessionId / reauthorizing the guard on every failed lease attempt. 2. Cancellable signal tracking. McpCacheContext stores the cancel id returned by signaler.signalAt in a per-signalId slot, and detach cancels all pending signals before tearing down the lifecycle stream. The defensive detached checks at beginLifecycle and onRefresh entry points (signal targets) are gone — cancellation makes them dead code. Store-callback boundaries (acquire complete handlers) still need detached checks because store ops are not cancellable. 3. Lifecycle stream owns list streams. McpHydrateLifecycleStream tracks active McpListHydrateStream instances via a List; list streams register on construction and unregister on terminal. The three close paths (onLifecycleEnd / onLifecycleAbort / onLifecycleReset) and doLifecycleEnd all call cleanupListStreams to cascade an END to in-flight list streams. Matches the parent-owns-children pattern used by McpLifecycleServer for its McpLifecycleClient children in McpProxyLifecycleFactory. McpListHydrater and McpListHydrateStream drop their private modifiers so the lifecycle stream and context can reference these types directly across the nested class boundary.
Readiness now derives from cache state rather than an explicit populated/expected counter. McpListCache tracks a populated flag that flips on get (re-evaluated each call: true when value non-null, false otherwise) and on put (always true). Both callbacks fire McpCacheContext checkReady which walks the active-cache list (built at start from hydrater.activeCaches) and calls markComplete when all are populated. Awaiters wait for actual cache population rather than the prior optimistic-on-acquire-fail signal — honest readiness. Strategies drop their explicit markReady calls; readiness flows from cache state. initiate (initial entry + contention poll) does cache.get first; refresh (periodic, cacheTtl-paced) does cache.acquire first to force a fresh hydrate. onRefresh dispatches between the two based on backoff state — non-zero backoff means we are in polling mode after losing a recent acquire race. scheduleBackoffRetry doubles the delay from leaseRetry on each acquire-fail, capped at leaseTtl. Reset to zero on successful hydrate or cache-hit. Polling sequence at defaults (leaseRetry=100ms, leaseTtl=30s): 100ms, 300ms, 700ms, 1.5s, 3.1s, 6.3s, 12.7s, 25.5s, cap thereafter. Bounded above by leaseTtl because per-kind lock expires at that point and our acquire must succeed. McpProxyCacheHydrater.refresh gains a polling flag; activeCaches accessor returns the filter-active per-kind cache list for the context.
jfallows
commented
May 20, 2026
- McpListCache.get/put use BiConsumer/Consumer .andThen() to chain state-check (checkGet, checkPut) onto downstream completion — drops the inline lambda blocks. - McpBindingConfig: introduce a 3-arg ctor delegating to the 4-arg with null hydrater, so McpServerFactory and McpClientFactory call sites no longer need to pass null explicitly. - McpBindingConfig: collapse the store-then-McpCacheContext chain into one Optional pipeline ending in map(store -> new McpCacheContext(...)).orElse(null). The intermediate StoreHandler variable goes away. - McpProxyCacheHydrater lifecycle stream: drop the if (!McpState.replyClosed(state)) and !initialClosed guards from onLifecycleEnd / onLifecycleAbort / onLifecycleReset. The engine does not deliver an END / ABORT / RESET frame on a direction already closed, so the guards are dead code per the canonical pattern (only do* outbound senders need state guards).
jfallows
commented
May 20, 2026
McpCacheContext ctor signature becomes (binding, config, context,
cache, hydrater) and resolves store/guard/credentials/leaseTtl/leaseRetry
internally from the supplied configs. McpBindingConfig drops the
intermediate cacheGuard/cacheCredentials/cacheTtl locals and the
Optional chain reduces to:
this.cache = Optional.ofNullable(options)
.map(o -> o.cache)
.map(cache -> new McpCacheContext(
binding, config, context, cache, hydrater))
.orElse(null);
Field renamed McpBindingConfig.cacheContext -> McpBindingConfig.cache
(still typed McpCacheContext). Consumers in McpProxyFactory,
McpProxyLifecycleFactory, and the three per-kind list factories
follow the rename.
Carve cache machinery into a new internal.stream.cache subpackage: - McpProxyCache (data + populated arbiter + awaiters) - McpProxyCacheManager + nested Factory (per-binding lifecycle) - McpProxyCacheHandler (one lifecycle-stream lifetime) - McpProxyCacheHydrater (per-worker, hidden behind Manager.Factory) - McpProxyCacheListener (Handler -> Manager escalation) McpProxyCache owns awaiter registration and the populated transition; McpProxyLifecycleFactory now calls binding.cache.register() against the cache directly, not a Manager indirection. The Manager owns refresh, per-kind retry, and lifecycle-reconnect timing; the Handler owns the lifecycle stream plus per-kind state machines and escalates via listener.onError(kind) for stream failures and listener.onClosed() for lifecycle loss. On lifecycle loss the Manager pre-emptively purges each kind so awaiters arriving during the outage wait rather than see stale data, then schedules a backoff reconnect. McpProxyFactory holds a Long2ObjectHashMap of managers keyed by bindingId, mirroring McpServerFactory.sessions, and creates per-binding Managers via Manager.Factory which hides the per-worker hydrater. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds two ITs for behaviours newly introduced by the cache Handler/Manager split that the existing suite did not exercise end-to-end: - shouldRetryAfterToolsRefreshError: a cacheTtl-driven tools-list refresh aborts, then the Manager's onError(kind) escalation drives a per-kind retry within leaseRetry. The third tools-list arrives within ~100ms, succeeds, and re-populates the cache. - shouldServeLifecycleAfterAwaiterQueued: a north MCP client connects while the binding's hydrate is in flight, so the awaiter registers on cache.populated=false and is queued. After the upstream completes the tools-list hydrate the cache transitions to populated, the awaiter fires, and the client receives its lifecycle BEGIN reply. Also fixes a regression in the Handler/Manager split where the lifecycle lock was no longer released on the populated transition (today's markComplete behaviour). Without this fix, multi-worker engines would have one worker hold the lock indefinitely and starve every other worker's awaiters. McpProxyCacheManager.onCacheReady now calls cache.releaseLifecycle exactly as McpCacheContext.markComplete did before the refactor. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…drate Renames the per-kind retry scheduling on McpProxyCacheManager to match the Handler API it ultimately drives: the timer fires handler.hydrate(kind), so the scheduler is scheduleHydrate(kind). Co-renames the companion helpers and state for consistency: scheduleKindRetry -> scheduleHydrate onKindRetryFire -> onHydrateFire cancelKindRetry -> cancelHydrate kindRetryCancelIds -> hydrateCancelIds kindBackoffMs -> hydrateBackoffMs Also renames shouldServeLifecycleAfterAwaiterQueued to shouldServeToolsListDuringHydrate so the test method name matches its scenario directory cache.serve.tools.list.during.hydrate. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
onRefreshFire -> onRefreshed, onReconnectFire -> onReconnected, matching the onHydrated rename. All three timer callbacks now use the event-completed naming. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Adds cache.hydrate.lifecycle.reconnect spec scripts + binding-side IT shouldReconnectAfterLifecycleAbort and spec-side peer-to-peer entry. Block A on the lifecycle stream uses write await ALL_HYDRATED then write abort; block D on prompts-list notifies via write notify ALL_HYDRATED after its data write. The test currently fails to time out: block A's write abort never lands in the trace, so the binding never observes a lifecycle abort and never reconnects. Iterating with k3po guidance on whether write await at the tail of an accepted block (no further writes after) is honoured. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
McpProxyLifecycleFactory had asymmetric propagation on terminal frames: * Client to server (1-to-1, upward): McpLifecycleClient.onClientEnd / onClientAbort / onClientReset only removed the one downstream leg from server.clients without propagating the terminal back through the server-side reply. The cache hydrate session and any other lifecycle proxy session therefore never observed upstream shutdowns. Adds the matching server.doServer<Terminal> call after the remove. * Server to clients (1-to-many, downward): the shared cleanup() helper always issued doClientEnd to every client regardless of how the server-side terminated, so a north ABORT or RESET was downgraded to a graceful END for every upstream, losing the failure signal. Inlines the fan-out into each onServer<Terminal> with the matching doClient<Terminal>; cleanup() is removed. Adds the missing doServerReset on McpLifecycleServer (mirror of doServerAbort but on the initial side). The cache.hydrate.lifecycle.reconnect IT scenario from the previous WIP commit is rolled back here pending a follow-up design decision on whether McpProxyCache.onPurged should also invalidate the per-kind store data so a reconnect triggers a fresh upstream hydrate cycle (and thus a new lifecycle BEGIN to the exit binding observable in the script). With this commit alone, the cache reconnects internally but no new traffic reaches the exit binding because the store has cached data and McpListHydrater.initiate's get-first path returns it without opening a new list stream. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Covers the recently-landed McpProxyLifecycleFactory terminal passthrough end-to-end without any cache involvement, using the existing proxy.yaml config (single route, app0 -> app1). Each scenario opens the lifecycle stream from north (client.rpt to app0) and a tools-list to force the proxy to lazily open its client leg to app1 via supplyClient. After the tools-list completes, the terminal event happens on the lifecycle stream. The 6 scenarios cover both directions of every fix path: * lifecycle.server.write.abort - proxy doServerAbort on reply * lifecycle.server.write.close - proxy doServerEnd on reply * lifecycle.server.read.abort - proxy onServerAbort on initial * lifecycle.client.write.abort - proxy doClientAbort on initial * lifecycle.client.write.close - proxy doClientEnd on initial * lifecycle.client.read.abort - proxy onClientAbort on reply Pairs cover the same end-to-end flow from opposite observation points (server.write.abort <-> client.read.abort for upward ABORT; server.read.abort <-> client.write.abort for downward ABORT). END is asymmetric: server.write.close locks down upward END, client.write.close locks down downward END. Binding-side IT: new McpProxyLifecycleIT class, no cache config. Spec-side peer-to-peer: 6 entries added to ApplicationIT. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…drate
Closes the last gap from the cache refactor: when the cache's upstream
lifecycle session aborts, the cache should reconnect AND pull fresh data
on the next hydrate cycle, all without invalidating the cached value
during the outage window so north clients keep being served from the
last good value.
McpProxyCache.populated flag drops to package-private so the hydrate
strategy can read it. checkReady fires onReady on every aggregate
populated state (not just on the false to true transition), so the
Manager's refresh schedule is renewed each time a kind's value is
overwritten via cache.put. Manager.onClosed stops calling
cache.onPurged; the cached value stays visible across the reconnect.
McpListHydrater.initiate now picks get-first vs acquire-direct based on
cache.populated. On the initial cycle (cache empty) it stays get-first
to coordinate across workers/nodes that may have populated the same
key. Once populated, it goes acquire-direct: we are the cache trying
to pull a fresher value into ourselves, so get-first would short-circuit
on our own cached value and never refresh. refresh stays acquire-direct
unconditionally.
The Gap 1 spec scripts cache.hydrate.lifecycle.reconnect/{client,server}
exercise the full cycle: 4 streams on app1 for the initial hydrate, an
abort on the lifecycle reply once all kinds populate, then 4 fresh
streams on app1 with new payloads after the proxy reconnects. The
binding-side IT shouldReconnectAfterLifecycleAbort uses just server.rpt
with proxy.cache.yaml (no override). The spec-side peer-to-peer test
on ProxyCacheLifecycleIT runs both halves.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
…eature-ww0Lv # Conflicts: # specs/binding-mcp.spec/src/test/java/io/aklivity/zilla/specs/binding/mcp/streams/application/ApplicationIT.java
…CacheIT Five runtime-side McpProxyCache*IT classes (Lifecycle, ToolsList, ResourcesList, PromptsList, Contention) collapse into a single McpProxyCacheIT alongside McpProxyIT / McpClientIT / McpServerIT. Four spec-side ProxyCache*IT classes collapse similarly into ProxyCacheIT alongside ApplicationIT / NetworkIT. The per-IT-class MCP_HYDRATE_FILTER override is replaced by per-test @configure(name = MCP_HYDRATE_FILTER_NAME, value = "tools") (or "resources" / "prompts"). Default in the class-wide engine rule is the unfiltered all-kinds predicate. The contention test additionally pins ENGINE_WORKERS=2 and uses a rotating session-id supplier, both via @configure. McpConfiguration.decodeHydrateFilter now accepts a space-separated list of kind names ("tools", "resources", "prompts") instead of a method reference. Each name maps to its KIND_*_LIST constant, and the resulting Set<Integer> is exposed as an IntPredicate via Set::contains. Drops the reflective findStatic lookup and the per-IT-class hydrate*Only static helpers. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
jfallows
commented
May 21, 2026
…e retry timing in Manager McpProxyCache replaces the tools/resources/prompts fields with a Map<Integer, McpListCache> keyed by kind; the hydrate filter is applied once at construction and the map then drives every downstream consumer. The list factories look up via cache.cacheOf(kind), and McpProxyCacheManager iterates cache.caches().keySet() rather than holding its own active-kinds projection. McpProxyCacheHydrater stores its kind-to-strategy mapping in an Int2ObjectHashMap, replacing the switch in hydraterOf. HandlerImpl no longer owns per-kind backoff state, per-kind retry signals, or lifecycle-acquire retry signals; on per-kind acquire/stream error it escalates via listener.onError(kind), and on lifecycle-acquire failure it escalates via listener.onClosed(). All retry timing (per-kind retry, refresh cadence, lifecycle reconnect) now lives in Manager. McpProxyCacheListener gains onOpened() so the Manager can dispatch initial per-kind hydrate immediately after the lifecycle stream opens; McpProxyCacheManager registers cache.onReady as today. McpProxyCache no longer holds a Signaler reference. Awaiters are registered as plain Runnable; McpProxyLifecycleFactory captures a Signaler at construction and calls signaler.signalNow(...) directly from the registered lambda. McpSignalHandle is removed. The lifecycle stream and list-hydrate streams cache originId/routedId fields (both equal to cache.bindingId), use them throughout. The lifecycle stream's child-stream registry is renamed streams. The lock-key constant is renamed STORE_LOCK_KEY_LIFECYCLE for symmetry with the other lock keys. McpListHydrater.hydrate retains the cache.populated branch -- get-first on initial dispatch (to read through seeded data without opening a list stream) and acquire-direct once populated (so refresh actually overwrites). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Switch McpProxyCache.caches from Map<Integer, McpListCache> to org.agrona.collections.Int2ObjectHashMap<McpListCache> to avoid the Integer boxing on every cacheOf(kind) lookup and keySet() iteration. Int2ObjectHashMap iteration order is hash-bucket order, not insertion order, but it is deterministic for a fixed set of int keys at a fixed initial capacity. With KIND_TOOLS_LIST / KIND_RESOURCES_LIST / KIND_PROMPTS_LIST the dispatch order is prompts -> tools -> resources; update every multi-kind cache.hydrate* spec script (both server.rpt and client.rpt sides) to that order so peer-to-peer ProxyCacheIT and the binding-side McpProxyCacheIT both pass. cache.hydrate, cache.hydrate.credentials, cache.hydrate.error, cache.hydrate.toolkit, cache.hydrate.lifecycle.reconnect all reorder their three accepted blocks (or the corresponding chain of client connects via barriers) from tools/resources/prompts to prompts/tools/resources. cache.hydrate.error keeps tools as the kind that aborts (now second in the chain rather than first). The lifecycle-reconnect script moves the write-notify ALL_HYDRATED to the new last block (resources) on both cycles. The toolkit script applies the reorder to both the app1 and app2 cycles. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
jfallows
commented
May 21, 2026
… field McpProxyFactory previously held an McpProxyCacheManager.Factory instance just to call create(cache) once per binding attach. Replace with a Function<McpProxyCache, McpProxyCacheManager> captured as a method reference (Factory::create) so the attach call site reads supplyManager.apply(cache) and the factory class no longer leaks the nested Factory type into its field declarations. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
Add shouldHydrate10k, shouldHydrate100k, shouldServeToolsList10k, and
shouldServeToolsList100k to McpProxyCacheIT, each pinning the engine
buffer slot capacity to 8192 so the payloads cross the slot boundary.
The 10k variant uses core:randomBase64(10000) for ~13KB of body; the
100k variant uses core:randomBase64(100000) for ~133KB.
The seeded yaml for the serve variants references the new randomBase64
template handled by TestResolverSpi -- e.g.
'{"tools":[{"name":"big_tool","description":"${{test.randomBase64.10000}}"}]}'
-- so the config stays compact while the resolver expands the template
at config-load time to the same deterministic Base64 string that
core:randomBase64 produces inside the .rpt scripts.
McpListHydrateStream was missing the receive-side flow-control update:
onListHydrateBegin now seeds replySeq/replyAck from the begin frame,
and onListHydrateData advances replySeq by data.reserved() and acks
replyAck = replySeq before re-emitting WINDOW. Without this, upstream
stalled once its in-flight reached replyMax (= slotCapacity) for any
single list-hydrate payload larger than one slot.
https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
jfallows
commented
May 21, 2026
…ateLifecycleStream methods HandlerImpl.lifecycleStream -> lifecycle (already typed McpHydrateLifecycleStream, the suffix was redundant). McpHydrateLifecycleStream.registerListStream / unregisterListStream / cleanupListStreams -> register / unregister / cleanupStreams; the "List" qualifier is implicit from the registry's element type (McpListHydrater.McpListHydrateStream). https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
The kind-switch in sessionId(McpBeginExFW) duplicated the binding-kind dispatch already performed when selecting the factory. Promote sessionId to a protected abstract method on McpProxyItemFactory; each kind-specific subclass (tools-call, prompts-get, resources-read) returns the sessionId from its corresponding extension. McpProxyListFactory already followed this pattern. https://claude.ai/code/session_01WNsipAt3RGwQoeFYVxwfL8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1737.
Summary
Adds
options.cacheto themcp · proxybinding. When configured, the proxy hydrates a shared store withtools/list,resources/list, andprompts/listpayloads aggregated across the binding's configured routes, and serves those list calls from cache without round-tripping downstream. Folds what was originally proposed as a separatemcp · cachekind into an option on the existing proxy — every original topology has an exact equivalent in the folded model, and the enginekind: cacheprerequisite drops out.Config
Binding options
Engine configuration
zilla.binding.mcp.hydrate.filterClass::methodreference returning anIntPredicate; restricts which list kinds the binding hydrates (mainly used by per-kind ITs to scope refresh scenarios)zilla.binding.mcp.lease.ttlPT30Szilla.binding.mcp.lease.retryPT0.1SRuntime behaviour
McpProxyCacheHydrateropens one lifecycle stream to the binding itself (originId == routedId == binding.id) and issues per-kind list BEGINs (gated byMCP_HYDRATE_FILTER) viaMcpProxyCacheListHydratersubclasses — one per kind. The binding's own dispatch (McpProxyListFactory.newStream) detects the self-loop (originId == routedId), skips the cache check, and creates anMcpListServerthat fans out across all matching routes viabinding.resolveAll(...). Each kind's aggregated response body is written to the configured store under keystools/resources/prompts.signaler.signalNow(delivered to the agent's reply-direction stream — the engine registers binding consumers inthrottles[replyId], where SignalFW is dispatched) until every configured kind has settled. By the time an agent seeslifecycle initialized, every list call hits a populated cache. The self-loop hydrate session bypasses this gating to avoid deadlocking against itself.cache.hydratescenario, whoseapp1-side script asserts exactly one lifecycle BEGIN and three list BEGINs (not four).binding.resolveAll(...). Each route returns its own list,McpListServerinjects the per-route prefix (e.g.,bluesky__get_weather,quartz+file:///doc.txt) and merges them into the single per-kind cache value — verified bycache.hydrate.toolkitagainstproxy.cache.toolkit.yaml.options.cache.authorizationis set, attach resolves the single guard, callsguard.reauthorize(traceId, binding.id, 0L, credentials)once, and stamps the resulting token onto every outbound BEGIN/END/WINDOW on the hydrate session and its list streams. Guard and credentials live onMcpLifecycleCacheso the hydrater doesn't reach back into the binding config for them.tools/list,resources/list,prompts/listrequests hitMcpCacheListServer, which reads the cached payload from the store and emits it as DATA + END without forwarding.McpProxyCacheListHydrater.scheduleRefreshschedulessignaler.signalAt(Instant.now().plus(ttl), kindSignalId, …)for that kind. On expiry, a new list stream is issued and routed through the same self-loop path; the existing lifecycle is reused viaMcpLifecycleClient.supplyClient. Aborts during refresh preserve the prior cached entry.lifecycle.lock(TTL =MCP_LEASE_TTL) arbitrates which worker opens the hydrate session (loser polls atMCP_LEASE_RETRYintervals); a per-kind.lockarbitrates which worker issues each list call on the wire. Winners release on hydrate complete (lifecycle lease) or aftercache.put/ abort terminal (per-kind lease). Settle paths split: cache hits funnel throughmarkSettledso only the worker that actually did the list-stream work arms a refresh signal.notifications/{tools,resources,prompts}/list_changed) — deferred per the issue description.Refactor
McpProxyFactoryballooned absorbing cache support. Extracted into four per-kind-family factories, each implementingBindingHandlerand dispatched fromMcpProxyFactory.factories: Int2ObjectHashMap:McpProxyLifecycleFactory—KIND_LIFECYCLE. Detects self-loop inonServerBeginand bypasses the hydrate gating so the hydrater's own lifecycle reply doesn't deadlock.McpProxyItemFactory(abstract base +McpProxyToolsCallFactory/McpProxyPromptsGetFactory/McpProxyResourcesReadFactory) —KIND_TOOLS_CALL/KIND_PROMPTS_GET/KIND_RESOURCES_READ. Each subclass passes its kind to the base via the constructor and overridesinjectInitialClientBeginEx.McpProxyListFactory(abstract base +McpProxyToolsListFactory/McpProxyPromptsListFactory/McpProxyResourcesListFactory) —KIND_TOOLS_LIST/KIND_PROMPTS_LIST/KIND_RESOURCES_LIST.newStreamcheckscache != null && originId != routedId— agent calls hitMcpCacheListServer(serve-from-cache); hydrate self-loop falls through toMcpListServer(passthrough/aggregation).Hydrate coordination extracted into dedicated top-level classes:
McpProxyCacheHydrater— multi-kind coordinator; owns the lifecycle stream via innerMcpHydrateLifecycleStream. Self-targetsbinding.id(originId == routedId).McpProxyCacheListHydrater(abstract base +McpProxyCacheToolsListHydrater/McpProxyCacheResourcesListHydrater/McpProxyCachePromptsListHydrater) — per-kind list stream via innerMcpListHydrateStream. Subclasses overridesignalId()andinjectInitialBeginEx(); the inner stream usesExpandableArrayBufferfor body accumulation. The initial stream is closed deferred —doListHydrateBeginmarksclosingInitial, andonListHydrateWindowissuesdoListHydrateEndonce the peer credit lands.Stream construction follows the standard "ctor for state init,
do*Begin/do*Endpost-construction" split — no behavioural work in stream constructors. Both stream classes handle the full close protocol (onListHydrateAbort→doListHydrateAborton initial,onListHydrateReset→doListHydrateReseton reply).McpProxyFactoryshrinks from ~3631 lines to ~134 lines and now owns only the dispatch map and attach/detach.McpBindingConfigowns the hydrater (hydrater.start()on attach, cleanup on detach), the per-kindMcpListCacheinstances, and theMcpLifecycleCache(which carries its own guard + credentials).McpAuthorizationConfigis reused under bothoptions.authorization(server/client kinds) andoptions.cache.authorization(proxy kind) — same shape, the latter additionally carries credentials.Engine SPI change
runtime/engine:Signalergains twosignalAt(Instant, …)abstract overloads alongside the existing long-based methods. The single in-tree implementation (EngineWorker.EngineSignaler) converts viaInstant.toEpochMilli();KafkaClientConnectionPool.KafkaClientSignalerandTlsWorker.TlsSignaler(test bench) provide matching delegations. Used bybinding-mcpfor lease retry and cache refresh scheduling so callers don't have to mixcurrentTimeMillis()withDuration.toMillis()arithmetic.Spec scripts
cache.hydratecache.hydrate.toolkitcache.hydrate.errorcache.hydrate.credentialscache.serve.initializecache.serve.tools.list/.resources.list/.prompts.listcache.refresh.tools/.resources/.promptscache.refresh.tools.errorcache.refresh.tools.contendedITs
specs/binding-mcp.spec/.../streams/cache/)runtime/binding-mcp/...)ProxyCacheLifecycleIT(5 tests)McpProxyCacheLifecycleIT(5 tests)ProxyCacheToolsListIT(4 tests)McpProxyCacheToolsListIT(3 tests)ProxyCacheResourcesListIT(2 tests)McpProxyCacheResourcesListIT(2 tests)ProxyCachePromptsListIT(2 tests)McpProxyCachePromptsListIT(2 tests)McpProxyCacheContentionIT(1 test,ENGINE_WORKERS=2)cache.hydrate(multi-kind single-route) is the canonical hydrate scenario; per-kind hydrate scripts dropped since they were strict subsets of the multi-kind one.cache.hydrate.toolkitcovers the multi-route aggregation case — the peer-to-peer client connects directly toapp1andapp2(sequenced withread notify/connect await) to mirror the server's two accepts, since the proxy is absent in peer-to-peer mode. The per-kind ITs retainMCP_HYDRATE_FILTERconfiguration to scope thecache.refresh.*scenarios to one kind.Test-only configs
proxy.cache.yaml— empty test store, single worker, single downstream route; used by hydrate ITs.proxy.cache.toolkit.yaml— empty test store, two toolkit routes (bluesky → app1, quartz → app2); multi-route hydrate IT.proxy.cache.refresh.yaml— empty test store withttl: PT1S; refresh + contention ITs.proxy.cache.seeded.yaml— test store pre-seeded with cached JSON; serve ITs.proxy.cache.credentials.yaml— test store + enginetestguard with credentials; credentials IT.TestStore enhancement (
runtime/enginetest sources)TestStorenow ownsConcurrentMap<String, Map<String, Entry>>keyed bystoreId; workers attached to the same store share entries.options.entriesseeds viaputIfAbsentso the first attach wins. Previously the entries map was per-handler, soputIfAbsent-based leases were trivially won by every worker — required forMcpProxyCacheContentionITto observe a single winner. Transparent for single-worker scenarios.binding-mcpno longer test-depends onstore-memory; all cache ITs run ontype: test.Test plan
specs/binding-mcp.specProxyCache*IT) → 13 pass / 0 fail.runtime/binding-mcp) → 155 pass / 0 fail / 0 skipped.runtime/engineunit tests → 320 pass / 0 fail.specs/engine.spec→ 39 pass / 0 fail after the TestStore sharing change.🤖 Generated with Claude Code
Generated by Claude Code