perf(sdk): DeltaChannel + add_messages fast-path + no-inline Sends#2910
Closed
Sydney Runkle (sydney-runkle) wants to merge 12 commits into
Closed
perf(sdk): DeltaChannel + add_messages fast-path + no-inline Sends#2910Sydney Runkle (sydney-runkle) wants to merge 12 commits into
DeltaChannel + add_messages fast-path + no-inline Sends#2910Sydney Runkle (sydney-runkle) wants to merge 12 commits into
Conversation
- AgentState.messages now uses DeltaChannel(add_messages) (via langchain sr/delta-channel-messages) — checkpoint storage drops from O(N²) to O(N) for long-running threads - FilesystemState.files now uses DeltaChannel(_file_data_reducer) directly; removes the now-redundant DeltaFilesystemMiddleware - pyproject.toml sources langchain, langchain-core, langsmith, langgraph, and langgraph-sdk from their respective dev branches via git URLs so testers can install without local clones Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop langsmith git override (PyPI is fine). Point langgraph + sdk to diff-channel-incremental-checkpointing instead of sr/deferred-imports. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also drops the langchain-core override-dependencies workaround — delta-channel-writes-based uses langchain-core>=1.3.0,<2 (no exact pin). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e Sends
Repoints all langchain/langgraph sources to the combined `sr/deepagents-perf-combo`
branches, which stack three perf improvements:
1. DeltaChannel for `messages` + `files` (langgraph + langchain AgentState):
checkpoint storage drops from O(N²) to O(N) for long threads.
2. `add_messages` fast-path (langgraph): skip left-side conversion and
fast-path pure appends on the hot add_messages call.
3. No-state-inline tool dispatch (langchain + langgraph ToolNode):
`Send("tools", [tool_call])` no longer carries a serialized snapshot of
the full messages list; ToolNode hydrates state from channels via
CONFIG_KEY_READ at execution time. Eliminates O(N²) __pregel_tasks growth.
Also drops `typ=dict` from the FilesystemState DeltaChannel — the new
DeltaChannel infers type from the Annotated outer type.
Combo branches:
- langchain-ai/langchain#sr/deepagents-perf-combo
- langchain-ai/langgraph#sr/deepagents-perf-combo
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merging this PR will not alter performance
Comparing Footnotes
|
Standalone deterministic benchmark (mock model, no API calls) that runs
create_deep_agent across multiple (N, durability) configs and writes a
bench_results.json with wall-clock, tracemalloc peak, and per-store
checkpoint storage for each config.
Usage:
uv run --project libs/deepagents python bench_perf_combo.py
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeltaChannel + add_messages fast-path + no-inline Sends
Upstream langgraph moved channels/delta.py → channels/_delta.py and removed the public re-export to signal the API is still experimental. Update the FilesystemState import and refresh pinned commits on both combo branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ate API The ToolNode hydrate-from-channels change added a required `tools` arg to `ToolRuntime.__init__`; update all test fixtures and the four production middlewares that build a synthetic ToolRuntime (summarization / filesystem / memory / skills) to pass `tools=[]`. Memory + skills default-backend tests now use `agent.get_state(config).values` instead of reading `checkpoint['channel_values'][...]` directly — DeltaChannel stores a sentinel in the raw checkpoint, and get_state resolves it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Run make lock across libs/* and examples/* so every workspace pulls in the renamed langgraph.channels._delta module (otherwise cli / runloop / example lockfiles import from the old channels.delta path and CI fails with ModuleNotFoundError). - Apply ruff format to files touched by the automated tools=[] injection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeltaChannel(add_messages) replay re-applies each step delta through add_messages, but across-step ID dedup isn't preserved — so an evicted HumanMessage and its replacement both survive reconstruction. The test asserts the replacement alone should win, which is correct behaviour against the baseline reducer. Mark xfail strict=True so we notice once the upstream fix lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Also relax the multi_turn_eviction xfail to strict=False — the bug only reproduces on macOS; Linux CI passes the test under DeltaChannel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous params (30KB×2 writes + 80KB tool result every turn) accumulated ~10M tokens of state at N=200 — an unrealistic stress test. Resized to ~500k tokens at N=200: 1 KB file write every turn, small (2 KB) tool results most turns, with a larger 82 KB result every 10th turn that still triggers FilesystemMiddleware eviction into the `files` channel so the eviction path stays exercised. Also enables N=200 on async mode now that baseline peak fits (was OOMing at the old payload size). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ould_summarize _truncate_args already calls token_counter(messages, tools=...) to decide whether to truncate. The subsequent _should_summarize check was re-running the same count on the same messages and tools — ~10s of duplicate work per 100-turn async run. Have _truncate_args return (messages, modified, total_tokens) and reuse that count downstream. If truncation modifies messages, recount. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundles multiple perf improvements against the latest PyPI releases of langchain / langgraph. Deepagents is pinned to combined
sr/deepagents-perf-combobranches on both upstream repos.Each optimization is a standalone change — the catalog further down lists every contributing commit and which upstream branch it lives on.
Companion branches
langchain-ai/langchain#sr/deepagents-perf-combolangchain-ai/langgraph#sr/deepagents-perf-comboUpdated benchmark: heavier realistic workload (LG 1.2.0a1 + LC PR #37101)
Workload:
create_deep_agent+InMemorySaver, parallel tool dispatch (all tools in one model call). Per turn: 2 × 30 KB file writes + 1 × 4 KB tool result (stays in messages) + 1 × 20 KB AI response; every 10th turn an 80 KB tool result is evicted tofilesbyFilesystemMiddleware. ~6,144 tokens/turn → ~1.23M tokens at N=200.Envs tested:
N=200 head-to-head
The 28 GB baseline breaks down as:
__pregel_tasks9.8 GB (54%) +messages6.7 GB (37%) +files1.7 GB (9%) in blobs, plus ~10 GB of the same in the writes table. Two distinct O(N²) problems:__pregel_tasksstop-inliningmessagesfilesstop-inliningfix addresses the dominant 54% of baseline storage independently of delta channel.messages+filesfrom O(N²) to O(N) — the remaining 46%._create_subset_model_v2was rebuilding 9 unique Pydantic models from scratch ~10,800 times per 50 turns.Profiling: what's in the remaining 5.4s (delta + schema cached, N=200)
ormsgpack.unpackb)builtins.repr(LangSmith tracing)FilesystemMiddlewareNo single hotspot dominates — this is the LangGraph execution floor for the current architecture.
Snapshot frequency (delta + schema cached, N=200, ~1.23M tokens)
snapshot_frequencyis in pregel steps; ~8 steps/turn with parallel dispatch.snap=Nonewins on storage (O(N) — only sentinels in blobs, deltas in writes table). Snapshots are still O(N²) total because each snapshot blob captures the full accumulated state. Read time differences are ~2 ms onInMemorySaver— real gap requires Postgres to quantify.Original benchmark setup (lighter workload)
create_deep_agent+InMemorySaver, deterministic mock model. Per turn: 1 × 1 KB file write + 1 tool call. Tool results are 2 KB most turns; every 10th turn returns 82 KB which exceeds the 20k-token threshold and is evicted into thefileschannel byFilesystemMiddleware. Per-turn state growth ≈ 11 KB ≈ ~2.75k tokens (4 chars/token). At N=200 that accumulates to ~550k tokens — a realistic long-running agent thread.uv run --project libs/deepagents python bench_perf_combo.py(script + results in this PR).time.perf_counter()around the N-turn invoke loop (construction excluded).tracemalloc.get_traced_memory()[1]— Python allocator peak across the full loop.InMemorySaver.{blobs, writes, storage}at end of run (what a durable saver would persist).Headline: async — baseline → full combo across N
Wall clock
Peak memory
Checkpoint storage
durability="exit"Wall clock
(Peak memory and storage reductions are even larger in
exitmode because there's only one checkpoint per invoke — seebench_results.json.)Ablation: who contributes what (async wall-clock)
Five configs, each adds one change cumulatively. Every perf commit is isolated to exactly one step — no bundling.
sr/perf-delta-channel-onlydelta-channel-writes-basedsr/delta-channel-messagesadd_messagesfast-pathsr/deepagents-perf-delta-plus-addmsgssr/deepagents-perf-combosr/deepagents-perf-comboWall clock (s)
Surprising finding: DeltaChannel alone (B1) is slower than baseline at high N in
asyncmode — 124s vs 79s at N=200. DeltaChannel saves per-step serialization cost but pays an O(N²) load-time cost (each invoke'schannels_from_checkpointwalks the delta write history and replays throughadd_messages). Without the other optimizations to offset that,asyncbecomes net-negative for wall clock. Memory and storage still drop by ~40% in B1 — the slowdown is pure CPU.Which step delivers what (async elapsed):
A→B1(DeltaChannel alone): noisy / negative at N≥100 on wall clock. Saves ~40% memory and storage.B1→B2(tool_call_schema caching, 5 commits): dominant wall-clock win — N=200 goes 124s → 73s (−41%). Zero memory/storage effect.B2→C(add_messages fast-path): matters at high N — N=200 goes 73s → 35s (−52%). Zero memory/storage effect.C→D(Sends/hydrate + today's 3 fixes): big jump — N=200 goes 35s → 20s (−42%). Also kills the remaining 60% of memory/storage.Peak memory (MB)
Checkpoint storage (MB)
Memory/storage story: DeltaChannel (B1) takes ~40% off; Sends/hydrate (C→D) handles the remaining ~60%. tool_call_schema caching and add_messages are pure CPU — zero effect on these axes.
Individual optimizations (catalog with links)
Every perf change is self-contained on its own upstream branch.
langgraph
delta-channel-writes-basedafec98f3(rename tochannels/_delta)add_messagesfast-path (skip left-side conversion, append-only short-circuit)optimize/add-messages-fast-path2a974d1b+ 3 follow-upsToolNodehydrate state from channels viaCONFIG_KEY_READsr/tool-call-no-state-inline18cbe46blangchain
AgentState.messagesannotated with DeltaChannelsr/perf-delta-channel-only5f2da29aBaseTool.tool_call_schema+.argsascached_propertysr/delta-channel-messages8669f027tool_call_schema/argson field mutationf54de971_create_subset_model→lru_cache63dd915atool_call_schemaaccess in_format_tool_to_openai_function29979178runnables/base.py21237799Send("tools", [call])no longer inlines full statesr/tool-call-context-fix22753f68+state_keys=drop fixup_format_tool_to_openai_function(tool)dict on tool instancesr/summarization-tool-schema-cache-spike06e351a4len(json.dumps(tool_dict))on tool instance (avoid re-dumping)43faee1fdeepagents (this PR)
1cdebbdctyp=dict(new DeltaChannel infers fromAnnotated); track_deltarename9a7e71aftools=[]toToolRuntimein production middlewares + testse5817dba_truncate_argsand_should_summarize268b8558Files touched in this PR
libs/deepagents/pyproject.toml,uv.lock(+ sibling workspace locks): pin langchain/langgraph packages tosr/deepagents-perf-combo.libs/deepagents/deepagents/middleware/filesystem.py: droptyp=dictkwarg; trackchannels/_deltarename.libs/deepagents/deepagents/middleware/summarization.py: return total_tokens from_truncate_args, reuse for_should_summarize.tools=[]toToolRuntime(new required arg introduced by the ToolNode hydrate change).bench_perf_combo.py: reproducer for the tables above.Test plan
libs/deepagentsunit tests (1181 passed / 0 failed, 1 xfail tracking upstream DeltaChannel +add_messagesdedup edge case)libs/langchain_v1agents unit tests (2 failures tracked — upstream_DeltaSentinelmsgpack serde gap in a test-only saver)bench_perf_combo.pyreproduces the tables above locally🤖 Generated with Claude Code