perf(reader): trim allocation hotspots beyond serde_json::Value#418
Conversation
Six independent reader-module hot paths flagged in #345: - `stable_stringify` now writes JSON directly via a custom `Serializer` with sorted-key behavior, avoiding the intermediate `serde_json::Value` tree for every hash input. - `short_sha256` hex-encodes only the 8 bytes it keeps instead of encoding the full 64-char digest and truncating. - `measure_content_bytes` uses a counting `io::Write` shim instead of materializing the JSON string just to read `.len()`. For tool results this avoids allocating payload-sized strings. - `relationship_key` returns a borrowed tuple of `&str` for the `has_relationship` linear scan (zero allocs per comparison). The cross-line dedup `HashSet` is now keyed by an owned `RelationshipKey` tuple. The duplicate kind/rt match has been collapsed onto the existing `wire_str()` methods. - Tool-result counter increments at four sites in claude.rs / codex.rs use the `entry` API (one lookup, no clone) instead of get-then-insert. - `ProjectResolver::resolve` now holds the cache lock across `resolve_uncached` so concurrent callers with the same `cwd` only walk the filesystem once. Refs #345 https://claude.ai/code/session_01HWW8moqvoV2oEbAZ1RBC1C
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReduces per-record allocations and races in the reader: stable JSON serialization with sorted keys, shorter SHA truncation, streamed byte counting, typed relationship dedup keys, HashMap entry-based counters, and holding the resolver cache lock during filesystem resolution. ChangesReader allocation hotspot optimizations
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Resolver as ProjectResolver
participant FS as Filesystem
Caller->>Resolver: resolve(cwd)
Resolver->>Resolver: hold cache Mutex
Resolver->>FS: resolve_uncached(cwd) (walk & parse)
Resolver->>Resolver: insert resolved project into cache
Resolver->>Caller: return ResolvedProject
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 38a1321376
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| fn finalize_object(out: &mut String, entries: &mut [(String, String)]) { | ||
| entries.sort_by(|a, b| a.0.cmp(&b.0)); |
There was a problem hiding this comment.
Sort map keys by raw string value
stable_stringify now sorts StableMap.entries using already-serialized key buffers, so ordering is based on escaped JSON text rather than the original key strings. This diverges from both the previous implementation (keys.sort() on raw Strings) and TS stableStringify, and it changes hashes for valid objects whose keys contain escapes/control chars (for example "\n" vs "\u0001"). That breaks the byte-identical cross-port hash contract called out in this module.
Useful? React with 👍 / 👎.
| fn serialize_key<T: Serialize + ?Sized>(&mut self, key: &T) -> Result<(), StableError> { | ||
| let mut buf = String::new(); | ||
| key.serialize(StableSerializer { out: &mut buf })?; | ||
| self.current_key = Some(buf); |
There was a problem hiding this comment.
Enforce JSON string keys in map serialization
serialize_key accepts any serializable key and stores its direct JSON rendering as the object key token. For non-string map keys this produces unquoted keys (e.g. 1, true) and finalize_object emits invalid JSON like {1:...} instead of a valid canonical form. This is a behavioral regression from the previous serde_json::to_value(...).unwrap_or(Value::Null) path and can corrupt hash inputs whenever a caller passes a map keyed by non-strings.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
CHANGELOG.md (1)
9-16: ⚡ Quick winTrim this bullet to the shipped impact.
This reads like implementation notes. The root changelog entry should summarize the practical effect for SDK users, not enumerate internal helpers/types or include the issue reference.
✍️ Suggested wording
-- `relayburn-sdk`: reader allocation hotspots trimmed — `stable_stringify` now - writes JSON directly via a custom `Serializer` (no intermediate `Value` - tree), `measure_content_bytes` uses a counting writer instead of - materializing the JSON string, `short_sha256` hex-encodes only the 8 bytes - it keeps, relationship dedup uses an owned-tuple `HashSet` key, tool-result - counter increments use the `entry` API, and `ProjectResolver::resolve` - holds the cache lock across `resolve_uncached` so concurrent callers don't - duplicate the filesystem walk. (`#345`) +- `relayburn-sdk`: reader ingest and hashing now avoid several temporary + allocations in JSON hashing, tool-result sizing, relationship dedup, and + project resolution, reducing overhead on large session imports.As per coding guidelines, "Changelog entries should be concise and impact-first... Drop issue/PR links, internal review notes, implementation backstory."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CHANGELOG.md` around lines 9 - 16, The changelog bullet for relayburn-sdk is too implementation-focused; rewrite the `relayburn-sdk` entry to a concise, impact-first summary for users (e.g., "improves performance and reduces memory allocations when reading and resolving projects"), remove internal details like `stable_stringify`, `measure_content_bytes`, `short_sha256`, `ProjectResolver::resolve`, owned-tuple/HashSet, `entry` API mentions and drop the PR/issue reference; keep one short sentence describing the shipped user-visible benefit and any measurable effect (faster reads, lower memory) only.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/relayburn-sdk/src/reader/hash.rs`:
- Around line 337-340: serialize_key currently writes the raw serialized key
into current_key via StableSerializer, producing invalid JSON for non-string map
keys; change serialize_key (the method named serialize_key in this file) to
produce a JSON-quoted string like struct fields do by running the serialized key
through serde_json::to_string (or otherwise ensure it is a quoted JSON string)
before assigning to self.current_key, or alternatively detect non-string keys
and return an error to match serde_json behavior; keep use of StableSerializer
for the inner serialization if needed but wrap its output with
serde_json::to_string (or reject non-string) so current_key holds a proper JSON
string.
---
Nitpick comments:
In `@CHANGELOG.md`:
- Around line 9-16: The changelog bullet for relayburn-sdk is too
implementation-focused; rewrite the `relayburn-sdk` entry to a concise,
impact-first summary for users (e.g., "improves performance and reduces memory
allocations when reading and resolving projects"), remove internal details like
`stable_stringify`, `measure_content_bytes`, `short_sha256`,
`ProjectResolver::resolve`, owned-tuple/HashSet, `entry` API mentions and drop
the PR/issue reference; keep one short sentence describing the shipped
user-visible benefit and any measurable effect (faster reads, lower memory)
only.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 16cedaae-6e1e-43bb-9bef-8f8ed899c0d5
📒 Files selected for processing (6)
CHANGELOG.mdcrates/relayburn-sdk/src/reader/claude.rscrates/relayburn-sdk/src/reader/codex.rscrates/relayburn-sdk/src/reader/git.rscrates/relayburn-sdk/src/reader/hash.rscrates/relayburn-sdk/src/reader/user_turn.rs
Two regressions in the custom Serializer flagged in PR review:
1. Sort map keys by raw string value, not their JSON-encoded form.
Encoding-time keys diverge when keys contain characters JSON escapes
(control chars, quotes); the previous Value-roundtrip path sorted on
raw `String`s, and the cross-port hash contract requires the same
ordering.
2. Coerce non-string map keys to their string form (or error), mirroring
serde_json's MapKeySerializer. Previously a HashMap<u32, _> emitted
invalid `{1:...}`; now it emits `{"1":...}` as serde_json's
Value-roundtrip path would.
Also trim the CHANGELOG entry to impact-only per project guidance.
Refs #345
https://claude.ai/code/session_01HWW8moqvoV2oEbAZ1RBC1C
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/relayburn-sdk/src/reader/hash.rs (1)
215-230: ⚡ Quick winPreallocate buffered entries from the serializer length hints.
These paths buffer every field before sorting, but they always start with
Vec::new(). Using the providedlen/Option<usize>avoids growth churn on larger objects, which is exactly the hot path this PR is optimizing.Suggested change
- fn serialize_map(self, _: Option<usize>) -> Result<StableMap<'a>, StableError> { + fn serialize_map(self, len: Option<usize>) -> Result<StableMap<'a>, StableError> { Ok(StableMap { out: self.out, - entries: Vec::new(), + entries: Vec::with_capacity(len.unwrap_or(0)), current_key: None, }) } fn serialize_struct( self, _: &'static str, - _: usize, + len: usize, ) -> Result<StableMap<'a>, StableError> { Ok(StableMap { out: self.out, - entries: Vec::new(), + entries: Vec::with_capacity(len), current_key: None, }) } fn serialize_struct_variant( self, _: &'static str, _: u32, variant: &'static str, - _: usize, + len: usize, ) -> Result<StableStructVariant<'a>, StableError> { self.out.push('{'); write_primitive(self.out, variant)?; self.out.push(':'); Ok(StableStructVariant { out: self.out, - entries: Vec::new(), + entries: Vec::with_capacity(len), }) }Also applies to: 233-246
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-sdk/src/reader/hash.rs` around lines 215 - 230, The serializer currently creates StableMap with entries: Vec::new(), causing realloc churn; change both serialize_map (use the Option<usize> parameter) and serialize_struct (use the usize parameter) to preallocate entries with Vec::with_capacity(len.unwrap_or(0)) / Vec::with_capacity(len) respectively; also update the other identical path (the alternate serialize_* implementation around lines 233-246) to use the same preallocation pattern so buffered fields reserve the hinted capacity before pushing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/relayburn-sdk/src/reader/hash.rs`:
- Around line 22-25: stable_stringify currently ignores the Result from
value.serialize(StableSerializer { out: &mut out }) which can leave truncated
output on failure; change stable_stringify to return a Result<String, E> (e.g.,
Result<String, serde_json::Error> or your crate's error type), propagate the
serialization result instead of discarding it (e.g.,
value.serialize(StableSerializer { out: &mut out })?; Ok(out)), and update
callers to handle the error; keep references to the StableSerializer type and
the stable_stringify function when making the change.
---
Nitpick comments:
In `@crates/relayburn-sdk/src/reader/hash.rs`:
- Around line 215-230: The serializer currently creates StableMap with entries:
Vec::new(), causing realloc churn; change both serialize_map (use the
Option<usize> parameter) and serialize_struct (use the usize parameter) to
preallocate entries with Vec::with_capacity(len.unwrap_or(0)) /
Vec::with_capacity(len) respectively; also update the other identical path (the
alternate serialize_* implementation around lines 233-246) to use the same
preallocation pattern so buffered fields reserve the hinted capacity before
pushing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: ef61cb8b-aed8-4534-b4c0-2828ada683fb
📒 Files selected for processing (2)
CHANGELOG.mdcrates/relayburn-sdk/src/reader/hash.rs
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
- Use the `Option<usize>` / `usize` length hints in `serialize_map`, `serialize_struct`, and `serialize_struct_variant` to seed `Vec::with_capacity`, avoiding regrow churn for buffered fields on larger objects. - Propagate the inner serializer result via `.expect`. Previously a rejected map key or non-finite float would silently truncate the hash input instead of failing; now it panics like the missing-key path. https://claude.ai/code/session_01HWW8moqvoV2oEbAZ1RBC1C
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/relayburn-sdk/src/reader/hash.rs (1)
335-335: 💤 Low valueSilent fallback is inconsistent with error-surfacing goal.
serde_json::to_stringon aStringcannot realistically fail, but if it did, theunwrap_or_elsesilently produces""— corrupting the hash input. The commit message states the intent to "propagate inner serializer errors via expect so failures … surface instead of silently truncating." Align this with that goal.Suggested change
- let encoded = serde_json::to_string(k).unwrap_or_else(|_| String::from("\"\"")); + let encoded = serde_json::to_string(k).expect("string is always JSON-encodable");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-sdk/src/reader/hash.rs` at line 335, The current assignment let encoded = serde_json::to_string(k).unwrap_or_else(|_| String::from("\"\"")); silently falls back to an empty string on serialization errors; change it to propagate the error instead by replacing the unwrap_or_else with expect (or ?/map_err where appropriate) so failures surface — i.e. call serde_json::to_string(k).expect("failed to serialize key for hashing") (or return the error) so the variable encoded (from serde_json::to_string and the input k) is never silently corrupted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/relayburn-sdk/src/reader/hash.rs`:
- Line 335: The current assignment let encoded =
serde_json::to_string(k).unwrap_or_else(|_| String::from("\"\"")); silently
falls back to an empty string on serialization errors; change it to propagate
the error instead by replacing the unwrap_or_else with expect (or ?/map_err
where appropriate) so failures surface — i.e. call
serde_json::to_string(k).expect("failed to serialize key for hashing") (or
return the error) so the variable encoded (from serde_json::to_string and the
input k) is never silently corrupted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: f4fe9ef1-a878-4ea8-9e31-d2568334c4cb
📒 Files selected for processing (1)
crates/relayburn-sdk/src/reader/hash.rs
`serde_json::to_string` on a `String` cannot fail in practice, so the `unwrap_or_else(|_| "\"\"")` fallback was both dead and a silent corruption hatch. Switch to `.expect` to match the error-surfacing posture in the rest of the file. https://claude.ai/code/session_01HWW8moqvoV2oEbAZ1RBC1C
|
@copilot resolve the merge conflicts in this pull request |
…-issue-5DYEC # Conflicts: # CHANGELOG.md # crates/relayburn-sdk/src/reader/claude.rs Co-authored-by: willwashburn <957608+willwashburn@users.noreply.github.com>
Agent-Logs-Url: https://github.com/AgentWorkforce/burn/sessions/d35c510c-543f-4b3a-9304-c4b56c942c3c Co-authored-by: willwashburn <957608+willwashburn@users.noreply.github.com>
Summary
Closes #345. Fixes the six independent reader-module allocation hotspots flagged in the May 2026 Rust review:
stable_stringify(reader/hash.rs) — replaced theserde_json::to_valueround-trip with a customSerializerthat writes JSON directly with sorted-key behavior. No intermediateValuetree.short_sha256(reader/hash.rs) —hex::encode(&hasher.finalize()[..8])instead of encoding the full 64-char digest and truncating to 16.measure_content_bytes(reader/user_turn.rs) — uses a countingio::Writeshim that sumsbuf.len()instead of allocating the JSON string just to read.len().reader/claude.rs) —relationship_keynow returns a borrowed&strtuple for thehas_relationshiplinear scan (zero allocs per comparison). Cross-lineHashSetdedup is keyed by an ownedRelationshipKeytuple. The duplicate kind/rtmatchblocks were folded onto the existingwire_str()methods.reader/claude.rs,reader/codex.rs) — four call sites switched fromget-then-insert(key.clone(), ...)to theentryAPI (one lookup, no clone).ProjectResolver::resolve(reader/git.rs) — now holds the cache lock acrossresolve_uncached, so concurrent callers with the samecwdonly walk the filesystem once.Test plan
cargo build --workspacecargo test --workspace(654 + 2 + 13 SDK/integration/node tests pass)cargo clippy --workspace --all-targets(no new warnings from changed files)stable_stringify/args_hashtests still pass — same byte-for-byte outputreader::hash::testshttps://claude.ai/code/session_01HWW8moqvoV2oEbAZ1RBC1C
Generated by Claude Code