Skip to content

Commit aa67ed2

Browse files
authored
fix(memory): prevent infinite loop in chunk_text when overlap rewinds start (#3027)
When rfind("\n\n") found a match near the beginning of the sliding window, `end` advanced by fewer bytes than CHUNK_OVERLAP_CHARS. The subsequent `end.saturating_sub(CHUNK_OVERLAP_CHARS)` produced a value less than the current `start`, and `ceil_char_boundary` returned a position behind it. The old safeguard (`if start >= end`) did not fire because new_start < start < end, so `start` regressed and the loop ran forever. Fix: guarantee forward progress by taking `end` when `new_start <= start`. This was the root cause of 100% CPU / GB RAM consumption at startup: the embed backfill spawned background tasks for unembedded messages (including an 85 KB tool_result), all of which entered the infinite loop simultaneously, saturating all tokio worker threads and blocking log flushing.
1 parent aa79adf commit aa67ed2

1 file changed

Lines changed: 5 additions & 5 deletions

File tree

crates/zeph-memory/src/semantic/recall.rs

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,12 +49,12 @@ fn chunk_text(text: &str) -> Vec<&str> {
4949
if end >= text.len() {
5050
break;
5151
}
52-
// Next chunk starts with overlap.
52+
// Next chunk starts with overlap, but must always advance past the
53+
// current position to prevent infinite loops when rfind finds a match
54+
// very early in the slice (end barely advances, overlap rewinds start).
5355
let next = end.saturating_sub(CHUNK_OVERLAP_CHARS);
54-
start = text.ceil_char_boundary(next);
55-
if start >= end {
56-
start = end; // safeguard against infinite loop
57-
}
56+
let new_start = text.ceil_char_boundary(next);
57+
start = if new_start > start { new_start } else { end };
5858
}
5959

6060
chunks

0 commit comments

Comments
 (0)