Skip to content

Commit 00a7de0

Browse files
am17anclaude
andcommitted
hybrid-memory: cleanup after slot-rollback ship
- Strip stale "phase 1/2" and HYBRID_PARTIAL_SEQRM_PLAN markers from comments now that the work has landed. - Untrack the planning doc; the rationale is captured in the commits and code comments. (File kept on disk locally, just not in repo.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6069ace commit 00a7de0

6 files changed

Lines changed: 24 additions & 362 deletions

File tree

docs/development/HYBRID_PARTIAL_SEQRM_PLAN.md

Lines changed: 0 additions & 334 deletions
This file was deleted.

src/llama-memory-recurrent.cpp

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -171,9 +171,9 @@ bool llama_memory_recurrent::seq_rm(llama_seq_id seq_id, llama_pos p0, llama_pos
171171
// partial intersection that includes the final pos: try slot
172172
// select if a per-token snapshot is available within rollback
173173
// distance. Recurrent kernels write state-after-(T-s) tokens to
174-
// slot `s` during a multi-token decode (HYBRID_PARTIAL_SEQRM_PLAN);
175-
// here we just look up which slot corresponds to the requested
176-
// truncation and arm `active_slots[seq]` for the next graph build.
174+
// slot `s` during a multi-token decode; here we just look up
175+
// which slot corresponds to the requested truncation and arm
176+
// `active_slots[seq]` for the next graph build.
177177
if (0 < p0 && p0 <= cell.pos && p1 > cell.pos) {
178178
const llama_pos rollback = cell.pos - (p0 - 1);
179179
if (rollback >= 1 && rollback <= (llama_pos) n_spec) {
@@ -1196,12 +1196,12 @@ int32_t llama_memory_recurrent_context::s_copy(int i) const {
11961196
return src0;
11971197
}
11981198

1199-
// Slot widening (HYBRID_PARTIAL_SEQRM_PLAN phase 1). active_slots[seq]
1200-
// holds the rollback slot index set by seq_rm's partial path; when
1201-
// non-zero, the graph reads from row (slot * mem_size + cell_idx) so the
1202-
// next decode resumes from the per-token snapshot rather than the
1203-
// committed state. One-shot consume: clear after read so subsequent
1204-
// graph builds see slot 0 unless another seq_rm partial fires.
1199+
// active_slots[seq] holds the rollback slot index set by seq_rm's
1200+
// partial path; when non-zero, the graph reads from row
1201+
// (slot * mem_size + cell_idx) so the next decode resumes from the
1202+
// per-token snapshot rather than the committed state. One-shot
1203+
// consume: cleared after read so subsequent graph builds see slot 0
1204+
// unless another seq_rm partial fires.
12051205
uint32_t slot = 0;
12061206
if (!mem->cells[cell_idx].seq_id.empty()) {
12071207
const llama_seq_id seq = *mem->cells[cell_idx].seq_id.begin();

src/models/qwen35.cpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -242,10 +242,10 @@ ggml_tensor * llm_build_qwen35::build_layer_attn_linear(
242242
GGML_ASSERT(ubatch.n_tokens == n_seq_tokens * n_seqs);
243243

244244
// Emit per-token state snapshots into recurrent slots when the verify
245-
// batch fits in the configured slot capacity (HYBRID_PARTIAL_SEQRM_PLAN).
246-
// Slot 0 holds the final (post-T-tokens) state — matches existing
247-
// semantics. Slots 1..T-1 hold per-token intermediates so a partial
248-
// seq_rm can roll back via slot select instead of restoring or reverify.
245+
// batch fits in the configured slot capacity. Slot 0 holds the final
246+
// (post-T-tokens) state — matches existing semantics. Slots 1..T-1
247+
// hold per-token intermediates so a partial seq_rm can roll back via
248+
// slot select instead of restoring or reverify.
249249
const uint32_t mem_size = mctx_cur->get_size();
250250
const bool emit_states = (cparams.n_spec_max > 0)
251251
&& (n_seq_tokens > 1)

0 commit comments

Comments
 (0)