Skip to content

Commit 48d6628

Browse files
committed
fix(gguf): restore 32-byte scaling on data_offset (chatgpt-codex catch)
The 1.95 clippy sweep in PR #367 (sprint-log-8 agent W4) flagged `(pos + 31) / 32 * 32` as `manual_div_ceil` and rewrote it to `(pos + 31).div_ceil(32)` — losing the `* 32` byte-scaling and turning a byte-rounded offset into a block COUNT (off by ×32). chatgpt-codex P2 catch on PR #367 review thread. Symptom: every GGUF header parsed by `gguf_thinking_styles.rs` returned data_offset ≈ N/32 where it should have returned N rounded up to the next 32-byte boundary. `SeekFrom::Start(h.data_offset + t.offset)` then landed in the header/metadata area and decoded garbage. Fix: `pos.div_ceil(32) * 32` (matches gguf_euler_fold.rs:373 + gguf_families.rs:337 + gamma_phi_gguf.rs:356 which all kept the `* 32` scaling correctly). Audit summary across the workspace's GGUF parsers post-PR #367: - gguf_euler_fold.rs:373 `pos.div_ceil(32) * 32` ✓ - gguf_families.rs:337 `pos.div_ceil(align) * align` ✓ - gguf_thinking_styles.rs:360 was buggy (this PR) → ✓ fix - gamma_phi_gguf.rs:356 `pos.div_ceil(32) * 32` ✓ Workspace clippy on 1.95.0 still clean; fmt still clean. No CI behavior change for non-GGUF code paths. Process note: the 12-agent fleet's per-file isolation was good for parallelism but missed the cross-file invariant that "all GGUF parsers in the workspace must produce the same data_offset". A future linter sweep that touches a pattern present in multiple sibling files should either cross-check the result or be reviewed against a single reviewer who can spot the divergence. The codex review was the safety net here.
1 parent f6c63d9 commit 48d6628

1 file changed

Lines changed: 9 additions & 1 deletion

File tree

crates/bgz-tensor/examples/gguf_thinking_styles.rs

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -357,7 +357,15 @@ fn parse_gguf_header<R: Read + Seek>(r: &mut R) -> Result<GgufHeader, String> {
357357
let pos = r.stream_position().map_err(|e| e.to_string())?;
358358
Ok(GgufHeader {
359359
tensors,
360-
data_offset: (pos + 31).div_ceil(32),
360+
// data_offset is the BYTE OFFSET of the tensor payload, rounded
361+
// up to a 32-byte boundary. `pos.div_ceil(32) * 32` matches the
362+
// GGUF spec (align-32) and the parallel parsers in
363+
// gguf_euler_fold.rs / gguf_families.rs / gamma_phi_gguf.rs.
364+
// (chatgpt-codex catch on PR #367: the earlier `(pos + 31).div_ceil(32)`
365+
// dropped the `* 32` byte-scaling — that returned the 32-byte
366+
// block COUNT, off by a factor of 32, causing tensor reads to
367+
// seek into the header.)
368+
data_offset: pos.div_ceil(32) * 32,
361369
})
362370
}
363371

0 commit comments

Comments
 (0)