feat(vault.read): faithful-by-default reads with char-budget pagination (ADR-203, #133)#205
Conversation
Refined through design dialogue: - paginate/no-paginate boundary is a CHARACTER budget (READ_PAGE_CHARS, default 50000), not a line count (line count is an invalid proxy for context cost — 1500 short vs 1500 long lines differ by orders of magnitude) - bookends stay line-based (lineStart/lineEnd/totalLines) so edit.at_line on a large file still works; a page = longest whole-line run within the char budget - fits budget → whole file verbatim, one load (common case, no ceremony) - exceeds → bookended page 1 (single contiguous block, not chunk array) + nextPage hint; sequential paging via new `page` param - returnFullFile:true RETAINED and repurposed as the explicit whole-large-file override (not retired) - hard invariant recorded: a default read must never hand the agent a context-breaking raw dump Alternatives added: no-guard full dump, line-count budget, chunk-array — all rejected with rationale.
…on (ADR-203, #133) Default vault.read no longer fragments-and-flattens. New contract: - fits READ_PAGE_CHARS (50000) → whole file, byte-exact, one load (common case) - exceeds → verbatim page 1, single contiguous block, with line bookends (lineStart/lineEnd/totalLines + nextPage); page=N to continue. Absolute line numbers preserved so edit.at_line still works on large files. - returnFullFile:true → entire file verbatim (explicit large override; param retained & repurposed, not retired) - query/strategy/maxFragments → semantic fragments (unchanged) - structured envelope no longer double-encodes the body (metadata sans body) Budget is char-based on purpose: line count is an invalid proxy for context cost; only bookends are line-based (for at_line). Hard invariant: a default read must never hand the agent a context-breaking raw dump. Also fixes the latent formatFileRead crash (_Formatter error_) on full-file shapes; formatter now renders verbatim + a Pagination section. src/utils/file-reader.ts rewritten; vault.ts threads `page`; formatter hardened; tool description + CHANGELOG (breaking) updated. make check green (0 errors, baseline 5 warnings, 243/243 incl. 8 new ADR-203 round-trip/pagination/fidelity tests).
|
✅ Build succeeded! Artifacts are available in the Actions tab. |
Code Review — ADR-203 faithful-by-default paginated readsReviewed against the amended ADR-203 (decision of record), #133, and the full diff. Forensic verification done on Verdict: Request changes — one blocking issueBLOCKING
Location: const contentLines = content.split('\n');
const previewLines = contentLines.slice(0, 50);
lines.push('```markdown');
lines.push(previewLines.join('\n'));
if (contentLines.length > 50) {
lines.push(`\n... (${contentLines.length - 50} more lines)`);
}Problem: ADR-203 §1 places But The PR description claims "formatter now renders verbatim + a Pagination section." The Pagination section is implemented; the verbatim rendering is not — the Why it matters: This silently negates the core correctness guarantee of the ADR for the common (non-raw) call path. It is the exact failure class the brief defines as blocking ("behaviour-fidelity bugs in a byte-exact feature"). Suggestion: For the whole-file and paginated branches, emit the content whole inside the fenced block — drop the 50-line slice. This is safe by construction: the page is already bounded to ≤ Also add a fidelity test asserting the formatted (non-raw) output contains the byte-exact content for both the whole-file and paginated branches. The current Verified correct (no action needed)
Pathological single-line > budget — Returned whole as its own page, Page edge cases — ADR contract — four branches match exactly: fragments (query/strategy/maxFragments) unchanged; De-dup (ADR §3) — Body is correctly stripped from Image / buffer interactions — Image files are intercepted by the Non-blocking notes
Once the truncation is fixed and a byte-exact formatted-output test is added, this is sound: the data layer is genuinely byte-faithful and the pagination math is correct. The blocking issue is purely the presentation layer silently undercutting the default-path guarantee. AI-assisted review via Claude |
…blocking review) formatFileRead truncated the string branch to a 50-line preview; since raw defaults false, the DEFAULT vault.read was that truncated view — an agent reading any >50-line file could not derive a byte-matching edit.window oldText without raw:true, the exact #133 friction ADR-203 retires. The data layer was already byte-faithful; the presentation default undercut it. - string branch: emit full content verbatim, no 50-line slice, no wrapping code fence (body may contain fences — a wrapper would corrupt round-trip) - add non-text/binary passthrough guard (closes a pre-existing latent formatter-error path; reviewer non-blocking note) - test now asserts the FORMATTED (raw:false) output is byte-exact, not just not.toThrow make check green: 0 errors, baseline 5 warnings, 244/244.
|
✅ Build succeeded! Artifacts are available in the Actions tab. |
Implements ADR-203 (Accepted, amended through design dialogue). Resolves the #133 intent + the large-raw-context guard.
Contract
READ_PAGE_CHARS(50000 ≈ ~12k tok)lineStart/lineEnd/totalLines/nextPage);page=NcontinuesreturnFullFile: truequery/strategy/maxFragmentsedit.at_linestill works on a large file.metadatasans body — ~halves raw read payload).formatFileReadcrash (_Formatter error_) on full-file shapes; formatter now renders verbatim + aPaginationsection.Files
src/utils/file-reader.ts— rewritten to the contract (READ_PAGE_CHARS,buildPage, the four branches).src/semantic/operations/vault.ts— threadspageinto the read options.src/formatters/vault.ts—FileReadResponseextended;formatFileReadhardened (guarded size/modified, pagination block, no crash).src/tools/semantic-tools.ts—returnFullFile/pagedescriptions reflect the new semantics.CHANGELOG.md— breaking-change entry.tests/vault-read-fidelity.test.ts— 8 tests: byte-exact small read, code-fence/tab round-trip, bookended page 1, contiguous page 2,returnFullFileoverride, page-past-EOF, fragments-unchanged, formatter-no-crash.Verification
make check: 0 errors, baseline 5 warnings (no new), 243/243 (21 suites).tscclean.Rollout
Per ADR-203 + the standing process: a prerelease will be cut for a BRAT functional test of a real read→
edit.windowround-trip and a large-file bookended-page-1 →at_linebefore this is considered field-validated. #133 stays open until that confirms, then is closed/reframed.Refs ADR-203, #133.