Skip to content

Commit 607fbfd

Browse files
critesjoshclaude
andcommitted
fix(error-lookup): simplify isUsefulSemanticChunk — shape-only check
Codex review feedback. Two related issues: 1. Sourceish set used `match.source` and `match.title` to detect a rendered file-path heading line. But `/api/search` rewrites `source` to a public URL (`_aztec_source_url` produces e.g. `https://github.com/.../foo.nr`), so the bare-path heading `aztec-nr/.../foo.nr` never matched the URL — the heading was never stripped, the chunk fell through to the path-shape check which also missed because `# foo/bar.nr` contains whitespace from the markdown marker. Result: a class of empty chunks slipping through both gates. 2. The mitigation — strip a leading `#+ ` from each line before the path-shape predicate — makes the metadata coupling unnecessary. Drop the sourceish comparison entirely. New helper `lineIsPathShaped` strips heading markers, then checks "contains `/` and no whitespace". Real signature lines always have whitespace (`pub fn ...`, `struct ...`, `pub use a::b;`), so they never trip the predicate. Equivalent fix on the docsgpt side: AztecProtocol/honk-ai#66 gets the same shape-only simplification. New regression test: chunk with `#`-prefixed heading body and a URL-rewritten source field — the exact failure mode codex described — is correctly identified as "no useful results". 283/283 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9d41359 commit 607fbfd

2 files changed

Lines changed: 48 additions & 23 deletions

File tree

src/tools/error-lookup.ts

Lines changed: 24 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,20 @@ import { checkVersionGate, formatMismatchMessage } from "../utils/version-check.
3131
*/
3232
const STRONG_MATCH_THRESHOLD = 70;
3333

34+
/**
35+
* A line is "path-shaped" if it looks like a filesystem path rather
36+
* than a code/docs line. Strips a leading markdown heading marker so
37+
* ``# aztec-nr/.../foo.nr`` is recognized as path-shaped just like
38+
* the bare ``aztec-nr/.../foo.nr``. Path-shaped means: contains ``/``
39+
* and has no whitespace. Real signature lines (``pub fn foo(...)``,
40+
* ``struct Bar { ... }``, ``pub use a::b;``) always have whitespace,
41+
* so they never trip this predicate.
42+
*/
43+
function lineIsPathShaped(line: string): boolean {
44+
const cleaned = line.replace(/^#+\s*/, "").trim();
45+
return cleaned.length > 0 && cleaned.includes("/") && !/\s/.test(cleaned);
46+
}
47+
3448
/**
3549
* Drop semantic chunks whose body is empty or just the file path.
3650
*
@@ -43,6 +57,14 @@ const STRONG_MATCH_THRESHOLD = 70;
4357
*
4458
* Mirrors the Python helper in ``application/api/answer/routes/search.py``
4559
* (``_is_empty_apiref_chunk``) — same content-shape predicate.
60+
*
61+
* The predicate is deliberately metadata-free. An earlier draft used
62+
* ``match.source`` / ``match.title`` as a "heading-equivalent" set
63+
* to strip a rendered file heading before checking the rest, but
64+
* docsgpt's ``/api/search`` rewrites ``source`` to a public URL via
65+
* ``_aztec_source_url`` — so the heading string never matches the
66+
* post-rewrite source field. The shape-only check below works
67+
* regardless of metadata transformations.
4668
*/
4769
function isUsefulSemanticChunk(match: SemanticSearchResult): boolean {
4870
const text = (match.text ?? "").trim();
@@ -54,29 +76,8 @@ function isUsefulSemanticChunk(match: SemanticSearchResult): boolean {
5476
.filter((l) => l.length > 0);
5577
if (lines.length === 0) return false;
5678

57-
// Sourceish: any of the strings the rendered file-heading might have
58-
// matched (source path / filename-equivalent title). Strip a leading
59-
// "#" / "##" markdown heading marker before comparing. Deliberately
60-
// does NOT include match.text — that would create a fixed point
61-
// (text === sourceish[0]) that filters single-line legitimate
62-
// chunks that happen to be one line long.
63-
const sourceish = new Set(
64-
[match.source, match.title]
65-
.map((s) => (s ?? "").trim())
66-
.filter(Boolean)
67-
);
68-
const firstStripped = lines[0].replace(/^#+\s*/, "").trim();
69-
let body = lines;
70-
if (sourceish.has(firstStripped) || sourceish.has(lines[0])) {
71-
body = body.slice(1);
72-
}
73-
74-
if (body.length === 0) return false;
75-
76-
// Body still looks like a file path: every remaining line is path-
77-
// shaped (contains "/" and no whitespace). A real signature line
78-
// ("pub fn ..., struct Foo, ...") always has whitespace.
79-
if (body.every((l) => l.includes("/") && !/\s/.test(l))) return false;
79+
// All non-empty lines are path-shaped → no real API content.
80+
if (lines.every(lineIsPathShaped)) return false;
8081

8182
return true;
8283
}

tests/tools/error-lookup.test.ts

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,30 @@ describe("lookupAztecError — content-thin chunk filter", () => {
304304
return { text, title: "foo.nr", source };
305305
}
306306

307+
it("drops chunks with `#`-prefixed path heading even when source field is a public URL", async () => {
308+
/**
309+
* Regression for codex review: `/api/search` rewrites the chunk's
310+
* `source` field to a public URL via `_aztec_source_url`. A chunk
311+
* whose body is `# aztec-nr/.../foo.nr` (path heading only) won't
312+
* match the URL-rewritten source field by string equality. The
313+
* earlier filter would fail to strip the heading, then fall through
314+
* to the path-shape check — which also failed because `# ...` has
315+
* whitespace from the markdown marker. The new shape-only filter
316+
* catches this directly.
317+
*/
318+
const client = makeClient({
319+
search: vi.fn().mockResolvedValue([
320+
{
321+
text: "# aztec-nr/aztec/src/context/foo.nr\n",
322+
title: "foo.nr",
323+
source: "https://github.com/AztecProtocol/aztec-packages/blob/v4.2.0/noir-projects/aztec-nr/aztec/src/context/foo.nr",
324+
},
325+
]),
326+
});
327+
const result = await lookupAztecError({ query: "obscure" }, client);
328+
expect(result.semanticHealth).toBe("no_results");
329+
});
330+
307331
it("treats raw output of all path-only chunks as 'no_results'", async () => {
308332
const client = makeClient({
309333
search: vi.fn().mockResolvedValue([

0 commit comments

Comments
 (0)