Skip to content

Commit f700db5

Browse files
andreinknvclaude
andcommitted
feat(mcp+cli): close 33+ alignment gaps; ship 7 tooling-gap items + 5 server-config flags
Tooling-gap backlog (codegraph/docs/codegraph-tooling-gaps.md) closed: #1 freshness severity bucket — `classifyFreshness` with fresh|recent|stale|very_stale #2 allowStale flag — opt-in bypass for the heavy-drift gate, registry-injected schema #3 module format in status — `module-format.ts` parses package.json + tsconfig (JSONC-safe) #4 codegraph_imports tool + import-classifier — file/directory/bare/unresolvable filters #5 dynamic imports — extractor catches `import('…')` + `require('…')`, incl. template_string #6 build-context refs — new `build_context_refs` table for `__dirname` / `import.meta.*` #7 files.is_test flag — column populated by glob; surfaced in status as `(N test)` colbymchenry#11 summarize-also-embeds (discovered while dogfooding) — `cg.summarizeAll()` chains `embedAllSummaries`; new `cg.embedAll()` for embed-only path; CLI `codegraph embed` CLI/MCP alignment (5/32 → 33+/35): - 13 new CLI commands via `runViaMCP` shim: callers, callees, impact, node, similar, biomarkers, imports, help-tools, explore, hotspots, dead-code, config-refs, sql-refs, module-summary, role, coverage-query, pending-summaries, save-summaries, review-context - 7 new MCP tools: codegraph_imports, codegraph_embed, codegraph_summarize, codegraph_sync, codegraph_reindex, codegraph_coverage_ingest, codegraph_init, codegraph_uninit, codegraph_unlock, codegraph_affected MCP server-level operator config (`codegraph serve --mcp`): - --no-write-tools / --allow-stale-default / --disable-tool (sandboxing) - --llm-endpoint / --llm-chat-model / --llm-ask-model / --llm-embedding-model / --llm-api-key (operator LLM config; per-project config wins on conflict) - New CODEGRAPH_LLM_* env vars wired through `mergeLlmEnv` in resolveLlmProviders Architectural cleanups: - `bypassFreshnessGate` and `isWriteTool` declarative flags on ToolModule (replaces growing string-comparison chain in execute()) - `withAllowStale` registry injection only on tools that DO see the gate - DRY of inline copy-paste in 3 hooks → `src/index-hooks/enclosing.ts` - `LlmClient.isEmbeddingReachable` for split-provider correctness - SyncResult `lockContention` flag → handleSync emits distinct retryable message - `clearStructural` deletes from build_context_refs (was orphan-leaking on --force) - cli:dev npm script + tsx CLI fixed (web-tree-sitter `import type` for type-only refs) Migrations: 023-files-is-test.ts — add `files.is_test` 024-build-context-refs.ts — add `build_context_refs` table Reviewer rounds: 11 total, all REQUEST_CHANGES addressed inline. Notable fixes: - JSONC URL strip via state machine (was eating `https://` tails) - classifyFreshness very_stale now requires isStale (in-sync-but-old → recent) - Dynamic imports also match template_string nodes - process.exit deferred until after finally cleanup in runViaMCP - --same-language / --different-language mutual exclusion guard - help-tools CLI bypasses isInitialized (works without a project) - handleUninit sweeps projectCache by getProjectRoot (no dangling alias leaks) - handleAffected errors instead of silently dropping unsupported glob filters - mergeLlmEnv preserves precedence: legacy flat config wins over env-synthesised block Suite: 1268 passing, 1 expected red (colbymchenry#8 — undecided), 13 skipped, 1 todo, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4736a42 commit f700db5

65 files changed

Lines changed: 4288 additions & 74 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
/**
2+
* Tooling-gaps item #6 (doc gap #3): build-context references.
3+
*
4+
* `__dirname`, `__filename`, `import.meta.dirname`, `import.meta.url`,
5+
* `import.meta.filename` are surface-area for any module-format
6+
* migration. During the ESM migration the agent had to grep for these
7+
* — they were not first-class in codegraph.
8+
*
9+
* Expected: a new `build_context_refs` table modelled exactly on
10+
* `config_refs` (per-site occurrences with optional source_node_id),
11+
* a new `extractBuildContextRefs` extractor, and a way to query them
12+
* (either via codegraph_imports with a flag, or a dedicated tool).
13+
*/
14+
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
15+
import * as fs from 'fs';
16+
import * as path from 'path';
17+
import * as os from 'os';
18+
import { CodeGraph } from '../src/index.js';
19+
20+
describe('Tooling-gaps #6: build-context refs (__dirname / import.meta.*)', () => {
21+
let testDir: string;
22+
let cg: CodeGraph;
23+
24+
beforeEach(async () => {
25+
testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-build-ctx-'));
26+
fs.mkdirSync(path.join(testDir, 'src'));
27+
fs.writeFileSync(path.join(testDir, 'src', 'cjs-style.ts'),
28+
`export function loadAsset(){\n` +
29+
` const root = __dirname;\n` +
30+
` const me = __filename;\n` +
31+
` return { root, me };\n` +
32+
`}\n`);
33+
fs.writeFileSync(path.join(testDir, 'src', 'esm-style.ts'),
34+
`export function loadAssetEsm(){\n` +
35+
` const root = import.meta.dirname;\n` +
36+
` const me = import.meta.filename;\n` +
37+
` const url = import.meta.url;\n` +
38+
` return { root, me, url };\n` +
39+
`}\n`);
40+
fs.writeFileSync(path.join(testDir, 'package.json'),
41+
JSON.stringify({ name: 'x', version: '0.0.0', type: 'module' }));
42+
cg = await CodeGraph.init(testDir, { config: { llm: { endpoint: '' } } });
43+
await cg.indexAll({ summarize: false });
44+
});
45+
46+
afterEach(() => {
47+
if (cg) cg.close();
48+
if (fs.existsSync(testDir)) fs.rmSync(testDir, { recursive: true, force: true });
49+
});
50+
51+
it('the build_context_refs table exists', () => {
52+
const q = (cg as any).queries;
53+
const row = q.db.prepare(
54+
`SELECT name FROM sqlite_master WHERE type='table' AND name='build_context_refs'`
55+
).get();
56+
expect(row).toBeDefined();
57+
});
58+
59+
it('captures __dirname and __filename in the CJS-style file', () => {
60+
const q = (cg as any).queries;
61+
const refs = q.db.prepare(
62+
`SELECT ref_kind, file_path FROM build_context_refs WHERE file_path LIKE '%cjs-style%'`
63+
).all() as Array<{ ref_kind: string; file_path: string }>;
64+
const kinds = refs.map((r) => r.ref_kind);
65+
expect(kinds).toContain('__dirname');
66+
expect(kinds).toContain('__filename');
67+
});
68+
69+
it('captures import.meta.dirname / .filename / .url in the ESM-style file', () => {
70+
const q = (cg as any).queries;
71+
const refs = q.db.prepare(
72+
`SELECT ref_kind FROM build_context_refs WHERE file_path LIKE '%esm-style%'`
73+
).all() as Array<{ ref_kind: string }>;
74+
const kinds = refs.map((r) => r.ref_kind);
75+
expect(kinds).toContain('import.meta.dirname');
76+
expect(kinds).toContain('import.meta.filename');
77+
expect(kinds).toContain('import.meta.url');
78+
});
79+
});

__tests__/context-homonym.test.ts

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
/**
2+
* Tooling-gaps item #9 (doc smaller-obs): context homonym weighting.
3+
*
4+
* Observation from the ESM migration session: when the agent's task
5+
* description used the word "migration" (meaning module-format
6+
* migration), `codegraph_context` over-weighted schema-migration
7+
* matches because the codebase has a migration system in src/db/.
8+
*
9+
* Expected fix is fuzzy — could be:
10+
* - tighter co-occurrence scoring (require a second signal
11+
* specific to the intended sense),
12+
* - or a clarifying-question response when the top candidates are
13+
* ambiguous on a known homonym set,
14+
* - or per-domain boosting based on other terms in the task.
15+
*
16+
* These tests are a PROBE, not a blocking spec. Marked .todo until
17+
* we pick a mechanism. The `it` here documents the symptom.
18+
*/
19+
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
20+
import * as fs from 'fs';
21+
import * as path from 'path';
22+
import * as os from 'os';
23+
import { CodeGraph } from '../src/index.js';
24+
import { ToolHandler } from '../src/mcp/tools.js';
25+
26+
describe('Tooling-gaps #9: context homonym weighting (PROBE)', () => {
27+
let testDir: string;
28+
let cg: CodeGraph;
29+
let handler: ToolHandler;
30+
31+
beforeEach(async () => {
32+
testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-homonym-'));
33+
fs.mkdirSync(path.join(testDir, 'src'));
34+
fs.mkdirSync(path.join(testDir, 'src', 'db'));
35+
fs.mkdirSync(path.join(testDir, 'src', 'db', 'migrations'));
36+
fs.writeFileSync(path.join(testDir, 'src', 'db', 'migrations.ts'),
37+
`export function applyMigration(name: string){return name;}\n` +
38+
`export function listMigrations(){return ['001-init'];}\n`);
39+
fs.writeFileSync(path.join(testDir, 'src', 'db', 'migrations', '001-init.ts'),
40+
`export const description = 'initial schema migration';\n`);
41+
fs.writeFileSync(path.join(testDir, 'src', 'esm-loader.ts'),
42+
`// Module-format migration helpers — convert require to import.\n` +
43+
`export function rewriteImport(spec: string){return spec.replace(/^\\.\\//, './');}\n`);
44+
fs.writeFileSync(path.join(testDir, 'package.json'),
45+
JSON.stringify({ name: 'x', version: '0.0.0', type: 'module' }));
46+
cg = await CodeGraph.init(testDir, { config: { llm: { endpoint: '' } } });
47+
await cg.indexAll({ summarize: false });
48+
handler = new ToolHandler(cg);
49+
});
50+
51+
afterEach(() => {
52+
handler?.closeAll();
53+
if (cg) cg.close();
54+
if (fs.existsSync(testDir)) fs.rmSync(testDir, { recursive: true, force: true });
55+
});
56+
57+
it.todo('disambiguates "ESM migration" away from src/db/migrations/', async () => {
58+
// Mechanism TBD. This documents the desired behaviour.
59+
const result = await handler.execute('codegraph_context',
60+
{ task: 'ESM module-format migration: rewrite require to import' });
61+
const text = result.content[0]?.text ?? '';
62+
// Top candidates should mention rewriteImport / esm-loader, NOT applyMigration.
63+
expect(text).toMatch(/rewriteImport|esm-loader/);
64+
// Soft assertion: schema-migration symbols should not dominate.
65+
const esmHits = (text.match(/esm-loader|rewriteImport/g) || []).length;
66+
const schemaHits = (text.match(/applyMigration|listMigrations|001-init/g) || []).length;
67+
expect(esmHits).toBeGreaterThanOrEqual(schemaHits);
68+
});
69+
});
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
/**
2+
* Tooling-gaps item #5 (doc gap #2): Dynamic imports extraction.
3+
*
4+
* Today: extractor only recognizes `importTypes: ['import_statement']`,
5+
* so dynamic `import('foo')` and `require('foo')` are invisible to the
6+
* graph. During the ESM migration this meant the dynamic-import sed
7+
* pass had to be a separate hand-rolled regex (which had bugs).
8+
*
9+
* Expected: tree-sitter call_expression with `import` callee or
10+
* `require` identifier should produce import nodes/edges with a
11+
* metadata flag distinguishing them from static imports.
12+
*/
13+
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
14+
import * as fs from 'fs';
15+
import * as path from 'path';
16+
import * as os from 'os';
17+
import { CodeGraph } from '../src/index.js';
18+
19+
describe('Tooling-gaps #5: dynamic import extraction', () => {
20+
let testDir: string;
21+
let cg: CodeGraph;
22+
23+
beforeEach(async () => {
24+
testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-dyn-import-'));
25+
fs.mkdirSync(path.join(testDir, 'src'));
26+
fs.writeFileSync(path.join(testDir, 'src', 'plugin.ts'),
27+
`export const name = 'plugin';\n`);
28+
fs.writeFileSync(path.join(testDir, 'src', 'cjs-dep.ts'),
29+
`export const value = 42;\n`);
30+
fs.writeFileSync(path.join(testDir, 'src', 'main.ts'),
31+
`export async function loadPlugin(){\n` +
32+
` const mod = await import('./plugin');\n` +
33+
` return mod;\n` +
34+
`}\n` +
35+
`export function loadCjs(){\n` +
36+
` const m = require('./cjs-dep');\n` +
37+
` return m;\n` +
38+
`}\n`);
39+
fs.writeFileSync(path.join(testDir, 'package.json'),
40+
JSON.stringify({ name: 'x', version: '0.0.0', type: 'module' }));
41+
42+
cg = await CodeGraph.init(testDir, { config: { llm: { endpoint: '' } } });
43+
await cg.indexAll({ summarize: false });
44+
});
45+
46+
afterEach(() => {
47+
if (cg) cg.close();
48+
if (fs.existsSync(testDir)) fs.rmSync(testDir, { recursive: true, force: true });
49+
});
50+
51+
it("indexes dynamic `import('./plugin')` as an import node", () => {
52+
const q = (cg as any).queries;
53+
const rows = q.db.prepare(
54+
`SELECT name, qualified_name FROM nodes WHERE kind = 'import' AND name LIKE '%plugin%'`
55+
).all() as Array<{ name: string; qualified_name: string }>;
56+
expect(rows.length).toBeGreaterThan(0);
57+
});
58+
59+
it("indexes `require('./cjs-dep')` as an import node", () => {
60+
const q = (cg as any).queries;
61+
const rows = q.db.prepare(
62+
`SELECT name, qualified_name FROM nodes WHERE kind = 'import' AND name LIKE '%cjs-dep%'`
63+
).all() as Array<{ name: string; qualified_name: string }>;
64+
expect(rows.length).toBeGreaterThan(0);
65+
});
66+
67+
it('marks dynamic imports distinctly from static imports (via signature)', () => {
68+
// Static imports have signatures like `import { foo } from './bar';`.
69+
// Dynamic imports / require calls have signatures starting with
70+
// `import(` or `require(`. An agent's filter for dynamic-only is a
71+
// signature LIKE pattern. Edge-level metadata could be a refinement
72+
// later, but the node-signature shape is enough for v1 filtering.
73+
const q = (cg as any).queries;
74+
const dynamic = q.db.prepare(
75+
`SELECT signature FROM nodes WHERE kind = 'import'
76+
AND (signature LIKE 'import(%' OR signature LIKE '%require(%')`
77+
).all() as Array<{ signature: string }>;
78+
expect(dynamic.length).toBeGreaterThan(0);
79+
});
80+
});

__tests__/embeddings.test.ts

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,70 @@ export class TokenStore {
336336
fs.rmSync(tempDir, { recursive: true, force: true });
337337
});
338338

339+
it('standalone summarizeAll() also embeds (tooling-gap #11 regression)', async () => {
340+
// Pre-fix bug: cg.summarizeAll() only summarised, never embedded.
341+
// The CLI `codegraph summarize` command and the status message
342+
// both promised "embeds as a side-effect" but the embed phase only
343+
// ran inside indexAll()'s background pass — leaving sqlite-vec
344+
// unused unless you ran a full `index --force`. This test pins
345+
// down the corrected behaviour: standalone summarizeAll, called
346+
// after a `summarize:false` indexAll, populates BOTH tables.
347+
const cg = await CodeGraph.init(tempDir, {
348+
config: {
349+
llm: {
350+
endpoint: fake.url,
351+
chatModel: 'qwen2.5-coder:7b',
352+
embeddingModel: 'nomic-embed-text',
353+
},
354+
},
355+
});
356+
try {
357+
await cg.indexAll({ summarize: false });
358+
// No background pass, so nothing embedded yet.
359+
expect(fake.embedCalls).toBe(0);
360+
361+
const result = await cg.summarizeAll();
362+
363+
// Summaries fired AND embeddings fired in the same call.
364+
expect(fake.chatCalls).toBeGreaterThan(0);
365+
expect(fake.embedCalls).toBeGreaterThan(0);
366+
// The new embed result is reported on the return value.
367+
expect(result.embed).not.toBeNull();
368+
expect(result.embed!.generated).toBeGreaterThan(0);
369+
} finally {
370+
cg.close();
371+
}
372+
});
373+
374+
it('cg.embedAll() runs the embed-only path', async () => {
375+
const cg = await CodeGraph.init(tempDir, {
376+
config: {
377+
llm: {
378+
endpoint: fake.url,
379+
chatModel: 'qwen2.5-coder:7b',
380+
embeddingModel: 'nomic-embed-text',
381+
},
382+
},
383+
});
384+
try {
385+
await cg.indexAll();
386+
await cg.awaitBackgroundSummarization();
387+
const beforeChat = fake.chatCalls;
388+
const beforeEmbed = fake.embedCalls;
389+
390+
const result = await cg.embedAll();
391+
392+
// Idempotent — already embedded, so no new vectors.
393+
expect(result.generated).toBe(0);
394+
// No chat fire — pure embed path.
395+
expect(fake.chatCalls).toBe(beforeChat);
396+
// Embed call count unchanged for the same reason (cache hit).
397+
expect(fake.embedCalls).toBe(beforeEmbed);
398+
} finally {
399+
cg.close();
400+
}
401+
});
402+
339403
it('searchHybrid falls back to FTS when no embedding model is configured', async () => {
340404
const cg = await CodeGraph.init(tempDir, {
341405
config: {

__tests__/enclosing.test.ts

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
/**
2+
* Unit tests for the innermost-enclosing-scope picker. Documents the
3+
* span-ASC-plus-containment-check invariant after a reviewer flagged
4+
* the inline copies of this logic as potentially buggy. Spoiler: not
5+
* a bug — but the test cases are useful regardless to lock the
6+
* invariant down.
7+
*/
8+
import { describe, it, expect } from 'vitest';
9+
import { sortScopesBySpan, findEnclosingNode, type ScopeNode } from '../src/index-hooks/enclosing.js';
10+
11+
function pick(nodes: ScopeNode[], line: number): string | null {
12+
return findEnclosingNode(sortScopesBySpan(nodes), line);
13+
}
14+
15+
describe('findEnclosingNode', () => {
16+
it('returns null when no scope contains the line', () => {
17+
expect(pick([
18+
{ id: 'f1', start: 1, end: 10 },
19+
{ id: 'f2', start: 20, end: 30 },
20+
], 15)).toBeNull();
21+
});
22+
23+
it('returns the only matching scope when scopes are adjacent', () => {
24+
const scopes: ScopeNode[] = [
25+
{ id: 'f1', start: 1, end: 10 },
26+
{ id: 'f2', start: 20, end: 30 },
27+
];
28+
expect(pick(scopes, 5)).toBe('f1');
29+
expect(pick(scopes, 25)).toBe('f2');
30+
});
31+
32+
it('picks the innermost scope on simple nesting (class > method)', () => {
33+
expect(pick([
34+
{ id: 'A', start: 10, end: 100 },
35+
{ id: 'm1', start: 20, end: 30 },
36+
], 25)).toBe('m1');
37+
});
38+
39+
it('picks the outer scope when only it contains the line', () => {
40+
expect(pick([
41+
{ id: 'A', start: 10, end: 100 },
42+
{ id: 'm1', start: 20, end: 30 },
43+
], 50)).toBe('A');
44+
});
45+
46+
it('picks the innermost scope on deep nesting (3 levels)', () => {
47+
expect(pick([
48+
{ id: 'outer', start: 1, end: 100 },
49+
{ id: 'middle', start: 10, end: 50 },
50+
{ id: 'inner', start: 20, end: 30 },
51+
], 25)).toBe('inner');
52+
expect(pick([
53+
{ id: 'outer', start: 1, end: 100 },
54+
{ id: 'middle', start: 10, end: 50 },
55+
{ id: 'inner', start: 20, end: 30 },
56+
], 35)).toBe('middle');
57+
expect(pick([
58+
{ id: 'outer', start: 1, end: 100 },
59+
{ id: 'middle', start: 10, end: 50 },
60+
{ id: 'inner', start: 20, end: 30 },
61+
], 75)).toBe('outer');
62+
});
63+
64+
it('REGRESSION: an unrelated tiny function does NOT win over the real enclosing scope', () => {
65+
// Reviewer's hypothesised bug case: a top-level function with a
66+
// smaller span than the enclosing class. Sort puts it first, but
67+
// the containment check filters it out — outer class wins
68+
// correctly. This test pins the invariant down.
69+
expect(pick([
70+
{ id: 'class_A', start: 1, end: 100 }, // span 99
71+
{ id: 'method_m', start: 20, end: 30 }, // span 10 (in A)
72+
{ id: 'unrelated_f', start: 200, end: 205 }, // span 5 (not in A)
73+
], 60)).toBe('class_A');
74+
});
75+
76+
it('handles ties on span length deterministically (first by stable sort)', () => {
77+
// When two scopes have identical spans and both contain the line,
78+
// either is acceptable as "innermost." In practice nested scopes
79+
// can't tie (parent strictly contains child by ≥1 line). But two
80+
// sibling adjacent scopes can tie — only one will contain the line.
81+
const result = pick([
82+
{ id: 'a', start: 1, end: 10 },
83+
{ id: 'b', start: 11, end: 20 },
84+
], 5);
85+
expect(result).toBe('a');
86+
});
87+
});

0 commit comments

Comments
 (0)