Skip to content

Commit e1fed2d

Browse files
andreinknvclaude
andcommitted
feat(graph): config_refs — env var read sites as queryable graph data
Extracts environment-variable read sites (process.env.X, os.getenv, os.environ, os.Getenv, System.getenv, ENV[...], env!, std::env::var) into a new config_refs table and surfaces them via a new codegraph_config MCP tool. Answers "what reads OBSIDIAN_PORT?" and "what config does this codebase read?" without grep noise. ## Why a separate table, not graph nodes/edges Env vars don't have a single source-of-truth file (they're a global namespace), so giving them a synthetic file_path would pollute the main graph. The table is queried via a dedicated MCP tool and via augmented codegraph_node output. ## Spike validation (mcp-obsidian-extended) Reproduces the spike's empirical numbers exactly when indexed with this PR's build: TOOL_PRESET 8 reads OBSIDIAN_PORT 8 OBSIDIAN_SCHEME 7 INCLUDE_TOOLS 7 TOOL_MODE 6 OBSIDIAN_CONFIG 5 OBSIDIAN_DEBUG 4 These are central config knobs an agent should know about when editing this codebase. The codegraph repo itself is sparse (4 reads) because it's a CLI with file-based config — this feature shines on service-style codebases. ## What's added **Parser** (src/config-refs/index.ts) - Per-language regex catalogue: TS/JS, Python, Go, Java/Kotlin, Ruby, Rust. Upper-case-only keys filter out dynamic accesses (`process.env.foo`) and lower-case identifiers. - Pre-filter `line.includes('env'|'Env'|'ENV')` skips 99% of lines cheaply before running per-language regexes. - Pure I/O + regex: caller owns DB writes via applyConfigRefs. **Schema migration v4** with `CREATE TABLE IF NOT EXISTS config_refs (id, config_kind, config_key, source_node_id, file_path, line)`, with FK to nodes for cascade-delete on node removal. Defensive ensureConfigRefsTable() guard at every persistence path so a botched merge with a peer v4 doesn't silently lose the table. **Wiring** (CodeGraph.runConfigRefsPass) - indexAll: full rescan, clearConfigRefs + extract every indexed file. - sync: incremental — invalidate rows for every changed file via deleteConfigRefsForPaths, then re-extract. Also runs when files were removed (sync's changedFilePaths excludes deletions); pruneOrphanedConfigRefs sweeps stale rows. - Per-pass enclosing-function resolver cache: Map<filePath, sorted span list> — avoids O(reads × symbols) DB scans on big files. - enableConfigRefs config flag (default true) for opt-out. **MCP tool** codegraph_config - No `key` arg → top-N keys with read counts (the "what config does this codebase read?" question). - With `key` → all read sites + enclosing function (the "what reads OBSIDIAN_PORT?" question). ## Tests 399 pass total, **19 new**: - Parser per-language (7): TS, JS, Python, Go, Java/Kotlin, Ruby, Rust - Parser correctness (5): upper-case-only filter, unsupported lang, line-number 1-indexed, resolveEnclosing closure threading, missing file - End-to-end CodeGraph (7): basic + enclosing-function attribution, reverse-view (getConfigKeysForNode), enableConfigRefs:false, incremental sync replace, **regression for empty-rows data-corruption bug** (file edited to remove last env read), deletion sweep, **v4-collision defense** (drop table after init) ## Reviewer pass Independent reviewer ran once. One REQUEST_CHANGES + two INFO addressed: - **Data-corruption bug:** `applyConfigRefs([])` early-returned without deleting stale rows for the file. A file edited to remove its last env read kept the orphan row forever. Fixed by adding `deleteConfigRefsForPaths` and calling it in runConfigRefsPass BEFORE re-extraction. Regression test asserts the fix. - Pre-filter heuristic: documented as quick-reject, not exact gate. ## Coexistence with #112 (centrality-and-churn) and #113 (issue-history) All three PRs target migration v4. Whichever lands second/third renumbers their migration in the array — standard merge-time rebase. As insurance, ensureConfigRefsTable() at every persistence path creates the table on demand so a botched merge doesn't silently swallow data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2dc4bc3 commit e1fed2d

10 files changed

Lines changed: 912 additions & 3 deletions

File tree

__tests__/config-refs.test.ts

Lines changed: 299 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
/**
2+
* Config-refs tests: parser unit tests + end-to-end through CodeGraph.
3+
*/
4+
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
5+
import * as fs from 'fs';
6+
import * as os from 'os';
7+
import * as path from 'path';
8+
import { extractConfigRefs } from '../src/config-refs';
9+
import CodeGraph from '../src/index';
10+
11+
let testDir: string;
12+
let cg: CodeGraph | null = null;
13+
14+
function write(rel: string, content: string) {
15+
const abs = path.join(testDir, rel);
16+
fs.mkdirSync(path.dirname(abs), { recursive: true });
17+
fs.writeFileSync(abs, content);
18+
}
19+
20+
beforeEach(() => {
21+
testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-config-'));
22+
});
23+
24+
afterEach(() => {
25+
if (cg) {
26+
cg.destroy();
27+
cg = null;
28+
}
29+
if (fs.existsSync(testDir)) fs.rmSync(testDir, { recursive: true, force: true });
30+
});
31+
32+
// ============================================================================
33+
// Pure parser tests (no CodeGraph)
34+
// ============================================================================
35+
36+
describe('extractConfigRefs', () => {
37+
it('extracts process.env.X from TS', () => {
38+
write('a.ts', `const port = process.env.OBSIDIAN_PORT;\n`);
39+
const refs = extractConfigRefs(testDir, [{ path: 'a.ts', language: 'typescript' }], () => null);
40+
expect(refs.length).toBe(1);
41+
expect(refs[0]!.configKey).toBe('OBSIDIAN_PORT');
42+
expect(refs[0]!.line).toBe(1);
43+
});
44+
45+
it('extracts process.env["X"] from JS', () => {
46+
write('a.js', `module.exports = { port: process.env["MY_KEY"] };\n`);
47+
const refs = extractConfigRefs(testDir, [{ path: 'a.js', language: 'javascript' }], () => null);
48+
expect(refs.map((r) => r.configKey)).toEqual(['MY_KEY']);
49+
});
50+
51+
it('extracts os.getenv / os.environ from Python', () => {
52+
write(
53+
'a.py',
54+
[
55+
`import os`,
56+
`port = os.getenv("PYTHON_PORT")`,
57+
`host = os.environ.get("PYTHON_HOST")`,
58+
`path = os.environ["PYTHON_PATH"]`,
59+
`name = getenv("PYTHON_NAME")`,
60+
].join('\n')
61+
);
62+
const refs = extractConfigRefs(testDir, [{ path: 'a.py', language: 'python' }], () => null);
63+
expect(new Set(refs.map((r) => r.configKey))).toEqual(
64+
new Set(['PYTHON_PORT', 'PYTHON_HOST', 'PYTHON_PATH', 'PYTHON_NAME'])
65+
);
66+
});
67+
68+
it('extracts os.Getenv / os.LookupEnv from Go', () => {
69+
write(
70+
'a.go',
71+
[
72+
`package main`,
73+
`import "os"`,
74+
`var Port = os.Getenv("GO_PORT")`,
75+
`var Host, _ = os.LookupEnv("GO_HOST")`,
76+
].join('\n')
77+
);
78+
const refs = extractConfigRefs(testDir, [{ path: 'a.go', language: 'go' }], () => null);
79+
expect(new Set(refs.map((r) => r.configKey))).toEqual(new Set(['GO_PORT', 'GO_HOST']));
80+
});
81+
82+
it('extracts ENV[...] / ENV.fetch from Ruby', () => {
83+
write('a.rb', `port = ENV["RUBY_PORT"]\nhost = ENV.fetch("RUBY_HOST")\n`);
84+
const refs = extractConfigRefs(testDir, [{ path: 'a.rb', language: 'ruby' }], () => null);
85+
expect(new Set(refs.map((r) => r.configKey))).toEqual(new Set(['RUBY_PORT', 'RUBY_HOST']));
86+
});
87+
88+
it('extracts env!/std::env::var from Rust', () => {
89+
write(
90+
'a.rs',
91+
[
92+
`let port = env!("RUST_PORT");`,
93+
`let host = std::env::var("RUST_HOST").unwrap();`,
94+
].join('\n')
95+
);
96+
const refs = extractConfigRefs(testDir, [{ path: 'a.rs', language: 'rust' }], () => null);
97+
expect(new Set(refs.map((r) => r.configKey))).toEqual(new Set(['RUST_PORT', 'RUST_HOST']));
98+
});
99+
100+
it('extracts System.getenv from Java/Kotlin', () => {
101+
write('A.java', `String port = System.getenv("JAVA_PORT");\n`);
102+
const refs = extractConfigRefs(testDir, [{ path: 'A.java', language: 'java' }], () => null);
103+
expect(refs.map((r) => r.configKey)).toEqual(['JAVA_PORT']);
104+
});
105+
106+
it('only matches UPPER_CASE keys (skips lower-case identifiers)', () => {
107+
write('a.ts', `const x = process.env.somethingDynamic;\nconst y = process.env.GOOD_KEY;\n`);
108+
const refs = extractConfigRefs(testDir, [{ path: 'a.ts', language: 'typescript' }], () => null);
109+
expect(refs.map((r) => r.configKey)).toEqual(['GOOD_KEY']);
110+
});
111+
112+
it('skips files in unsupported languages without crashing', () => {
113+
write('a.swift', `let port = ProcessInfo.processInfo.environment["SWIFT_PORT"]\n`);
114+
const refs = extractConfigRefs(testDir, [{ path: 'a.swift', language: 'swift' }], () => null);
115+
// Swift not in PATTERNS for v1.
116+
expect(refs).toEqual([]);
117+
});
118+
119+
it('captures the correct 1-indexed line number', () => {
120+
write(
121+
'a.ts',
122+
[
123+
`// line 1`,
124+
`// line 2`,
125+
`const x = process.env.LINE_THREE_KEY;`,
126+
`// line 4`,
127+
`const y = process.env.LINE_FIVE_KEY;`,
128+
].join('\n')
129+
);
130+
const refs = extractConfigRefs(testDir, [{ path: 'a.ts', language: 'typescript' }], () => null);
131+
expect(refs).toEqual([
132+
expect.objectContaining({ configKey: 'LINE_THREE_KEY', line: 3 }),
133+
expect.objectContaining({ configKey: 'LINE_FIVE_KEY', line: 5 }),
134+
]);
135+
});
136+
137+
it('threads the resolveEnclosing closure correctly', () => {
138+
write('a.ts', `const x = process.env.FOO;\n`);
139+
const calls: Array<[string, number]> = [];
140+
extractConfigRefs(
141+
testDir,
142+
[{ path: 'a.ts', language: 'typescript' }],
143+
(filePath, line) => {
144+
calls.push([filePath, line]);
145+
return 'fake-node-id';
146+
}
147+
);
148+
expect(calls).toEqual([['a.ts', 1]]);
149+
});
150+
151+
it('survives a missing file (skips, no throw)', () => {
152+
const refs = extractConfigRefs(
153+
testDir,
154+
[{ path: 'does-not-exist.ts', language: 'typescript' }],
155+
() => null
156+
);
157+
expect(refs).toEqual([]);
158+
});
159+
});
160+
161+
// ============================================================================
162+
// End-to-end through CodeGraph
163+
// ============================================================================
164+
165+
describe('CodeGraph config refs', () => {
166+
it('persists env reads after indexAll and resolves enclosing function', async () => {
167+
write(
168+
'src/server.ts',
169+
[
170+
`export function start() {`,
171+
` const port = process.env.OBSIDIAN_PORT ?? 8080;`,
172+
` return port;`,
173+
`}`,
174+
``,
175+
`export function getApiKey() {`,
176+
` return process.env.OBSIDIAN_API_KEY;`,
177+
`}`,
178+
``,
179+
`// top-level read`,
180+
`export const HOST = process.env.OBSIDIAN_HOST;`,
181+
].join('\n')
182+
);
183+
cg = CodeGraph.initSync(testDir, {
184+
config: { include: ['**/*.ts'], exclude: [] },
185+
});
186+
await cg.indexAll();
187+
188+
// All three keys should be visible.
189+
const keys = cg.getConfigKeys({ configKind: 'env' });
190+
expect(keys.map((k) => k.configKey).sort()).toEqual([
191+
'OBSIDIAN_API_KEY',
192+
'OBSIDIAN_HOST',
193+
'OBSIDIAN_PORT',
194+
]);
195+
196+
// The OBSIDIAN_PORT read should be attributed to `start`.
197+
const portSites = cg.getConfigRefsByKey('OBSIDIAN_PORT');
198+
expect(portSites.length).toBe(1);
199+
expect(portSites[0]!.sourceName).toBe('start');
200+
201+
// The HOST read is at the top level — sourceName should be null.
202+
const hostSites = cg.getConfigRefsByKey('OBSIDIAN_HOST');
203+
expect(hostSites[0]!.sourceName).toBeNull();
204+
});
205+
206+
it('reverse view: getConfigKeysForNode returns keys read by a function', async () => {
207+
write(
208+
'src/a.ts',
209+
[
210+
`export function loadConfig() {`,
211+
` const a = process.env.KEY_A;`,
212+
` const b = process.env.KEY_B;`,
213+
` return { a, b };`,
214+
`}`,
215+
].join('\n')
216+
);
217+
cg = CodeGraph.initSync(testDir, { config: { include: ['**/*.ts'], exclude: [] } });
218+
await cg.indexAll();
219+
220+
const node = cg.getNodesInFile('src/a.ts').find((n) => n.name === 'loadConfig')!;
221+
const keys = cg.getConfigKeysForNode(node.id).map((r) => r.configKey).sort();
222+
expect(keys).toEqual(['KEY_A', 'KEY_B']);
223+
});
224+
225+
it('respects enableConfigRefs=false', async () => {
226+
write('src/a.ts', `export const PORT = process.env.PORT;\n`);
227+
cg = CodeGraph.initSync(testDir, {
228+
config: { include: ['**/*.ts'], exclude: [], enableConfigRefs: false },
229+
});
230+
await cg.indexAll();
231+
expect(cg.getConfigKeys()).toEqual([]);
232+
});
233+
234+
it('incremental sync replaces refs for changed files only', async () => {
235+
write('src/a.ts', `export const A = process.env.OLD_KEY;\n`);
236+
write('src/b.ts', `export const B = process.env.UNCHANGED_KEY;\n`);
237+
cg = CodeGraph.initSync(testDir, { config: { include: ['**/*.ts'], exclude: [] } });
238+
await cg.indexAll();
239+
expect(cg.getConfigKeys().map((k) => k.configKey).sort()).toEqual([
240+
'OLD_KEY',
241+
'UNCHANGED_KEY',
242+
]);
243+
244+
// Edit only a.ts — UNCHANGED_KEY should still be there.
245+
write('src/a.ts', `export const A = process.env.NEW_KEY;\n`);
246+
await cg.sync();
247+
248+
const keys = cg.getConfigKeys().map((k) => k.configKey).sort();
249+
expect(keys).toContain('NEW_KEY');
250+
expect(keys).toContain('UNCHANGED_KEY');
251+
expect(keys).not.toContain('OLD_KEY');
252+
});
253+
254+
it('drops refs when a file is edited to remove its last env read', async () => {
255+
// Regression for the empty-rows early-return data-corruption bug:
256+
// applyConfigRefs([]) used to short-circuit without deleting the
257+
// stale rows for the file. The sync path now explicitly invalidates
258+
// rows for every changed file *before* extracting, regardless of
259+
// whether the new content has any reads.
260+
write('src/a.ts', `export const PORT = process.env.REMOVED_KEY;\n`);
261+
cg = CodeGraph.initSync(testDir, { config: { include: ['**/*.ts'], exclude: [] } });
262+
await cg.indexAll();
263+
expect(cg.getConfigKeys().some((k) => k.configKey === 'REMOVED_KEY')).toBe(true);
264+
265+
// Edit a.ts to remove the env read entirely (no remaining reads).
266+
write('src/a.ts', `export const PORT = 8080; // no env read here\n`);
267+
await cg.sync();
268+
269+
expect(cg.getConfigKeys().some((k) => k.configKey === 'REMOVED_KEY')).toBe(false);
270+
});
271+
272+
it('drops refs for files removed between syncs', async () => {
273+
write('src/a.ts', `export const A = process.env.GOING_AWAY;\n`);
274+
cg = CodeGraph.initSync(testDir, { config: { include: ['**/*.ts'], exclude: [] } });
275+
await cg.indexAll();
276+
expect(cg.getConfigKeys().some((k) => k.configKey === 'GOING_AWAY')).toBe(true);
277+
278+
fs.unlinkSync(path.join(testDir, 'src/a.ts'));
279+
await cg.sync();
280+
281+
expect(cg.getConfigKeys().some((k) => k.configKey === 'GOING_AWAY')).toBe(false);
282+
});
283+
284+
it('creates config_refs table on demand if migration v4 was claimed by a peer', async () => {
285+
// Simulates a peer-PR migration v4 collision: another branch's
286+
// v4 ran first, ours was a no-op, config_refs never got created.
287+
// The defensive ensureConfigRefsTable() guard at every persistence
288+
// path must still let us record + query.
289+
write('src/a.ts', `export const PORT = process.env.PORT;\n`);
290+
cg = CodeGraph.initSync(testDir, { config: { include: ['**/*.ts'], exclude: [] } });
291+
292+
// eslint-disable-next-line @typescript-eslint/no-explicit-any
293+
(cg as any).db.getDb().exec('DROP TABLE IF EXISTS config_refs');
294+
295+
await cg.indexAll();
296+
const keys = cg.getConfigKeys().map((k) => k.configKey);
297+
expect(keys).toContain('PORT');
298+
});
299+
});

__tests__/foundation.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,7 +305,7 @@ describe('Database Connection', () => {
305305

306306
const version = db.getSchemaVersion();
307307
expect(version).not.toBeNull();
308-
expect(version?.version).toBe(3);
308+
expect(version?.version).toBe(4);
309309

310310
db.close();
311311
});

__tests__/pr19-improvements.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -299,7 +299,7 @@ describe('Best-Candidate Resolution', () => {
299299
describe('Schema v2 Migration', () => {
300300
it.skipIf(!HAS_SQLITE)('should have correct current schema version', async () => {
301301
const { CURRENT_SCHEMA_VERSION } = await import('../src/db/migrations');
302-
expect(CURRENT_SCHEMA_VERSION).toBe(3);
302+
expect(CURRENT_SCHEMA_VERSION).toBe(4);
303303
});
304304

305305
it.skipIf(!HAS_SQLITE)('should have migration for version 2', async () => {

0 commit comments

Comments
 (0)