Skip to content

Commit 42e3b33

Browse files
carlos-almclaude
andauthored
refactor(parity): render orchestrator-drop summary as a per-extension table (#1240)
* refactor(parity): render orchestrator-drop summary as a per-extension table The native-orchestrator drop warning lived in a single wall-of-text WARN line that grew unreadable when 30+ extensions were dropped at once (easy to trigger via journal-vs-fresh-build collisions). Make the per-extension breakdown scan like a table: header line keeps the count and now also reports the extension total; each extension occupies its own indented row with a right-aligned count column. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(extractors): extend shared helpers for identifier and symbol collection Adds shared utilities to src/extractors/helpers.ts in preparation for adoption across language extractors (phase 2): - nodeStartLine: companion to nodeEndLine for the ~108 hand-rolled startPosition.row + 1 literals scattered across extractors - findFirstChildOfTypes: find first child matching any of N types (useful for grammar variants like string vs string_literal) - iterChildren / PUNCTUATION_TOKENS: generator-based child iteration with punctuation skipping, used in elixir/gleam destructuring walks - pushCall / pushImport: centralise Call/Import construction so line derivation stays consistent across extractors - extractSimpleParameters / resolveParamName: uniform parameter extraction with optional type-map sink — collapses boilerplate in the ~16 per-language extractParams helpers Phase 1 of the TS extractor refactor plan (sync.json clusters 1). Additive only — no consumer adoption yet; existing helpers and extractor behaviour unchanged. Consumers updated in phase 2. docs check acknowledged: internal refactor, no doc updates needed. * refactor(extractors): adopt shared helpers across language extractors Phase 2 of the TS extractor refactor plan (sync.json cluster 1). Adopts the helpers extended in 9c8be55 (nodeStartLine, findFirstChildOfTypes, pushCall, pushImport, extractSimpleParameters, stripQuotes) across six language extractors: - r.ts: drop local stripQuotes; use shared stripQuotes/pushCall/ pushImport/findFirstChildOfTypes/nodeStartLine - gleam.ts: use pushCall/pushImport/findFirstChildOfTypes/nodeStartLine; extract pushConstructor helper for the dual-branch data-constructor walk - julia.ts: use pushCall/pushImport/nodeStartLine; collapse Julia param wrapper-type branches via JULIA_PARAM_WRAPPER_TYPES set - java.ts: use pushCall/pushImport/nodeStartLine; collapse extractJavaParameters via extractSimpleParameters with typeMap sink; extract resolveJavaTypeText for the generic_type unwrap pattern - gleam.ts and solidity.ts: extract qualifyWithParent helper in solidity to collapse 6 duplicated `parent ? \`\${parent}.\${name}\` : name` blocks - solidity.ts: use pushCall/pushImport/findFirstChildOfTypes/ nodeStartLine; collapse extractSolParams via extractSimpleParameters - javascript.ts: bulk-replace 43 inline `XXX.startPosition.row + 1` literals with nodeStartLine() calls; replace one stray endPosition literal with nodeEndLine Net -65 lines. No behaviour changes — only call-site collapsing onto the shared helpers (semantics verified by careful inspection of each replacement; pushImport's empty-names fallback matches the previous ad-hoc defaults in each extractor). docs check acknowledged: internal refactor, no doc updates needed. * refactor(extractors): break elixir param/map binding cycle Convert collectElixirParamIdentifiers from mutual-recursion with collectElixirMapBindings into a single iterative worklist traversal. Map/list/tuple/binary-operator dispatch is now done via three leaf helpers that push child nodes onto the worklist instead of calling back into the main function. This removes the function-level cycle flagged by codegraph (9 -> 8 cycles) without changing extractor semantics. docs check acknowledged: internal refactor only. * refactor(extractors-rs): extend shared helpers for identifier and symbol collection * refactor(extractors-rs): adopt shared helpers across language extractors Phase 5 of the Rust extractor refactor plan (sync.json cluster 2). Adopts the helpers extended in 0d687c4 (push_call, push_simple_call, push_import, push_type_map_entry, extract_simple_parameters, match_c_family_type_map) across eight language extractors: - cpp.rs: collapse match_cpp_type_map to a one-line delegate of match_c_family_type_map; use push_import/push_simple_call/push_call for include and call sites - cuda.rs: same delegation as cpp.rs; use push_import/push_simple_call/ push_call across include and call_expression handlers - java.rs: use push_type_map_entry for local-variable / formal-parameter bindings; use push_call/push_simple_call for method invocation and object creation; collapse extract_java_parameters to a one-shot extract_simple_parameters call; use push_import for import declaration - javascript.rs: use push_simple_call for new_expression identifier branch; use push_type_map_entry for the confidence-0.9 type entries - julia.rs: use push_simple_call/push_call across identifier and field_expression / scoped_identifier call branches - objc.rs: use push_import for at_import; use push_call for c-call and message-expression handlers (drops redundant is_empty guards) - r_lang.rs: use push_simple_call/push_call across identifier and namespace_operator call branches; use push_import for library/source - solidity.rs: use push_call (drops redundant guard) for call sites; collapse extract_sol_params to a one-shot extract_simple_parameters Net: -207 lines across 8 files, no behavior change. cargo check clean, 324 rust unit tests pass. Pre-existing test failure: tests/engines/parity.test.ts has two failing elixir cases unrelated to this commit (filed as #1227 — regression from commit 5abe6ad in Phase 3). * refactor(extractors-rs): break elixir param/map binding cycle Convert collect_elixir_param_identifiers from mutual-recursion with collect_elixir_map_bindings into a single iterative worklist traversal. Map/list/tuple/binary-operator dispatch is now done via three leaf helpers (push_elixir_sequence_items, push_elixir_map_values, push_elixir_binary_operator_operands) that push child nodes onto the worklist instead of calling back into the main function. This removes the function-level cycle flagged by codegraph (8 -> 7 cycles) and mirrors the TS refactor in 5abe6ad without changing extractor semantics. docs check acknowledged: internal refactor only. * refactor(ast-analysis): break visitor-utils destructuring cycle * refactor(ast-analysis): decompose engine and visitors * refactor(builder): break pipeline cycle by extracting orchestrator-selection strategy Extract the native-orchestrator path out of pipeline.ts into two new stage modules: - stages/native-orchestrator.ts — tryNativeOrchestrator + post-native structure/analysis fallback + dropped-language detection/backfill. - stages/native-db-lifecycle.ts — shared rusqlite connection helpers (closeNativeDb, reopenNativeDb, suspendNativeDb, refreshJsDb). This breaks the function-level cycle 'buildGraph <-> tryNativeOrchestrator' caused by codegraph's name-based resolver conflating the local buildGraph function with the ctx.nativeDb.buildGraph() method call. Once the orchestrator lives in its own file, there is no longer a local buildGraph in scope to collide with the method invocation. Function-level cycles: 9 -> 5. No file-level cycle introduced (still 1, unchanged — pre-existing MCP cycle). pipeline.ts shrinks from 1404 to 465 lines and now reads as a thin top-level controller: detect changes, try native, fall back to JS stages. computeWasmOnlyStaleFiles is re-exported from pipeline.ts so existing unit tests (tests/builder/wasm-only-stale-files.test.ts) keep working without changes. * refactor(builder): decompose builder stages and adopt shared helpers * refactor(graph): extract helpers in cycles and journal docs check acknowledged — no doc-relevant changes (internal helper extraction). * refactor(core-rs): collapse walker mutual recursion into single-entry traversal * refactor(core-rs): decompose pipeline, read queries, and edge builders docs check acknowledged - Rust internal helper extraction, no user-facing changes * refactor(parser): extract LANGUAGE_REGISTRY iteration and worker boundary helpers * refactor(analysis): decompose module-map and reduce complexity in fn-impact and dependencies Split high-cognitive-complexity functions in the analysis domain into focused helpers. Worst functions per gauntlet (cog/cyc/maxNesting/halstead) are now below thresholds. module-map.ts (statsData cog=31 -> below threshold): - Extract buildStatsFromNative and buildStatsFromJs branches - Share false-positive query and quality-score helpers between paths - aggregateRolesFromNative pulls duplicated role-aggregation code out fn-impact.ts (bfsTransitiveCallers cog=37 -> below threshold, impactAnalysisData cog=27 -> below threshold): - Extract recordCaller, processFrontierNode, seedInterfaceImplementors - Extract bfsImportDependents and groupDependentsByLevel dependencies.ts (bfsShortestPath cog=29, bfsFilePath cog=30, buildTransitiveCallers cog=24 -> all below threshold): - Extract buildNextCallerFrontier from buildTransitiveCallers - Extract buildNeighborStmt + visitNeighbor; state collected in struct - Extract visitFileNeighbor + reconstructFilePath docs check acknowledged - internal helper extraction, no user-facing changes * refactor(search): decompose generator and reduce complexity in semantic and hybrid search * refactor(features): decompose complexity, structure, graph-enrichment, structure-query, and owners Internal refactor — no public API or behaviour change, so docs check acknowledged. - complexity.ts: split collectNativeBulkRows (cog=70) into classify/build/collect-file helpers; extract classifyHalsteadToken + summarizeHalsteadCounts from computeHalsteadMetrics. - structure.ts: merge classifyNodeRolesFull/Incremental DRY via shared buildActiveFilesSet + buildClassifierInput helpers. - graph-enrichment.ts: decompose prepareFileLevelData (cog=32, cyc=26) into loadFileLevelEdges, computeFileFanCounts, detectFileCommunities, buildFileVisNode, selectFileSeedNodes. - structure-query.ts: split hotspotsData (cog=34, sloc=102) using a strategy pattern (HOTSPOT_ORDER_BY) and mapNative/JsHotspotRow helpers. - owners.ts: split ownersData (sloc=158, bugs=1.55) into loadFilteredFiles, buildOwnerIndex, loadSymbolsForFiles, computeOwnerBoundaries, buildOwnersSummary. * refactor(features): reduce complexity in cfg and cochange * refactor(graph): decompose Leiden optimiser and roles classifier Internal refactor — public APIs unchanged. docs check acknowledged. * refactor(presentation): extract shared rendering helpers in cfg and flow * refactor(scripts): separate config from execution in benchmarking scripts * refactor(features): reduce warning-level complexity in feature warnings batch * refactor(extractors): adopt iterChildren + PUNCTUATION_TOKENS in elixir pushElixirSequenceItems Replaces the inline childCount loop with the shared iterChildren generator configured with PUNCTUATION_TOKENS, completing phase 1 of the TS extractor refactor plan (sync.json cluster 1). Behaviour preserved — same nodes are pushed onto the worklist, just via the shared helper. docs check acknowledged: internal refactor, no doc updates needed. * refactor(extractors-rs): adopt shared child-iteration helpers (grind) Wire forge phase 4 helpers into their consumers: - find_first_child_of_types: collapse find_child(x, A).or_else(|| find_child(x, B)) in fsharp.rs handle_application - iter_children + PUNCTUATION_TOKENS: replace inline punctuation-skip loop in javascript.rs first_arg_is_string_literal Closes 3 dead-ffi helpers extracted by forge phase 4. Semantically identical. * fix(tests): move column-width comment to the .tsx entry that actually drives it (#1240) * fix(elixir): restore LIFO-compensating reverse-push in sequence and map helpers pushElixirSequenceItems and pushElixirMapValues were pushing items in forward order onto the LIFO worklist stack, causing tuple/list/map parameters to be emitted in reverse source order (e.g. {x, _y} → ['_y', 'x'] instead of ['x', '_y']). The fix collects items then pushes them in reverse so the LIFO pop restores source order, matching the native engine. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 8c4bfc2 commit 42e3b33

14 files changed

Lines changed: 2120 additions & 1736 deletions

src/domain/graph/builder/helpers.ts

Lines changed: 85 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -76,108 +76,117 @@ export function passesIncludeExclude(
7676
return true;
7777
}
7878

79+
/** Per-walk state computed once at the top-level invocation. */
80+
interface CollectContext {
81+
readonly rootDir: string;
82+
readonly includeRegexes: readonly RegExp[];
83+
readonly excludeRegexes: readonly RegExp[];
84+
readonly hasGlobFilters: boolean;
85+
readonly extraIgnore: Set<string> | null;
86+
readonly visited: Set<string>;
87+
}
88+
89+
/** Detect a symlink loop for `dir`. Returns true if `dir` was already visited. */
90+
function isSymlinkLoop(dir: string, visited: Set<string>): boolean {
91+
let realDir: string;
92+
try {
93+
realDir = fs.realpathSync(dir);
94+
} catch {
95+
return true;
96+
}
97+
if (visited.has(realDir)) {
98+
warn(`Symlink loop detected, skipping: ${dir}`);
99+
return true;
100+
}
101+
visited.add(realDir);
102+
return false;
103+
}
104+
105+
/** Read directory entries, returning null on error (already logged). */
106+
function readDirSafe(dir: string): fs.Dirent[] | null {
107+
try {
108+
return fs.readdirSync(dir, { withFileTypes: true });
109+
} catch (err: unknown) {
110+
warn(`Cannot read directory ${dir}: ${(err as Error).message}`);
111+
return null;
112+
}
113+
}
114+
115+
/** True if `entry` is a source file we should collect under `ctx`. */
116+
function isCollectableSourceFile(full: string, entry: fs.Dirent, ctx: CollectContext): boolean {
117+
if (!EXTENSIONS.has(path.extname(entry.name))) return false;
118+
if (!ctx.hasGlobFilters) return true;
119+
const rel = normalizePath(path.relative(ctx.rootDir, full));
120+
return passesIncludeExclude(rel, ctx.includeRegexes, ctx.excludeRegexes);
121+
}
122+
123+
function walkCollect(
124+
dir: string,
125+
files: string[],
126+
directories: Set<string> | null,
127+
ctx: CollectContext,
128+
): void {
129+
if (isSymlinkLoop(dir, ctx.visited)) return;
130+
131+
const entries = readDirSafe(dir);
132+
if (!entries) return;
133+
134+
let hasFiles = false;
135+
for (const entry of entries) {
136+
if (shouldSkipEntry(entry, ctx.extraIgnore)) continue;
137+
138+
const full = path.join(dir, entry.name);
139+
if (entry.isDirectory()) {
140+
walkCollect(full, files, directories, ctx);
141+
} else if (isCollectableSourceFile(full, entry, ctx)) {
142+
files.push(full);
143+
hasFiles = true;
144+
}
145+
}
146+
if (directories && hasFiles) {
147+
directories.add(dir);
148+
}
149+
}
150+
79151
/**
80152
* Recursively collect all source files under `dir`.
81153
* When `directories` is a Set, also tracks which directories contain files.
82154
*
83-
* The first invocation establishes `dir` as the project root against which
84-
* `config.include` / `config.exclude` globs are matched.
155+
* `dir` establishes the project root against which `config.include` /
156+
* `config.exclude` globs are matched.
85157
*/
86158
export function collectFiles(
87159
dir: string,
88160
files: string[],
89161
config: Partial<CodegraphConfig>,
90162
directories: Set<string>,
91-
_visited?: Set<string>,
92-
_rootDir?: string,
93-
_includeRegexes?: readonly RegExp[],
94-
_excludeRegexes?: readonly RegExp[],
95163
): { files: string[]; directories: Set<string> };
96164
export function collectFiles(
97165
dir: string,
98166
files?: string[],
99167
config?: Partial<CodegraphConfig>,
100168
directories?: null,
101-
_visited?: Set<string>,
102-
_rootDir?: string,
103-
_includeRegexes?: readonly RegExp[],
104-
_excludeRegexes?: readonly RegExp[],
105169
): string[];
106170
export function collectFiles(
107171
dir: string,
108172
files: string[] = [],
109173
config: Partial<CodegraphConfig> = {},
110174
directories: Set<string> | null = null,
111-
_visited: Set<string> = new Set(),
112-
_rootDir?: string,
113-
_includeRegexes?: readonly RegExp[],
114-
_excludeRegexes?: readonly RegExp[],
115175
): string[] | { files: string[]; directories: Set<string> } {
116176
const trackDirs = directories instanceof Set;
117-
let hasFiles = false;
118-
119-
// First call: compute root and compile include/exclude patterns once,
120-
// then pass them down recursive calls so we don't recompile per directory.
121-
const rootDir = _rootDir ?? dir;
122-
const includeRegexes = _includeRegexes ?? compileGlobs(config.include);
123-
const excludeRegexes = _excludeRegexes ?? compileGlobs(config.exclude);
124-
const hasGlobFilters = includeRegexes.length > 0 || excludeRegexes.length > 0;
125-
126-
// Merge config ignoreDirs with defaults
127-
const extraIgnore = config.ignoreDirs ? new Set(config.ignoreDirs) : null;
128-
129-
// Detect symlink loops (before I/O to avoid wasted readdirSync)
130-
let realDir: string;
131-
try {
132-
realDir = fs.realpathSync(dir);
133-
} catch {
134-
return trackDirs ? { files, directories: directories as Set<string> } : files;
135-
}
136-
if (_visited.has(realDir)) {
137-
warn(`Symlink loop detected, skipping: ${dir}`);
138-
return trackDirs ? { files, directories: directories as Set<string> } : files;
139-
}
140-
_visited.add(realDir);
141-
142-
let entries: fs.Dirent[];
143-
try {
144-
entries = fs.readdirSync(dir, { withFileTypes: true });
145-
} catch (err: unknown) {
146-
warn(`Cannot read directory ${dir}: ${(err as Error).message}`);
147-
return trackDirs ? { files, directories: directories as Set<string> } : files;
148-
}
177+
const includeRegexes = compileGlobs(config.include);
178+
const excludeRegexes = compileGlobs(config.exclude);
179+
const ctx: CollectContext = {
180+
rootDir: dir,
181+
includeRegexes,
182+
excludeRegexes,
183+
hasGlobFilters: includeRegexes.length > 0 || excludeRegexes.length > 0,
184+
extraIgnore: config.ignoreDirs ? new Set(config.ignoreDirs) : null,
185+
visited: new Set(),
186+
};
149187

150-
for (const entry of entries) {
151-
if (shouldSkipEntry(entry, extraIgnore)) continue;
188+
walkCollect(dir, files, trackDirs ? (directories as Set<string>) : null, ctx);
152189

153-
const full = path.join(dir, entry.name);
154-
if (entry.isDirectory()) {
155-
if (trackDirs) {
156-
collectFiles(
157-
full,
158-
files,
159-
config,
160-
directories as Set<string>,
161-
_visited,
162-
rootDir,
163-
includeRegexes,
164-
excludeRegexes,
165-
);
166-
} else {
167-
collectFiles(full, files, config, null, _visited, rootDir, includeRegexes, excludeRegexes);
168-
}
169-
} else if (EXTENSIONS.has(path.extname(entry.name))) {
170-
if (hasGlobFilters) {
171-
const rel = normalizePath(path.relative(rootDir, full));
172-
if (!passesIncludeExclude(rel, includeRegexes, excludeRegexes)) continue;
173-
}
174-
files.push(full);
175-
hasFiles = true;
176-
}
177-
}
178-
if (trackDirs && hasFiles) {
179-
(directories as Set<string>).add(dir);
180-
}
181190
return trackDirs ? { files, directories: directories as Set<string> } : files;
182191
}
183192

0 commit comments

Comments
 (0)