Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .claude/hooks/check-dead-exports.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,19 +62,60 @@ if [ -z "$FILES_TO_CHECK" ]; then
fi

# Single Node.js invocation: check all files in one process
# Excludes exports that are re-exported from index.js (public API) or consumed
# via dynamic import() — codegraph's static graph doesn't track those edges.
DEAD_EXPORTS=$(node -e "
const fs = require('fs');
const path = require('path');
const root = process.argv[1];
const files = process.argv[2].split('\n').filter(Boolean);

const { exportsData } = require(path.join(root, 'src/queries.js'));

// Build set of names exported from index.js (public API surface)
const indexSrc = fs.readFileSync(path.join(root, 'src/index.js'), 'utf8');
const publicAPI = new Set();
// Match: export { foo, bar as baz } from '...'
for (const m of indexSrc.matchAll(/export\s*\{([^}]+)\}/g)) {
for (const part of m[1].split(',')) {
const name = part.trim().split(/\s+as\s+/).pop().trim();
if (name) publicAPI.add(name);
}
}
// Match: export default ...
if (/export\s+default\b/.test(indexSrc)) publicAPI.add('default');

// Scan all src/ files for dynamic import() consumers
const srcDir = path.join(root, 'src');
function scanDynamic(dir) {
for (const ent of fs.readdirSync(dir, { withFileTypes: true })) {
if (ent.isDirectory()) { scanDynamic(path.join(dir, ent.name)); continue; }
if (!ent.name.endsWith('.js')) continue;
try {
const src = fs.readFileSync(path.join(dir, ent.name), 'utf8');
for (const m of src.matchAll(/import\(['\"]([^'\"]+)['\"]\)/g)) {
// Extract imported names from destructuring: const { X } = await import(...)
const line = src.substring(Math.max(0, src.lastIndexOf('\n', m.index) + 1), src.indexOf('\n', m.index + m[0].length));
const destructure = line.match(/\{\s*([^}]+)\s*\}/);
if (destructure) {
for (const part of destructure[1].split(',')) {
const name = part.trim().split(/\s+as\s+/).pop().trim();
if (name && /^\w+$/.test(name)) publicAPI.add(name);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-line-only destructuring extraction misses multi-line patterns

The line variable captures only the source line that contains import(...). When a dynamic import is consumed with a multi-line destructuring pattern, the opening { and the imported names are on different lines from the import() call:

// This pattern is NOT detected:
const {
  runAnalyses,
  buildExtensionSet,
} = await import('./ast-analysis/engine.js');

In that case the "line" captured is ) = await import('./ast-analysis/engine.js'); — no {…} on that line — so the regex at line 99 produces no match and none of the names are added to publicAPI. As a result the hook may continue to flag runAnalyses and buildExtensionSet as dead exports even after this fix, causing spurious pre-commit failures.

A more robust approach is to search the surrounding window (e.g. a few lines before the import() match) for the opening brace, or to scan the full source once with a multi-line regex that captures const\s*\{([^}]+)\}\s*=\s*(?:await\s+)?import(:

for (const m of src.matchAll(/const\s*\{([^}]+)\}\s*=\s*(?:await\s+)?import\s*\(['"]/gs)) {
  for (const part of m[1].split(',')) {
    const name = part.trim().split(/\s+as\s+/).pop().trim().split('\n').pop().trim();
    if (name && /^\w+$/.test(name)) publicAPI.add(name);
  }
}

(The s flag makes . match newlines, covering multi-line destructuring blocks.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6d0f34e — replaced the single-line regex with a multi-line-safe pattern using the s flag: const\s*\{([^}]+)\}\s*=\s*(?:await\s+)?import\s*\(['"/gs. This correctly handles multi-line destructuring like const {\n runAnalyses,\n} = await import(...). Also added a second pattern for single-binding default imports (const X = await import(...)).

}
} catch {}
}
}
scanDynamic(srcDir);

const dead = [];
for (const file of files) {
try {
const data = exportsData(file, undefined, { noTests: true, unused: true });
if (data && data.results) {
for (const r of data.results) {
if (publicAPI.has(r.name)) continue; // public API or dynamic import consumer
dead.push(r.name + ' (' + data.file + ':' + r.line + ')');
}
}
Expand Down
20 changes: 19 additions & 1 deletion .claude/hooks/check-readme.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
#!/bin/bash
# Hook: block git commit if README.md, CLAUDE.md, or ROADMAP.md might need updating but aren't staged.
# Runs as a PreToolUse hook on Bash tool calls.
#
# Policy:
# - If NO docs are staged but source files changed → deny (docs weren't considered)
# - If SOME docs are staged → allow (developer reviewed and chose which to update)
# - If commit message contains "docs check acknowledged" → allow (explicit bypass)

INPUT=$(cat)
COMMAND=$(echo "$INPUT" | node -e "
Expand All @@ -17,11 +22,16 @@ if ! echo "$COMMAND" | grep -qE '^\s*git\s+commit'; then
exit 0
fi

# Allow explicit bypass via commit message
if echo "$COMMAND" | grep -q 'docs check acknowledged'; then
exit 0
fi

# Check which docs are staged
STAGED_FILES=$(git diff --cached --name-only 2>/dev/null)
README_STAGED=$(echo "$STAGED_FILES" | grep -c '^README.md$' || true)
CLAUDE_STAGED=$(echo "$STAGED_FILES" | grep -c '^CLAUDE.md$' || true)
ROADMAP_STAGED=$(echo "$STAGED_FILES" | grep -c '^ROADMAP.md$' || true)
ROADMAP_STAGED=$(echo "$STAGED_FILES" | grep -c 'ROADMAP.md$' || true)

# If all three are staged, all good
if [ "$README_STAGED" -gt 0 ] && [ "$CLAUDE_STAGED" -gt 0 ] && [ "$ROADMAP_STAGED" -gt 0 ]; then
Expand All @@ -32,6 +42,14 @@ fi
NEEDS_CHECK=$(echo "$STAGED_FILES" | grep -cE '(src/|cli\.js|constants\.js|parser\.js|package\.json|grammars/)' || true)

if [ "$NEEDS_CHECK" -gt 0 ]; then
DOCS_STAGED=$((README_STAGED + CLAUDE_STAGED + ROADMAP_STAGED))

# If at least one doc is staged, developer considered docs — allow with info
if [ "$DOCS_STAGED" -gt 0 ]; then
exit 0
fi

# No docs staged at all — block
MISSING=""
[ "$README_STAGED" -eq 0 ] && MISSING="README.md"
[ "$CLAUDE_STAGED" -eq 0 ] && MISSING="${MISSING:+$MISSING, }CLAUDE.md"
Expand Down
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ JS source is plain JavaScript (ES modules) in `src/`. No transpilation step. The
| `native.js` | Native napi-rs addon loader with WASM fallback |
| `registry.js` | Global repo registry (`~/.codegraph/registry.json`) for multi-repo MCP |
| `resolve.js` | Import resolution (supports native batch mode) |
| `ast-analysis/` | Unified AST analysis framework: shared DFS walker (`visitor.js`), engine orchestrator (`engine.js`), extracted metrics (`metrics.js`), and pluggable visitors for complexity, dataflow, and AST-store |
| `complexity.js` | Cognitive, cyclomatic, Halstead, MI computation from AST; `complexity` CLI command |
| `communities.js` | Louvain community detection, drift analysis |
| `manifesto.js` | Configurable rule engine with warn/fail thresholds; CI gate |
Expand Down
1 change: 1 addition & 0 deletions docs/roadmap/BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@ These address fundamental limitations in the parsing and resolution pipeline tha
| 71 | Basic type inference for typed languages | Extract type annotations from TypeScript and Java AST nodes (variable declarations, function parameters, return types, generics) to resolve method calls through typed references. Currently `const x: Router = express.Router(); x.get(...)` produces no edge because `x.get` can't be resolved without knowing `x` is a `Router`. Tree-sitter already parses type annotations — we just don't use them for resolution. Start with declared types (no flow inference), which covers the majority of TS/Java code. | Resolution | Dramatically improves call graph completeness for TypeScript and Java — the two languages where developers annotate types explicitly and expect tooling to use them. Directly prevents hallucinated "no callers" results for methods called through typed variables | ✓ | ✓ | 5 | No | — |
| 72 | Interprocedural dataflow analysis | Extend the existing intraprocedural dataflow (ID 14) to propagate `flows_to`/`returns`/`mutates` edges across function boundaries. When function A calls B with argument X, and B's dataflow shows X flows to its return value, connect A's call site to the downstream consumers of B's return. Requires stitching per-function dataflow summaries at call edges — no new parsing, just graph traversal over existing `dataflow` + `edges` tables. Start with single-level propagation (caller↔callee), not transitive closure. | Analysis | Current dataflow stops at function boundaries, missing the most important flows — data passing through helper functions, middleware chains, and factory patterns. Single-function scope means `dataflow` can't answer "where does this user input end up?" across call boundaries. Cross-function propagation is the difference between toy dataflow and useful taint-like analysis | ✓ | ✓ | 5 | No | 14 |
| 73 | Improved dynamic call resolution | Upgrade the current "best-effort" dynamic dispatch resolution for Python, Ruby, and JavaScript. Three concrete improvements: **(a)** receiver-type tracking — when `x = SomeClass()` is followed by `x.method()`, resolve `method` to `SomeClass.method` using the assignment chain (leverages existing `ast_nodes` + `dataflow` tables); **(b)** common pattern recognition — resolve `EventEmitter.on('event', handler)` callback registration, `Promise.then/catch` chains, `Array.map/filter/reduce` with named function arguments, and decorator/annotation patterns; **(c)** confidence-tiered edges — mark dynamically-resolved edges with a confidence score (high for direct assignment, medium for pattern match, low for heuristic) so consumers can filter by reliability. | Resolution | In Python/Ruby/JS, 30-60% of real calls go through dynamic dispatch — method calls on variables, callbacks, event handlers, higher-order functions. The current best-effort resolution misses most of these, leaving massive gaps in the call graph for the languages where codegraph is most commonly used. Even partial improvement here has outsized impact on graph completeness | ✓ | ✓ | 5 | No | — |
| 81 | Track dynamic `import()` and re-exports as graph edges | Codegraph's static graph does not create edges for dynamic `import()` expressions (e.g. `const { buildAstNodes } = await import('../ast.js')`). Exports consumed exclusively via dynamic import appear as "zero consumers" in `exports --unused`, `check`, and dead-code detection — false positives that erode trust in the graph. Parse `import()` call expressions during the symbol extraction phase, resolve the target module, and create `kind='dynamic_import'` edges in the `edges` table. Similarly, re-exports from barrel files (`index.js`) that re-export symbols without calling them don't create consumer edges — the graph sees the barrel as the only consumer. Track re-export chains so the original module's export shows the true downstream consumers, not just `index.js`. | Resolution | Eliminates a class of false positives in `exports --unused`, `check --no-dead-code`, and audit dead-code reports. Currently any export consumed only via dynamic import or re-exported through a barrel file is incorrectly flagged as dead code. Also enables impact analysis to trace through lazy-loaded and barrel-file boundaries — critical for codebases that use dynamic imports for code splitting or conditional loading | ✓ | ✓ | 5 | No | — |

### Tier 1i — Search, navigation, and monitoring improvements

Expand Down
50 changes: 29 additions & 21 deletions docs/roadmap/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -562,36 +562,44 @@ Plus updated enums on existing tools (edge_kinds, symbol kinds).

**Context:** Phases 2.5 and 2.7 added 38 modules and grew the codebase from 5K to 26,277 lines without introducing shared abstractions. The dual-function anti-pattern was replicated across 19 modules. Three independent AST analysis engines (complexity, CFG, dataflow) totaling 4,801 lines share the same fundamental pattern but no infrastructure. Raw SQL is scattered across 25+ modules touching 13 tables. The priority ordering has been revised based on actual growth patterns -- the new #1 priority is the unified AST analysis framework.

### 3.1 -- Unified AST Analysis Framework ★ Critical (New)
### 3.1 -- Unified AST Analysis Framework ★ Critical 🔄

Unify the three independent AST analysis engines (complexity, CFG, dataflow) plus AST node storage into a shared visitor framework. These four modules total 5,193 lines and independently implement the same pattern: per-language rules map → AST walk → collect data → write to DB → query → format.
Unify the independent AST analysis engines (complexity, CFG, dataflow) plus AST node storage into a shared visitor framework. These four modules independently implement the same pattern: per-language rules map → AST walk → collect data → write to DB → query → format.

| Module | Lines | Languages | Pattern |
|--------|-------|-----------|---------|
| `complexity.js` | 2,163 | 8 | Per-language rules → AST walk → collect metrics |
| `cfg.js` | 1,451 | 9 | Per-language rules → AST walk → build basic blocks |
| `dataflow.js` | 1,187 | 1 (JS/TS) | Scope stack → AST walk → collect flows |
| `ast.js` | 392 | 1 (JS/TS) | AST walk → extract stored nodes |

The extractors refactoring (Phase 2.7.6) proved the pattern: split per-language rules into files, share the engine. Apply it to all four AST analysis passes.
**Completed:** Phases 1-7 implemented a pluggable visitor framework with a shared DFS walker (`walkWithVisitors`), an analysis engine orchestrator (`runAnalyses`), and three visitors (complexity, dataflow, AST-store) that share a single tree traversal per file. `builder.js` collapsed from 4 sequential `buildXxx` blocks into one `runAnalyses` call.

```
src/
ast-analysis/
visitor.js # Shared AST visitor with hook points
engine.js # Single-pass or multi-pass orchestrator
metrics.js # Halstead, MI, LOC/SLOC (language-agnostic)
cfg-builder.js # Basic-block + edge construction
rules/
complexity/{lang}.js # Cognitive/cyclomatic rules per language
cfg/{lang}.js # Basic-block rules per language
dataflow/{lang}.js # Define-use chain rules per language
ast-store/{lang}.js # Node extraction rules per language
visitor.js # Shared DFS walker with pluggable visitor hooks
engine.js # Orchestrates all analyses in one coordinated pass
metrics.js # Halstead, MI, LOC/SLOC (extracted from complexity.js)
visitor-utils.js # Shared helpers (functionName, extractParams, etc.)
visitors/
complexity-visitor.js # Cognitive/cyclomatic/nesting + Halstead
ast-store-visitor.js # new/throw/await/string/regex extraction
dataflow-visitor.js # Scope stack + define-use chains
shared.js # findFunctionNode, rule factories, ext mapping
rules/ # Per-language rule files (unchanged)
```

A single AST walk with pluggable visitors eliminates 3 redundant tree traversals per function, shares language-specific node type mappings, and allows new analyses to plug in without creating another 1K+ line module.
- ✅ Shared DFS walker with `enterNode`/`exitNode`/`enterFunction`/`exitFunction` hooks, `skipChildren` per-visitor, nesting/scope tracking
- ✅ Complexity visitor (cognitive, cyclomatic, max nesting, Halstead) — file-level and function-level modes
- ✅ AST-store visitor (new/throw/await/string/regex extraction)
- ✅ Dataflow visitor (define-use chains, arg flows, mutations, scope stack)
- ✅ Engine orchestrator: unified pre-walk stores results as pre-computed data on `symbols`, then delegates to existing `buildXxx` for DB writes
- ✅ `builder.js` → single `runAnalyses` call replaces 4 sequential blocks + WASM pre-parse
- ✅ Extracted pure computations to `metrics.js` (Halstead derived math, LOC, MI)
- ✅ Extracted shared helpers to `visitor-utils.js` (from dataflow.js)
- 🔲 **CFG visitor rewrite** (see below)

**Remaining: CFG visitor rewrite.** `buildFunctionCFG` (813 lines) uses a statement-level traversal (`getStatements` + `processStatement` with `loopStack`, `labelMap`, `blockIndex`) that is fundamentally incompatible with the node-level DFS used by `walkWithVisitors`. This is why the engine runs CFG as a separate Mode B pass — the only analysis that can't participate in the shared single-DFS walk.

Rewrite the CFG algorithm as a node-level visitor that builds basic blocks and edges incrementally via `enterNode`/`exitNode` hooks, tracking block boundaries at branch/loop/return nodes the same way the complexity visitor tracks nesting. This eliminates the last redundant tree traversal during build and lets CFG share the exact same DFS pass as complexity, dataflow, and AST extraction. The statement-level `getStatements` helper and per-language `CFG_RULES.statementTypes` can be replaced by detecting block-terminating node types in `enterNode`. Also simplifies `engine.js` by removing the Mode A/B split and WASM pre-parse special-casing for CFG.

**Remaining: Derive cyclomatic complexity from CFG.** Once CFG participates in the unified walk, cyclomatic complexity can be derived directly from CFG edge/block counts (`edges - nodes + 2`) rather than independently computed by the complexity visitor. This creates a single source of truth for control flow metrics and eliminates redundant computation. Can also be done as a simpler SQL-only approach against stored `cfg_blocks`/`cfg_edges` tables (see backlog ID 45).

**Affected files:** `src/complexity.js`, `src/cfg.js`, `src/dataflow.js`, `src/ast.js` -> split into `src/ast-analysis/`
**Affected files:** `src/complexity.js`, `src/cfg.js`, `src/dataflow.js`, `src/ast.js` split into `src/ast-analysis/`

### 3.2 -- Command/Query Separation ★ Critical 🔄

Expand Down
Loading