Skip to content

Commit ca5cf24

Browse files
authored
Merge pull request #71 from optave/docs/backlog-and-roadmap-reorg
docs: add feature backlog and track file moves in hooks
2 parents 8a87c05 + cfe633b commit ca5cf24

5 files changed

Lines changed: 172 additions & 0 deletions

File tree

.claude/hooks/track-moves.sh

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
#!/usr/bin/env bash
2+
# track-moves.sh — PostToolUse hook for Bash tool calls
3+
# Detects mv/git mv/cp commands and logs all affected paths
4+
# (both source and destination) to .claude/session-edits.log so that
5+
# guard-git.sh can validate commits that include moved/copied files.
6+
# Always exits 0 (informational only, never blocks).
7+
8+
set -euo pipefail
9+
10+
INPUT=$(cat)
11+
12+
# Extract the command from tool_input JSON
13+
COMMAND=$(echo "$INPUT" | node -e "
14+
let d='';
15+
process.stdin.on('data',c=>d+=c);
16+
process.stdin.on('end',()=>{
17+
const p=JSON.parse(d).tool_input?.command||'';
18+
if(p)process.stdout.write(p);
19+
});
20+
" 2>/dev/null) || true
21+
22+
if [ -z "$COMMAND" ]; then
23+
exit 0
24+
fi
25+
26+
# Only care about mv / git mv / cp commands
27+
if ! echo "$COMMAND" | grep -qE '(^|\s|&&\s*)(mv|git\s+mv|cp)\s+'; then
28+
exit 0
29+
fi
30+
31+
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-.}"
32+
LOG_FILE="$PROJECT_DIR/.claude/session-edits.log"
33+
34+
# Use node to parse the command and extract all file paths involved
35+
PATHS=$(echo "$COMMAND" | node -e "
36+
const path = require('path');
37+
let d = '';
38+
process.stdin.on('data', c => d += c);
39+
process.stdin.on('end', () => {
40+
const base = path.resolve(process.argv[1]);
41+
const results = new Set();
42+
43+
// Split on && or ; to handle chained commands
44+
const parts = d.split(/\s*(?:&&|;)\s*/);
45+
46+
for (const part of parts) {
47+
// Match: mv / cp / git mv followed by arguments
48+
const m = part.match(/(?:git\s+mv|mv|cp)\s+(.+)/);
49+
if (!m) continue;
50+
51+
// Simple arg splitting that respects quotes
52+
const raw = m[1];
53+
const args = [];
54+
let cur = '';
55+
let q = null;
56+
for (let i = 0; i < raw.length; i++) {
57+
const c = raw[i];
58+
if (q) { if (c === q) q = null; else cur += c; }
59+
else if (c === '\"' || c === \"'\") { q = c; }
60+
else if (c === ' ' || c === '\\t') { if (cur) { args.push(cur); cur = ''; } }
61+
else { cur += c; }
62+
}
63+
if (cur) args.push(cur);
64+
65+
// Filter out flags (-f, -v, --force, etc.)
66+
const paths = args.filter(a => !a.startsWith('-'));
67+
68+
// Resolve each path relative to project root
69+
for (const p of paths) {
70+
const abs = path.resolve(p);
71+
const rel = path.relative(base, abs).split(path.sep).join('/');
72+
if (!rel.startsWith('..')) results.add(rel);
73+
}
74+
}
75+
76+
process.stdout.write([...results].join('\\n'));
77+
});
78+
" "$PROJECT_DIR" 2>/dev/null) || true
79+
80+
if [ -z "$PATHS" ]; then
81+
exit 0
82+
fi
83+
84+
# Append timestamped entries for every affected path
85+
mkdir -p "$(dirname "$LOG_FILE")"
86+
TS=$(date -u +%Y-%m-%dT%H:%M:%SZ)
87+
while IFS= read -r rel_path; do
88+
if [ -n "$rel_path" ]; then
89+
echo "$TS $rel_path" >> "$LOG_FILE"
90+
fi
91+
done <<< "$PATHS"
92+
93+
exit 0

.claude/settings.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,16 @@
6262
"timeout": 5
6363
}
6464
]
65+
},
66+
{
67+
"matcher": "Bash",
68+
"hooks": [
69+
{
70+
"type": "command",
71+
"command": "bash \"$CLAUDE_PROJECT_DIR/.claude/hooks/track-moves.sh\"",
72+
"timeout": 5
73+
}
74+
]
6575
}
6676
]
6777
}

roadmap/BACKLOG.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Codegraph Feature Backlog
2+
3+
**Last updated:** 2026-02-23
4+
**Source:** Features derived from [COMPETITIVE_ANALYSIS.md](../generated/COMPETITIVE_ANALYSIS.md) and internal roadmap discussions.
5+
6+
---
7+
8+
## How to Read This Backlog
9+
10+
Each item has a short title, description, category, expected benefit, and four assessment columns left blank for prioritization review:
11+
12+
| Column | Meaning |
13+
|--------|---------|
14+
| **Zero-dep** | Can this feature be implemented without adding new runtime dependencies to the project? A checkmark means it builds entirely on what we already ship (tree-sitter, SQLite, existing AST). Blank means it needs evaluation. Features that require new deps raise the install footprint and maintenance burden — they need stronger justification. |
15+
| **Foundation-aligned** | Does this feature align with the [FOUNDATION.md](../FOUNDATION.md) core principles? Specifically: does it keep the graph always-current (P1), maintain zero-cost core with optional LLM enhancement (P4), respect embeddable-first design (P5), and stay honest about what we are — a code intelligence engine, not an application (P8)? A checkmark means full alignment. An X means it conflicts with at least one principle and needs a deliberate exception. |
16+
| **Problem-fit (1-5)** | How directly does this feature address the core problem from our README: *AI coding assistants waste tokens re-orienting themselves in large codebases, hallucinate dependencies, and miss blast radius.* A 5 means it directly reduces token waste, prevents hallucinated deps, or catches breakage. A 1 means it's tangential — nice to have but doesn't solve the stated problem. |
17+
| **Breaking** | Is this a breaking change? `Yes` means existing CLI output, API signatures, DB schema, or MCP tool contracts change in incompatible ways. `No` means it's purely additive. Breaking changes require a major version bump. |
18+
19+
---
20+
21+
## Backlog
22+
23+
| ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking |
24+
|----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|
25+
| 1 | Dead code detection | Find symbols with zero incoming edges (excluding entry points and exports). Agents constantly ask "is this used?" — the graph already has the data, we just need to surface it. Inspired by narsil-mcp, axon, codexray, CKB. | Analysis | Agents stop wasting tokens investigating dead code; developers get actionable cleanup lists without external tools | | | | |
26+
| 2 | Shortest path A→B | BFS/Dijkstra on the existing edges table to find how symbol A reaches symbol B. We have `fn` for single-node chains but no A→B pathfinding. Inspired by codexray, arbor. | Navigation | Agents can answer "how does this function reach that one?" in one call instead of manually tracing chains | | | | |
27+
| 3 | Token counting on responses | Add tiktoken-based token counts to CLI and MCP responses so agents know how much context budget each query consumed. Inspired by glimpse, arbor. | Developer Experience | Agents and users can budget context windows; enables smarter multi-query strategies without blowing context limits | | | | |
28+
| 4 | Node classification | Auto-tag symbols as Entry Point / Core / Utility / Adapter based on in-degree/out-degree patterns. High fan-in + low fan-out = Core. Zero fan-in + non-export = Dead. Inspired by arbor. | Intelligence | Agents immediately understand architectural role of any symbol without reading surrounding code — fewer orientation tokens | | | | |
29+
| 5 | TF-IDF lightweight search | SQLite FTS5 + TF-IDF as a middle tier (~50MB) between "no search" and full transformer embeddings (~500MB). Provides decent keyword search with near-zero overhead. Inspired by codexray. | Search | Users get useful search without the 500MB embedding model download; faster startup for small projects | | | | |
30+
| 6 | Formal code health metrics | Cyclomatic complexity, Maintainability Index, and Halstead metrics per function — we already parse the AST, the data is there. Inspired by code-health-meter (published in ACM TOSEM 2025). | Analysis | Agents can prioritize refactoring targets; `hotspots` becomes richer with quantitative health scores per function | | | | |
31+
| 7 | OWASP/CWE pattern detection | Security pattern scanning on the existing AST — hardcoded secrets, SQL injection patterns, eval usage, XSS sinks. Lightweight static rules, not full taint analysis. Inspired by narsil-mcp, CKB. | Security | Catches low-hanging security issues during `diff-impact`; agents can flag risky patterns before they're committed | | | | |
32+
| 8 | Optional LLM provider integration | Bring-your-own provider (OpenAI, Anthropic, Ollama, etc.) for richer embeddings and AI-powered search. Enhancement layer only — core graph never depends on it. Inspired by code-graph-rag, autodev-codebase. | Search | Semantic search quality jumps significantly with provider embeddings; users who already pay for an LLM get better results at no extra cost | | | | |
33+
| 9 | Git change coupling | Analyze git history for files/functions that always change together. Surfaces hidden dependencies that the static graph can't see. Enhances `diff-impact` with historical co-change data. Inspired by axon. | Analysis | `diff-impact` catches more breakage by including historically coupled files; agents get a more complete blast radius picture | | | | |
34+
| 10 | Interactive HTML visualization | `codegraph viz` opens an interactive force-directed graph in the browser (vis.js or Cytoscape.js). Zoom, pan, filter by module, click to inspect. Inspired by autodev-codebase, CodeVisualizer. | Visualization | Developers and teams can visually explore architecture; useful for onboarding, code reviews, and spotting structural problems | | | | |
35+
| 11 | Community detection | Leiden/Louvain algorithm to discover natural module boundaries vs actual file organization. Reveals which symbols are tightly coupled and whether the directory structure matches. Inspired by axon, GitNexus, CodeGraphMCPServer. | Intelligence | Surfaces architectural drift — when directory structure no longer matches actual dependency clusters; guides refactoring | | | | |
36+
| 12 | Execution flow tracing | Framework-aware entry point detection (Express routes, CLI commands, event handlers) + BFS flow tracing from entry to leaf. Inspired by axon, GitNexus, code-context-mcp. | Navigation | Agents can answer "what happens when a user hits POST /login?" by tracing the full execution path in one query | | | | |
37+
| 13 | Architecture boundary rules | User-defined rules for allowed/forbidden dependencies between modules (e.g., "controllers must not import from other controllers"). Violations flagged in `diff-impact` and CI. Inspired by codegraph-rust, stratify. | Architecture | Prevents architectural decay in CI; agents are warned before introducing forbidden cross-module dependencies | | | | |
38+
| 14 | Dataflow analysis | Define/use chains and flows_to/returns/mutates edge types. Tracks how data moves through functions, not just call relationships. Major analysis depth increase. Inspired by codegraph-rust. | Analysis | Enables taint-like analysis, more precise impact analysis, and answers "where does this value end up?" | | | | |
39+
| 15 | Hybrid BM25 + semantic search | Combine BM25 keyword matching with embedding-based semantic search using Reciprocal Rank Fusion. Better recall than either approach alone. Inspired by GitNexus, claude-context-local. | Search | Search results improve dramatically — keyword matches catch exact names, embeddings catch conceptual matches, RRF merges both | | | | |
40+
| 16 | Branch structural diff | Compare code structure between two branches using git worktrees. Show added/removed/changed symbols and their impact. Inspired by axon. | Analysis | Teams can review structural impact of feature branches before merge; agents get branch-aware context | | | | |
41+
| 17 | Multi-file coordinated rename | Rename a symbol across all call sites, validated against the graph structure. Inspired by GitNexus. | Refactoring | Safe renames without relying on LSP or IDE — works in CI, agent loops, and headless environments | | | | |
42+
| 18 | CODEOWNERS integration | Map graph nodes to CODEOWNERS entries. Show who owns each function, surface ownership boundaries in `diff-impact`. Inspired by CKB. | Developer Experience | `diff-impact` tells agents which teams to notify; ownership-aware impact analysis reduces missed reviews | | | | |
43+
| 19 | Auto-generated context files | Generate structural summaries (AGENTS.md, CLAUDE.md sections) from the graph — module descriptions, key entry points, architecture overview. Inspired by GitNexus. | Intelligence | New contributors and AI agents get an always-current project overview without manual documentation effort | | | | |
44+
| 20 | Streaming / chunked results | Support streaming output for large query results so MCP clients and programmatic consumers can process incrementally. | Embeddability | Large codebases don't blow up agent context windows; consumers process results as they arrive instead of waiting for the full payload | | | | |
45+
46+
---
47+
48+
## Scoring Guide
49+
50+
When filling in the assessment columns during a prioritization session:
51+
52+
**Zero-dep checklist:**
53+
- Does it use only tree-sitter AST data we already extract? → likely zero-dep
54+
- Does it need a new npm package at runtime? → not zero-dep
55+
- Does it need git CLI access? → acceptable (git is already assumed)
56+
- Does it need a new WASM module or native addon? → not zero-dep
57+
58+
**Foundation alignment red flags:**
59+
- Adds a cloud API call to the core pipeline → violates P1 and P4
60+
- Requires Docker, external DB, or non-npm toolchain → violates zero-infrastructure goal
61+
- Generates code, edits files, or makes decisions → violates P8 (we're not an agent)
62+
- Breaks programmatic API contract → check against P5 (embeddable-first)
63+
64+
**Problem-fit rubric:**
65+
- **5** — Directly reduces token waste, prevents hallucinated dependencies, or catches blast-radius breakage
66+
- **4** — Improves agent accuracy or reduces round-trips for common tasks
67+
- **3** — Useful for developers and agents but doesn't address the core "lost AI" problem
68+
- **2** — Nice-to-have; improves the tool but tangential to the stated problem
69+
- **1** — Cool feature, but doesn't help AI agents navigate codebases better
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)