Skip to content

Commit b25b612

Browse files
author
jgstern-agent
committed
feat: add dead-code-maybe subcommand (WI-fisam)
New CLI command that finds production callables unreachable from entrypoints via BFS over call edges. Usage: hypergumbo dead-code-maybe . # Text output hypergumbo dead-code-maybe . --format json # JSON output hypergumbo dead-code-maybe . --seeds all # Include tests as seeds Seed set is configurable: entrypoints (default), tests, or all. Dead candidates are ranked by LOC (largest first). Foundation for downstream dead-code prospector items (WI-zafab, WI-pimig, WI-hadap, WI-zimum). Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
1 parent 40faef5 commit b25b612

4 files changed

Lines changed: 502 additions & 7 deletions

File tree

.ci/affected-tests.txt

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
# Test selection manifest
2-
# Generated by smart-test at 2026-04-10T08:58:31-04:00
2+
# Generated by smart-test at 2026-04-10T10:23:49-04:00
33
# Mode: targeted
44
# Baseline: e2fb9e02102c793608778dce538cc121418600fc
5-
# Changed files: 3
6-
# Changed source files: 1
7-
# Selected tests: 48
5+
# Changed files: 6
6+
# Changed source files: 2
7+
# Selected tests: 50
88
#
99
# === CHANGED_SOURCE_FILES ===
10+
packages/hypergumbo-core/src/hypergumbo_core/cli.py
1011
packages/hypergumbo-core/src/hypergumbo_core/framework_patterns.py
1112
# === SELECTED_TESTS ===
1213
packages/hypergumbo-core/tests/BRANCHES_test_framework_patterns.py
@@ -15,6 +16,7 @@ packages/hypergumbo-core/tests/test_cli_basic.py
1516
packages/hypergumbo-core/tests/test_cli_cache.py
1617
packages/hypergumbo-core/tests/test_cli_commands.py
1718
packages/hypergumbo-core/tests/test_cli_config.py
19+
packages/hypergumbo-core/tests/test_cli_dead_code.py
1820
packages/hypergumbo-core/tests/test_cli_explain.py
1921
packages/hypergumbo-core/tests/test_cli_io_boundaries.py
2022
packages/hypergumbo-core/tests/test_cli_routes.py
@@ -38,6 +40,7 @@ packages/hypergumbo-core/tests/test_sketch.py
3840
packages/hypergumbo-core/tests/test_sketch_sanity.py
3941
packages/hypergumbo-core/tests/test_slice_tier_filter.py
4042
packages/hypergumbo-core/tests/test_stable_shape_ids.py
43+
packages/hypergumbo-core/tests/test_supply_chain.py
4144
packages/hypergumbo-lang-common/tests/BRANCHES_test_dart.py
4245
packages/hypergumbo-lang-common/tests/BRANCHES_test_elixir.py
4346
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_cpp.py

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ This changelog tracks the **tool version** (package releases). The **schema vers
2323

2424
#### Behavior map
2525

26+
- **`hypergumbo dead-code-maybe` subcommand** (WI-fisam): finds production callables unreachable from entrypoints via BFS over call edges. Supports `--seeds {entrypoints,tests,all}` for configurable seed sets, `--format {text,json}` output, and `--min-confidence` for entrypoint filtering. Dead candidates are ranked by LOC (largest unreachable functions first). Foundation for downstream dead-code prospector tooling.
2627
- **Co-located test files classified as tier 1** (WI-gifuz): files matching test naming conventions (`_test.go`, `.test.js`, `.spec.ts`, `_spec.rb`, `tests.rs`) that are co-located with source code are now classified as tier 1 (FIRST_PARTY) with `is_test=True`, instead of tier 2 (INTERNAL_DEP). Files in dedicated test directories (`tests/`, `spec/`, `__tests__/`) remain tier 2. This fixes a bakeoff signal where all tier-2-only nodes in Go repos were `_test.go` files, making tier filtering useless for distinguishing first-party tests from actual third-party dependencies.
2728
- **Event-sourcing linker expansion** (WI-zadat): extends event detection beyond Spring/JS/Django to cover Guava EventBus (`bus.post()`, `@Subscribe`), generic Java event bus patterns (`fire()`/`dispatch()`/`register()`/`addListener()`), Go channel-based events (`ch <- value`/`<-ch`), and Go event bus method calls (`Publish()`/`Subscribe()`/`Emit()`/`On()`). Go `.go` files are now scanned for event patterns alongside Python, JS/TS, and Java.
2829
- **Go closure wrapper edges** (WI-nikul): when a route registration passes a handler through a closure wrapper (e.g., `r.Get("/query", wrapAgent(api.query))`), the wrapper is now visible in the call graph. The analyzer detects `func`-typed closure variables declared via `:=`, creates a function Symbol for the wrapper with `middleware` concept metadata, records `wrapper_name` in route metadata, and emits `wraps` edges from the wrapper symbol to the inner handler. Covers both Gin/Echo/Fiber and Gorilla mux/stdlib route patterns.

packages/hypergumbo-core/src/hypergumbo_core/cli.py

Lines changed: 211 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3523,6 +3523,186 @@ def cmd_test_coverage(args: argparse.Namespace) -> int:
35233523
return 0
35243524

35253525

3526+
def cmd_dead_code_maybe(args: argparse.Namespace) -> int:
3527+
"""Find potentially dead code: production callables unreachable from entrypoints.
3528+
3529+
Computes: dead = production_callables - reachable_from(seed_set)
3530+
3531+
The seed set is configurable via ``--seeds``:
3532+
- ``entrypoints``: CLI mains, HTTP routes, framework hooks (default)
3533+
- ``tests``: test functions only
3534+
- ``all``: both entrypoints AND tests
3535+
3536+
Uses BFS over call edges from seed symbols. Functions not visited
3537+
are flagged as potentially dead. Results are ranked by lines of code
3538+
(larger unreachable functions first).
3539+
"""
3540+
repo_root = Path(args.path).resolve()
3541+
3542+
input_path, was_cached, generated_files = _get_or_run_analysis(
3543+
repo_root,
3544+
explicit_input=args.input,
3545+
show_progress=True,
3546+
)
3547+
if input_path is None:
3548+
print(f"Error: Input file not found: {args.input}", file=sys.stderr)
3549+
return 1
3550+
3551+
behavior_map = json.loads(input_path.read_text())
3552+
nodes = behavior_map.get("nodes", [])
3553+
edges = behavior_map.get("edges", [])
3554+
# Identify production callable symbols (exclude test files)
3555+
production_symbols: dict[str, dict] = {}
3556+
test_symbols: set[str] = set()
3557+
for node in nodes:
3558+
path = node.get("path", "")
3559+
kind = node.get("kind", "")
3560+
if kind not in ("function", "method"):
3561+
continue
3562+
if _is_test_path(path):
3563+
test_symbols.add(node["id"])
3564+
else:
3565+
production_symbols[node["id"]] = node
3566+
3567+
if not production_symbols:
3568+
print("No production functions found to analyze.", file=sys.stderr)
3569+
return 0
3570+
3571+
# Build seed set based on --seeds flag
3572+
seed_ids: set[str] = set()
3573+
seeds_mode = getattr(args, "seeds", "entrypoints")
3574+
3575+
if seeds_mode in ("entrypoints", "all"):
3576+
from .entrypoints import detect_entrypoints
3577+
from .ir import Symbol, Edge, Span
3578+
3579+
# Convert dict nodes/edges to IR objects for detect_entrypoints
3580+
ir_nodes = []
3581+
for n in nodes:
3582+
span_data = n.get("span", {})
3583+
sym = Symbol(
3584+
id=n["id"],
3585+
name=n.get("name", ""),
3586+
kind=n.get("kind", ""),
3587+
language=n.get("language", ""),
3588+
path=n.get("path", ""),
3589+
span=Span(
3590+
start_line=span_data.get("start_line", 0),
3591+
end_line=span_data.get("end_line", 0),
3592+
start_col=span_data.get("start_col", 0),
3593+
end_col=span_data.get("end_col", 0),
3594+
),
3595+
meta=n.get("meta"),
3596+
)
3597+
ir_nodes.append(sym)
3598+
3599+
ir_edges = []
3600+
for e in edges:
3601+
ir_edges.append(Edge(
3602+
id=e.get("id", ""),
3603+
src=e.get("src", ""),
3604+
dst=e.get("dst", ""),
3605+
edge_type=e.get("type", "calls"),
3606+
line=e.get("line", 0),
3607+
confidence=e.get("confidence", 0.85),
3608+
))
3609+
3610+
min_conf = getattr(args, "min_confidence", 0.0)
3611+
entrypoints = detect_entrypoints(ir_nodes, ir_edges)
3612+
for ep in entrypoints:
3613+
if ep.confidence >= min_conf:
3614+
seed_ids.add(ep.symbol_id)
3615+
3616+
if seeds_mode in ("tests", "all"):
3617+
seed_ids.update(test_symbols)
3618+
3619+
# BFS from seeds through call edges
3620+
call_graph: dict[str, list[str]] = {}
3621+
for edge in edges:
3622+
if edge.get("type") == "calls":
3623+
src = edge.get("src", "")
3624+
dst = edge.get("dst", "")
3625+
if src and dst:
3626+
call_graph.setdefault(src, []).append(dst)
3627+
3628+
reachable: set[str] = set()
3629+
queue = list(seed_ids)
3630+
visited: set[str] = set(seed_ids)
3631+
while queue:
3632+
current = queue.pop()
3633+
reachable.add(current)
3634+
for neighbor in call_graph.get(current, []):
3635+
if neighbor not in visited:
3636+
visited.add(neighbor)
3637+
queue.append(neighbor)
3638+
3639+
# Dead candidates = production symbols NOT reachable
3640+
dead_candidates = []
3641+
for sym_id, node in production_symbols.items():
3642+
if sym_id not in reachable:
3643+
dead_candidates.append(node)
3644+
3645+
# Sort by LOC descending (larger unreachable functions first)
3646+
dead_candidates.sort(key=lambda n: -(n.get("lines_of_code") or 1))
3647+
3648+
# Summary stats
3649+
total_production = len(production_symbols)
3650+
total_reachable = len(reachable & set(production_symbols.keys()))
3651+
total_dead = len(dead_candidates)
3652+
total_entrypoints = len(seed_ids)
3653+
3654+
if args.format == "json":
3655+
output = {
3656+
"summary": {
3657+
"total_production_functions": total_production,
3658+
"reachable_functions": total_reachable,
3659+
"dead_candidates": total_dead,
3660+
"seed_count": total_entrypoints,
3661+
"seeds_mode": seeds_mode,
3662+
"dead_percent": round(total_dead / max(total_production, 1) * 100, 1),
3663+
},
3664+
"dead_candidates": [
3665+
{
3666+
"name": n.get("name", ""),
3667+
"path": n.get("path", ""),
3668+
"language": n.get("language", ""),
3669+
"kind": n.get("kind", ""),
3670+
"lines_of_code": n.get("lines_of_code"),
3671+
"span": n.get("span"),
3672+
"id": n["id"],
3673+
}
3674+
for n in dead_candidates
3675+
],
3676+
}
3677+
print(json.dumps(output, indent=2))
3678+
else:
3679+
# Text format
3680+
print(f"Dead Code Analysis (seeds: {seeds_mode})")
3681+
print(f"{'=' * 50}")
3682+
print(f"Production functions: {total_production}")
3683+
print(f"Entrypoints/seeds: {total_entrypoints}")
3684+
print(f"Reachable: {total_reachable}")
3685+
print(f"Potentially dead: {total_dead} "
3686+
f"({total_dead / max(total_production, 1) * 100:.1f}%)")
3687+
print()
3688+
3689+
if dead_candidates:
3690+
print("Potentially dead functions (by LOC, largest first):")
3691+
print(f"{'─' * 70}")
3692+
for n in dead_candidates[:50]:
3693+
name = n.get("name", "?")
3694+
path = n.get("path", "?")
3695+
loc = n.get("lines_of_code") or "?"
3696+
print(f" {name:<30} {path:<30} {loc:>5} LOC")
3697+
3698+
if len(dead_candidates) > 50: # pragma: no cover
3699+
print(f" ... and {len(dead_candidates) - 50} more")
3700+
else:
3701+
print("No potentially dead functions found.")
3702+
3703+
return 0
3704+
3705+
35263706
def build_parser() -> argparse.ArgumentParser:
35273707
# Main parser with comprehensive help
35283708
main_description = """\
@@ -4422,6 +4602,34 @@ def build_parser() -> argparse.ArgumentParser:
44224602
)
44234603
p_test_cov.set_defaults(func=cmd_test_coverage)
44244604

4605+
# hypergumbo dead-code-maybe
4606+
p_dead_code = sub.add_parser(
4607+
"dead-code-maybe",
4608+
help="Find potentially dead code unreachable from entrypoints",
4609+
formatter_class=argparse.RawDescriptionHelpFormatter,
4610+
)
4611+
p_dead_code.add_argument(
4612+
"path", nargs="?", default=".",
4613+
help="Path to repo root (default: current directory)",
4614+
)
4615+
p_dead_code.add_argument(
4616+
"--input", default=None,
4617+
help="Input behavior map file (default: auto-detect cached results)",
4618+
)
4619+
p_dead_code.add_argument(
4620+
"--format", choices=["text", "json"], default="text",
4621+
help="Output format (default: text)",
4622+
)
4623+
p_dead_code.add_argument(
4624+
"--seeds", choices=["entrypoints", "tests", "all"], default="entrypoints",
4625+
help="Seed set for reachability analysis (default: entrypoints)",
4626+
)
4627+
p_dead_code.add_argument(
4628+
"--min-confidence", type=float, default=0.0,
4629+
help="Minimum entrypoint confidence threshold (default: 0.0)",
4630+
)
4631+
p_dead_code.set_defaults(func=cmd_dead_code_maybe)
4632+
44254633
# hypergumbo symbols
44264634
symbols_epilog = """\
44274635
Examples:
@@ -4666,8 +4874,8 @@ def build_parser() -> argparse.ArgumentParser:
46664874
# Assign subcommands to groups for help formatting
46674875
# Core analysis commands (group_order=0) - ordered by suborder
46684876
core_cmds = ["sketch", "run", "slice", "search", "routes", "explain",
4669-
"catalog", "config", "test-coverage", "symbols", "compact",
4670-
"io-boundaries", "verify-claims"]
4877+
"catalog", "config", "test-coverage", "dead-code-maybe",
4878+
"symbols", "compact", "io-boundaries", "verify-claims"]
46714879
for i, cmd in enumerate(core_cmds):
46724880
_set_subparser_group(sub, cmd, "core", 0, suborder=i)
46734881

@@ -5363,7 +5571,7 @@ def main(argv=None) -> int:
53635571
print_all_help(parser)
53645572
return 0
53655573

5366-
subcommands = {"run", "slice", "search", "routes", "explain", "catalog", "config", "sketch", "build-grammars", "install-gitleaks", "uninstall-gitleaks", "cache-status", "cache-clear", "install-embeddings", "uninstall-embeddings", "add-extras", "remove-extras", "test-coverage", "symbols", "compact", "io-boundaries", "verify-claims"}
5574+
subcommands = {"run", "slice", "search", "routes", "explain", "catalog", "config", "sketch", "build-grammars", "install-gitleaks", "uninstall-gitleaks", "cache-status", "cache-clear", "install-embeddings", "uninstall-embeddings", "add-extras", "remove-extras", "test-coverage", "dead-code-maybe", "symbols", "compact", "io-boundaries", "verify-claims"}
53675575

53685576
# If no args, or first arg is not a subcommand (and not a flag), use sketch mode
53695577
if not argv or (argv[0] not in subcommands and not argv[0].startswith("-")):

0 commit comments

Comments
 (0)