Skip to content

Commit 56c4bb6

Browse files
author
jgstern-agent
committed
feat(kotlin): detect extension functions and mark is_exported (WI-fuhav)
Kotlin extension functions (``fun Receiver.name() { }``) were invisible to the dead-code-maybe BFS: the static call graph never reaches them because nothing textually "calls" an extension — the runtime dispatches based on receiver type. Spring-boot's SpringApplicationExtensions.kt was the top WI-tubot prospector example (8339 cross-language hits, dominated the top-uncategorized list because ``with`` is a Kotlin stdlib keyword). Detect extension functions by walking the function_declaration children in order: a ``user_type`` child that appears BEFORE the ``identifier`` (name) field is the distinguishing marker. Plain functions have no preceding receiver; class method declarations use a different AST path via _get_enclosing_class and are unaffected. When detected: - Set Symbol.is_exported=True (extension functions are inherently part of the public API of their receiver type — something external invokes them). - Record the receiver type text under meta.extension_receiver for future linker use (full receiver-typed dispatch resolution is a separate follow-up item). Combined with the WI-zimum --seeds exports mode, these symbols now count as reachable via the public-API seed set in dead-code-maybe. New helper in kotlin.py: - _extract_kotlin_receiver_type(func_node, source): walks children in AST order, returning the user_type text when it precedes the identifier, None otherwise. Tests (3 in TestKotlinExtensionFunctions): - detects_extension_function: fun SpringApplication.configure(...) - plain_function_not_flagged: fun greet() is NOT flagged - extension_function_on_generic_receiver: fun List<Int>.sumSafe() Refs: WI-fuhav-rigah-mojit-fuvom-vohim-jokiv-tosuj-ponah Related: WI-tubot prospector 2026-04-11, WI-zimum Phase 1+2 Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
1 parent a37d112 commit 56c4bb6

4 files changed

Lines changed: 138 additions & 3 deletions

File tree

.ci/affected-tests.txt

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Test selection manifest
2-
# Generated by smart-test at 2026-04-11T11:18:54-04:00
2+
# Generated by smart-test at 2026-04-11T11:42:46-04:00
33
# Mode: targeted
44
# Baseline: 94a6f0ca6c69d3de637c39e8e0667dff2382edbf
5-
# Changed files: 26
6-
# Changed source files: 9
5+
# Changed files: 29
6+
# Changed source files: 10
77
# Selected tests: 200
88
#
99
# === CHANGED_SOURCE_FILES ===
@@ -15,6 +15,7 @@ packages/hypergumbo-core/src/hypergumbo_core/linkers/go_memberlist.py
1515
packages/hypergumbo-core/src/hypergumbo_core/sketch.py
1616
packages/hypergumbo-core/src/hypergumbo_core/supply_chain.py
1717
packages/hypergumbo-lang-mainstream/src/hypergumbo_lang_mainstream/js_ts.py
18+
packages/hypergumbo-lang-mainstream/src/hypergumbo_lang_mainstream/kotlin.py
1819
packages/hypergumbo-lang-mainstream/src/hypergumbo_lang_mainstream/py.py
1920
# === SELECTED_TESTS ===
2021
packages/hypergumbo-core/tests/BRANCHES_test_compact.py

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ This changelog tracks the **tool version** (package releases). The **schema vers
4040

4141
- **Content-based `@generated` header detection** (WI-pofin): `classify_file` now scans the first 4 KiB of text-like source files for canonical generated-code headers in addition to the existing path-pattern check. Covered markers: `// @generated`, `/* @generated`, `# @generated`, `<!-- @generated -->` (JS/TS/Go/Python/Ruby/HTML/etc.); Go-stdlib `// Code generated by <tool>. DO NOT EDIT.`; and the alternate `Autogenerated`/`AUTO-GENERATED`/`automatically generated` spellings — all required to sit at a comment-line start so a mid-file prose mention of "generated" is not flagged. Content scan is gated on extension (`_CONTENT_SCAN_EXTS`, 36 text-like source suffixes) so images, archives, and other binary files are not opened. Bounded-cost: one 4 KiB read per scanned file. Follow-up for files flagged by header but not by path — e.g., repo code-gen that emits to hand-written directories.
4242
- **`is_generated_file` detection for TypeScript `openapi-gen/` output** (WI-vubad): extends `GENERATED_CODE_PATTERNS` in `supply_chain.py` with `(?:^|/)openapi-gen/.*\.(?:ts|tsx|js|jsx|mjs|cjs)$`. Airflow, FastAPI, and similar Python web projects ship a generated TypeScript SDK under an `openapi-gen/` directory (typical files: `request.ts`, `CancelablePromise.ts`, `ApiClient.ts`). The WI-tubot prospector run 2026-04-11 surfaced ~200 dead-code-maybe false positives per airflow run from a single `openapi-gen/requests/core/` directory; flagging those paths as generated demotes them out of the candidate ranking. Hand-written `request.ts` outside an `openapi-gen/` directory is not affected.
43+
- **Kotlin extension functions → `Symbol.is_exported`** (WI-fuhav): the Kotlin analyzer detects extension functions (`fun Receiver.name() { }`) by walking the `function_declaration` children in order — a `user_type` child that appears before the `identifier`/name field is the distinctive marker of an extension receiver. Extension-function symbols get `is_exported=True` (they're inherently part of the public API of their receiver type) and carry the receiver text under `meta.extension_receiver`. Addresses the WI-tubot prospector finding where spring-boot's `SpringApplicationExtensions.kt#with` and similar symbols dominated the dead-code candidate list because nothing in the static call graph reaches them. Follow-up: full receiver-typed dispatch resolution (emit edges from `receiver.extMethod()` call sites to matching extension functions in scope) is a separate larger item.
4344
- **TypeScript/JavaScript `export``Symbol.is_exported`** (WI-zimum Phase 2b, WI-nimug): the js_ts analyzer now marks `Symbol.is_exported=True` for any top-level declaration appearing under an `export_statement`. Covered syntaxes: `export function`, `export class`, `export const/let/var` (all variable declarators), `export { foo, bar }` clauses (including `export { foo as bar }` where the alias is used), `export default <named_decl>`, and `export default <identifier>`. Anonymous `export default` expressions without a name (`export default () => {}`, `export default 42`) are not flagged because there is no symbol name to join on. The module pseudo-node is never flagged. Pairs with WI-gipag (Python `__all__`) to close out WI-zimum Phase 2 for the two languages with the highest dead-code-maybe false-positive rates in the WI-tubot prospector cohort.
4445
- **Python `__all__` → `Symbol.is_exported`** (WI-zimum Phase 2, WI-gipag): the Python analyzer now decides `is_exported` for top-level classes and functions based on the module's `__all__` list (when present) or the leading-underscore convention (when absent). When a module defines `__all__ = ["foo", "bar"]`, only symbols whose name appears in the list are flagged exported — even if their name is public. When `__all__` is absent, any top-level (col_offset == 0) class/function whose name doesn't start with `_` is flagged exported; nested functions (col_offset > 0) are never flagged regardless of name. Supports `__all__` as a list literal, tuple literal, or annotated assignment (`__all__: list[str] = [...]`). Non-literal forms (`__all__ = other.__all__`, list comprehensions) are treated as "no __all__" and fall back to the underscore rule. Combined with the WI-zimum `--seeds exports` dead-code-maybe mode, this drops Python framework libraries (Airflow, Django, Superset — observed at 70–83 % dead rate in the WI-tubot prospector run) out of the false-positive bucket. Follow-up: TypeScript/JavaScript top-level `export` declarations (tracked separately).
4546
- **FFI-signature auto-flag for `dead-code-maybe` ranking** (WI-hadap Heuristic 2): dead-code candidates whose decorators or modifiers match FFI markers get a new `ffi_signature: true` field in the JSON output and a +10 rank boost so they sort above regular orphans. An FFI-shaped candidate that isn't reached from any entrypoint is almost definitionally a "free hit" — a missing cross-language linker edge. Covered markers: Rust `#[no_mangle]`, `#[pyo3*]`, `#[napi]`, `#[wasm_bindgen]`; Python `@ctypes.CFUNCTYPE`, `@cffi.*`, `@cython.*`; C/C++ `JNIEXPORT`, `extern`/`extern "C"`; C# `[DllImport]`; Java `native` modifier. Exact-match substring check on the decorator's name (so `pyo3::pyfunction` matches via the `pyo3` fragment) and exact-match string check on modifiers.

packages/hypergumbo-lang-mainstream/src/hypergumbo_lang_mainstream/kotlin.py

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,53 @@ def _find_child_by_field(node: "tree_sitter.Node", field_name: str) -> Optional[
7878
return node.child_by_field_name(field_name)
7979

8080

81+
def _extract_kotlin_receiver_type(
82+
func_node: "tree_sitter.Node",
83+
source: bytes,
84+
) -> Optional[str]:
85+
"""Return the receiver type name if *func_node* is an extension function.
86+
87+
WI-fuhav: tree-sitter-kotlin parses ``fun Receiver.name(...) { }``
88+
as a ``function_declaration`` whose children run as
89+
``[fun, user_type, '.', identifier, function_value_parameters, ...]``.
90+
The ``user_type`` child sits BEFORE the ``identifier`` that carries
91+
the function name — this is the distinguishing marker of an
92+
extension function. Regular functions (``fun name() {}``) have
93+
no receiver sibling.
94+
95+
Returns the text of the receiver user_type when detected, or
96+
None for plain (non-extension) functions.
97+
"""
98+
saw_user_type = False
99+
receiver_text: Optional[str] = None
100+
for child in func_node.children:
101+
if child.type in ("simple_identifier", "identifier") and saw_user_type:
102+
# We reached the function name after already seeing the
103+
# receiver user_type — confirmed extension function.
104+
return receiver_text
105+
if child.type == "user_type":
106+
# Record (but don't commit yet — the function_declaration
107+
# body may contain user_type nodes further down that
108+
# aren't receivers). We only commit if the NEXT identifier
109+
# arrives before any structural block starts.
110+
saw_user_type = True
111+
receiver_text = (
112+
child.text.decode("utf-8", errors="replace")
113+
if child.text else None
114+
)
115+
elif child.type in (
116+
"function_value_parameters",
117+
"function_body",
118+
"type_parameters",
119+
"modifiers",
120+
"{",
121+
):
122+
# These sit after the name — if we hit one before confirming
123+
# the name, there was no preceding receiver.
124+
return None
125+
return None # pragma: no cover - defensive fallback for malformed AST
126+
127+
81128
# Kotlin modifier keyword types grouped by category.
82129
# tree-sitter-kotlin wraps each in a typed node (visibility_modifier,
83130
# inheritance_modifier, etc.) whose single child is the keyword.
@@ -587,6 +634,23 @@ def _extract_symbols_from_file(
587634
if annotations:
588635
func_meta = {"decorators": annotations}
589636

637+
# WI-fuhav: detect Kotlin extension functions.
638+
# fun Receiver.name() {} is an extension function; the
639+
# receiver type is the user_type sibling before the
640+
# name identifier. Extension functions are inherently
641+
# part of the public API of their receiver — something
642+
# external calls them — so mark is_exported=True so
643+
# dead-code-maybe's --seeds exports mode treats them
644+
# as reachable. Also record the receiver type in meta
645+
# for future linker use.
646+
receiver_type = _extract_kotlin_receiver_type(node, source)
647+
func_is_exported = False
648+
if receiver_type is not None:
649+
if func_meta is None:
650+
func_meta = {}
651+
func_meta["extension_receiver"] = receiver_type
652+
func_is_exported = True
653+
590654
symbol = Symbol(
591655
id=make_symbol_id("kotlin", str(file_path), start_line, end_line, full_name, kind),
592656
name=full_name,
@@ -606,6 +670,7 @@ def _extract_symbols_from_file(
606670
modifiers=modifiers,
607671
meta=func_meta,
608672
shape_id=_analyzer.compute_shape_id(node),
673+
is_exported=func_is_exported,
609674
)
610675
analysis.symbols.append(symbol)
611676
analysis.symbol_by_name[func_name] = symbol

packages/hypergumbo-lang-mainstream/tests/test_kotlin.py

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,74 @@ def test_extracts_function(self, tmp_path: Path) -> None:
7575
assert "main" in func_names
7676
assert "helper" in func_names
7777

78+
class TestKotlinExtensionFunctions:
79+
"""WI-fuhav: Kotlin extension function detection (``fun Receiver.name()``)."""
80+
81+
def test_detects_extension_function(self, tmp_path: Path) -> None:
82+
"""A ``fun Receiver.name()`` declaration is flagged as an extension."""
83+
from hypergumbo_lang_mainstream.kotlin import analyze_kotlin
84+
85+
kt_file = tmp_path / "SpringApplicationExtensions.kt"
86+
kt_file.write_text(
87+
"package org.springframework.boot\n\n"
88+
"class SpringApplication\n\n"
89+
"fun SpringApplication.configure(block: () -> Unit) {\n"
90+
" block()\n"
91+
"}\n",
92+
)
93+
94+
result = analyze_kotlin(tmp_path)
95+
funcs = [s for s in result.symbols if s.kind == "function"]
96+
configure = next(
97+
(s for s in funcs if s.name == "configure"), None,
98+
)
99+
assert configure is not None
100+
assert configure.is_exported is True
101+
assert configure.meta is not None
102+
assert configure.meta.get("extension_receiver") == "SpringApplication"
103+
104+
def test_plain_function_not_flagged(self, tmp_path: Path) -> None:
105+
"""A regular (non-extension) top-level function is not flagged."""
106+
from hypergumbo_lang_mainstream.kotlin import analyze_kotlin
107+
108+
kt_file = tmp_path / "Main.kt"
109+
kt_file.write_text(
110+
"fun greet(name: String) {\n"
111+
" println(\"hello, $name\")\n"
112+
"}\n",
113+
)
114+
result = analyze_kotlin(tmp_path)
115+
greet = next(
116+
(s for s in result.symbols
117+
if s.kind == "function" and s.name == "greet"),
118+
None,
119+
)
120+
assert greet is not None
121+
assert greet.is_exported is False
122+
assert (greet.meta or {}).get("extension_receiver") is None
123+
124+
def test_extension_function_on_generic_receiver(
125+
self, tmp_path: Path,
126+
) -> None:
127+
"""Extension on a generic receiver (``List<T>``) is detected."""
128+
from hypergumbo_lang_mainstream.kotlin import analyze_kotlin
129+
130+
kt_file = tmp_path / "Ext.kt"
131+
kt_file.write_text(
132+
"fun List<Int>.sumSafe(): Int = this.fold(0) { acc, x -> acc + x }\n",
133+
)
134+
result = analyze_kotlin(tmp_path)
135+
sum_safe = next(
136+
(s for s in result.symbols
137+
if s.kind == "function" and s.name == "sumSafe"),
138+
None,
139+
)
140+
assert sum_safe is not None
141+
assert sum_safe.is_exported is True
142+
# Generic receiver text is preserved in the meta.
143+
assert "List" in (sum_safe.meta or {}).get("extension_receiver", "")
144+
145+
78146
class TestKotlinClassExtraction:
79147
"""Tests for extracting Kotlin classes."""
80148

0 commit comments

Comments
 (0)