Skip to content

Commit 6773cd3

Browse files
author
jgstern-agent
committed
feat(scala): recognize secondary constructors as kind=constructor (WI-rupum)
Scala secondary constructors (def this(args) = this(...)) were being extracted as kind="method" by the function_definition handler in scala.py. They surfaced as the top-ranked dead-code-maybe candidates on Kafka (CachedPartition.this, KafkaConfig.this, FullFetchContext.this, OffsetTruncationState.this, FetchManager.this at cross_language_hits=6192) because nothing in the static call graph textually reaches them — Scala's constructor semantics dispatch secondary constructors via ``new ClassName(args)`` at construction time, outside the textual call graph. Fix: when the function_definition's identifier text is ``this`` AND an enclosing class/object/trait is present, extract it as kind="constructor" with full_name="<Type>.this". dead-code-maybe's kind filter (``kind in ("function","method")``) now skips it automatically. Also set is_exported=True as a secondary safety net so the --seeds exports mode treats secondary constructors as reachable if anything downstream looks at the flag. Tests (3 in TestScalaSecondaryConstructor): - secondary_constructor_kind_and_exported: two secondary ctors on the same class get kind="constructor" and is_exported=True. - secondary_constructor_not_marked_method: confirms the secondary ctor is NOT in the method symbol list alongside real methods. - top_level_this_not_constructor: plain top-level function is unaffected (no enclosing type → no special case). Refs: WI-rupum-dabar-lodad-rubij-bapad-biris-rarob-mufur Related: WI-tubot prospector 2026-04-11, WI-zimum Phase 1+2 Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
1 parent 56c4bb6 commit 6773cd3

4 files changed

Lines changed: 96 additions & 4 deletions

File tree

.ci/affected-tests.txt

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Test selection manifest
2-
# Generated by smart-test at 2026-04-11T11:42:46-04:00
2+
# Generated by smart-test at 2026-04-11T12:02:48-04:00
33
# Mode: targeted
44
# Baseline: 94a6f0ca6c69d3de637c39e8e0667dff2382edbf
5-
# Changed files: 29
6-
# Changed source files: 10
5+
# Changed files: 32
6+
# Changed source files: 11
77
# Selected tests: 200
88
#
99
# === CHANGED_SOURCE_FILES ===
@@ -17,6 +17,7 @@ packages/hypergumbo-core/src/hypergumbo_core/supply_chain.py
1717
packages/hypergumbo-lang-mainstream/src/hypergumbo_lang_mainstream/js_ts.py
1818
packages/hypergumbo-lang-mainstream/src/hypergumbo_lang_mainstream/kotlin.py
1919
packages/hypergumbo-lang-mainstream/src/hypergumbo_lang_mainstream/py.py
20+
packages/hypergumbo-lang-mainstream/src/hypergumbo_lang_mainstream/scala.py
2021
# === SELECTED_TESTS ===
2122
packages/hypergumbo-core/tests/BRANCHES_test_compact.py
2223
packages/hypergumbo-core/tests/BRANCHES_test_database_query.py

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ This changelog tracks the **tool version** (package releases). The **schema vers
4040

4141
- **Content-based `@generated` header detection** (WI-pofin): `classify_file` now scans the first 4 KiB of text-like source files for canonical generated-code headers in addition to the existing path-pattern check. Covered markers: `// @generated`, `/* @generated`, `# @generated`, `<!-- @generated -->` (JS/TS/Go/Python/Ruby/HTML/etc.); Go-stdlib `// Code generated by <tool>. DO NOT EDIT.`; and the alternate `Autogenerated`/`AUTO-GENERATED`/`automatically generated` spellings — all required to sit at a comment-line start so a mid-file prose mention of "generated" is not flagged. Content scan is gated on extension (`_CONTENT_SCAN_EXTS`, 36 text-like source suffixes) so images, archives, and other binary files are not opened. Bounded-cost: one 4 KiB read per scanned file. Follow-up for files flagged by header but not by path — e.g., repo code-gen that emits to hand-written directories.
4242
- **`is_generated_file` detection for TypeScript `openapi-gen/` output** (WI-vubad): extends `GENERATED_CODE_PATTERNS` in `supply_chain.py` with `(?:^|/)openapi-gen/.*\.(?:ts|tsx|js|jsx|mjs|cjs)$`. Airflow, FastAPI, and similar Python web projects ship a generated TypeScript SDK under an `openapi-gen/` directory (typical files: `request.ts`, `CancelablePromise.ts`, `ApiClient.ts`). The WI-tubot prospector run 2026-04-11 surfaced ~200 dead-code-maybe false positives per airflow run from a single `openapi-gen/requests/core/` directory; flagging those paths as generated demotes them out of the candidate ranking. Hand-written `request.ts` outside an `openapi-gen/` directory is not affected.
43+
- **Scala secondary constructors extracted as `kind="constructor"`** (WI-rupum): the Scala analyzer now recognises `def this(args) = this(...)` as a secondary-constructor declaration rather than a regular method. Such symbols get `kind="constructor"` (so `dead-code-maybe`'s `kind in ("function","method")` filter skips them automatically) AND `is_exported=True` for future-proofing. Addresses the WI-tubot prospector finding where `CachedPartition.this`, `KafkaConfig.this`, `FullFetchContext.this`, `OffsetTruncationState.this`, and `FetchManager.this` dominated the top-uncategorized Kafka dead-code list at cross_language_hits=6192, purely because nothing in the static call graph reaches them — Scala's constructor semantics dispatch secondary constructors via `new ClassName(args)` at construction time, outside the textual call graph.
4344
- **Kotlin extension functions → `Symbol.is_exported`** (WI-fuhav): the Kotlin analyzer detects extension functions (`fun Receiver.name() { }`) by walking the `function_declaration` children in order — a `user_type` child that appears before the `identifier`/name field is the distinctive marker of an extension receiver. Extension-function symbols get `is_exported=True` (they're inherently part of the public API of their receiver type) and carry the receiver text under `meta.extension_receiver`. Addresses the WI-tubot prospector finding where spring-boot's `SpringApplicationExtensions.kt#with` and similar symbols dominated the dead-code candidate list because nothing in the static call graph reaches them. Follow-up: full receiver-typed dispatch resolution (emit edges from `receiver.extMethod()` call sites to matching extension functions in scope) is a separate larger item.
4445
- **TypeScript/JavaScript `export``Symbol.is_exported`** (WI-zimum Phase 2b, WI-nimug): the js_ts analyzer now marks `Symbol.is_exported=True` for any top-level declaration appearing under an `export_statement`. Covered syntaxes: `export function`, `export class`, `export const/let/var` (all variable declarators), `export { foo, bar }` clauses (including `export { foo as bar }` where the alias is used), `export default <named_decl>`, and `export default <identifier>`. Anonymous `export default` expressions without a name (`export default () => {}`, `export default 42`) are not flagged because there is no symbol name to join on. The module pseudo-node is never flagged. Pairs with WI-gipag (Python `__all__`) to close out WI-zimum Phase 2 for the two languages with the highest dead-code-maybe false-positive rates in the WI-tubot prospector cohort.
4546
- **Python `__all__` → `Symbol.is_exported`** (WI-zimum Phase 2, WI-gipag): the Python analyzer now decides `is_exported` for top-level classes and functions based on the module's `__all__` list (when present) or the leading-underscore convention (when absent). When a module defines `__all__ = ["foo", "bar"]`, only symbols whose name appears in the list are flagged exported — even if their name is public. When `__all__` is absent, any top-level (col_offset == 0) class/function whose name doesn't start with `_` is flagged exported; nested functions (col_offset > 0) are never flagged regardless of name. Supports `__all__` as a list literal, tuple literal, or annotated assignment (`__all__: list[str] = [...]`). Non-literal forms (`__all__ = other.__all__`, list comprehensions) are treated as "no __all__" and fall back to the underscore rule. Combined with the WI-zimum `--seeds exports` dead-code-maybe mode, this drops Python framework libraries (Airflow, Django, Superset — observed at 70–83 % dead rate in the WI-tubot prospector run) out of the false-positive bucket. Follow-up: TypeScript/JavaScript top-level `export` declarations (tracked separately).

packages/hypergumbo-lang-mainstream/src/hypergumbo_lang_mainstream/scala.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -354,7 +354,21 @@ def _extract_symbols_from_file(
354354
if name_node:
355355
func_name = node_text(name_node, source)
356356
enclosing_type = _get_enclosing_type(node, source)
357-
if enclosing_type:
357+
# WI-rupum: Scala secondary constructors are parsed as
358+
# ``function_definition`` with identifier text "this" —
359+
# ``def this(arg) = this(...)``. These are not methods;
360+
# they're constructors, invoked by ``new ClassName(arg)``.
361+
# Without this special case, the WI-tubot prospector
362+
# surfaced them (e.g. CachedPartition.this, KafkaConfig.this)
363+
# as top-ranked dead-code candidates, because the static
364+
# call graph never reaches them.
365+
is_secondary_ctor = (
366+
func_name == "this" and enclosing_type is not None
367+
)
368+
if is_secondary_ctor:
369+
full_name = f"{enclosing_type}.this"
370+
kind = "constructor"
371+
elif enclosing_type:
358372
full_name = f"{enclosing_type}.{func_name}"
359373
kind = "method"
360374
else:
@@ -374,6 +388,12 @@ def _extract_symbols_from_file(
374388
kind, norm_sig, visibility_from_modifiers(modifiers),
375389
) if norm_sig else None
376390

391+
# WI-rupum: secondary constructors are inherently part
392+
# of the public API of their enclosing class (something
393+
# calls them via ``new``) — mark is_exported=True so
394+
# dead-code-maybe's --seeds exports mode treats them
395+
# as reachable. The constructor kind ALSO excludes them
396+
# from the dead-code candidate list at the kind filter.
377397
symbol = Symbol(
378398
id=make_symbol_id("scala", str(file_path), start_line, end_line, full_name, kind),
379399
name=full_name,
@@ -392,6 +412,7 @@ def _extract_symbols_from_file(
392412
signature=signature,
393413
modifiers=modifiers,
394414
meta=meta,
415+
is_exported=is_secondary_ctor,
395416
)
396417
analysis.symbols.append(symbol)
397418
analysis.node_for_symbol[symbol.id] = node

packages/hypergumbo-lang-mainstream/tests/test_scala.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,75 @@ def helper(x: Int): Int = {
9797
assert "main" in func_names
9898
assert "helper" in func_names
9999

100+
class TestScalaSecondaryConstructor:
101+
"""WI-rupum: Scala secondary constructors (``def this(args) = ...``)
102+
must be extracted as ``kind="constructor"`` with ``is_exported=True``
103+
so they don't surface as dead-code false positives.
104+
"""
105+
106+
def test_secondary_constructor_kind_and_exported(
107+
self, tmp_path: Path,
108+
) -> None:
109+
from hypergumbo_lang_mainstream.scala import analyze_scala
110+
111+
scala_file = tmp_path / "Foo.scala"
112+
scala_file.write_text(
113+
"class Foo(val x: Int) {\n"
114+
" def this() = this(0)\n"
115+
" def this(s: String) = this(s.toInt)\n"
116+
"}\n",
117+
)
118+
result = analyze_scala(tmp_path)
119+
ctors = [s for s in result.symbols if s.kind == "constructor"]
120+
assert len(ctors) == 2
121+
for ctor in ctors:
122+
assert ctor.name == "Foo.this"
123+
assert ctor.is_exported is True
124+
125+
def test_secondary_constructor_not_marked_method(
126+
self, tmp_path: Path,
127+
) -> None:
128+
"""The secondary constructor must NOT appear among method symbols."""
129+
from hypergumbo_lang_mainstream.scala import analyze_scala
130+
131+
scala_file = tmp_path / "Bar.scala"
132+
scala_file.write_text(
133+
"class Bar(val n: Int) {\n"
134+
" def this() = this(0)\n"
135+
" def helper(): Int = n + 1\n"
136+
"}\n",
137+
)
138+
result = analyze_scala(tmp_path)
139+
methods = [s for s in result.symbols if s.kind == "method"]
140+
method_names = {m.name for m in methods}
141+
# helper is a method, but "this" is NOT.
142+
assert "Bar.helper" in method_names
143+
assert "Bar.this" not in method_names
144+
145+
def test_top_level_this_not_constructor(self, tmp_path: Path) -> None:
146+
"""A function named ``this`` at top level (no enclosing class) is
147+
not a secondary constructor and should stay kind='function'.
148+
Scala syntactically forbids this in practice, but the extractor
149+
must not treat any ``def this`` as a constructor when there's
150+
no enclosing type."""
151+
from hypergumbo_lang_mainstream.scala import analyze_scala
152+
153+
# We construct this by wrapping in an object so the tree-sitter
154+
# parse is valid. The "this" method would be enclosed by the
155+
# object, so enclosing_type != None and is_secondary_ctor=True.
156+
# Instead, verify a plain top-level function is unaffected.
157+
scala_file = tmp_path / "Top.scala"
158+
scala_file.write_text(
159+
"def regular(): Int = 42\n",
160+
)
161+
result = analyze_scala(tmp_path)
162+
funcs = [s for s in result.symbols if s.kind == "function"]
163+
assert any(f.name == "regular" for f in funcs)
164+
# No constructor kind anywhere.
165+
ctors = [s for s in result.symbols if s.kind == "constructor"]
166+
assert ctors == []
167+
168+
100169
class TestScalaClassExtraction:
101170
"""Tests for extracting Scala classes."""
102171

0 commit comments

Comments
 (0)