Skip to content

Commit 3532b32

Browse files
author
jgstern-agent
committed
fix(haskell): parse module exports → is_exported on symbols (WI-buvun / UAT BUG-12)
shellcheck had 0 recognized exports on UAT despite every module declaring `module X (export1, export2, ...) where`. The dead-code- maybe seed sets `exports` and `tests` were useless on Haskell — every symbol had is_exported=False because the analyzer never parsed module headers. This PR adds export parsing: - New _extract_module_exports walks the tree-sitter `header` node, finds the optional `exports` child, and collects names from each `export` entry. Functions / values come from `variable`; types and classes come from `name`. - Returns None when no export list is present (the Haskell default is "export everything top-level"), or a set of names when one is. - _extract_symbols_from_file consumes the tri-state and sets Symbol.is_exported correctly: * None → True for every module-level binding * set → True iff name in set * instances → True iff their class or type is in set (instances aren't named in Haskell exports but are externally reachable through them) Net effect: dead-code-maybe's exports / tests seed selection now works on Haskell. shellcheck and similar projects will get meaningful candidate lists. 5 tests in TestHaskellModuleExports: - Explicit list marks listed names; private names stay False. - No export list = export-everything default. - Type names in the list are exported (Person(..), Address). - Class names in the list are exported. - Instances follow class or type membership. Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
1 parent 7c6c0dd commit 3532b32

4 files changed

Lines changed: 188 additions & 4 deletions

File tree

.ci/affected-tests.txt

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
# Test selection manifest
2-
# Generated by smart-test at 2026-04-15T10:12:20-04:00
2+
# Generated by smart-test at 2026-04-15T10:38:39-04:00
33
# Mode: targeted
44
# Baseline: 02dba9744d2c86e26f06565aad4ebcae7ef0f4a8
5-
# Changed files: 38
6-
# Changed source files: 7
7-
# Selected tests: 66
5+
# Changed files: 41
6+
# Changed source files: 8
7+
# Selected tests: 67
88
#
99
# === CHANGED_SOURCE_FILES ===
1010
packages/hypergumbo-core/src/hypergumbo_core/cli.py
1111
packages/hypergumbo-core/src/hypergumbo_core/_hf_noise.py
1212
packages/hypergumbo-core/src/hypergumbo_core/io_boundary.py
1313
packages/hypergumbo-core/src/hypergumbo_core/profile.py
1414
packages/hypergumbo-core/src/hypergumbo_core/sketch_embeddings.py
15+
packages/hypergumbo-lang-common/src/hypergumbo_lang_common/haskell.py
1516
packages/hypergumbo-lang-mainstream/src/hypergumbo_lang_mainstream/php.py
1617
packages/hypergumbo-tracker/src/hypergumbo_tracker/cli.py
1718
# === SELECTED_TESTS ===
@@ -54,6 +55,7 @@ packages/hypergumbo-core/tests/test_tree_sitter_analyzer.py
5455
packages/hypergumbo-core/tests/test_verify_claims.py
5556
packages/hypergumbo-lang-common/tests/BRANCHES_test_dart.py
5657
packages/hypergumbo-lang-common/tests/BRANCHES_test_elixir.py
58+
packages/hypergumbo-lang-common/tests/BRANCHES_test_haskell.py
5759
packages/hypergumbo-lang-common/tests/test_haskell.py
5860
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_cpp.py
5961
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_c.py

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This changelog tracks the **tool version** (package releases). The **schema vers
1212

1313
### Added
1414

15+
- **Haskell module exports recognized as dead-code seeds** (WI-buvun / UAT BUG-12): the Haskell analyzer now parses module headers (`module Foo (publicFn, Type(..)) where`) and marks listed symbols as `is_exported=True`. Modules with no export list (`module Foo where`) get the Haskell default — every top-level binding is exported. Type names in the export list count for data/class declarations. Instances follow their class/type: an `instance ClassName TypeName` is considered exported when either name is in the export list (instances aren't named directly in Haskell exports but are externally reachable through them). Before this, every Haskell symbol had `is_exported=False` and the `exports` / `tests` seed sets for `dead-code-maybe` were useless on Haskell — UAT showed shellcheck had 0 recognized exports.
1516
- **Yesod framework detection + pattern set** (WI-vabiv / UAT BUG-16): Yesod (Haskell — haskellers) is now detected from `*.cabal` / `package.yaml` dependencies (`yesod`, `yesod-core`, `yesod-auth`, `yesod-persistent`) and a new `frameworks/yesod.yaml` ships patterns for the Yesod conventions: `mkYesod` / `mkYesodData` / `mkYesodSubData` / `parseRoutes` quasi-quoter calls, Warp runner (`warp`/`warpTLS`/`toWaiApp`), `Yesod` / `YesodSubsite` typeclass memberships, `RenderRoute` / `ParseRoute` router class, standard mixin typeclasses (`YesodPersist`, `YesodAuth`, `YesodBreadcrumbs`), and the `<method><Resource>R` handler naming convention (getHomeR, postUserR, deleteUserR, ...). Route materialization for the non-GET/non-POST methods is a later materializer expansion; concepts are attached today so downstream analysis recognizes the handlers as web-handler entry points.
1617
- **Elixir I/O primitive catalog** (WI-vibur / UAT BUG-09b): new `io_primitives/elixir.yaml` fixes the "0 boundaries on plausible (Phoenix/Ecto)" gap. Covers Elixir stdlib (`File.read`/`write`/`stream!`, `IO.puts`/`write`/`read`, `System.cmd`/`get_env`, `Logger.*`), the idiomatic HTTP-client galaxy (`HTTPoison`, `Tesla`, `Req`, `Finch`, `Mint.HTTP*`, plus Erlang `:httpc`), Phoenix server surface (`Phoenix.Router.get/post/...`, `Phoenix.Controller.render/json`, `Plug.Conn.send_resp/read_body`, `Phoenix.Channel.broadcast`, `Phoenix.Endpoint.broadcast`), databases (Ecto.Repo read/write verbs, `Ecto.Multi`, `Postgrex.query`, `MyXQL.query`, `Redix.command`), and IPC (`GenServer.call/cast`, `Process.send`, `Oban.insert`, `Task.async`). `elixir` added to `_CATALOG_PARENTS` with `erlang` as parent so atom-access into Erlang (`:gen_tcp.send`, `:ets.lookup`, `:file.read_file`) is still matched. Elixir-specific `ambiguous_names` prevents Elixir pipe/Enum verbs and scope functions from producing short-name false positives.
1718
- **Kotlin I/O primitive catalog** (WI-rujos / UAT BUG-09d): new `io_primitives/kotlin.yaml` with Kotlin-specific entries. Kotlin was previously aliased to `java.yaml` verbatim via `_CATALOG_ALIASES`, which produced only 1 boundary (net_send) on detekt because Kotlin idiom uses extension functions and top-level stdlib functions that have no Java analog. The new catalog covers: `kotlin.io` File extensions (`readText`, `writeText`, `forEachLine`, `useLines`, `copyTo`, `walk`), `kotlin.io.path` Path extensions (Kotlin 1.5+), top-level `println`/`print` (receiver `kotlin.io.ConsoleKt`), ktor client + server, `android.util.Log`, `kotlin-logging` (`mu.KLogger` and the 5.x relocated `io.github.oshai.kotlinlogging.KLogger`), and Exposed ORM read/write. Kotlin-specific `ambiguous_names` prevent scope functions (`apply`, `run`, `let`, `use`) and coroutine/Flow verbs (`send`, `receive`, `collect`) from producing short-name false positives. `kotlin` moved from `_CATALOG_ALIASES` to `_CATALOG_PARENTS` so the Java parent still provides the raw `java.io/java.net/JDBC/SLF4J` entries for Kotlin code that uses those APIs directly.

packages/hypergumbo-lang-common/src/hypergumbo_lang_common/haskell.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,42 @@ def _extract_haskell_signature(
146146
return None # pragma: no cover - defensive, called only when signature exists
147147

148148

149+
def _extract_module_exports(tree: "tree_sitter.Tree", source: bytes) -> Optional[set[str]]:
150+
"""Parse the module header's optional export list.
151+
152+
Returns a set of exported names when the module declares
153+
``module Foo.Bar (a, b, Type(..)) where``; returns None when no
154+
export list is present (``module Foo.Bar where``) — Haskell's
155+
default is to export every top-level binding in that case.
156+
157+
The caller uses the tri-state to decide the ``is_exported`` flag:
158+
- None → mark all module-level symbols as exported (WI-buvun).
159+
- a set → mark only names in the set as exported; others are
160+
module-private.
161+
"""
162+
for node in iter_tree(tree.root_node):
163+
if node.type != "header":
164+
continue
165+
exports_node = find_child_by_type(node, "exports")
166+
if exports_node is None:
167+
return None
168+
names: set[str] = set()
169+
for child in exports_node.children:
170+
if child.type != "export":
171+
continue
172+
# Each `export` has either a `variable` (fn/value) or a
173+
# `name` (type / constructor). Both count as exported.
174+
var = find_child_by_type(child, "variable")
175+
if var is not None:
176+
names.add(node_text(var, source))
177+
continue
178+
type_name = find_child_by_type(child, "name")
179+
if type_name is not None:
180+
names.add(node_text(type_name, source))
181+
return names
182+
return None # pragma: no cover - every Haskell file has a header
183+
184+
149185
def _extract_symbols_from_file(
150186
tree: "tree_sitter.Tree",
151187
source: bytes,
@@ -163,6 +199,12 @@ def _extract_symbols_from_file(
163199
symbols: list[Symbol] = []
164200
seen_names: set[str] = set()
165201

202+
# WI-buvun: parse the module header's export list so module-level
203+
# symbols carry is_exported correctly. shellcheck had zero exports
204+
# recognized before this — the `exports` and `tests` seed sets for
205+
# `dead-code-maybe` were therefore useless on Haskell.
206+
exported_names = _extract_module_exports(tree, source)
207+
166208
# First pass: collect type signatures
167209
type_signatures: dict[str, str] = {}
168210

@@ -196,6 +238,21 @@ def add_symbol(
196238
end_col=node.end_point[1],
197239
)
198240
sym_id = make_symbol_id("haskell", file_path, start_line, end_line, name, kind)
241+
# WI-buvun: is_exported tri-state. No export list = all top-level
242+
# bindings are exported (Haskell default). With a list, only
243+
# names explicitly in it are exported. Instances aren't named in
244+
# the export list the same way, so they follow the class they
245+
# implement — if that's exported, they're externally reachable.
246+
if exported_names is None:
247+
is_exported = True
248+
elif kind == "instance":
249+
# Instance: exported iff its class/type name is exported.
250+
# `name` here is "ClassName TypeName"; match either.
251+
parts = name.split()
252+
is_exported = any(p in exported_names for p in parts)
253+
else:
254+
is_exported = name in exported_names
255+
199256
symbols.append(Symbol(
200257
id=sym_id,
201258
name=name,
@@ -206,6 +263,7 @@ def add_symbol(
206263
origin=PASS_ID,
207264
origin_run_id=run_id,
208265
signature=signature,
266+
is_exported=is_exported,
209267
))
210268

211269
# Second pass: extract symbols

packages/hypergumbo-lang-common/tests/test_haskell.py

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -756,3 +756,126 @@ def test_normal_name_call_gets_full_confidence(self, tmp_path: Path) -> None:
756756
assert len(helper_edges) >= 1
757757
for edge in helper_edges:
758758
assert edge.confidence > 0.50
759+
760+
761+
# ============================================================================
762+
# WI-buvun: module export list parsing → is_exported on symbols
763+
# ============================================================================
764+
765+
766+
class TestHaskellModuleExports:
767+
"""WI-buvun: dead-code-maybe needs is_exported on Haskell symbols.
768+
769+
Before this fix, every Haskell symbol had is_exported=False because
770+
the analyzer never parsed module headers. The `exports` and `tests`
771+
seed sets for `dead-code-maybe` were therefore empty on Haskell —
772+
UAT BUG-12: shellcheck had 0 recognized exports.
773+
"""
774+
775+
def test_explicit_export_list_marks_listed_symbols(self, tmp_path) -> None:
776+
"""`module Foo (a, b) where` exports only a and b; c is private."""
777+
from hypergumbo_lang_common.haskell import analyze_haskell
778+
779+
(tmp_path / "Foo.hs").write_text("""module Foo (publicFn, exportedValue) where
780+
publicFn :: Int -> Int
781+
publicFn x = x + 1
782+
783+
exportedValue :: Int
784+
exportedValue = 42
785+
786+
privateHelper :: Int -> Int
787+
privateHelper x = x * 2
788+
""")
789+
result = analyze_haskell(tmp_path)
790+
by_name = {s.name: s for s in result.symbols if s.name in (
791+
"publicFn", "exportedValue", "privateHelper",
792+
)}
793+
assert by_name["publicFn"].is_exported is True
794+
assert by_name["exportedValue"].is_exported is True
795+
assert by_name["privateHelper"].is_exported is False
796+
797+
def test_no_export_list_marks_all_top_level_exported(self, tmp_path) -> None:
798+
"""`module Foo where` (no parens) is Haskell's "export everything"
799+
default — every top-level binding is exported.
800+
"""
801+
from hypergumbo_lang_common.haskell import analyze_haskell
802+
803+
(tmp_path / "Bar.hs").write_text("""module Bar where
804+
foo :: Int
805+
foo = 1
806+
807+
bar :: Int -> Int
808+
bar x = x + foo
809+
810+
baz :: String
811+
baz = "hello"
812+
""")
813+
result = analyze_haskell(tmp_path)
814+
for sym in result.symbols:
815+
if sym.name in ("foo", "bar", "baz"):
816+
assert sym.is_exported is True, (
817+
f"{sym.name} should be exported (no export list "
818+
f"means Haskell's export-everything default)"
819+
)
820+
821+
def test_data_type_in_export_list_marks_exported(self, tmp_path) -> None:
822+
"""Type names in the export list (e.g. `Type(..)`) are exported."""
823+
from hypergumbo_lang_common.haskell import analyze_haskell
824+
825+
(tmp_path / "Types.hs").write_text("""module Types (Person(..), Address) where
826+
data Person = Person { name :: String, age :: Int }
827+
data Address = Address { street :: String }
828+
data Internal = Internal Int
829+
""")
830+
result = analyze_haskell(tmp_path)
831+
by_name = {s.name: s for s in result.symbols
832+
if s.name in ("Person", "Address", "Internal")}
833+
assert by_name["Person"].is_exported is True
834+
assert by_name["Address"].is_exported is True
835+
assert by_name["Internal"].is_exported is False
836+
837+
def test_class_in_export_list_marks_exported(self, tmp_path) -> None:
838+
"""Type class names in the export list are exported."""
839+
from hypergumbo_lang_common.haskell import analyze_haskell
840+
841+
(tmp_path / "Classes.hs").write_text("""module Classes (Showable, internalClass) where
842+
class Showable a where
843+
showMe :: a -> String
844+
845+
class InternalClass a where
846+
internalOp :: a -> Int
847+
848+
internalClass :: Int
849+
internalClass = 0
850+
""")
851+
result = analyze_haskell(tmp_path)
852+
by_name = {s.name: s for s in result.symbols if s.name in (
853+
"Showable", "InternalClass",
854+
)}
855+
assert by_name["Showable"].is_exported is True
856+
assert by_name["InternalClass"].is_exported is False
857+
858+
def test_instance_exported_iff_class_or_type_exported(
859+
self, tmp_path,
860+
) -> None:
861+
"""An instance `instance ClassName TypeName where ...` is
862+
considered exported when ClassName or TypeName is in the export
863+
list — instances aren't named explicitly in Haskell exports but
864+
they're externally reachable through their class / type.
865+
"""
866+
from hypergumbo_lang_common.haskell import analyze_haskell
867+
868+
(tmp_path / "Insts.hs").write_text("""module Insts (Showable, Person) where
869+
class Showable a where
870+
showMe :: a -> String
871+
872+
data Person = Person String
873+
874+
instance Showable Person where
875+
showMe (Person n) = n
876+
""")
877+
result = analyze_haskell(tmp_path)
878+
instance_syms = [s for s in result.symbols if s.kind == "instance"]
879+
assert len(instance_syms) >= 1
880+
# Both Showable and Person are in exports, so the instance is exported
881+
assert any(s.is_exported for s in instance_syms)

0 commit comments

Comments
 (0)