Skip to content

Feat/elixir support#228

Open
oberernst wants to merge 8 commits into
colbymchenry:mainfrom
oberernst:feat/elixir-support
Open

Feat/elixir support#228
oberernst wants to merge 8 commits into
colbymchenry:mainfrom
oberernst:feat/elixir-support

Conversation

@oberernst
Copy link
Copy Markdown

What it says on the tin: a pass at elixir support in codegraph.

oberernst and others added 8 commits May 20, 2026 17:50
Adds 'elixir' to the Language union, .ex/.exs to EXTENSION_MAP, and
wires tree-sitter-elixir.wasm (from the tree-sitter-wasms package) into
WASM_GRAMMAR_FILES and getLanguageDisplayName. Also extends
DEFAULT_CONFIG.include with **/*.ex and **/*.exs.

After this commit .ex/.exs files are recognised and the grammar loads,
but the default visitor emits no nodes for Elixir AST shapes — phase 2
adds the dispatcher.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds src/extraction/languages/elixir.ts with a visitNode hook that
dispatches on the identifier text of `call` nodes (since
tree-sitter-elixir represents every keyword form — defmodule, def,
alias, if, case — as a call whose first child is an identifier).

Implements:

  - defmodule Foo.Bar do … end → module node, name "Foo.Bar"
  - def name(args)             → function "name/arity", public
  - defp name(args)            → function "name/arity", private
  - defmacro / defmacrop       → function with arity
  - Multi-clause defs merged   → one node per name/arity per module
  - @moduledoc "…"             → attached to enclosing module's docstring

Out of scope (later phases): @doc, @behaviour, @callback, @SPEC,
alias/import/require/use, defstruct, defprotocol/defimpl, and call-site
extraction. Those are stubbed by swallowing the relevant call/attribute
nodes so the default walker doesn't mis-interpret their operands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each directive produces one or more `import` nodes plus an
`imports`-kind unresolved reference from the enclosing module. The
ReferenceResolver will turn those into `imports` edges once it can
match the dotted module name to a known module node.

The full directive text lives on the import node's signature
(e.g. "use Phoenix.LiveView, layout: {…}"), so the mechanism is
queryable without adding a separate metadata field.

`alias Foo.{Bar, Baz}` expansion produces two import nodes named
`Foo.Bar` and `Foo.Baz`, each positioned on the matching child.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inside `def`/`defp` bodies, emit a `calls`-kind unresolved reference for
every user call site. Resolves the callee name from the `call` node's
target child:

  helper(a, b)       → "helper/2"
  String.upcase(x)   → "String.upcase/1"
  x |> String.upcase → "String.upcase/1" (pipe LHS counts as arg 1)

Pipe detection reads the source slice between the binary_operator's
left and right children (tree-sitter-elixir doesn't expose the
operator as a field).

Bodies are now visited via `ctx.visitNode(body)` rather than iterating
`body.namedChildren` — for inline `do: expr` clauses the body itself
is the call we need to dispatch on.

Multi-clause merging now pushes the existing function node onto scope
before walking later clause bodies so nested calls anchor to the right
caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  - `defstruct [:a, :b]` (and `defexception`) → struct node named after
    the enclosing module, with one `field` child per atom or pair key.

  - `defprotocol Sizable do def size(thing) end` → protocol node plus
    abstract `function` nodes for each bodyless `def` clause.

  - `defimpl Sizable, for: List do … end` → module node named
    "Sizable.List" plus an `implements`-kind unresolved reference to
    `Sizable`. Function bodies inside are extracted normally.

  - `@behaviour Foo` → `implements`-kind unresolved reference from the
    enclosing module to `Foo`.

`@doc`, `@callback`, and `@spec` are still swallowed by the attribute
handler so the default walker doesn't treat their operand call as a
user call site; they get proper handling in a later phase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Elixir's control-flow keywords (case, cond, if, with, for, fn, …) and
metaprogramming forms (quote, unquote, raise, throw) syntactically
parse as `call` nodes — `case foo do … end` is literally a call to
`case`. The earlier call-site walker emitted spurious `case/N`,
`if/N`, etc. references. NON_CALL_FORMS now skips them at emit time.
Measured: 50 nodes (1 module, 1 struct + 5 fields, 14 imports, 28
functions), 220 calls captured, 26 ms extraction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
stdlib filtering, docs

Round out Elixir support so cross-module queries, callers, and impact
analysis work end-to-end on real Elixir/Phoenix projects.

Extraction:

  - @doc "…" now sets the next def's docstring; @SPEC sets its
    signature; @callback emits an abstract function inside the
    enclosing module.
  - alias Foo.Bar.Baz (with optional `as: Short`) records the binding
    in a per-module alias map. handleUserCall consults that map and
    attaches the fully-qualified expansion to the unresolved ref's
    `candidates` array — so `Baz.foo/2` is recoverable as
    `Foo.Bar.Baz.foo/2` even when Baz isn't a globally visible name.

Resolution:

  - matchByQualifiedName bridges Elixir's `.`-joined call refs
    (`Foo.Bar.changeset/2`) to codegraph's `::`-joined qualifiedNames
    (`Foo.Bar::changeset/2`). Without this bridge, every cross-file
    Elixir call stayed unresolved because endsWith() couldn't see past
    the separator mismatch.
  - The matcher now also tries each entry in ref.candidates — the
    alias-expansion target lights up here.
  - hasAnyPossibleMatch pre-filter recognises Elixir-shape names
    (`A.B.C.func/N`) by checking the trailing `func/N` segment against
    knownNames; otherwise the pre-filter rejected legitimate refs
    before the bridge could see them.
  - Built-in / stdlib filter: ELIXIR_STDLIB_PREFIXES skips Enum,
    String, Map, List, Phoenix, Ecto, Plug, Logger, etc. and Erlang
    modules \`:lists.\`, \`:maps.\`, \`:ets.\`. ELIXIR_KERNEL_FUNCS skips
    \`is_nil/1\`, \`to_string/1\`, \`inspect/1\`, and friends.

Config / docs:

  - 'elixir' added to the validLanguages allowlist in src/config.ts
    (config-load would otherwise silently reject \`elixir\` as an
    invalid language value).
  - .ex/.exs added to import-resolver.ts EXTENSION_RESOLUTION so any
    relative-path resolution code path that lands on Elixir behaves.
  - README's "supported languages" tagline and table now list Elixir.
  - The "Language Support" test that asserts which languages
    getSupportedLanguages() returns now includes pascal, scala, and
    elixir (alongside the existing entries).

Tests:

  - 30 Elixir extraction tests (modules, defs, aliases, calls,
    structs, protocols, behaviour, @doc/@spec/@callback,
    alias-aware candidates).
  - 2 Elixir resolution tests proving cross-file calls and \`imports\`
    edges resolve correctly through the Mix module-name convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compared elixir.ts side-by-side with typescript.ts (canonical) and
scala.ts (closest peer with a custom visitNode). The project's
convention is sparse, terse comments only at non-obvious AST shapes
— no top-of-file docblocks, no section dividers, no JSDoc on
internal helpers.

  - Drops the 15-line module docblock; the same context survives as a
    5-line preamble.
  - Removes section dividers (`// --- helpers ---`, etc.).
  - Replaces JSDoc blocks on every helper with one-line WHY comments
    where genuinely needed, or no comment at all where the function
    name carries its own meaning.
  - Trims tutorial-style explanations down to the load-bearing
    sentence (state lifecycle for the buffers/aliasStack, the pipe
    detection trick, the readonly-node mutation note).
  - Deletes a dead for-loop in resolveDefBody that was a relic of an
    earlier iteration of the body-resolution heuristic.
  - Folds the small `nearestModuleId` walk that was duplicated in
    handleAttribute branches and handleDefstruct into a helper.

Net: 305 deletions, 109 insertions. No behaviour change — all 30
extraction tests and 2 resolution tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant