feat: add F#, Gleam, Clojure, Julia, R, Erlang language support#722
Conversation
Add Batch 3 (Functional & BEAM) languages to codegraph: - F# (.fs, .fsx, .fsi): modules, functions, union/record types, open imports - Gleam (.gleam): functions, types, type aliases, constants, imports - Clojure (.clj, .cljs, .cljc): namespaces, defn/defprotocol/defrecord, ns requires - Julia (.jl): functions (long + short form), structs, modules, macros, imports - R (.r, .R): function definitions (<-/=), library/require imports, S4 classes - Erlang (.erl, .hrl): modules, functions, records, types, defines, includes Grammar sources: - F#: tree-sitter-fsharp (npm, sub: fsharp/) - Gleam: gleam-lang/tree-sitter-gleam (GitHub) - Clojure: sogaiu/tree-sitter-clojure (GitHub) - Julia: tree-sitter-julia (npm) - R: @eagleoutice/tree-sitter-r (npm) - Erlang: WhatsApp/tree-sitter-erlang (GitHub)
|
Claude finished @carlos-alm's task —— View job Comprehensive PR ReviewTodo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Adding support for 6 new languages (F#, Gleam, Clojure, Julia, R, Erlang) to expand codegraph's parsing capabilities to cover functional programming languages and BEAM ecosystem. Why This Approach: Consistent with existing architecture - each language gets a registry entry, extractor function, and tree-sitter grammar. The PR follows the established pattern of optional language support (required: false). Risk Assessment: Low architectural risk - purely additive changes that follow existing patterns. Main risks are in extractor implementation quality and dependency management. Phase 2: Configuration Change Detection ✅High-Risk Configuration Changes: None detected. ✅ Linter/Formatter Config: No changes to New Dependencies: 6 new tree-sitter grammar dependencies added to Phase 3: Test Integrity Validation ✅Test Coverage: ✅ Excellent - All 6 new languages have comprehensive test files (6/6) Test Quality: All tests follow consistent patterns, validate proper structures, and use strict assertions without weakening. Phase 4: Code Quality Review ✅Architecture Consistency: ✅ Excellent
Extractor Implementation Quality: ✅ F# Extractor ( ✅ Gleam Extractor ( ✅ Clojure Extractor ( ✅ Julia Extractor ( ✅ R Extractor ( ✅ Erlang Extractor: Comprehensive BEAM ecosystem support for modules, functions, records, and includes. Code Quality Issues: None detected - no debug code, TODOs, or problematic patterns found in new extractors. Phase 5: Backlog Compliance Assessment ✅Root Cause AnalysisProblem Being Solved: Expanding language coverage to include functional programming languages and BEAM ecosystem - a legitimate gap in codegraph's current coverage. Why This Approach: Follows established architecture patterns consistently. No workarounds or shortcuts. Risk Assessment: Very low risk - purely additive feature following proven patterns. Backlog Compliance
Critical ConcernsNone identified. This is a well-executed language expansion that follows all established patterns. Final Recommendation
Minor Notes:
|
Greptile SummaryThis PR adds parser support for 6 new languages (F#, Gleam, Clojure, Julia, R, Erlang) by wiring up tree-sitter grammars and implementing dedicated symbol extractors that follow the established extractor pattern in the codebase. Key changes:
Issues noted:
Confidence Score: 5/5Safe to merge — all remaining findings are P2 quality/completeness suggestions with no correctness or data-integrity impact. All P0/P1 issues from the previous review round are resolved. The three remaining findings are P2: potential silent omission of record fields (incomplete data, not wrong data), special forms being logged as calls (noise, not crashes), and an unexpected transitive prettier package. None block functionality or introduce incorrect graph edges. The 2239-test suite passes clean. src/extractors/erlang.ts — verify record field extraction with a children-asserting test; src/extractors/clojure.ts — consider filtering Clojure special forms from the call list. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Source File] --> B{Extension lookup\nLANGUAGE_REGISTRY}
B --> C1[.fs / .fsx / .fsi → F#]
B --> C2[.gleam → Gleam]
B --> C3[.clj / .cljs / .cljc → Clojure]
B --> C4[.jl → Julia]
B --> C5[.r / .R → R]
B --> C6[.erl / .hrl → Erlang]
C1 --> D1[extractFSharpSymbols]
C2 --> D2[extractGleamSymbols]
C3 --> D3[extractClojureSymbols]
C4 --> D4[extractJuliaSymbols]
C5 --> D5[extractRSymbols]
C6 --> D6[extractErlangSymbols]
D1 & D2 & D3 & D4 & D5 & D6 --> E[ExtractorOutput\ndefinitions · calls · imports · classes · exports · typeMap]
Reviews (2): Last reviewed commit: "fix: address review feedback for new lan..." | Re-trigger Greptile |
| const params = extractRParams(rhs); | ||
| ctx.definitions.push({ | ||
| name: lhs.text, | ||
| kind: 'function', | ||
| line: node.startPosition.row + 1, | ||
| endLine: nodeEndLine(node), | ||
| children: params.length > 0 ? params : undefined, | ||
| }); | ||
| } else { | ||
| // Variable assignment — only record top-level | ||
| if (node.parent?.type === 'program') { | ||
| ctx.definitions.push({ | ||
| name: lhs.text, | ||
| kind: 'variable', | ||
| line: node.startPosition.row + 1, | ||
| endLine: nodeEndLine(node), | ||
| }); | ||
| } |
There was a problem hiding this comment.
-> right-assignment operator not handled correctly
-> is included in the allowed operators check, but the handler logic always reads the function name from child(0) (the left side). For R's right-assignment operator ->, the assignment target (identifier) is on the right (child(2)), not the left. As a result, a definition like function(x) x + 1 -> add would silently fall through with lhs being the function_definition node (not an identifier), causing the function to be unextracted.
In practice, using -> for function definitions is essentially unheard of in R codebases, but the operator is explicitly included in the condition, which signals intent to support it. Either remove '->' from the check or add a separate handling branch for the reversed operand order:
| const params = extractRParams(rhs); | |
| ctx.definitions.push({ | |
| name: lhs.text, | |
| kind: 'function', | |
| line: node.startPosition.row + 1, | |
| endLine: nodeEndLine(node), | |
| children: params.length > 0 ? params : undefined, | |
| }); | |
| } else { | |
| // Variable assignment — only record top-level | |
| if (node.parent?.type === 'program') { | |
| ctx.definitions.push({ | |
| name: lhs.text, | |
| kind: 'variable', | |
| line: node.startPosition.row + 1, | |
| endLine: nodeEndLine(node), | |
| }); | |
| } | |
| if (op.text !== '<-' && op.text !== '=' && op.text !== '<<-') return; | |
| if (lhs.type !== 'identifier') return; |
Or, to properly support ->:
// Right-assignment: RHS is the identifier, LHS is the value
if (op.text === '->') {
if (rhs.type !== 'identifier') return;
if (lhs.type === 'function_definition') {
const params = extractRParams(lhs);
ctx.definitions.push({ name: rhs.text, kind: 'function', ... });
}
return;
}There was a problem hiding this comment.
Fixed -- removed the -> right-assignment operator from the condition in handleBinaryOp. Since right-assignment is essentially unused in real R codebases, the simplest correct fix is to not claim support for it. The operator check now only matches <-, =, and <<-.
| import type { ExtractorOutput, SubDeclaration, TreeSitterNode, TreeSitterTree } from '../types.js'; | ||
| import { nodeEndLine } from './helpers.js'; | ||
|
|
||
| /** | ||
| * Extract symbols from Clojure files. | ||
| * | ||
| * Clojure tree-sitter grammar (oakmac/tree-sitter-clojure) notes: | ||
| * - The grammar is minimal: everything is a list/vector/map/symbol | ||
| * - We detect definitions by the first symbol in a list: defn, def, defprotocol, etc. | ||
| * - Namespace: (ns name ...) | ||
| * - Imports: (:require ...) inside ns, or (require ...) |
There was a problem hiding this comment.
Comment references the wrong grammar source
The doc comment attributes the grammar to oakmac/tree-sitter-clojure, but package.json uses github:sogaiu/tree-sitter-clojure. These are different repositories with different node type naming conventions, so the comment could mislead future contributors looking up node types.
| import type { ExtractorOutput, SubDeclaration, TreeSitterNode, TreeSitterTree } from '../types.js'; | |
| import { nodeEndLine } from './helpers.js'; | |
| /** | |
| * Extract symbols from Clojure files. | |
| * | |
| * Clojure tree-sitter grammar (oakmac/tree-sitter-clojure) notes: | |
| * - The grammar is minimal: everything is a list/vector/map/symbol | |
| * - We detect definitions by the first symbol in a list: defn, def, defprotocol, etc. | |
| * - Namespace: (ns name ...) | |
| * - Imports: (:require ...) inside ns, or (require ...) | |
| /** | |
| * Extract symbols from Clojure files. | |
| * | |
| * Clojure tree-sitter grammar (sogaiu/tree-sitter-clojure) notes: | |
| * - The grammar is minimal: everything is a list/vector/map/symbol | |
| * - We detect definitions by the first symbol in a list: defn, def, defprotocol, etc. | |
| * - Namespace: (ns name ...) | |
| * - Imports: (:require ...) inside ns, or (require ...) | |
| */ |
There was a problem hiding this comment.
Fixed -- updated the doc comment to reference sogaiu/tree-sitter-clojure, which matches the actual dependency in package.json.
- Remove unsupported -> right-assignment operator from R extractor - Fix Clojure doc comment to reference correct grammar (sogaiu, not oakmac) - Strengthen trivially-true test assertions (>= 0 -> >= 1) in Erlang, F#, and Gleam parser tests
|
Addressed all Greptile review feedback:
|
1 similar comment
|
Addressed all Greptile review feedback:
|
Summary
Grammar sources
tree-sitter-fsharptree-sitter-gleamtree-sitter-clojuretree-sitter-julia@eagleoutice/tree-sitter-rtree-sitter-erlangNote: Gleam, Clojure, and Erlang use GitHub dependencies because their npm packages are either name-squatted (R, Erlang) or too old for web-tree-sitter 0.26 compatibility (Gleam, Clojure).
Test plan