Skip to content

Commit cb27a7e

Browse files
feat: 14-language AST support with heritage, call resolution, and dead code improvements (#78)
* feat: improve PreToolUse hook relevance with multi-signal search Replace FTS-only file retrieval with a 3-signal ranking system: - Symbol name match (weight 2.0) — most precise - File path match (weight 1.5) — catches path-based searches - FTS on wiki content (weight 1.0) — broadest, lowest priority Files ranked by signal score then PageRank, top 3 returned. Remove git signals (HOTSPOT, bus-factor, owner) from enrichment — that info belongs in get_risk, not every search. Remove Bash command interception (fragile regex on grep/rg commands). Keep: symbols (3), importers (3), dependencies (2) per file. * feat: centralize language config into LanguageRegistry (Phase 1) Create a single LanguageRegistry with 42 LanguageSpec entries as the source of truth for all language identity data. Migrate 14 consumer files to derive their constants from the registry, eliminating widespread duplication. Delete stale packages/core/queries/ directory. * feat: modularize extractors and resolvers out of parser.py and graph.py (Phase 2) Extract per-language logic into dedicated packages: - extractors/ — visibility, signatures, docstrings, bindings, heritage - resolvers/ — Python, TS/JS, Go, Rust, C/C++, generic stem fallback - framework_edges.py — Django, FastAPI, Flask, pytest conftest detection parser.py drops from 1,806 to 796 lines (pure orchestration). graph.py drops from 1,286 to 646 lines. Delete dead parsers/ stubs. * docs: update language support doc and add Phase 3 handoff Update Adding a New Language guide to reflect modular architecture (extractors/, resolvers/ instead of inline in parser.py/graph.py). Add architecture section and updated roadmap. Create Phase 3 handoff doc covering remaining language work: hardening C++/C, wiring Kotlin/Ruby/C#, adding Swift/Scala/PHP. * feat: add AST support for 6 new languages and harden C/C++ (Phase 3) Complete language pipeline for Kotlin, Ruby, C#, Swift, Scala, and PHP with tree-sitter grammars, .scm queries, LanguageConfig entries, per-language extractors (bindings, docstrings, visibility, heritage), and dedicated import resolvers. Harden C++ with binding extraction and Doxygen docstrings, add call captures to C. Brings total AST-supported languages to 14 (7 Full + 7 Good tier). - Add 6 grammar dependencies (tree-sitter-kotlin/ruby/c-sharp/swift/scala/php) - Create .scm query files for C#, Swift, Scala, PHP; extend Kotlin, Ruby, C - Add LanguageConfig entries for all 8 languages in parser.py - Add per-language visibility functions (kotlin, csharp, swift, scala, php) - Add binding extractors for all 8 languages - Add docstring extractors (KDoc, RDoc, XML doc, Swift doc, ScalaDoc, PHPDoc, Doxygen) - Add heritage extractors for Swift, Scala, PHP - Create dedicated resolvers for Kotlin, Ruby, C#, Swift, Scala, PHP - Add 37 new parser tests with fixtures for all 6 languages - Update registry specs with grammar_package and heritage_node_types - Update README.md and LANGUAGE_SUPPORT.md documentation * fix: P0 bugs — dead code symbols lookup, PHP method visibility, Kotlin interfaces - Fix _detect_unused_exports to read symbol nodes via DEFINES edges instead of non-existent 'symbols' attribute on file nodes - Add fallback PHP method_declaration pattern without visibility_modifier so methods defaulting to public are captured - Add refine_kotlin_class_kind() to distinguish interface/enum from regular class in Kotlin class_declaration nodes - Update test helper _build_graph to create proper symbol nodes * feat: heritage support for Ruby mixins, Rust derive, Swift extensions, PHP traits - Ruby: extract include/extend/prepend from class body as mixin relations - Rust: extract #[derive(Trait)] from struct/enum attribute items - Swift: add extension conformance capture (user_type pattern in .scm) - PHP: extract use TraitName; from class declaration_list - Add struct_item/enum_item to Rust heritage_node_types - Add 'derive' to valid heritage kinds in integration tests * feat: P1 fixes — PHP imports, multi-lang dynamic detection, unused internals, module-level calls - Add PHP require/require_once/include/include_once as import captures - Extend dynamic import detection to JS/TS/Java/Kotlin/Ruby/PHP/Go - Implement _detect_unused_internals for private symbols with no callers - Add synthetic __module__ symbol per file for module-level call resolution - Update call_resolver to assign orphan calls to __module__ symbol * docs: update language support docs for 14-language coverage, remove obsolete planning docs Update README, LANGUAGE_SUPPORT.md, ARCHITECTURE.md, and website docs to reflect 14 AST-supported languages (7 Full + 7 Good tier) with heritage extraction improvements. Remove obsolete planning and handoff docs.
1 parent 4677a19 commit cb27a7e

92 files changed

Lines changed: 6412 additions & 3699 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ There's a small genre of "token efficiency" benchmarks going around. It would be
6565
repowise runs once, builds everything, then keeps it in sync on every commit.
6666

6767
### ◈ Graph Intelligence
68-
tree-sitter parses every file into symbols. NetworkX builds a two-tier dependency graph — file nodes for module-level relationships and symbol nodes (functions, classes, methods) for fine-grained call resolution. A 3-tier resolver links call sites to their targets with confidence scoring. Import aliases, barrel re-exports, and namespace imports all resolve correctly. Inheritance hierarchies are extracted across 11 languages (extends, implements, trait impls, mixins) and resolved to concrete symbol edges. Leiden community detection identifies logical modules at both file and symbol level — with cohesion scoring and heuristic labeling — even when your directory structure doesn't reflect them. Execution flow tracing discovers call paths from entry points through your codebase, classifying them as intra- or cross-community. PageRank, betweenness centrality, and SCC analysis identify your most central and most coupled code.
68+
tree-sitter parses every file across 14 languages into a two-tier dependency graph — file nodes and symbol nodes (functions, classes, methods). A 3-tier call resolver with confidence scoring handles import aliases, barrel re-exports, and namespace imports. Heritage extraction covers extends, implements, trait impls, derive macros, mixins, and extension conformance. Leiden community detection finds logical modules even when your directory structure doesn't reflect them. PageRank, betweenness centrality, SCC analysis, and execution flow tracing from entry points identify your most central, most coupled, and most traversed code.
6969

7070
### ◈ Git Intelligence
7171
500 commits of history turned into signals: hotspot files (high churn × high complexity), ownership percentages per engineer, co-change pairs (files that change together without an import link — hidden coupling), and significant commit messages that explain *why* code evolved.
@@ -543,14 +543,11 @@ repowise reindex # rebuild vector store (no LLM calls)
543543

544544
| Tier | Languages | What works |
545545
|------|-----------|------------|
546-
| **Full** | Python · TypeScript · JavaScript · Java · Go · Rust | AST parsing, import resolution, named bindings, dependency graph edges, call resolution, heritage extraction |
547-
| **Partial** | C++ | AST parsing, symbol extraction, heritage extraction, `compile_commands.json` import resolution |
548-
| **Partial** | C | AST parsing (structs, functions, macros), `#include` resolution with `compile_commands.json` |
549-
| **Scaffolded** | Kotlin · Ruby | Tree-sitter queries and heritage extractors exist but grammars not yet wired — install `tree-sitter-kotlin` / `tree-sitter-ruby` separately |
550-
| **Traversal** | C# · Swift · Scala · PHP | Files indexed and searchable, but no AST symbol extraction yet |
546+
| **Full** | Python · TypeScript · JavaScript · Java · Go · Rust · C++ | AST parsing, import resolution, named bindings, call resolution, heritage extraction, docstrings |
547+
| **Good** | C · Kotlin · Ruby · C# · Swift · Scala · PHP | AST parsing, import resolution, named bindings, call resolution, heritage (mixins, derive, extensions, traits), docstrings, dedicated resolvers |
551548
| **Config / data** | OpenAPI · Protobuf · GraphQL · Dockerfile · Makefile · YAML · JSON · TOML · SQL · Terraform | Included in the file tree; special handlers extract endpoints/targets where applicable |
552549

553-
Kotlin, Ruby, C#, Swift, Scala, PHP, Dart, and Elixir are on the [language support roadmap](docs/LANGUAGE_SUPPORT_PLAN.md). Adding a new language requires one `.scm` tree-sitter query file and one config entry. No changes to the parser core. PRs welcome. See [Adding a new language](docs/CONTRIBUTING.md#adding-a-new-language).
550+
14 languages with full AST support. Adding a new language requires one `.scm` tree-sitter query file and one config entry. No changes to the parser core. See [Language Support](docs/LANGUAGE_SUPPORT.md) for details.
554551

555552
---
556553

docs/LANGUAGE_SUPPORT.md

Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
# Language Support
2+
3+
repowise analyses codebases written in many languages. Each language goes
4+
through a multi-stage pipeline: file traversal, AST parsing, import
5+
resolution, call graph construction, and heritage (inheritance) extraction.
6+
Not every language has reached full coverage yet --- this page documents
7+
exactly what works today, what is coming next, and how to add a new language.
8+
9+
---
10+
11+
## Support Tiers
12+
13+
Every language falls into one of five tiers. The tier determines which
14+
pipeline stages produce meaningful output.
15+
16+
| Stage | Full | Partial | Scaffolded | Traversal | Config / Data |
17+
|-------|:----:|:-------:|:----------:|:---------:|:-------------:|
18+
| File discovery & git history | Y | Y | Y | Y | Y |
19+
| AST symbol extraction | Y | Y | -- | -- | -- |
20+
| Import resolution | Y | Y | -- | -- | -- |
21+
| Call graph edges | Y | partial | -- | -- | -- |
22+
| Heritage (extends/implements) | Y | partial | -- | -- | -- |
23+
| Named bindings | Y | -- | -- | -- | -- |
24+
| Dead code detection | Y | Y | Y | Y | -- |
25+
| Semantic search & wiki pages | Y | Y | Y | Y | Y |
26+
27+
---
28+
29+
## Language Reference
30+
31+
### Full
32+
33+
Languages with complete pipeline coverage: AST parsing, import resolution,
34+
call resolution, named bindings, heritage extraction, and docstrings.
35+
36+
| Language | Extensions | Entry Points | Import Style |
37+
|----------|-----------|-------------|-------------|
38+
| **Python** | `.py` `.pyi` | `main.py` `app.py` `__main__.py` `manage.py` `wsgi.py` `asgi.py` | `import x` / `from x import y` |
39+
| **TypeScript** | `.ts` `.tsx` | `index.ts` `main.ts` `app.ts` `server.ts` | `import { x } from 'y'` / `require()` |
40+
| **JavaScript** | `.js` `.jsx` `.mjs` `.cjs` | `index.js` `main.js` `app.js` `server.js` | `import` / `require()` |
41+
| **Java** | `.java` | `Main.java` `Application.java` | `import pkg.Class` |
42+
| **Go** | `.go` | `main.go` `cmd/main.go` | `import "path"` with `go.mod` resolution |
43+
| **Rust** | `.rs` | `main.rs` `lib.rs` | `use crate::` / `use super::` / `use self::` with `Cargo.toml` |
44+
| **C++** | `.cpp` `.cc` `.cxx` `.h` `.hpp` `.hxx` | `main.cpp` `main.cc` | `#include` with `compile_commands.json` resolution |
45+
46+
All seven languages support:
47+
- Tree-sitter AST parsing with dedicated `.scm` query files
48+
- Three-tier call resolution (same-file, cross-file, global stem match)
49+
- Named binding extraction (mapping imported names to source symbols)
50+
- Heritage extraction (class/interface/trait inheritance chains)
51+
- Docstring extraction (Python, JSDoc, GoDoc, Rustdoc, Javadoc, Doxygen)
52+
- Framework-aware edges (Django, FastAPI, Flask for Python; tsconfig path aliases for TS/JS; pytest fixture detection)
53+
54+
### Good
55+
56+
AST parsing, symbol extraction, import resolution, call resolution, named
57+
bindings, heritage extraction (including Ruby mixins, Rust derive, Swift
58+
extension conformance, PHP trait use), and docstrings. Dedicated import
59+
resolvers for each language.
60+
61+
| Language | Extensions | Entry Points | Import Style |
62+
|----------|-----------|-------------|-------------|
63+
| **C** | `.c` | `main.c` | `#include` with `compile_commands.json` (shares C++ grammar) |
64+
| **Kotlin** | `.kt` `.kts` | `Main.kt` `Application.kt` | `import com.example.Foo` |
65+
| **Ruby** | `.rb` | `main.rb` `app.rb` `config.ru` | `require 'mod'` / `require_relative './mod'` |
66+
| **C#** | `.cs` | `Program.cs` `Startup.cs` | `using System.Collections.Generic` |
67+
| **Swift** | `.swift` | `main.swift` `App.swift` | `import Foundation` |
68+
| **Scala** | `.scala` | `Main.scala` `App.scala` | `import pkg.{A, B => C}` |
69+
| **PHP** | `.php` | `index.php` `public/index.php` | `use Foo\Bar\Baz` |
70+
71+
### Config / Data
72+
73+
Non-code files included in the file tree and wiki. Special handlers extract
74+
endpoints or targets where applicable.
75+
76+
| Language | Extensions / Filenames | Special Handler |
77+
|----------|----------------------|----------------|
78+
| **OpenAPI** | YAML/JSON with `openapi` or `swagger` key | Extracts API paths and schemas |
79+
| **Dockerfile** | `Dockerfile` | Extracts stages and exposed ports |
80+
| **Makefile** | `Makefile` `GNUmakefile` | Extracts targets |
81+
| **Protobuf** | `.proto` | -- |
82+
| **GraphQL** | `.graphql` `.gql` | -- |
83+
| **Terraform** | `.tf` `.hcl` | -- |
84+
| **YAML** | `.yaml` `.yml` | -- |
85+
| **JSON** | `.json` | -- |
86+
| **TOML** | `.toml` | -- |
87+
| **Markdown** | `.md` `.mdx` | -- |
88+
| **SQL** | `.sql` | -- |
89+
| **Shell** | `.sh` `.bash` `.zsh` | -- |
90+
91+
### Git-Blame-Only
92+
93+
These languages are tracked in git history (blame, hotspot analysis,
94+
co-change detection) but have no AST parsing or dedicated support. Files
95+
appear in the wiki as traversal-level entries.
96+
97+
Objective-C, Elixir, Erlang, Lua, R, Dart, Zig, Julia, Clojure, Elm,
98+
Haskell, OCaml, F#, Crystal, Nim, D
99+
100+
---
101+
102+
## How the Pipeline Processes a File
103+
104+
```
105+
File discovered by FileTraverser
106+
|
107+
v
108+
Extension/filename -> LanguageTag (via LanguageRegistry)
109+
|
110+
+-- Config/data language? -> empty ParsedFile (passthrough)
111+
+-- Special format? -> special_handlers.py (OpenAPI/Dockerfile/Makefile)
112+
+-- Has grammar? -> tree-sitter AST parsing
113+
|
114+
v
115+
.scm query extracts:
116+
@symbol.def / @symbol.name -> Symbol nodes
117+
@import.statement / @import.module -> Import edges
118+
@call.target / @call.receiver -> Call edges
119+
|
120+
v
121+
Per-language extractors:
122+
- Named bindings (import name -> source symbol)
123+
- Heritage (extends/implements/traits)
124+
- Docstrings (Python, JSDoc, GoDoc, Rustdoc, Javadoc)
125+
- Visibility (public/private/protected)
126+
|
127+
v
128+
GraphBuilder resolves imports:
129+
Python: dotted module paths, __init__.py, src/ layout
130+
TS/JS: relative paths, tsconfig aliases, node_modules
131+
Go: go.mod module path stripping
132+
Rust: crate::/self::/super::, mod.rs probing
133+
C/C++: compile_commands.json include directories
134+
Other: stem-map fallback (filename matching)
135+
|
136+
v
137+
Graph analysis:
138+
PageRank, community detection, dead code, execution flows
139+
```
140+
141+
---
142+
143+
## Adding a New Language
144+
145+
The pipeline is fully modular. Language identity data lives in the
146+
centralised `LanguageRegistry`, per-language extraction logic lives in
147+
`extractors/`, and per-language import resolution lives in `resolvers/`.
148+
Adding a new language touches these places:
149+
150+
### Step 1: Add a `LanguageSpec` to the registry
151+
152+
Edit `packages/core/src/repowise/core/ingestion/languages/registry.py` and
153+
add a new `LanguageSpec(...)` entry to the `_SPECS` tuple:
154+
155+
```python
156+
LanguageSpec(
157+
tag="mylang",
158+
display_name="MyLang",
159+
extensions=frozenset({".ml"}),
160+
grammar_package="tree_sitter_mylang", # PyPI package name
161+
scm_file="mylang.scm", # query file name
162+
heritage_node_types=frozenset({"class_declaration"}),
163+
entry_point_patterns=("main.ml",),
164+
manifest_files=("mylang.toml",),
165+
shebang_tokens=("mylang",),
166+
builtin_calls=frozenset({"print", "len"}), # filter from call graph
167+
builtin_parents=frozenset({"Object"}), # filter from heritage
168+
color_hex="#AB47BC",
169+
)
170+
```
171+
172+
### Step 2: Add the `LanguageTag`
173+
174+
Add `"mylang"` to the `LanguageTag` Literal type in
175+
`packages/core/src/repowise/core/ingestion/models.py`.
176+
177+
### Step 3: Write a tree-sitter query file
178+
179+
Create `packages/core/src/repowise/core/ingestion/queries/mylang.scm` using
180+
tree-sitter S-expression syntax. Follow the capture-name conventions:
181+
182+
| Capture | Purpose | Required? |
183+
|---------|---------|-----------|
184+
| `@symbol.def` | Full definition node (line numbers, kind lookup) | Yes |
185+
| `@symbol.name` | Name identifier | Yes |
186+
| `@symbol.params` | Parameter list | No |
187+
| `@symbol.modifiers` | Decorators / visibility modifiers | No |
188+
| `@symbol.receiver` | Go-style method receiver | No |
189+
| `@import.statement` | Full import node | Yes |
190+
| `@import.module` | Module path being imported | Yes |
191+
| `@call.target` | Function/method being called | No (enables call graph) |
192+
| `@call.receiver` | Object the call is made on | No |
193+
| `@call.arguments` | Call arguments | No |
194+
195+
Look at existing `.scm` files for examples --- `python.scm` and
196+
`typescript.scm` are good starting points.
197+
198+
### Step 4: Add a `LanguageConfig` entry
199+
200+
Add a parser configuration to `LANGUAGE_CONFIGS` in
201+
`packages/core/src/repowise/core/ingestion/parser.py`:
202+
203+
```python
204+
"mylang": LanguageConfig(
205+
symbol_node_types={
206+
"function_definition": "function",
207+
"class_definition": "class",
208+
},
209+
import_node_types=["import_statement"],
210+
export_node_types=[],
211+
visibility_fn=public_by_default, # from extractors.visibility
212+
parent_extraction="nesting",
213+
parent_class_types=frozenset({"class_definition"}),
214+
entry_point_patterns=["main.ml"],
215+
),
216+
```
217+
218+
### Step 5: Add the tree-sitter grammar dependency
219+
220+
Add the grammar package to `pyproject.toml`:
221+
222+
```toml
223+
[project]
224+
dependencies = [
225+
# ...
226+
"tree-sitter-mylang>=0.23,<1",
227+
]
228+
```
229+
230+
### Step 6 (optional): Binding extractor
231+
232+
For full-tier support, add a `extract_mylang_bindings()` function in
233+
`packages/core/src/repowise/core/ingestion/extractors/bindings.py` and
234+
register it in the `extract_import_bindings()` dispatcher. Without this,
235+
imports are still resolved but named-binding-level call resolution won't
236+
work.
237+
238+
### Step 7 (optional): Heritage extractor
239+
240+
Add a `_extract_mylang_heritage()` function in
241+
`packages/core/src/repowise/core/ingestion/extractors/heritage.py` and
242+
register it in the `HERITAGE_EXTRACTORS` dict. Without this, inheritance
243+
chains won't appear in the graph.
244+
245+
### Step 8 (optional): Import resolver
246+
247+
If the language has a non-trivial import system, create a resolver in
248+
`packages/core/src/repowise/core/ingestion/resolvers/mylang.py` and
249+
register it in the `_RESOLVERS` dict in `resolvers/__init__.py`. For simple
250+
languages, the generic stem-map fallback (matching by filename) works out
251+
of the box.
252+
253+
### Verify
254+
255+
```bash
256+
# Run the parser tests
257+
pytest tests/ -k "mylang or sample_repo" -x
258+
259+
# Index a real project
260+
repowise init /path/to/mylang-project
261+
```
262+
263+
No changes are needed to `traverser.py`, `dead_code.py`,
264+
`page_generator.py`, `cost_estimator.py`, or any other consumer file ---
265+
they all derive their language sets from the registry automatically.
266+
267+
---
268+
269+
## Architecture
270+
271+
The language pipeline is fully modular:
272+
273+
```
274+
ingestion/
275+
languages/ # LanguageRegistry + LanguageSpec (identity data)
276+
extractors/ # Per-language AST extraction
277+
visibility.py # symbol visibility (public/private/protected)
278+
signatures.py # human-readable signature building
279+
docstrings.py # module + symbol docstring extraction
280+
bindings.py # import name + alias binding extraction
281+
heritage.py # inheritance/interface/trait extraction
282+
resolvers/ # Per-language import resolution
283+
python.py # dotted imports, __init__.py, src/ layout
284+
typescript.py # multi-ext probe, tsconfig aliases
285+
go.py # go.mod module path stripping
286+
rust.py # crate::/self::/super::, mod.rs probing
287+
cpp.py # compile_commands.json include paths
288+
kotlin.py # package-to-directory mapping
289+
ruby.py # require/require_relative resolution
290+
csharp.py # namespace-based resolution
291+
swift.py # module import resolution
292+
scala.py # package-to-directory mapping
293+
php.py # namespace/PSR-4 resolution
294+
generic.py # stem-matching fallback
295+
framework_edges.py # Django, FastAPI, Flask, pytest detection
296+
parser.py # ASTParser (language-agnostic orchestration)
297+
graph.py # GraphBuilder (import/call/heritage resolution)
298+
```
299+
300+
Adding a new language requires zero changes to `parser.py`, `graph.py`,
301+
`traverser.py`, `dead_code.py`, or any other core file.
302+
303+
---
304+
305+
## Roadmap
306+
307+
| Language | Target Tier | Status |
308+
|----------|------------|--------|
309+
| Dart | Good | Planned — `tree-sitter-dart` available |
310+
| Elixir | Good | Planned — `tree-sitter-elixir` available |
311+
| F# | Good | Planned — `tree-sitter-f-sharp` available |

docs/architecture/ARCHITECTURE.md

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,11 @@ repowise/
163163
│ │ ├── cpp.scm
164164
│ │ ├── c.scm
165165
│ │ ├── ruby.scm
166-
│ │ └── kotlin.scm
166+
│ │ ├── kotlin.scm
167+
│ │ ├── csharp.scm
168+
│ │ ├── swift.scm
169+
│ │ ├── scala.scm
170+
│ │ └── php.scm
167171
│ │
168172
│ ├── server/ # Python: FastAPI REST API + MCP server
169173
│ │ └── src/repowise/server/
@@ -506,8 +510,8 @@ which runs after the static import graph is built. It operates in three tiers:
506510
from the file's import list
507511
3. **Global unique match** (confidence 0.50) — target is unique across the whole repo
508512

509-
Call sites are extracted by tree-sitter for all 7 supported languages (Python, TypeScript,
510-
JavaScript, Go, Rust, Java, C++) using per-language `.scm` query files. Results are stored
513+
Call sites are extracted by tree-sitter for all 14 supported languages (Python, TypeScript,
514+
JavaScript, Go, Rust, Java, C++, C, Kotlin, Ruby, C#, Swift, Scala, PHP) using per-language `.scm` query files. Results are stored
511515
as `CallSite` dataclasses and become `CALLS` edges in the graph.
512516

513517
**Named binding resolution** (`NamedBinding` dataclass in `ingestion/models.py`) ensures
@@ -1547,19 +1551,27 @@ Key files:
15471551
),
15481552
```
15491553

1550-
3. **Verify `tree-sitter-languages` includes your language**
1554+
3. **Add a `LanguageSpec` to `LanguageRegistry`** in `ingestion/languages/registry.py`
15511555

1552-
Run `python -c "from tree_sitter_languages import get_parser; get_parser('mylang')"`.
1553-
If it raises, the grammar is not bundled. You can add it manually as a tree-sitter
1554-
grammar and register it with the Language API.
1556+
This registers the language's identity data (extensions, entry points, manifest files,
1557+
builtin calls, heritage node types, etc.) centrally.
15551558

1556-
4. **Add test files to `tests/fixtures/sample_repo/`**
1559+
4. **Add the grammar dependency to `pyproject.toml`**
1560+
1561+
```toml
1562+
"tree-sitter-mylang>=0.23,<1",
1563+
```
1564+
1565+
5. **Add test files to `tests/fixtures/sample_repo/`**
15571566

15581567
At minimum: one file with a function, one with a class, one with imports.
15591568

1560-
5. **Run `pytest tests/unit/test_parser.py -k mylang`** to verify extraction.
1569+
6. **(Optional) Add per-language extractors** for bindings, heritage, visibility, docstrings,
1570+
and a dedicated import resolver in `resolvers/mylang.py`.
1571+
1572+
7. **Run `pytest tests/unit/test_parser.py -k mylang`** to verify extraction.
15611573

1562-
6. **Open a PR.** That's it — no other changes needed.
1574+
8. **Open a PR.** That's it — no other changes needed.
15631575

15641576
---
15651577

0 commit comments

Comments
 (0)