Skip to content

Commit 2ba882a

Browse files
authored
feat(native): port Verilog extractor to Rust (#1107)
* feat(native): port Verilog extractor to Rust Adds tree-sitter-verilog dependency and a native Verilog/SystemVerilog extractor in crates/codegraph-core/src/extractors/verilog.rs, registers .v / .sv with LanguageKind::Verilog and the Rust file_collector, and adds Verilog to NATIVE_SUPPORTED_EXTENSIONS on the JS side. Mirrors extractVerilogSymbols: module/interface/package/class declarations, function and task declarations (parent-prefixed when nested), package_import_declaration and include_compiler_directive imports, and module_instantiation as call extraction. VERILOG_AST_CONFIG in helpers.rs deliberately has all node-type lists empty to mirror the WASM side, whose AST_TYPE_MAPS has no verilog entry — so both engines emit zero ast_nodes rows for Verilog and stay in parity. Closes #1071 * fix: address Greptile review feedback for Verilog extractor (#1107) - handle_class_decl: strengthen comment so the no-op behavior on the current tree-sitter-verilog grammar is loud and discoverable for future grammar upgrades. - handle_module_instantiation: switch child(0) to named_child(0) so any anonymous grammar tokens (e.g. parameter-override '#') leading the module type cannot leak into call names. - file_collector::SUPPORTED_EXTENSIONS: document .v conflict with Coq theorem-prover source files so Coq-heavy repos know to exclude *.v via config. - native-drop-classification: drop expected count to 9 to reflect the merge with main (.clj already removed, .v removed by this PR). * chore: sync Cargo.lock version after merge (#1107) * test(benchmark): exempt 3.10.0:Full build for verilog grammar addition (#1107) Adding native Verilog (#1107) brings 4 .v resolution-benchmark fixtures into the incremental benchmark sweep (which runs against the repo root). tree-sitter-verilog is a large grammar so each .v file costs noticeably more to parse than other fixture languages — pushing the native fullBuildMs from the 3.10.0 baseline of 1959ms to ~2809ms (+43%). This is a structural one-time cost of supporting the language, not a regression in shared code paths. Following the existing pattern in KNOWN_REGRESSIONS (3.9.6:* / 3.10.0:* entries) with a documented rationale so a future PR isn't blocked by the bump. * test(benchmark): exempt 3.10.0:fnDeps depth 3 and fix native-drop count (#1107) * fix: extract Verilog class declarations and extends relations (#1107) The tree-sitter-verilog grammar exposes no field names on class_declaration, so childForFieldName('name') and childForFieldName('superclass') always returned null in both engines. The previous workaround left class extraction as documented dead code in both extractors. Per the CLAUDE.md principle 'Never document bugs as expected behavior', fix the root cause by descending through the grammar's actual structure: - Class name lives under class_identifier > simple_identifier - Superclass appears as a class_type child with the same wrapping Both engines now emit identical class Definitions and ClassRelation extends edges. Added matching Rust and TypeScript regression tests covering classes with and without an extends clause. * fix: qualify Verilog tasks nested in classes with class name (#1107) find_verilog_parent only consulted find_decl_name and find_module_name, neither of which descends into the class_identifier wrapper that tree-sitter-verilog uses for class names. As a result, any task or function declared inside a SystemVerilog class lost its qualifier and surfaced as a bare name instead of ClassName.task. Extend the parent-name resolution chain to also try find_class_name, mirroring the same logic in the WASM extractor for engine parity. Added regression tests in both engines covering the class > task case. * fix(test): drop .gleam/.v from WASM-only fixture after native port (#1107) The merge-conflict resolution commit fixed parser_registry.rs but the test edit was lost from the same merge commit. Both Gleam (#1105) and Verilog (#1107) are now natively supported, so the WASM-only test fixture should only count .fs / .fsx / .m as unsupported (3, not 4). docs check acknowledged — README/CLAUDE/ROADMAP already cover both languages. * fix(extractors): remove unreachable splitn/split fallback in verilog package-import (#1107) * fix(extractors): restore Verilog WASM engine parity for ports and includes (#1107) TS `extractPorts` was missing `module_ansi_header` from its container recursion and was not descending into `port_identifier`, so ANSI-style modules (`module top(input clk, …)`) returned no port children in the WASM engine while the native engine extracted them correctly. `handleIncludeDirective` was also missing `double_quoted_string`, which would silently drop `\`include` imports when the grammar emits that node kind. Added regression tests for ANSI port extraction and include directive imports.
1 parent 0049d67 commit 2ba882a

13 files changed

Lines changed: 871 additions & 40 deletions

File tree

Cargo.lock

Lines changed: 11 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

crates/codegraph-core/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ tree-sitter-erlang = "0.16"
4545
tree-sitter-groovy = "0.1"
4646
tree-sitter-r = "1.2"
4747
tree-sitter-solidity = "1.2"
48+
tree-sitter-verilog = "1.0.3"
4849
rayon = "1"
4950
ignore = "0.4"
5051
globset = "0.4"

crates/codegraph-core/src/change_detection.rs

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -774,15 +774,18 @@ mod tests {
774774

775775
#[test]
776776
fn detect_removed_skips_unsupported_extensions() {
777-
// Files in WASM-only languages (Verilog) live in
778-
// `file_hashes` because the JS-side WASM backfill writes them, but
779-
// Rust's narrower file_collector never collects them. Without this
780-
// skip, every incremental rebuild would flag them as removed and
781-
// purge their rows — the #1066 ~2s floor.
777+
// Files that the JS-side WASM backfill wrote into `file_hashes` for
778+
// an extension that the Rust `file_collector` doesn't recognise must
779+
// not be flagged as removed merely because the orchestrator's
780+
// narrower collector never sees them — that would purge their rows
781+
// on every incremental rebuild (the #1066 ~2s floor). All currently
782+
// registered languages have native extractors, so this test uses
783+
// synthetic extensions that are deliberately outside the
784+
// `SUPPORTED_EXTENSIONS` set to exercise the skip path.
782785
let mut existing = HashMap::new();
783786
for path in [
784-
"tests/fixtures/verilog/main.v",
785-
"tests/fixtures/verilog/util.sv",
787+
"tests/fixtures/unknown/main.unknownlang",
788+
"tests/fixtures/unknown/util.fakelang",
786789
] {
787790
existing.insert(
788791
path.to_string(),

crates/codegraph-core/src/extractors/helpers.rs

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -473,6 +473,23 @@ pub const SOLIDITY_AST_CONFIG: LangAstConfig = LangAstConfig {
473473
string_prefixes: &[],
474474
};
475475

476+
/// Verilog/SystemVerilog AST config.
477+
///
478+
/// The WASM-side `AST_TYPE_MAPS` (in `src/ast-analysis/rules/index.ts`) has no
479+
/// `verilog` entry, so the JS engine emits no `ast_nodes` rows for Verilog
480+
/// files. Keeping every list empty produces the same outcome here: the generic
481+
/// walker visits every node but classifies none, so nothing is pushed. If the
482+
/// JS map ever grows a Verilog entry, mirror it here.
483+
pub const VERILOG_AST_CONFIG: LangAstConfig = LangAstConfig {
484+
new_types: &[],
485+
throw_types: &[],
486+
await_types: &[],
487+
string_types: &[],
488+
regex_types: &[],
489+
quote_chars: &['"'],
490+
string_prefixes: &[],
491+
};
492+
476493
// ── Generic AST node walker ──────────────────────────────────────────────────
477494

478495
/// Node types that represent identifiers across languages.

crates/codegraph-core/src/extractors/mod.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ pub mod rust_lang;
2929
pub mod scala;
3030
pub mod solidity;
3131
pub mod swift;
32+
pub mod verilog;
3233
pub mod zig;
3334

3435
use crate::parser_registry::LanguageKind;
@@ -166,5 +167,8 @@ pub fn extract_symbols_with_opts(
166167
LanguageKind::Solidity => {
167168
solidity::SolidityExtractor.extract_with_opts(tree, source, file_path, include_ast_nodes)
168169
}
170+
LanguageKind::Verilog => {
171+
verilog::VerilogExtractor.extract_with_opts(tree, source, file_path, include_ast_nodes)
172+
}
169173
}
170174
}

0 commit comments

Comments
 (0)