Notes from building an LSP on top of systemrdl-compiler — input for v2 #328

seimei-d · 2026-05-05T19:33:34Z

seimei-d
May 5, 2026

I've been working on systemrdl-pro for the past few months - VS Code extension plus a standalone LSP server, sitting on top of systemrdl-compiler. After reading #268 I figured the most useful thing I could do was write down what I learned from the editor side, instead of filing a feature request that would pull in a different direction from where v2 is going.

A bit of setup. An LSP is a long-lived process that recompiles after each meaningful keystroke (debounced). It needs diagnostics, hover, goto-definition, references, and elaborated values for whatever code is on screen, ideally under ~100ms before users start to notice. On smaller designs v1 is fine. On a few thousand registers with deep regfile arrays, it's not, and after reading through compiler.py, core/elaborate.py, walker.py, ComponentVisitor.py, and the speedy-antlr ext end to end, your diagnosis in #268 lines up with what I see. So this isn't a v1 ask.

There are three things v1 taught me that I think are worth flagging before v2 design solidifies, in case any of it is useful.

Per-file ingestion is the missing primitive

compile_file is stateful and additive: the visitor mutates self.root.comp_defs and the namespace as it goes (compiler.py:253). Calling it twice on the same path raises a duplicate-definition error in register_type (core/namespace.py:50). The "discard on any exception" advice in the docstring is conservative for batch but expensive for editing, because the user is mid-typing most of the time and the compiler hits errors constantly.

The natural change unit for an LSP is "this file's content changed." Today I rebuild the compiler from scratch on every reparse, which works but defeats any caching above it. If v2 exposed a file-replacement boundary - some way to say "remove this file's contributions and re-ingest" - it would unlock most of the LSP wins on its own. I don't have a strong opinion on the API; just flagging that the file to component-defs mapping is the right granularity, and v1 hides it.

A related thing that would help is clarifying which exceptions invalidate state vs which can be tolerated. The current blanket guidance pushes consumers toward throwing the whole compiler away.

Elaboration is non-destructive, which is great. Address allocation is the catch.

_elab_create_root_inst deep-copies the source tree and walks the copy (compiler.py:347, component.py:121-160). That's a strong base for caching. I went down the path of imagining a subtree cache keyed by (definition, resolved parameters), and then realised it doesn't work: StructuralPlacementListener allocates addresses strictly sequentially among siblings (elaborate.py:520-526), and field packing has the same property within a register (elaborate.py:383-409). The same register definition with the same parameters lands at different addresses depending on what surrounds it.

So if subtree reuse comes up in v2, the fingerprint needs sibling context, or placement needs to be expressible as a separable pass that's cheap to redo. I figured I'd post this in case it saves rediscovering.

Source-ref resolution

I'll keep this short because #325 already covers it from the peakrdl_html side. The same hot path hits LSPs harder - every diagnostic, hover, and goto resolves source positions, sometimes dozens per viewport refresh. I'll drop a comment over there with traces from my side rather than open a duplicate.

What I can pitch in

I'm planning to publish a small benchmark repo for my own work - profiles preprocess, parse, C++ to Python translation, each elaborate listener, and validate on a corpus of designs of different sizes. Useful for me regardless, and if you'd find numbers helpful for either #325 or v2 priority-setting, I'll make sure it's runnable. If an RFC opens for v2 at some point I'd want to be a test consumer.

That's it. Sorry if any of this is already obvious from your side - figured the editor angle was worth getting on the record while I had it fresh.

seimei-d · 2026-05-05T20:33:22Z

seimei-d
May 5, 2026
Author

Follow-up to my earlier post in this thread. I built the small benchmark I mentioned and ran it on a corpus of synthetic designs. Some of the numbers changed how I'd frame the original, so I want to put the corrections on record before they propagate.

Repo: https://github.com/seimei-d/systemrdl-compiler-bench

Numbers from one local run (WSL2 / Python 3.12 / systemrdl-compiler 1.32.2, median of 2-5 iterations, all in ms):

profile	regs	preprocess	parse+translate	root_visitor	deep_copy	elab walks (sum)	total
small	42	2	9	7	0	1	19
medium	532	14	57	42	2	6	120
large	4,146	52	295	292	4	22	665
huge	32,868	203	1,231	689	23	78	2,223
massive	5,000*	1,397	16,695	8,823	2,195	1,149	30,258

(*massive is flat: every register declared directly under the top addrmap, 32 fields each, no arrays. I'd originally aimed at 250k flat regs but the parser OOMed at 25k under a 10 GB virtual memory cap, and 250k crashed the host outright. 5k is the largest size that benchmarks reliably on a typical dev box.)

Three corrections to the original post.

1. The bottleneck is the front end, not elaboration. preprocess + parse_and_translate + root_visitor is ~95% of total time on every profile. The elaborate walks plus validate are well under 5% even on huge. When I went on about subtree elaboration caching and address-allocation sequencing, I was guessing in the wrong direction. The higher-leverage target is per-file parse-and-visit caching, not elaborate.

2. Flat designs scale much worse than arrayed. 33k regs via deep arrays takes 2.2 s. 5k flat regs (32 fields each) takes 30 s and 4.7 GB RSS, roughly 14x more wall time for ~7x fewer registers. This is what the existing design already gets right: arrays are virtual, so the parse tree, the Python ParseTree, and the component-definition tree all see one node per unique declaration, not per instance. Flat declarations give that up by construction. Worth flagging for v2 because peripheral / FIFO-heavy SoCs do have flat shapes, and a streaming parse-tree construction (vs a full Python mirror) would help that case directly.

3. #325 is real but smaller than I implied. Source-ref resolution is on the parse-phase hot path, but it's a slice of that phase rather than the dominant cost.

1 reply

amykyta3 May 5, 2026
Maintainer

This may be moot, but have you looked into using the tree-sitter parser instead? My understanding of tree-sitter is that it is far better suited for constantly changing input in code editors.
Of course the existing SystemRDL free-sitter library is not much more than the grammar for the language, and would leave a lot of work to be done to actually interpret the language.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SystemRDL

Notes from building an LSP on top of systemrdl-compiler — input for v2 #328

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

SystemRDL

Notes from building an LSP on top of systemrdl-compiler — input for v2 #328

Uh oh!

seimei-d May 5, 2026

Per-file ingestion is the missing primitive

Elaboration is non-destructive, which is great. Address allocation is the catch.

Source-ref resolution

What I can pitch in

Replies: 1 comment · 1 reply

Uh oh!

seimei-d May 5, 2026 Author

Uh oh!

amykyta3 May 5, 2026 Maintainer

seimei-d
May 5, 2026

Replies: 1 comment 1 reply

seimei-d
May 5, 2026
Author

amykyta3 May 5, 2026
Maintainer