Replies: 1 comment 1 reply
-
|
Follow-up to my earlier post in this thread. I built the small benchmark I mentioned and ran it on a corpus of synthetic designs. Some of the numbers changed how I'd frame the original, so I want to put the corrections on record before they propagate. Repo: https://github.com/seimei-d/systemrdl-compiler-bench Numbers from one local run (WSL2 / Python 3.12 / systemrdl-compiler 1.32.2, median of 2-5 iterations, all in ms):
(*massive is flat: every register declared directly under the top addrmap, 32 fields each, no arrays. I'd originally aimed at 250k flat regs but the parser OOMed at 25k under a 10 GB virtual memory cap, and 250k crashed the host outright. 5k is the largest size that benchmarks reliably on a typical dev box.) Three corrections to the original post. 1. The bottleneck is the front end, not elaboration. preprocess + parse_and_translate + root_visitor is ~95% of total time on every profile. The elaborate walks plus validate are well under 5% even on huge. When I went on about subtree elaboration caching and address-allocation sequencing, I was guessing in the wrong direction. The higher-leverage target is per-file parse-and-visit caching, not elaborate. 2. Flat designs scale much worse than arrayed. 33k regs via deep arrays takes 2.2 s. 5k flat regs (32 fields each) takes 30 s and 4.7 GB RSS, roughly 14x more wall time for ~7x fewer registers. This is what the existing design already gets right: arrays are virtual, so the parse tree, the Python ParseTree, and the component-definition tree all see one node per unique declaration, not per instance. Flat declarations give that up by construction. Worth flagging for v2 because peripheral / FIFO-heavy SoCs do have flat shapes, and a streaming parse-tree construction (vs a full Python mirror) would help that case directly. 3. #325 is real but smaller than I implied. Source-ref resolution is on the parse-phase hot path, but it's a slice of that phase rather than the dominant cost. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been working on systemrdl-pro for the past few months - VS Code extension plus a standalone LSP server, sitting on top of
systemrdl-compiler. After reading #268 I figured the most useful thing I could do was write down what I learned from the editor side, instead of filing a feature request that would pull in a different direction from where v2 is going.A bit of setup. An LSP is a long-lived process that recompiles after each meaningful keystroke (debounced). It needs diagnostics, hover, goto-definition, references, and elaborated values for whatever code is on screen, ideally under ~100ms before users start to notice. On smaller designs v1 is fine. On a few thousand registers with deep regfile arrays, it's not, and after reading through compiler.py, core/elaborate.py, walker.py, ComponentVisitor.py, and the speedy-antlr ext end to end, your diagnosis in #268 lines up with what I see. So this isn't a v1 ask.
There are three things v1 taught me that I think are worth flagging before v2 design solidifies, in case any of it is useful.
Per-file ingestion is the missing primitive
compile_fileis stateful and additive: the visitor mutatesself.root.comp_defsand the namespace as it goes (compiler.py:253). Calling it twice on the same path raises a duplicate-definition error inregister_type(core/namespace.py:50). The "discard on any exception" advice in the docstring is conservative for batch but expensive for editing, because the user is mid-typing most of the time and the compiler hits errors constantly.The natural change unit for an LSP is "this file's content changed." Today I rebuild the compiler from scratch on every reparse, which works but defeats any caching above it. If v2 exposed a file-replacement boundary - some way to say "remove this file's contributions and re-ingest" - it would unlock most of the LSP wins on its own. I don't have a strong opinion on the API; just flagging that the file to component-defs mapping is the right granularity, and v1 hides it.
A related thing that would help is clarifying which exceptions invalidate state vs which can be tolerated. The current blanket guidance pushes consumers toward throwing the whole compiler away.
Elaboration is non-destructive, which is great. Address allocation is the catch.
_elab_create_root_instdeep-copies the source tree and walks the copy (compiler.py:347, component.py:121-160). That's a strong base for caching. I went down the path of imagining a subtree cache keyed by(definition, resolved parameters), and then realised it doesn't work:StructuralPlacementListenerallocates addresses strictly sequentially among siblings (elaborate.py:520-526), and field packing has the same property within a register (elaborate.py:383-409). The same register definition with the same parameters lands at different addresses depending on what surrounds it.So if subtree reuse comes up in v2, the fingerprint needs sibling context, or placement needs to be expressible as a separable pass that's cheap to redo. I figured I'd post this in case it saves rediscovering.
Source-ref resolution
I'll keep this short because #325 already covers it from the peakrdl_html side. The same hot path hits LSPs harder - every diagnostic, hover, and goto resolves source positions, sometimes dozens per viewport refresh. I'll drop a comment over there with traces from my side rather than open a duplicate.
What I can pitch in
I'm planning to publish a small benchmark repo for my own work - profiles preprocess, parse, C++ to Python translation, each elaborate listener, and validate on a corpus of designs of different sizes. Useful for me regardless, and if you'd find numbers helpful for either #325 or v2 priority-setting, I'll make sure it's runnable. If an RFC opens for v2 at some point I'd want to be a test consumer.
That's it. Sorry if any of this is already obvious from your side - figured the editor angle was worth getting on the record while I had it fresh.
Beta Was this translation helpful? Give feedback.
All reactions