Optimize waveform render loop and per-line string utility hot paths#12069
Merged
Conversation
Verified with BenchmarkDotNet (Apple M4, .NET 10, old vs new, identical outputs asserted before timing): - AudioVisualizer.GetShotChangeIndex: linear scan over all shot changes per frame -> binary search: 1013 ns -> 7.6 ns (134x) - DrawShotChanges: skip entirely with no shot changes (the common case, previously still rebuilt paragraph position sets), binary search the first visible entry and stop past the right edge - DrawParagraph selection check: List.Contains per visible paragraph (O(selection) each) -> HashSet rebuilt once per render - BuildWaveFormFancy: hoist WaveformColor/WaveformSelectedColor/ WaveformFancyHighColor StyledProperty reads out of the per-pixel loop - HitTestParagraph: resolve WavePeaks/ZoomFactor once per pointer move instead of per paragraph edge - Timeline FormattedText cache: same 8000-entry cap as the paragraph caches (grew unbounded when scrubbing in frame mode) - MainViewModel position timer: stop the per-tick current-line scan at the first line starting after the playhead (list is sorted) - HtmlUtil.RemoveColorTags/RemoveFontName: per-call uncompiled Regex -> static compiled: 1882 -> 282 ns (6.7x) / 2170 -> 358 ns (6.1x) - HtmlUtil.FixUpperTags: rescan-from-zero with 8 IndexOf passes and two full copies per tag -> single forward pass: 548 -> 64 ns (8.6x) - RegexUtils.ReplaceNewLineSafe/CountNewLineSafe: skip the SplitToLines + Join round-trips when the text is already \n-only: 182 -> 73 ns - StringExtensions.CountWords: char walk instead of Split allocations: 184 -> 134 ns, 82% less allocated - StringExtensions.RemoveAndSaveTags: O(n^2) Substring(index) per tag char -> AsSpan: ASSA lines 359 -> 198 ns, ~30% less allocated - CalcFactory.MakeCalculator: drop LINQ closure + fallback allocation - Utilities.CanBreak: drop the per-call ToArray copy of the cached no-break list Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance pass over the AudioVisualizer render path, the MainViewModel position timer, and the libse string utilities called per subtitle line. No caching added per request — these are algorithmic fixes (binary search, early exit, single-pass rewrites, hoisted invariants, static compiled regexes).
All changes verified with BenchmarkDotNet (Apple M4, .NET 10). Each old/new pair was asserted to produce identical output before timing.
Waveform / render loop (runs at ~60 fps during playback)
GetShotChangeIndex: linear scan over all shot changes per frame → binary search on the sorted list: 1013 ns → 7.6 ns (134×) with 2000 shot changes.DrawShotChanges: returns immediately when there are no shot changes (the common case — it previously still rebuilt the paragraph position sets every frame); binary-searches the first visible entry and breaks past the right edge instead of computing X positions for every shot change in the movie.DrawParagraph: selection check wasList.Containsper visible paragraph (O(selection) each — with select-all on a large subtitle that's millions of compares per frame); now aHashSetrebuilt once per render.BuildWaveFormFancy:WaveformColor/WaveformSelectedColor/WaveformFancyHighColorStyledProperty reads hoisted out of the per-pixel loop (they went through Avalonia's value store once per pixel column while scrolling/zooming).HitTestParagraph: resolvesWavePeaks.SampleRate * ZoomFactoronce per pointer move instead of twice per paragraph.FormattedTextcache gets the same 8000-entry cap as the paragraph text caches (it grew unbounded when scrubbing in frame mode).String utilities (called per line from casing fixes, batch convert, multiple replace, auto-break, grid repaints)
HtmlUtil.RemoveColorTagsHtmlUtil.RemoveFontNameHtmlUtil.FixUpperTagsRegexUtils.ReplaceNewLineSafe(\n-only text)StringExtensions.CountWordsStringExtensions.RemoveAndSaveTags(ASSA)RemoveColorTags/RemoveFontName: the regex was re-parsed on every call (uncompilednew Regex(...)) — now static compiled; also dropped the rest-of-stringSubstringper match.FixUpperTags: rescan-from-zero loop (8IndexOfpasses +Remove+Insertfull copies per tag) → single forward pass over a char buffer.ReplaceNewLineSafe/CountNewLineSafe: skip the twoSplitToLines+Joinround-trips when the text contains no\r/U+2028 (always true after the first rule in a multiple-replace run).RemoveAndSaveTags:input.Substring(index)(rest of string) allocated per</{/\just for aStartsWithcheck — nowAsSpan. The plain-HTML micro-case is ~13% slower on M4 but allocates 30% less and no longer degrades quadratically with line length; the ASSA case is 1.8× faster.CalcFactory.MakeCalculator(hit per grid cell repaint via CPS): dropped the LINQ closure and per-call fallbacknew CalcAll(); returns the existing shared instances.Utilities.CanBreak: dropped the per-callToArray()copy of the cached no-break list.Not addressed (fix would require caching, excluded by request): the spectrogram path rebuilding + converting a full bitmap every frame, and the position timer's per-tick copy+sort of all subtitles.
All 1003 tests pass (libse 485, UI 512, libuilogic 6).
🤖 Generated with Claude Code