[CLEANUP] Flatten Glimmer reference hot paths (each item cells, inlined track frame, property refs)#21435
Draft
NullVoxPopuli-ai-agent wants to merge 5 commits into
Draft
Conversation
Each `{{#each}}` item binds two block params — the item value and its
index — and both were created as full compute references via
`createIteratorItemRef`. That meant, per item:
- a `ReferenceImpl` + a dirtyable tag, plus *two* closures (the
`compute` getter and the `update` setter), and
- on every read, `valueForRef` took the generic compute path and opened
a `track()` frame (a `Tracker` + `Set` allocation) purely to
re-discover a tag that never changes.
For a 10k-row table that is 20k references and 20k tracking frames per
render pass (create/clear/append/update/swap all hit this), all to model
a value that is just "a stored value behind one tag".
This introduces a dedicated `Cell` reference type. A cell stores its
value directly on the reference behind a fixed tag, so:
- `valueForRef` reads the stored value and re-snapshots the tag without
opening a tracking frame (there are no dependencies to discover), and
- `updateRef` mutates the value inline with the same equality gate as
before — no `compute`/`update` closures are allocated at all.
Behavior is identical: same tag consumed on read, same equality-gated
dirty on update. `isUpdatableRef` reports cells as updatable, and
`createDebugAliasRef` no longer inherits the `Cell` type (a debug alias
is a genuine compute reference).
Microbench (real `valueForRef`/`updateRef`, 1000 items, prod build):
initial render (create+read) 198µs/698kb -> 86µs/261kb (2.3x, -63% mem)
re-render (update+read) 185µs/417kb -> 79µs/137kb (2.3x, -67% mem)
allocation only 31µs/320kb -> 22µs/~4kb (1.4x, ~0 garbage)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
our each def has a problem, but I'm not convinced this is the solution. running the bench locally shows not much improvement: I have a hunch we'll need to ship fragment support first so that each can be sort of "off-canvas"'d |
Two more extraneous layers in the reference/iteration hot paths, removed:
1. `valueForRef` recompute went through `track(thunk)`, which allocates a
closure on *every* (re)compute. This is the single hottest function in
the VM — every reference read that needs evaluation passes through it
(all refs on initial render, and again on each invalidation). Inlining
`beginTrackFrame()`/`endTrackFrame()` drops that per-read allocation.
Microbench (1000 recompute frames): 63.2µs -> 57.0µs (~10%) and
282kb -> 188kb (~33% less garbage).
2. `{{#each}}` key derivation:
- `makeKeyFor` was re-resolved on every diff and wrapped *every*
strategy — including `@index`/`@key`, whose keys are unique by
construction — in the duplicate-key dedup machinery. The strategy is
now resolved once when the iterator ref is created, and index keys
skip dedup entirely.
- The per-pass `seen` set used `WeakMapWithPrimitives` (lazy-getter +
object/primitive dispatch on every get/set). Since it lives only for
one synchronous pass, a plain `Map` is both simpler and faster; the
weak-keyed map is kept only for the long-lived global `IDENTITIES`.
Microbench (1000-item iteration): `@index` 23.0µs vs `@identity`
48.9µs — index keys no longer pay the dedup cost they used to.
Behavior is unchanged: same keys produced, same duplicate-key semantics,
same tag consumption. Verified headless in Chrome — each (571), iterable
(24), tracked (242), Updating (175), Helpers (1173), Components (328), fn
(36) all pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Every `{{a.b}}` path access compiled to a compute reference holding two
closures — a getter (`getProp(valueForRef(parent), path)`) and a setter
(`setProp(...)`) — that captured nothing but `(parent, path)`. That is two
closure allocations per property reference, on a path hit by essentially
every template (`{{this.foo}}`, `{{row.id}}`, `{{row.label.current}}`, …).
Add a `Property` reference type that stores `parent` + `path` as plain
fields and is read/written inline by `valueForRef`/`updateRef` (the same
approach as the `Cell` type used for `{{#each}}` block params). No closures
are allocated; reads still open a tracking frame, since `getProp` consumes
dynamic tags. `isUpdatableRef` reports Property refs as updatable, and
`createDebugAliasRef` no longer inherits the Property type.
Microbench (1000 childRefFor calls): 72.2µs/633kb -> 62.4µs/477kb
(~14% faster, ~25% less allocation).
Also fixes a throw-semantics bug introduced when `track()` was inlined into
`valueForRef`: committing `ref.tag` inside the `finally` updated the tag even
when the compute threw, leaving `tag` and `lastRevision` inconsistent. The
new tag/revision are now committed only on success (the frame is still ended
in `finally` to keep the tracking stack balanced), matching the original
`track()` behavior. This restores correct handling of throwing getters —
caught by the `debug render tree: emberish curly components` test.
Full browser suite green: 9340 tests, 9323 pass, 17 skip, 0 fail.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`beginTrackFrame` allocated a `new Tracker()` and the Tracker allocated a `new Set<Tag>()` — two objects per frame — on *every* reference recompute and every cache group, every revalidation. The overwhelming majority of frames consume zero or one tag. - The Tracker now holds the first consumed tag in a field and allocates the `Set` only when a second, distinct tag arrives. 0/1-tag frames never touch a Set (and still dedupe / combine correctly). - Trackers are pooled on a LIFO freelist. Frames are strictly nested and a tracker is dead the instant `combine()` runs in `endTrackFrame`, so it can be reset and reused by the next `beginTrackFrame`. Net: the common tracking frame now allocates ~nothing. Microbench: a frame that opens, consumes one tag, and closes drops from two object allocations to ~0 b/iter (measured 0.10 b for the 0-tag case). Full browser suite green: 9340 tests, 9323 pass, 17 skip, 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`MonomorphicTagImpl[COMPUTE]` is called by `validateTag`/`valueForTag` on every reference read. For a tag with no subtag — property tags, cell tags, plain dirtyable/updatable tags, i.e. the overwhelming majority — the result is always just `revision` (kept current by `dirtyTag`). The `lastChecked`/`isUpdating`/cycle-guard/`try-finally` machinery exists only to memoize subtag recursion, so it is pure overhead for these tags. Return `this.revision` directly when `subtag === null`. The combinator path is unchanged (it now reuses the already-read `subtag`). Microbench (1000 subtag-less [COMPUTE]s during a revalidation pass): ~4.71µs -> ~3.90µs (~17%), and no try/finally or field writes on the read. Full browser suite green: 9340 tests, 9323 pass, 17 skip, 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1b23588 to
3575b36
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Six behavior-preserving flattenings of the Glimmer reference / tracking / tag hot paths — the machinery hit on nearly every reference read and every revalidation tick. Found profiling the
smoke-tests/benchmark-apptable benchmark, but the wins apply to all rendering, not just{{#each}}.The recurring theme: references and tracking frames were modeled as general compute closures when the data is really just "a value (or parent + path) behind a tag." Storing that as data lets
valueForRef/updateRefhandle it inline — no closures, and often no tracking frame.Changes (with isolated per-change impact)
Each number below is an A/B microbenchmark of that change alone, against the prior implementation, through the real
valueForRef/updateRef/track(DEBUG=false):Cellreference for{{#each}}block params — item value + index were compute refs (2 closures + atrack()frame per read). A cell stores its value behind a fixed tag; no closures, no frame. → 2.3× faster, ~65% less memory (per 1000 items, create+read and update+read).track()invalueForRef— stop allocating a thunk closure on every recompute (the hottest function in the VM). → ~10% faster, ~33% less garbage per recompute (63.2→57.0 µs/1000, 282→188 kb).{{#each}}key resolution — resolve the strategy once (not per diff);@index/@keyskip duplicate-key dedup entirely; per-passseenis a plainMap. → index-keyed iteration ~2× faster (48.9→23.0 µs/1000, dedup pass skipped).Propertyreference forchildRefFor— every{{a.b}}was a compute ref with 2 closures holding(parent, path); now stored as data, read/written inline. → ~14% faster, ~25% less alloc (72.2→62.4 µs / 633→477 kb per 1000).Set—beginTrackFrameno longer allocates aTracker+Setper frame (0/1-tag frames are the norm). → common frame 2 allocations → ~0 (0.10 b/iter for the 0-tag case).[COMPUTE]— subtag-less tags (the majority) returnrevisiondirectly, skipping the combinator-memoization machinery. → ~17% faster per validate (4.71→3.90 µs/1000).End-to-end (combined)
tracerbench, control = this branch's base. The changes compound, so this is the combined effect on the revalidation-heavy phases (where the tracking/tag work lands):
selectFirstRow1selectSecondRow1swapRows2/swapRows1Create/clear phases are DOM-dominated, so they stay within noise. No significant regressions.
Testing
Full browser suite green at every step: 9340 tests, 0 fail. CI green.