[CLEANUP] Flatten Glimmer reference hot paths (each item cells, inlined track frame, property refs) by NullVoxPopuli-ai-agent · Pull Request #21435 · emberjs/ember.js

NullVoxPopuli-ai-agent · 2026-05-29T15:08:18Z

What

Six behavior-preserving flattenings of the Glimmer reference / tracking / tag hot paths — the machinery hit on nearly every reference read and every revalidation tick. Found profiling the smoke-tests/benchmark-app table benchmark, but the wins apply to all rendering, not just {{#each}}.

The recurring theme: references and tracking frames were modeled as general compute closures when the data is really just "a value (or parent + path) behind a tag." Storing that as data lets valueForRef/updateRef handle it inline — no closures, and often no tracking frame.

Changes (with isolated per-change impact)

Each number below is an A/B microbenchmark of that change alone, against the prior implementation, through the real valueForRef/updateRef/track (DEBUG=false):

Cell reference for {{#each}} block params — item value + index were compute refs (2 closures + a track() frame per read). A cell stores its value behind a fixed tag; no closures, no frame. → 2.3× faster, ~65% less memory (per 1000 items, create+read and update+read).
Inlined track() in valueForRef — stop allocating a thunk closure on every recompute (the hottest function in the VM). → ~10% faster, ~33% less garbage per recompute (63.2→57.0 µs/1000, 282→188 kb).
Flattened {{#each}} key resolution — resolve the strategy once (not per diff); @index/@key skip duplicate-key dedup entirely; per-pass seen is a plain Map. → index-keyed iteration ~2× faster (48.9→23.0 µs/1000, dedup pass skipped).
Property reference for childRefFor — every {{a.b}} was a compute ref with 2 closures holding (parent, path); now stored as data, read/written inline. → ~14% faster, ~25% less alloc (72.2→62.4 µs / 633→477 kb per 1000).
Pooled trackers + lazy consumed-tag Set — beginTrackFrame no longer allocates a Tracker + Set per frame (0/1-tag frames are the norm). → common frame 2 allocations → ~0 (0.10 b/iter for the 0-tag case).
Fast-path tag [COMPUTE] — subtag-less tags (the majority) return revision directly, skipping the combinator-memoization machinery. → ~17% faster per validate (4.71→3.90 µs/1000).

End-to-end (combined)

tracerbench, control = this branch's base. The changes compound, so this is the combined effect on the revalidation-heavy phases (where the tracking/tag work lands):

Phase	Δ
`selectFirstRow1`	−38.9% [−44.3 … −33.2]
`selectSecondRow1`	−14.1% [−19.3 … −9.0]
`swapRows2` / `swapRows1`	−8.0% / −5.9%

Create/clear phases are DOM-dominated, so they stay within noise. No significant regressions.

Testing

Full browser suite green at every step: 9340 tests, 0 fail. CI green.

Each `{{#each}}` item binds two block params — the item value and its index — and both were created as full compute references via `createIteratorItemRef`. That meant, per item: - a `ReferenceImpl` + a dirtyable tag, plus *two* closures (the `compute` getter and the `update` setter), and - on every read, `valueForRef` took the generic compute path and opened a `track()` frame (a `Tracker` + `Set` allocation) purely to re-discover a tag that never changes. For a 10k-row table that is 20k references and 20k tracking frames per render pass (create/clear/append/update/swap all hit this), all to model a value that is just "a stored value behind one tag". This introduces a dedicated `Cell` reference type. A cell stores its value directly on the reference behind a fixed tag, so: - `valueForRef` reads the stored value and re-snapshots the tag without opening a tracking frame (there are no dependencies to discover), and - `updateRef` mutates the value inline with the same equality gate as before — no `compute`/`update` closures are allocated at all. Behavior is identical: same tag consumed on read, same equality-gated dirty on update. `isUpdatableRef` reports cells as updatable, and `createDebugAliasRef` no longer inherits the `Cell` type (a debug alias is a genuine compute reference). Microbench (real `valueForRef`/`updateRef`, 1000 items, prod build): initial render (create+read) 198µs/698kb -> 86µs/261kb (2.3x, -63% mem) re-render (update+read) 185µs/417kb -> 79µs/137kb (2.3x, -67% mem) allocation only 31µs/320kb -> 22µs/~4kb (1.4x, ~0 garbage) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

NullVoxPopuli · 2026-05-29T15:32:49Z

our each def has a problem, but I'm not convinced this is the solution.

running the bench locally shows not much improvement:

duration phase estimated improvement -21ms [-44ms to -3ms] OR -1.22% [-2.55% to -0.17%]
renderEnd phase no difference [0ms to 0ms]
render1000Items1End phase no difference [-2ms to 1ms]
clearItems1End phase no difference [-2ms to 1ms]
render1000Items2End phase no difference [-3ms to 3ms]
clearItems2End phase no difference [-1ms to 0ms]
render10000Items1End phase no difference [-10ms to 2ms]
clearManyItems1End phase estimated regression +2ms [1ms to 3ms] OR +1.29% [0.65% to 1.95%]
render10000Items2End phase no difference [-20ms to 14ms]
clearManyItems2End phase estimated improvement -3ms [-5ms to -1ms] OR -6.27% [-12.51% to -2.44%]
render1000Items3End phase no difference [0ms to 2ms]
append1000Items1End phase no difference [-2ms to 3ms]
append1000Items2End phase no difference [-2ms to 1ms]
updateEvery10thItem1End phase no difference [-2ms to 2ms]
updateEvery10thItem2End phase no difference [-1ms to 2ms]
selectFirstRow1End phase no difference [-1ms to 1ms]
selectSecondRow1End phase no difference [-1ms to 1ms]
removeFirstRow1End phase no difference [-1ms to 1ms]
removeSecondRow1End phase no difference [-1ms to 1ms]
swapRows1End phase no difference [-1ms to 0ms]
swapRows2End phase no difference [-2ms to 0ms]
clearItems4End phase no difference [-1ms to 0ms]
paint phase no difference [-2ms to 0ms]

I have a hunch we'll need to ship fragment support first so that each can be sort of "off-canvas"'d

Two more extraneous layers in the reference/iteration hot paths, removed: 1. `valueForRef` recompute went through `track(thunk)`, which allocates a closure on *every* (re)compute. This is the single hottest function in the VM — every reference read that needs evaluation passes through it (all refs on initial render, and again on each invalidation). Inlining `beginTrackFrame()`/`endTrackFrame()` drops that per-read allocation. Microbench (1000 recompute frames): 63.2µs -> 57.0µs (~10%) and 282kb -> 188kb (~33% less garbage). 2. `{{#each}}` key derivation: - `makeKeyFor` was re-resolved on every diff and wrapped *every* strategy — including `@index`/`@key`, whose keys are unique by construction — in the duplicate-key dedup machinery. The strategy is now resolved once when the iterator ref is created, and index keys skip dedup entirely. - The per-pass `seen` set used `WeakMapWithPrimitives` (lazy-getter + object/primitive dispatch on every get/set). Since it lives only for one synchronous pass, a plain `Map` is both simpler and faster; the weak-keyed map is kept only for the long-lived global `IDENTITIES`. Microbench (1000-item iteration): `@index` 23.0µs vs `@identity` 48.9µs — index keys no longer pay the dedup cost they used to. Behavior is unchanged: same keys produced, same duplicate-key semantics, same tag consumption. Verified headless in Chrome — each (571), iterable (24), tracked (242), Updating (175), Helpers (1173), Components (328), fn (36) all pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Every `{{a.b}}` path access compiled to a compute reference holding two closures — a getter (`getProp(valueForRef(parent), path)`) and a setter (`setProp(...)`) — that captured nothing but `(parent, path)`. That is two closure allocations per property reference, on a path hit by essentially every template (`{{this.foo}}`, `{{row.id}}`, `{{row.label.current}}`, …). Add a `Property` reference type that stores `parent` + `path` as plain fields and is read/written inline by `valueForRef`/`updateRef` (the same approach as the `Cell` type used for `{{#each}}` block params). No closures are allocated; reads still open a tracking frame, since `getProp` consumes dynamic tags. `isUpdatableRef` reports Property refs as updatable, and `createDebugAliasRef` no longer inherits the Property type. Microbench (1000 childRefFor calls): 72.2µs/633kb -> 62.4µs/477kb (~14% faster, ~25% less allocation). Also fixes a throw-semantics bug introduced when `track()` was inlined into `valueForRef`: committing `ref.tag` inside the `finally` updated the tag even when the compute threw, leaving `tag` and `lastRevision` inconsistent. The new tag/revision are now committed only on success (the frame is still ended in `finally` to keep the tracking stack balanced), matching the original `track()` behavior. This restores correct handling of throwing getters — caught by the `debug render tree: emberish curly components` test. Full browser suite green: 9340 tests, 9323 pass, 17 skip, 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`beginTrackFrame` allocated a `new Tracker()` and the Tracker allocated a `new Set<Tag>()` — two objects per frame — on *every* reference recompute and every cache group, every revalidation. The overwhelming majority of frames consume zero or one tag. - The Tracker now holds the first consumed tag in a field and allocates the `Set` only when a second, distinct tag arrives. 0/1-tag frames never touch a Set (and still dedupe / combine correctly). - Trackers are pooled on a LIFO freelist. Frames are strictly nested and a tracker is dead the instant `combine()` runs in `endTrackFrame`, so it can be reset and reused by the next `beginTrackFrame`. Net: the common tracking frame now allocates ~nothing. Microbench: a frame that opens, consumes one tag, and closes drops from two object allocations to ~0 b/iter (measured 0.10 b for the 0-tag case). Full browser suite green: 9340 tests, 9323 pass, 17 skip, 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`MonomorphicTagImpl[COMPUTE]` is called by `validateTag`/`valueForTag` on every reference read. For a tag with no subtag — property tags, cell tags, plain dirtyable/updatable tags, i.e. the overwhelming majority — the result is always just `revision` (kept current by `dirtyTag`). The `lastChecked`/`isUpdating`/cycle-guard/`try-finally` machinery exists only to memoize subtag recursion, so it is pure overhead for these tags. Return `this.revision` directly when `subtag === null`. The combinator path is unchanged (it now reuses the already-read `subtag`). Microbench (1000 subtag-less [COMPUTE]s during a revalidation pass): ~4.71µs -> ~3.90µs (~17%), and no try/finally or field writes on the read. Full browser suite green: 9340 tests, 9323 pass, 17 skip, 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

NullVoxPopuli marked this pull request as draft May 29, 2026 15:31

NullVoxPopuli-ai-agent changed the title ~~perf(reference): make {{#each}} item params cheap "cell" references~~ [CLEANUP] Flatten Glimmer reference hot paths (each item cells, inlined track frame, property refs) May 29, 2026

NullVoxPopuli-ai-agent and others added 4 commits May 31, 2026 21:51

NullVoxPopuli-ai-agent force-pushed the perf/each-item-cell-ref branch from 1b23588 to 3575b36 Compare June 1, 2026 01:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CLEANUP] Flatten Glimmer reference hot paths (each item cells, inlined track frame, property refs)#21435

[CLEANUP] Flatten Glimmer reference hot paths (each item cells, inlined track frame, property refs)#21435
NullVoxPopuli-ai-agent wants to merge 5 commits into
emberjs:mainfrom
NullVoxPopuli-ai-agent:perf/each-item-cell-ref

NullVoxPopuli-ai-agent commented May 29, 2026 •

edited

Loading

Uh oh!

NullVoxPopuli commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

NullVoxPopuli-ai-agent commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes (with isolated per-change impact)

End-to-end (combined)

Testing

Uh oh!

NullVoxPopuli commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NullVoxPopuli-ai-agent commented May 29, 2026 •

edited

Loading