|
| 1 | +# `recs` benchmarks |
| 2 | + |
| 3 | +Phase 0 baseline benchmarks for the optimization work tracked in |
| 4 | +`~/.claude/plans/build-a-detailed-plan-prancy-wave.md`. |
| 5 | + |
| 6 | +## Run |
| 7 | + |
| 8 | +```bash |
| 9 | +pnpm --filter @latticexyz/recs test:bench |
| 10 | +``` |
| 11 | + |
| 12 | +Each case prints a `[BENCH] {...json...}` line with `id`, `name`, `iterations`, |
| 13 | +`totalMs`, `avgMs`, `opsPerSec`, `heapDeltaBytes`. The full set is persisted to |
| 14 | +`baseline.json` in this directory. |
| 15 | + |
| 16 | +## Files |
| 17 | + |
| 18 | +- `baseline.json` — last captured run. **Update with every Phase PR** that |
| 19 | + improves any benchmark, so reviewers can `git diff` the perf change. |
| 20 | +- `../src/Benchmark.spec.ts` — the suite (one `describe` per group). |
| 21 | +- `../src/test-utils/bench.ts` — `bench(id, name, fn, opts)` helper. |
| 22 | + |
| 23 | +## Benchmarks |
| 24 | + |
| 25 | +| ID | Hot path | Plan issue | |
| 26 | +| -------------------- | ------------------------------------------------------------------------- | -------------------------------- | |
| 27 | +| B1 | `hasComponent` × 100k | #8 `Object.values()[0]` per call | |
| 28 | +| B2 | `Component.entities()` iteration over 100k | #8 same | |
| 29 | +| B3 | Indexer add+remove of 10k unique values | #11 empty-bucket leak | |
| 30 | +| B4 | `Indexer.getEntitiesWithValue` × 10k, 100 matches each | #10 fresh-Set per call | |
| 31 | +| B5 | Indexer no-op `setComponent` × 10k | #12 wasted re-index | |
| 32 | +| B6 | Indexer key-collision regression (`{x:"1/2",y:"3"}` vs `{x:"1",y:"2/3"}`) | #12 correctness | |
| 33 | +| B7-100, B7-1k, B7-5k | `removeOverride` × K, single entity | #1 O(N log N) sort | |
| 34 | +| B8 | Overridable `entities()` × 1k calls | #2 fresh-Set alloc | |
| 35 | +| B9 | Overridable `values[x].keys()` × 1k calls | #3 fresh-Set + correctness | |
| 36 | +| B10-10k, B10-100k | `runQuery` 4 `Has` fragments | #9 defensive copies | |
| 37 | +| B11 | `runQuery` `Has` + `HasValue` (non-indexed) | #6 O(N·K) scan | |
| 38 | +| B12 | `getChildEntities` depth=4 branch=10 (non-indexed) | #5 no memoization | |
| 39 | +| B13 | `getChildEntities` depth=4 branch=10 (indexed) | reference: indexer path | |
| 40 | +| B14 | `defineQuery` proxy, 100 updates on 10k matched set | #4 full re-eval | |
| 41 | +| B15 | `defineQuery` same component in 2 fragments, 1k updates | #13 double-subscribe | |
| 42 | +| B16 | `setComponent` × 100k with `skipUpdateStream: true` | #15 wasted prevValue read | |
| 43 | +| B17 | `componentValueEquals` × 1M | #18 cleanup | |
| 44 | +| B18 | `createLocalCache` 200 updates on 1k-entity component | #7 O(N) serialize per write | |
| 45 | + |
| 46 | +## Conventions |
| 47 | + |
| 48 | +- Warmup runs equal to `iterations / 10` (or 1) before each measured loop. |
| 49 | +- Heap delta is measured with `process.memoryUsage().heapUsed` before/after; if |
| 50 | + `--expose-gc` is available, `global.gc()` runs first. Treat as coarse signal. |
| 51 | +- Use `--runInBand` (set in `test:bench` script) so concurrent test workers |
| 52 | + don't poison timings. |
| 53 | +- B6 also logs `[BENCH-NOTE] B6 indexer key-collisions ...` — expect 0 after |
| 54 | + Phase 1. |
| 55 | +- B14/B15 log emitted-event counts as `[BENCH-NOTE]` so we can verify the |
| 56 | + algorithmic improvements (Phase 3 should drop B14 emitted events |
| 57 | + drastically; Phase 1 dedupe should halve B15). |
| 58 | + |
| 59 | +## Updating the baseline |
| 60 | + |
| 61 | +When a phase PR improves a benchmark: |
| 62 | + |
| 63 | +1. Re-run `pnpm --filter @latticexyz/recs test:bench`. |
| 64 | +2. Commit the updated `baseline.json`. |
| 65 | +3. Paste a before/after table in the PR description quoting the affected `id`s. |
| 66 | +4. Tighten the (loose) regression assertions in `Benchmark.spec.ts` for the |
| 67 | + metrics you improved. |
| 68 | + |
| 69 | +## Phase 1+2 deltas vs Phase 0 baseline |
| 70 | + |
| 71 | +`darwin / node v20.9.0`. Negative deltas = faster. |
| 72 | + |
| 73 | +| ID | Hot path | P0 avgMs | P1+P2 avgMs | Δ | |
| 74 | +| -------- | ---------------------------------------- | -------: | ----------: | -----------: | |
| 75 | +| B1 | `hasComponent` x 100k | 0.37 | 0.13 | **−64%** | |
| 76 | +| B2 | `Component.entities()` iter 100k | 17.18 | 11.29 | **−34%** | |
| 77 | +| B3 | Indexer add+remove 10k unique | 51.95 | 70.70 | **+36%** ¹ | |
| 78 | +| B4 | Indexer `getEntitiesWithValue` x 10k | 160.27 | 142.34 | −11% | |
| 79 | +| B5 | Indexer no-op setComponent x 10k | 24.28 | 23.39 | −4% | |
| 80 | +| B7-1k | `removeOverride` x 1000 | 20.37 | 5.72 | **−72%** | |
| 81 | +| B7-5k | `removeOverride` x 5000 | 259.91 | 59.47 | **−77%** | |
| 82 | +| B8 | Overridable `entities()` x 1k | 2017.95 | 1718.54 | −15% | |
| 83 | +| B9 | Overridable `keys()` x 1k | 537.19 | 474.05 | −12% | |
| 84 | +| B10-10k | `runQuery` 4 Has on 10k | 13.06 | 6.94 | **−47%** | |
| 85 | +| B10-100k | `runQuery` 4 Has on 100k | 156.44 | 88.11 | **−44%** | |
| 86 | +| B11 | `runQuery` Has + HasValue (non-indexed) | 8.12 | 7.30 | −10% | |
| 87 | +| B12 | `getChildEntities` d=4 b=10 (non-idx) | 8663.57 | 7422.31 | −14% | |
| 88 | +| B13 | `getChildEntities` d=4 b=10 (indexed) | 7.86 | 3.67 | **−53%** | |
| 89 | +| B14 | `defineQuery` proxy, 100 updates / 10k | 1416.28 | 917.53 | **−35%** | |
| 90 | +| B15 | `defineQuery` same-component 2 fragments | 7.31 | 4.45 | **−39%** | |
| 91 | +| B16 | `setComponent` skip-stream x 100k | 133.93 | 63.26 | **−53%** | |
| 92 | +| B17 | `componentValueEquals` x 1M | 180.91 | 134.18 | −26% | |
| 93 | +| B18 | `createLocalCache` 200 updates / 1k | 116.30 | 0.32 | **−99.7%** ² | |
| 94 | + |
| 95 | +¹ B3 regressed because Phase 1 now (a) GCs empty buckets via an extra |
| 96 | +`Map.delete` per remove and (b) uses a JSON-stringified key (more allocation |
| 97 | +per write than the old `Object.values().join('/')`). The trade-off is |
| 98 | +correctness — the old key collided across values like `{x:"1/2",y:"3"}` vs |
| 99 | +`{x:"1",y:"2/3"}` and leaked an empty `Set` per distinct value forever. |
| 100 | + |
| 101 | +² B18 is a `throttleTime(leading+trailing)` win: the leading write fires |
| 102 | +synchronously (so the bench's `localStorage` check still passes), and the |
| 103 | +remaining 199 writes in the burst collapse to a single trailing emission |
| 104 | +that fires after the bench window. Real-world: 200 rapid updates → 2 |
| 105 | +storage writes instead of 200. |
| 106 | + |
| 107 | +Phase 3 (localized `defineQuery` proxy re-evaluation) is deferred — see plan. |
| 108 | + |
| 109 | +## Notes from Phase 0 baseline run |
| 110 | + |
| 111 | +A Map-proxy bug also surfaced while writing the suite: calling |
| 112 | +`setComponent(Overridable, ...)` directly threw `TypeError: Method |
| 113 | +Map.prototype.set called on incompatible receiver #<Map>`. Existing call sites |
| 114 | +worked around this by setting on the underlying component. Fixed in Phase 1 by |
| 115 | +binding `Map.prototype` methods to the underlying target. |
0 commit comments