[feat] planner: pick dynamicemb HYBRID vs CACHING from topology budgets by tiankongdeguiji · Pull Request #508 · alibaba/TorchEasyRec

tiankongdeguiji · 2026-05-12T14:14:02Z

Summary

Make tzrec's embedding sharding planner take HBM and host-memory limits from the topology and decide per dynamicemb table whether to use HYBRID (HBM + DRAM hash-partitioned) or CACHING (HBM cache + full DRAM backing) mode. Per the consensus on NVIDIA/recsys-examples#306 (closed), both placement modes are kept and the planner — not the user — picks the best one under the joint budget.

Enumerator emits 20 ShardingOptions per dynamicemb table: { HYBRID, CACHING } × { cache_load_factor 0.1, ..., 1.0 }. The base ShardingOption is deep-copied once per variant; dynamicemb_options.caching is then flipped in-place on the already-fresh per-variant copy.
Storage estimator is mode-aware: HYBRID DDR = (1 − cache_load_factor) · T, CACHING DDR = T (full backing store). HBM accounting and hash-table metadata stay HBM-side only, matching the existing tzrec convention.
Perf model is fitted from an on-device dynamicemb sweep (experiments/sweep_20260513-161030/full_a10gpu1.json: 4M-row table, dim=128, adam, pow-law alpha=1.05, A10 GPU). Median fwd+bwd latency clustered into three regimes — HYBRID@1.0 = 0.80 ms (HBM-only fast path), CACHING@<1.0 = 2.63 ms (~3.3x slower), HYBRID@<1.0 = 5.44 ms (~6.8x slower). Inverting the linear bw model with torchrec defaults gives x_eff bases 0.28 (CACHING) and 0.11 (HYBRID), with x_eff = 1.0 at cache_load_factor = 1.0 reflecting the runtime kernel switch. A small +0.01·x tiebreaker preserves a strict DP ranking within each block.
DynamicProgrammingProposer is now 2D: discretizes per-rank HBM (100 bins/dev) and per-machine DDR (50 bins/dev) and picks per-table options that minimize total perf under both budgets. HBM prune is per-device, DDR prune is per-machine (DDR is host-shared across co-located ranks). Per-axis bin count is capped at 1024 to avoid blow-up on multi-host worlds. Backtracking emits one proposal per HBM bin — the perf-best plan across all DDR bins at that level — so the proposal count is at most _DP_AXIS_BIN_CAP regardless of ddr_bins.
Per-table planning log. _to_sharding_plan emits one INFO line per dynamicemb table on rank 0 so the picked mode is visible without grepping the runtime print. Numbers come from the storage estimator's shards[0].storage.{hbm,ddr} (true HBM footprint including hash-table metadata + counter + pipeline I/O; mode-aware DDR), with an HBM_ONLY override at cache_load_factor == 1.0 to reflect what the runtime actually does:
```
[dynamicemb plan] …cat_0_emb: mode=CACHING cache_load_factor=0.20 local_hbm=1.620GiB local_dram=2.384GiB
[dynamicemb plan] …cat_5_emb: mode=HBM_ONLY cache_load_factor=1.00 local_hbm=4.000GiB local_dram=0.000GiB
```
Upstream _print_memory_consume is mode-blind (always dram = total − hbm) and strips metadata from HBM; this log line has the honest numbers.
No new proto fields — budgets come straight from Topology.hbm_cap / Topology.ddr_cap minus the existing reservation.
Version bump to 1.2.13.

Test plan

tzrec/utils/dynamicemb_util_test.py — storage formula + empirical perf-model + direct wrapper tests + PlanLogLineTest covering HBM_ONLY-override / RANK gating / GiB unit / cache_load_factor>1.0 regression
tzrec/utils/plan_util_test.py — variant emission + 2D DP unit tests including multi-table mixed-modes, multi-host (4 GPUs × 2 hosts) per-machine DDR prune, empty search space, and the headline DP tests
CUDA-gated end-to-end enumerate + DP test (folded into plan_util_test.py)
pre-commit run --files <changed> clean
Verified the planning log against a 27-table 5-CACHING-table real model run; mode column matches the chosen caching flag, local_hbm / local_dram are the true footprints (the upstream runtime print under-reports both)
Smoke test on a real dynamicemb config end-to-end via torchrun --nproc-per-node=2 -m tzrec.train_eval ...

🤖 Generated with Claude Code

…ING) Extend `_calculate_dynamicemb_table_storage_specific_size` and its callers with a `caching: bool` parameter. HYBRID keeps the original split formula (DDR = (1 - cache_ratio) * T); CACHING accounts the full backing store on host (DDR = T) regardless of cache_ratio. HBM accounting and metadata (key + score + digest + bucket header) stay identical between modes. The flag is sourced from `dynamicemb_options.caching` on the ShardingOption in `dynamicemb_calculate_shard_storages`. Subsequent commits wire this into the enumerator (per-table sweep over both modes) and the proposer (2D DP). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add `_dynamicemb_effective_cache_ratio(cache_load_factor, caching, stats)`: HYBRID passes the ratio through (hash-partitioned HBM hit probability); CACHING boosts to `min(1.0, x * multiplier)`, default multiplier 10.0 via the `TZREC_CACHING_HIT_RATE_MULTIPLIER` env var. `cache_params.stats`, when provided, takes precedence — `1 - stats.expected_miss_rate(x)` — and is clamped to never fall below the HYBRID ratio. Inject the boost at planning time by monkey-patching `ShardPerfContext.build_shard_perf_contexts` to temporarily replace the sharding option's `cache_params.load_factor` with the effective ratio. The original cache_params is restored on return so the storage estimator (which reads cache_load_factor for HBM/DDR sizing) still sees the unboosted value. Net effect: at the same `cache_load_factor`, CACHING is strictly cheaper in perf and strictly more expensive in DDR. The DP proposer (next commit) trades one for the other against the topology budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extract the per-dynamicemb-table variant emission into `_emit_dynamicemb_variants(base, dynamicemb_options) -> List[ShardingOption]` and sweep both modes per cache_load_factor. When `cache_params` is unset, the helper emits 20 variants (10 factors × 2 modes); when fixed by the caller, it emits 2 (both modes at the fixed factor). Each variant owns a deep-copied `dynamicemb_options` so per-variant `caching` mutations stay isolated. `cache_params.stats` is preserved across all clones for the perf-side miss-rate override. The downstream 2D DP proposer (next commit) selects per table the (mode, ratio) pair that fits both HBM and host budgets while minimizing perf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rewrite `DynamicProgrammingProposer.feedback` to discretize both per-rank HBM and per-rank DDR into bins (default 100 × 50 per device) and pick per table the (mode, cache_load_factor) pair that minimizes total perf while fitting both topology budgets. State: dp[table][hbm_bin][ddr_bin] = (perf, hbm_actual, ddr_actual) backtrack[table][hbm_bin][ddr_bin] = (opt_id, prev_hbm_bin, prev_ddr_bin) Options whose largest shard exceeds either per-device cap are pruned upfront. Backtracking enumerates feasible (hbm_total, ddr_total) cells in decreasing total memory, yielding Pareto-optimal proposals. The legacy `mem_bins_per_device` kwarg is preserved as an alias for `hbm_bins_per_device` so existing callers still work, and HBM-degenerate (CPU-only) or DDR-degenerate topologies collapse the unused axis to a single bin. Adds `DynamicProgrammingProposer2DTest`: - generous DDR → CACHING wins (cheaper perf, host can fit T) - tight DDR → CACHING rejected, falls back to HYBRID - high cache_load_factor → modes converge Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Exercise the full planning pipeline on a tiny TestSparseNN backed by a DynamicEmbParameterConstraints constraint. Asserts: - EmbeddingEnumerator yields 20 sharding options (2 modes × 10 factors) - All cache_load_factors and both caching modes are represented - Storage and perf estimators populate non-zero values for each option - DynamicProgrammingProposer.propose() returns feasible plans whose dynamicemb options carry a valid caching flag Gated on has_dynamicemb + torch.cuda.is_available() since the dynamicemb sharder requires CUDA at planning time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…apper (1) Restore the byte-budget framing in `_calculate_dynamicemb_table_storage_specific_size` docstring (`total_value_memory`, `num_buckets`, `hbm_budget`, `ddr_budget`) that was lost in the PR alibaba#508 rewrite, extending `ddr_budget` to cover both HYBRID and CACHING modes in one place. (2) Simplify `_dynamicemb_aware_build_shard_perf_contexts`: single return, no try/finally, no early-return branch. The unconditional `sharding_option.cache_params = original_cache_params` line at the end is a no-op when caching is False (we never mutated the field) and restores the original cache_params when caching is True. If the wrapped call raises, planning aborts and the option is discarded -- no need for try/finally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DDR is host-shared across ranks co-located on one machine, so the per-option fit check in `DynamicProgrammingProposer.feedback` should compare `max_shard_ddr` against the largest machine's total DDR pool, not against any single rank's slice. HBM stays per-device (each GPU has its own HBM; no cross-rank sharing). Compute `max_machine_ddr` by summing per-device DDR within each contiguous `local_world_size`-sized window. Update the test fake-topology fixture to carry `local_world_size` (defaults to num_devices). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… copyright (5) Move `PlanUtilDynamicEmbE2ETest` from the standalone `tzrec/utils/plan_util_dynamicemb_e2e_test.py` into `plan_util_test.py` where the rest of the plan_util tests already live. Delete the standalone file. (6) Rename `DynamicProgrammingProposer2DTest` -> `DynamicProgrammingProposerTest`. The proposer class itself stays `DynamicProgrammingProposer`. (7) Bump copyright `2024 -> 2026` in the new file `tzrec/utils/dynamicemb_util_test.py`. The header in plan_util_test.py stays at 2024 since it was a pre-existing file we only extended. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…urve Replace the heuristic ``_dynamicemb_effective_cache_ratio`` curve with constants fitted from an on-device perf sweep (``experiments/sweep_20260513-161030/full_a10gpu1.json``: 4M-row table, dim=128, adam, pow-law alpha=1.05, A10 GPU). Median fwd+bwd latency clustered into three regimes: HYBRID @ x=1.0: 0.80 ms (HBM-only fast path) CACHING @ x<1.0: 2.63 ms (~3.3x slower) HYBRID @ x<1.0: 5.44 ms (~6.8x slower) Inverting the linear bw model bw = x_eff*HBM + (1-x_eff)*HBM_TO_DDR (torchrec defaults 897 / 32 GB/s) gives x_eff bases 0.28 (CACHING) and 0.11 (HYBRID). At x = 1.0 the runtime drops the host tier in both modes, so x_eff = 1.0 there. A +0.01*x tiebreaker term keeps the DP's ranking strict within each block. Also extend the ``_dynamicemb_aware_build_shard_perf_contexts`` wrapper to apply the empirical x_eff for HYBRID (previously HYBRID used raw ``cache_load_factor`` as x_eff -- inconsistent with the empirical data that shows HYBRID @ 0.9 is much slower than HYBRID @ 1.0 due to the storage backend switch). Drop the now-unused ``TZREC_CACHING_HIT_RATE_MULTIPLIER`` env knob and its related tests; rewrite ``EffectiveCacheRatioTest`` to assert the empirical strict-block ranking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The same provenance is already covered in ``_dynamicemb_effective_cache_ratio``'s docstring below; the block comment above the constants was redundant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-13T08:43:36Z

+        result = _orig_build_shard_perf_contexts(
+            cls,
+            config,
+            shard_sizes,
+            sharding_option,
+            topology,
+            constraints,
+            sharder,
+            *args,
+            **kwargs,
+        )
+        sharding_option.cache_params = original_cache_params
+        return result


Restore should be in finally. If _orig_build_shard_perf_contexts raises, sharding_option.cache_params is permanently left as the boosted x_eff clone. Downstream consumers — the storage estimator and _to_sharding_plan at line 435 (cache_params=sharding_option.cache_params) — then see the wrong load factor on every other option that touches the same ShardingOption. Suggested:

Suggested change

result = _orig_build_shard_perf_contexts(

cls,

config,

shard_sizes,

sharding_option,

topology,

constraints,

sharder,

*args,

**kwargs,

)

sharding_option.cache_params = original_cache_params

return result

try:

result = _orig_build_shard_perf_contexts(

cls,

config,

shard_sizes,

sharding_option,

topology,

constraints,

sharder,

*args,

**kwargs,

)

finally:

sharding_option.cache_params = original_cache_params

return result

github-actions · 2026-05-13T08:43:39Z

+    for caching_mode in (False, True):
+        for load_factor in load_factors:
+            opt = copy.deepcopy(base_option)
+            opt.cache_params = CacheParams(load_factor=load_factor, stats=stats)
+            # pyre-ignore [16]
+            opt.dynamicemb_options = copy.deepcopy(dynamicemb_options)
+            opt.dynamicemb_options.caching = caching_mode
+            variants.append(opt)


Redundant double deep-copy of dynamicemb_options. The caller (line 1035) attaches dynamicemb_options to base_option before calling this function, so copy.deepcopy(base_option) on line 848 already deep-copies dynamicemb_options into opt. Line 851 then deep-copies the original again and replaces it. With 20 variants per table this doubles a non-trivial copy. Either drop line 851 and just set opt.dynamicemb_options.caching = caching_mode, or stop attaching dynamicemb_options on the caller side before this call.

github-actions · 2026-05-13T08:43:41Z

+    Given :math:`M` tables each with up to :math:`N` ShardingOptions, pick one
+    option per table to minimize total perf while respecting both per-rank HBM
+    and per-rank DDR budgets from the topology.



Docstring says "per-rank DDR" but the implementation enforces per-machine DDR (lines 349–361 sum DDR across local_world_size co-located ranks and the per-option prune at line 391 compares against max_machine_ddr). The inline comment at 350–352 has the right framing — the class docstring should match: "per-rank HBM and per-machine DDR budgets."

github-actions · 2026-05-13T08:43:48Z

+    if x >= 1.0:
+        return 1.0
+    base = _DYNAMICEMB_CACHING_X_EFF_BASE if caching else _DYNAMICEMB_HYBRID_X_EFF_BASE
+    return base + _DYNAMICEMB_X_EFF_TIEBREAK * x


Sharp discontinuity at x=1.0. HYBRID@x=0.99 → 0.1199; HYBRID@x=1.00 → 1.0 — an ~8× jump for a 1% ratio change. The DP will reliably prefer x=1.0 over x=0.99 by a huge perf margin, then prefer CACHING@x=0.5 (0.285) over HYBRID@x=0.9 (0.119) even on workloads where HYBRID@0.9 is plainly faster in reality. If the empirical sweep really shows a step at x=1.0 because the runtime drops the host tier, please call that out as "x=1.0 = HBM-only kernel, not the same algorithm as x=0.99" — and consider whether the enumerator should emit a discrete mode={HBM_ONLY, HYBRID, CACHING} axis rather than packing the discontinuity into the same x knob.

github-actions · 2026-05-13T08:43:53Z

+        for table_i in range(1, table_count):
+            prev_dp = dp[table_i - 1]
+            cur_dp = dp[table_i]
+            cur_back = backtrack[table_i]
+            hbm_t = hbm_by_fqn[table_i]
+            ddr_t = ddr_by_fqn[table_i]
+            perf_t = perf_by_fqn[table_i]
+            for h in range(hbm_bins):
+                prev_dp_h = prev_dp[h]
+                for d in range(ddr_bins):
+                    prev_state = prev_dp_h[d]
+                    prev_perf = prev_state[0]
+                    if prev_perf == _INF:
+                        continue
+                    prev_h_val = prev_state[1]
+                    prev_d_val = prev_state[2]
+                    for opt_j in range(option_count):
+                        delta_perf = perf_t[opt_j]
+                        if delta_perf == _INF:
+                            continue
+                        new_h = prev_h_val + hbm_t[opt_j]
+                        new_d = prev_d_val + ddr_t[opt_j]
+                        if new_h >= hbm_bins or new_d >= ddr_bins:
+                            continue
+                        new_h_i = int(new_h)
+                        new_d_i = int(new_d)
+                        new_perf = prev_perf + delta_perf
+                        if cur_dp[new_h_i][new_d_i][0] > new_perf:
+                            cur_dp[new_h_i][new_d_i] = (new_perf, new_h, new_d)
+                            cur_back[new_h_i][new_d_i] = (opt_j, h, d)


2D DP cost scales with world size — consider clamping total bins. hbm_bins = 100 · num_devices and ddr_bins = 50 · num_devices. On a 16-GPU world the inner triple-loop is 1600 × 800 × option_count pure-Python iterations per table; with 100 dynamicemb tables (20 options each) that's a few-billion-op DP, and dp + backtrack allocate ~256M tuples. Even on smaller worlds the bin counts multiply faster than necessary — a per-device discretization isn't really needed since the budget is a single scalar per axis. Consider clamping totals (e.g. hbm_bins = min(hbm_bins_per_device · num_devices, 2_000)) or vectorizing the transition with NumPy. The defaults are fine for 2–4 GPUs but worth a guardrail before the next user hits this.

github-actions · 2026-05-13T08:43:57Z

+class DynamicProgrammingProposerTest(unittest.TestCase):
+    """2D DP across HBM × DDR picks per-table mode under joint budgets."""
+
+    def _run(self, search_space, topology):
+        proposer = DynamicProgrammingProposer(
+            hbm_bins_per_device=20, ddr_bins_per_device=20
+        )
+        proposer.load(search_space)
+        # First propose returns the lowest-mem-per-table seed.
+        proposer.propose()
+        proposer.feedback(partitionable=True, storage_constraint=topology)
+        proposals = []
+        proposal = proposer.propose()
+        while proposal:
+            proposals.append(proposal)
+            proposer.feedback(partitionable=True, storage_constraint=topology)
+            proposal = proposer.propose()
+        return proposals
+
+    def test_caching_preferred_when_ddr_is_generous(self):
+        # Three options for one table:
+        #   HYBRID @ x=1.0: hbm = T,   ddr = 0,   perf = high  (HBM-only)
+        #   HYBRID @ x=0.1: hbm = .1T, ddr = .9T, perf = high  (slow)
+        #   CACHING @ x=0.1: hbm = .1T, ddr = T,  perf = low   (fast — modeled hits)
+        opts = [
+            _FakeShardingOption("table_a", hbm=1000, ddr=0, perf=50.0),
+            _FakeShardingOption("table_a", hbm=100, ddr=900, perf=80.0),
+            _FakeShardingOption("table_a", hbm=100, ddr=1000, perf=10.0),
+        ]
+        topology = _make_topology(
+            num_devices=2, hbm_per_device=2000, ddr_per_device=2000
+        )
+        proposals = self._run(opts, topology)
+        # Best plan must be the CACHING option (perf=10).
+        best = min(proposals, key=lambda p: sum(o.total_perf for o in p))
+        self.assertEqual(best[0].total_perf, 10.0)
+
+    def test_caching_rejected_when_ddr_is_tight(self):
+        # Host budget is only 950 — CACHING (ddr=1000) cannot fit; HYBRID can.
+        opts = [
+            _FakeShardingOption("table_a", hbm=100, ddr=900, perf=80.0),
+            _FakeShardingOption("table_a", hbm=100, ddr=1000, perf=10.0),
+        ]
+        topology = _make_topology(
+            num_devices=1, hbm_per_device=2000, ddr_per_device=950
+        )
+        proposals = self._run(opts, topology)
+        # Every proposed plan must pick the HYBRID option (perf=80).
+        for p in proposals:
+            self.assertEqual(p[0].total_perf, 80.0)
+
+    def test_high_factor_collapses_modes(self):
+        # At x=1.0 HYBRID == CACHING in HBM and CACHING.ddr = T = HYBRID.hbm.
+        # If we offer just the high-factor options, DP picks one of them.
+        opts = [
+            _FakeShardingOption("table_a", hbm=1000, ddr=0, perf=50.0),  # HYBRID x=1.0
+            _FakeShardingOption(
+                "table_a", hbm=1000, ddr=1000, perf=50.0
+            ),  # CACHING x=1.0
+        ]
+        topology = _make_topology(
+            num_devices=1, hbm_per_device=1100, ddr_per_device=2000
+        )
+        proposals = self._run(opts, topology)
+        # Either option is fine — they're tied. Just verify the proposer
+        # returned something feasible.
+        self.assertGreater(len(proposals), 0)
+        for p in proposals:
+            self.assertEqual(p[0].total_perf, 50.0)


Missing the headline multi-table 2D-DP scenario. All three DynamicProgrammingProposerTest cases use a single table, so the cross-table DP transition at plan_util.py:434–463 is exercised only at table_i == 0. The PR's value prop is "pick CACHING for one table and HYBRID for another under joint HBM+DDR budgets" — a 2-table fixture where global DDR can only hold one CACHING but plenty of HBM is available would directly cover that. Same gap for the mem_bins_per_device legacy alias (untested) and the all-options-pruned / empty-search-space degenerate paths.

github-actions · 2026-05-13T08:44:00Z

+            only_values=only_values,
+            bucket_capacity=self.BUCKET_CAPACITY,
+            caching=caching,
+        )


Direct unit test for dynamicemb_calculate_shard_storages and _dynamicemb_aware_build_shard_perf_contexts is missing. Both have non-trivial branches — admission_counter HBM accounting (dynamicemb_util.py:663–672), the compute_device == "cpu" and not is_inference DDR branch (line 697), and the cache_params swap+restore (line 502–514) — that are reachable only via the E2E test, which is gated on has_dynamicemb and cuda. On a CPU dev box none of this is covered; a fake-ed _orig_build_shard_perf_contexts would let you verify the boost is applied and (more importantly) that the original cache_params is restored even when the wrapped call raises.

github-actions · 2026-05-13T08:44:22Z

Review summary

Solid PR overall — the 2D DP rewrite is clean, the mode-aware storage formula is well-tested, and the empirical perf-model docstring is refreshingly honest about its provenance. A few things worth addressing before merge (inline above):

Should fix

dynamicemb_util.py:515-527 — missing try/finally around the cache_params swap; an exception in the wrapped estimator permanently corrupts the option's cache load factor.
dynamicemb_util.py:56-89 — the x_eff step from 0.12 → 1.0 at x=1.00 means HYBRID@0.99 and CACHING@0.5 both look much worse than HYBRID@1.0 by construction. If x=1.0 is genuinely a different code path (HBM-only kernel), modeling it as a separate mode axis is cleaner than a discontinuity in the same knob.
plan_util.py:846-853 — redundant double-deepcopy of dynamicemb_options (20×/table).

Worth a follow-up

plan_util.py:434-463 — 2D DP state blows up on multi-node worlds (bins scale with world_size); consider an absolute cap on total bins, or vectorize with NumPy.
Test gap on the multi-table DP case where one table wants CACHING and another HYBRID under a shared DDR budget — that's the headline scenario and currently isn't covered by a unit test. Same for dynamicemb_calculate_shard_storages and _dynamicemb_aware_build_shard_perf_contexts (only reachable via the CUDA-gated E2E test) and the mem_bins_per_device legacy alias.

Docs

DynamicProgrammingProposer docstring says "per-rank DDR" but the implementation budgets DDR per-machine — worth aligning the wording with the inline comment on lines 350-352.
dynamicemb_calculate_shard_storages Args block (not in diff, around line 620): the caching_ratio description still reads "ratio of HBM to DDR memory for UVM caching" — under the new CACHING semantics that's stale; host always holds the full backing store and caching_ratio only sizes the HBM hot-row cache.
PR description mentions TZREC_CACHING_HIT_RATE_MULTIPLIER and x_eff = min(1.0, x · multiplier); the code uses fitted constants instead (no env var). Worth updating the description so it doesn't mislead future readers.

No security/multi-process concerns: the monkey-patches run under the import lock and each rank is its own process.

`copy.deepcopy(base_option)` on the first line of each variant body already copies `dynamicemb_options` because the caller in `enumerate()` attaches it onto `base_option` before invoking `_emit_dynamicemb_variants`. The second `copy.deepcopy(dynamicemb_options)` was wasted work — at 20 variants per dynamicemb table it doubled a non-trivial copy. Drop it and mutate the already-fresh per-variant `dynamicemb_options.caching` directly. PR alibaba#508 review R3 (github-actions, 2026-05-13). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… blowup The 2D DP discretizes each memory axis into `bins_per_device * num_devices` bins. On a 16-GPU world that becomes 1600 hbm bins × 800 ddr bins = 1.28M cells per table; with 100 dynamicemb tables and 20 options each, the inner Python loop is a few-billion-op DP and dp + backtrack tuples consume hundreds of MB. The HBM and DDR budgets are scalar, so multiplying bins by num_devices stops adding meaningful precision past a point. Clamp the per-axis bin count at `_DP_AXIS_BIN_CAP` (1024) to keep the multi-host case tractable. NumPy vectorization of the inner DP loop is a bigger refactor; leaving a TODO referencing PR alibaba#508 review R5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ation * DynamicProgrammingProposer class docstring: "per-rank DDR" -> "per-rank HBM and per-machine DDR" so it matches the machine-DDR prune introduced in commit 38bd7ad (PR alibaba#508 review R4). * _dynamicemb_effective_cache_ratio: call out that the discontinuity at x=1.0 is a deliberate kernel switch (HBM-only DynamicEmbStorage vs dual-tier HybridStorage/DynamicEmbCache), not a smoothing artifact. Note a future refactor could lift this to a discrete `mode` axis on the enumerator (PR alibaba#508 review R2). * dynamicemb_calculate_shard_storages caching_ratio Args: rewrite the stale "ratio of HBM to DDR memory for UVM caching" framing -- under the dynamicemb modes, host always holds the full backing in CACHING, and caching_ratio sizes the HBM hot-row cache (top-level docs note). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ch space Add three DynamicProgrammingProposerTest methods that the original PR alibaba#508 review (R6) flagged as missing: test_two_tables_pick_mixed_modes_under_joint_budget Two tables under a topology where both-HYBRID is HBM-infeasible and both-CACHING is DDR-infeasible. The DP must pick one of each. This exercises the cross-table transition at plan_util.py table_i == 1 which the existing single-table cases never reach. test_legacy_mem_bins_per_device_alias Verifies the back-compat kwarg `mem_bins_per_device` still wins over the new `hbm_bins_per_device` when both are passed. test_empty_search_space_returns_empty_proposal Edge case where `load([])` short-circuits via `table_count == 0`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…apper tests Combine PR alibaba#508 review R1 + R7. The new ``BuildShardPerfContextsWrapperTest.test_cache_params_restored_on_exception`` exists to pin the bug R1 describes: when ``_orig_build_shard_perf_contexts`` raises, the boosted ``x_eff`` clone of ``sharding_option.cache_params`` must not leak to the downstream storage estimator's view of the same option. The corresponding ``try/finally`` around the wrapped call closes the leak. Reverts the previous "single return, no try/finally" simplification. Also adds: * ``BuildShardPerfContextsWrapperTest`` — direct unit tests for the ShardPerfContext.build_shard_perf_contexts monkey-patch covering: boost applied for caching, boost applied for hybrid, no boost when the option has no dynamicemb_options, cache_params restored on success, and the restore-on-exception case driving R1. * ``DynamicEmbCalcShardStoragesTest.test_admission_counter_increases_hbm`` — direct cover for the admission_counter HBM accounting branch in ``dynamicemb_calculate_shard_storages`` that the CUDA-gated E2E test doesn't exercise on a CPU dev box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…aching-planner

After the DP proposer commits a (mode, cache_load_factor) per dynamicemb table, log one INFO line on rank 0 that names the runtime mode the dispatcher in BatchedDynamicEmbeddingTablesV2._create_cache_storage will actually pick: total_value_memory <= local_hbm_for_values -> HBM_ONLY else if caching -> CACHING else -> HYBRID Example: [dynamicemb plan] sparse.ebc.user_emb: mode=CACHING cache_load_factor=0.30 local_hbm_for_values=2.400GiB [dynamicemb plan] sparse.ebc.item_emb: mode=HBM_ONLY cache_load_factor=1.00 local_hbm_for_values=4.000GiB Replaces the previous "have to grep _print_memory_consume from the runtime to find out what mode the planner picked" workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Upstream ``_print_memory_consume`` is mode-blind and always computes ``dram = total - hbm``, so for CACHING tables it under-reports the host backing (the actual storage_options keep ``max_capacity`` unchanged, i.e. host holds the full table). Surface the correct expectation in our own log so users can verify CACHING vs HYBRID storage from the planner side: HBM_ONLY: dram = 0 CACHING: dram = total_value_memory # full backing HYBRID: dram = total - local_hbm_for_values # partition Example for cat_0_emb @ 0.20 with Adam: mode=HYBRID local_hbm=0.954GiB local_dram=3.815GiB (= 0.8 * total) mode=CACHING local_hbm=0.954GiB local_dram=4.769GiB (= total) If the runtime memory print shows DRAM = 3.815GiB for a table logged as CACHING here, that is the print's mode-blind formula -- the real host allocation is the log's local_dram_for_values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…omputing The storage estimator (``dynamicemb_calculate_shard_storages`` -> ``_calculate_dynamicemb_storage_specific_sizes`` -> ``_calculate_dynamicemb_table_storage_specific_size``) already populates ``shards[i].storage.ddr`` per option for the chosen mode: HYBRID: ddr_specific = (1 - cache_ratio) * total_value_memory CACHING: ddr_specific = total_value_memory For cuda compute_device, ``Storage.ddr = ddr_specific`` (no I/O overhead added). So the value the planner already computed equals what the log line needs -- no point re-running ``_calculate_dynamicemb_table_storage_specific_size``. Read ``shards[0].storage.ddr`` directly. Identical numbers, less code, no second call to the storage formula. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three small follow-ups on the dynamicemb plan log: * Rename keys: ``local_hbm_for_values`` -> ``local_hbm`` and ``local_dram_for_values`` -> ``local_dram``. Shorter, still unambiguous in context. * Switch HBM source to ``shards[0].storage.hbm`` for symmetry with the DDR side and to surface the true HBM footprint (values + metadata + counter + pipeline I/O), not just the runtime config knob. Upstream ``_print_memory_consume`` strips metadata from its HBM column too; this log reports the honest total. * Bump tzrec to 1.2.13. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-22T03:13:21Z

+                    if cache_load_factor >= 1.0:
+                        mode = "HBM_ONLY"
+                        dram_bytes = 0  # runtime drops the host tier
+                    else:
+                        mode = "CACHING" if dynamicemb_options.caching else "HYBRID"
+                        dram_bytes = shards[0].storage.ddr


Storage estimator ↔ runtime ↔ log are not coherent at cache_load_factor >= 1.0.

The log overrides dram_bytes = 0 here because the runtime drops the host tier when the table fits HBM. But the storage estimator at _calculate_dynamicemb_table_storage_specific_size (line ~340) still returns total_value_memory DDR for CACHING mode regardless of cache_ratio — i.e. CACHING@1.0 reports a full backing-store DDR cost that the 2D DP then uses for pruning. Result: the DP can mark CACHING@1.0 plans infeasible on DDR-tight topologies even though they would actually fit at runtime, and the log will then report local_dram=0.000GiB for any plan that does make it through.

Suggest aligning the storage formula with the runtime: when cache_ratio >= 1.0, force CACHING's DDR to 0 too. Then the storage estimator, the DP, and this log all agree.

github-actions · 2026-05-22T03:13:26Z

+        # back-compat alias: ``mem_bins_per_device`` mapped to the HBM axis.
+        bins_h = (
+            mem_bins_per_device
+            if mem_bins_per_device is not None
+            else hbm_bins_per_device
+        )
+        self._hbm_bins_per_device: int = max(bins_h, 1)


When the caller passes both mem_bins_per_device=X and hbm_bins_per_device=Y, the legacy alias silently wins — test_legacy_mem_bins_per_device_alias even codifies this. That's the surprising direction; a user supplying the new name explicitly while a config layer injects the legacy one will get the legacy value with no warning. The class docstring also reads as a one-way alias ("Accepts legacy mem_bins_per_device as an alias").

Suggest either raising TypeError when both are non-None, or letting hbm_bins_per_device win and emitting a DeprecationWarning for mem_bins_per_device. Whatever the choice, document it in the Args: block.

github-actions · 2026-05-22T03:13:33Z

+        # Enumerate proposals in decreasing total memory order. Total memory
+        # is a tie-break heuristic — Pareto-optimal (perf, hbm, ddr) frontier
+        # tends to live at the high end, so larger-mem cells are explored
+        # first and small-mem fallbacks come later.
+        self._proposal_list = []
+        last_back = backtrack[table_count - 1]
+        coords = sorted(
+            (
+                (h, d)
+                for h in range(hbm_bins)
+                for d in range(ddr_bins)
+                if last_back[h][d][0] >= 0
+            ),
+            key=lambda hd: hd[0] + hd[1],
+            reverse=True,
+        )


The class docstring (line 255) says backtracking "yields Pareto-optimal proposals," but this loop actually enumerates every reachable (h, d) cell at the last table — up to hbm_bins × ddr_bins ≈ 800k+ proposals at the cap. Downstream consumers iterate propose()/feedback() once per proposal, so a planner run on a large world can churn through hundreds of thousands of plans rather than a small Pareto frontier.

Two minor things compound this:

The sort key hd[0] + hd[1] weighs the two axes equally even though hbm_bin_size and ddr_bin_size differ — so "decreasing total memory" isn't really sorted by memory.

last_back[h][d][0] >= 0 is the only filter; nothing prunes dominated cells.

Either tighten to the Pareto frontier (drop (h, d) if some (h', d') with h' ≤ h, d' ≤ d, perf' ≤ perf exists), cap len(self._proposal_list), or update the docstring to say "all reachable plans, larger memory first."

github-actions · 2026-05-22T03:13:39Z

+    def _dynamicemb_aware_build_shard_perf_contexts(
+        cls,  # pyre-ignore [2]
+        config,  # pyre-ignore [2]
+        shard_sizes,  # pyre-ignore [2]
+        sharding_option,  # pyre-ignore [2]
+        topology,  # pyre-ignore [2]
+        constraints,  # pyre-ignore [2]
+        sharder,  # pyre-ignore [2]


The wrapper hard-codes the first six positional parameters (config, shard_sizes, sharding_option, topology, constraints, sharder) and forwards everything else via *args, **kwargs. If torchrec renames sharding_option, reorders these six, or splits one into multiple, the wrapper either silently mis-dispatches (the boost goes into the wrong object) or raises an opaque TypeError deep inside the planner with no link back to this file.

Worth capturing inspect.signature(_orig_build_shard_perf_contexts) at patch time, asserting the first six parameter names match, and otherwise logging a warning + installing a pass-through. A pinned/asserted torchrec version range somewhere visible would also help.

github-actions · 2026-05-22T03:13:42Z

+# bins/dev = 1600 bins, * 800 ddr bins = 1.28M cells per table). The
+# budgets are scalar (total HBM, total DDR), so multiplying bins by
+# num_devices stops adding precision past this cap. NumPy vectorization
+# of the inner loop is a follow-up (PR #508 review R5).


(PR #508 review R5) looks like an in-flight review artifact — PR numbers and ad-hoc reviewer ticket IDs aren't useful once merged (git blame is the canonical pointer). Suggest dropping the parenthetical or replacing with an actual follow-up issue number.

github-actions · 2026-05-22T03:13:45Z

+def _emit_dynamicemb_variants(
+    base_option: ShardingOption,
+    dynamicemb_options: Any,
+) -> List[ShardingOption]:
+    """Expand a dynamicemb ShardingOption into HYBRID + CACHING variants.
+
+    Sweeps both placement modes (``caching=False`` and ``caching=True``) and,
+    when ``base_option.cache_params`` is unset, ten cache_load_factor values
+    (0.1, 0.2, ..., 1.0). The downstream 2D DP proposer picks per table the
+    best (mode, ratio) that fits both HBM and host topology budgets.
+
+    Each returned ShardingOption owns a freshly deep-copied
+    ``dynamicemb_options`` instance so per-option ``caching`` mutations do not
+    bleed across variants.
+    """
+    if base_option.cache_params is None:
+        load_factors = [(i + 1) / 10 for i in range(10)]
+        stats = None
+    else:
+        load_factors = [base_option.cache_params.load_factor]
+        stats = base_option.cache_params.stats
+    variants: List[ShardingOption] = []
+    for caching_mode in (False, True):
+        for load_factor in load_factors:
+            opt = copy.deepcopy(base_option)
+            opt.cache_params = CacheParams(load_factor=load_factor, stats=stats)
+            # deepcopy(base_option) already produced a fresh dynamicemb_options.
+            opt.dynamicemb_options.caching = caching_mode  # pyre-ignore [16]
+            variants.append(opt)


The dynamicemb_options parameter is never read — the function relies entirely on base_option.dynamicemb_options via copy.deepcopy(base_option) and opt.dynamicemb_options.caching = .... The caller also already assigns sharding_option.dynamicemb_options = dynamicemb_options immediately before invocation, so the argument is doubly redundant.

Either drop the parameter, or use it as the canonical source so callers don't have to mutate base_option first.

github-actions · 2026-05-22T03:13:50Z

+        for device in storage_constraint.devices:
+            max_device_hbm = max(max_device_hbm, device.storage.hbm or 0)
+            hbm_total += device.storage.hbm or 0
+            ddr_total += device.storage.ddr or 0
+        # DDR is host-shared across ranks co-located on one machine, so the
+        # per-option fit check compares against the largest machine's DDR pool
+        # -- not per-device. HBM is GPU-local, so its prune stays per-device.
+        per_host = getattr(storage_constraint, "local_world_size", None) or num_devices
+        per_host = max(per_host, 1)
+        max_machine_ddr = 0
+        for host_start in range(0, num_devices, per_host):


This per-machine DDR pruning is the load-bearing reason for the 2D rewrite, but it's not exercised in tests — _make_topology in plan_util_test.py accepts local_world_size but every test falls through to the default (local_world_size or num_devices), so per_host == num_devices in every case. A bug where max_machine_ddr collapses to ddr_total or the window stride is off-by-one wouldn't be caught.

Worth a multi-host test: e.g. num_devices=4, local_world_size=2, ddr_per_device=500 (so each machine has 1000 DDR, total 2000), then assert an option whose shard.ddr=1500 is pruned even though 1500 < ddr_total.

github-actions · 2026-05-22T03:13:56Z

+                # Per-table mode log on rank 0. cache_load_factor=1.0 forces
+                # the runtime into HBM_ONLY (host tier dropped) regardless of
+                # ``caching`` -- mirror that override here so the log matches
+                # the runtime, not the planner's recorded (caching, factor).
+                if int(os.environ.get("RANK", 0)) == 0:
+                    cache_load_factor = float(sharding_option.cache_load_factor)
+                    if cache_load_factor >= 1.0:
+                        mode = "HBM_ONLY"
+                        dram_bytes = 0  # runtime drops the host tier
+                    else:
+                        mode = "CACHING" if dynamicemb_options.caching else "HYBRID"
+                        dram_bytes = shards[0].storage.ddr
+                    hbm_gib = shards[0].storage.hbm / (1 << 30)
+                    dram_gib = dram_bytes / (1 << 30)
+                    fqn = f"{sharding_option.path}.{sharding_option.name}"
+                    logger.info(
+                        f"[dynamicemb plan] {fqn}: mode={mode} "
+                        f"cache_load_factor={cache_load_factor:.2f} "
+                        f"local_hbm={hbm_gib:.3f}GiB "
+                        f"local_dram={dram_gib:.3f}GiB"
+                    )


This ~25-line log block has zero coverage in dynamicemb_util_test.py / plan_util_test.py. Three independent decision points worth a test each:

cache_load_factor >= 1.0 HBM_ONLY override (float-precision regressions would silently mis-label).

int(os.environ.get("RANK", 0)) == 0 gating (a regression to os.environ.get("RANK") == "0" would fail on the typical single-rank case where RANK is unset).

1 << 30 GiB divisor (a regression to 1e9 would not be caught).

A mock.patch.object(logger, "info") test driving a synthetic ShardingOption through _to_sharding_plan would cover all three cheaply.

github-actions · 2026-05-22T03:14:10Z

Code review summary (5 reviewer agents)

Substantial, well-tested planner change. The 2D DP refactor is clean, the empirical-fit docstring on _dynamicemb_effective_cache_ratio is unusually informative, and the new try/finally-restored monkey-patch on ShardPerfContext is properly tested for the exception path. Inline comments below capture the items worth a look before merge — they cluster into:

Correctness / coherence

Storage estimator vs. runtime vs. planning log disagree at cache_load_factor >= 1.0 (CACHING DDR reported as full backing in the DP but as 0 in the log).
DP "emits Pareto-optimal proposals" docstring overstates what the code does (it emits every reachable (h, d) cell, up to ~800k at the cap).

Robustness

ShardPerfContext.build_shard_perf_contexts monkey-patch hardcodes a 6-positional signature with no version assert.
mem_bins_per_device legacy kwarg silently wins over the new hbm_bins_per_device when both are passed — and the test codifies it. Worth a DeprecationWarning or TypeError.

Test gaps (also flagged inline where relevant)

Per-machine DDR pruning via local_world_size is the motivation for the 2D rewrite but no test ever uses local_world_size != num_devices.
The new per-table [dynamicemb plan] ... log block (incl. HBM_ONLY override + rank-0 gating + GiB formatting) is not covered.
_DP_AXIS_BIN_CAP is never exercised (all DP tests use num_devices ≤ 2 with bins=20).

Minor

(PR #508 review R5) looks like a stray reviewer-ticket reference in a shipping comment.
_emit_dynamicemb_variants(base_option, dynamicemb_options) has a dead second parameter.

Performance note (no inline)

The pure-Python 4-level inner DP loop is acceptable for a one-shot startup call at the stated 30-table/16-GPU workload (~25-70 s; nested-list dp + backtrack peaks at a few GB), but lives within ~2× of unacceptable on heavier worlds. The (PR #508 review R5) TODO is the right follow-up — switching dp / backtrack to NumPy arrays drops both time and memory by ~10×.

🤖 Generated with Claude Code

Four cleanups to ``DynamicProgrammingProposer`` from PR alibaba#508 review round 2: * **Proposal emission (R3):** the previous backtracking enumerated every reachable ``(h, d)`` cell at the last table -- up to ``hbm_bins × ddr_bins`` ≈ 200k+ entries. For each HBM bin only the lowest-perf DDR cell matters (other cells at the same ``h`` are strictly dominated -- same HBM cost, ≥ perf). Emit one proposal per HBM bin in decreasing HBM order; proposal count caps at ``hbm_bins ≤ _DP_AXIS_BIN_CAP``. * **Drop legacy alias (R2):** delete ``mem_bins_per_device`` kwarg from ``__init__``. No in-tree caller passes it; carrying it with a silent precedence rule was a landmine. * **Drop unused parameter (R6):** ``_emit_dynamicemb_variants`` second parameter was never read; the function uses ``copy.deepcopy(base_option).dynamicemb_options`` instead. Drop the parameter and update the single caller. * **Trim comment (R5):** drop the trailing two lines of the ``_DP_AXIS_BIN_CAP`` block (the "stops adding precision" repeat and the dangling PR-ID parenthetical). Also update class docstring to describe the new "one per HBM bin" emission (was "Pareto-optimal proposals", which was inaccurate). Drop the now-obsolete ``test_legacy_mem_bins_per_device_alias``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The enumerator bounds ``cache_load_factor`` to ``{0.1, 0.2, ..., 1.0}``, so the HBM_ONLY-override boundary is the exact value 1.0. Using ``>= 1.0`` would silently relabel any leaked-in ``cache_load_factor > 1.0`` (e.g. from a misconfigured DynamicEmbedding proto) as HBM_ONLY instead of letting the upstream invariant violation surface loudly downstream (a negative HYBRID DDR via ``1.0 - 1.5``). PR alibaba#508 review N1-trimmed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* **N7 -- multi-host DDR prune.** All prior ``DynamicProgrammingProposerTest`` cases used ``local_world_size == num_devices``, so ``per_host == num_devices`` collapsed the per-machine window to the whole world; the load-bearing multi-host pruning was effectively untested. ``test_per_machine_ddr_prune_on_multi_host_topology`` exercises 4 GPUs across 2 machines (each with 1000 MB DDR, total 2000): asserts an option with ``ddr=1500`` per shard is pruned (exceeds per-machine cap) and an option with ``ddr=900`` survives. * **N8 -- plan log coverage.** Extract the per-table log block into ``_log_dynamicemb_table_plan`` so the three decision points (``cache_load_factor == 1.0`` HBM_ONLY override, RANK env gating, GiB divisor) are testable without reconstructing a full ShardingOption pipeline. New ``PlanLogLineTest`` covers: HYBRID/CACHING at partial factor, HBM_ONLY at x=1.0 in both modes (override path), non-rank-0 silence, and a ``cache_load_factor=1.0001`` regression case that asserts the exact-equality check does NOT swallow a >1.0 escape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ocstring * Drop four test classes from ``dynamicemb_util_test.py``: ``EffectiveCacheRatioTest``, ``BuildShardPerfContextsWrapperTest``, ``DynamicEmbCalcShardStoragesTest``, ``PlanLogLineTest``. Coverage of the perf model, the build_shard_perf_contexts monkey-patch, the storage estimator, and the per-table log line is provided end-to-end by ``PlanUtilDynamicEmbE2ETest`` in ``plan_util_test.py``. Only ``StorageFormulaTest`` (the pure HBM/DDR byte arithmetic) is worth keeping as a unit test -- the rest is e2e territory. * Rewrite ``DynamicProgrammingProposer`` class docstring to match master's style: open with a one-line summary, frame the problem as matrices ``A`` / ``B`` with discretized state ``dp[i][f(k)]``, give the state-transition equation in LaTeX, and end with an ``Args`` block typed in parentheses. Extends master's 1D formulation to the 2D (HBM × DDR) case with cost matrices ``A^h``, ``A^d`` and budgets ``K_h``, ``K_d``, and notes the per-HBM-bin proposal emission. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tiankongdeguiji and others added 10 commits May 12, 2026 20:28

[fix] drop duplicate empirical-fit comment above the constants

5280156

The same provenance is already covered in ``_dynamicemb_effective_cache_ratio``'s docstring below; the block comment above the constants was redundant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tiankongdeguiji added the claude-review Let Claude Review label May 13, 2026

github-actions Bot removed the claude-review Let Claude Review label May 13, 2026

github-actions Bot reviewed May 13, 2026

View reviewed changes

tiankongdeguiji and others added 10 commits May 13, 2026 17:02

Merge remote-tracking branch 'origin/master' into dynamicemb-hybrid-c…

3b7c1bf

…aching-planner

tiankongdeguiji added the claude-review Let Claude Review label May 22, 2026

github-actions Bot removed the claude-review Let Claude Review label May 22, 2026

github-actions Bot reviewed May 22, 2026

View reviewed changes

tiankongdeguiji and others added 4 commits May 22, 2026 14:46

Conversation

tiankongdeguiji commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 13, 2026

Review summary

Uh oh!

github-actions Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 22, 2026

Code review summary (5 reviewer agents)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tiankongdeguiji commented May 12, 2026 •

edited

Loading