[feat] SID: add ResidualQuantizer / BaseSidModel base classes by WhiteSwan1 · Pull Request #538 · alibaba/TorchEasyRec

WhiteSwan1 · 2026-06-02T09:26:17Z

First of three PRs splitting the Semantic-ID models onto a shared base. This one is purely additive — only the backend-agnostic foundation, with no concrete quantizer or model and no edits to existing files. RQ-KMeans follows in PR2, RQ-VAE in PR3.

What this adds:

ResidualQuantizer (abstract): owns the state every residual quantizer shares — embed_dim, per-layer codebook sizes (normalize_n_embed), residual-normalization flag, and the layer list — and the shared residual walk. _residual_pass drives the concrete get_codes / decode_codes / output_dim. Subclasses implement just the two per-layer primitives — _quantize_layer (encode) and _lookup_code (decode) — plus forward and get_codebook_embeddings.
BaseSidModel (abstract): the shared SID model scaffold — embedding-feature extraction, loss/metric initialization, and shared config parsing — that SidRqkmeans and SidRqvae will subclass.
types.py: QuantizeForwardMode, QuantizeOutput, ResidualQuantizerOutput.

Tests: normalize_n_embed, the abstract-base contract (shared state present, backend primitives raise).

No proto changes and no edits to existing modules — the concrete SidRqkmeans / SidRqvae messages, and the BaseModel.on_train_end lifecycle hook, arrive with their models in PR2 / PR3.

github-actions · 2026-06-03T06:51:17Z

Review summary

Clean, well-scoped additive PR — the abstract ResidualQuantizer / BaseSidModel split is sensible, docstrings are thorough and accurate, and update_train_metric correctly wires into the real call site (main.py:438). The detach/accumulate semantics of _residual_pass match the docstring. Nothing blocking.

A few items, left inline:

decode_codes hardcodes dtype=torch.float — pins reconstruction to fp32 and silently upcasts under mixed precision; seed the accumulator from the first lookup so dtype/device follow the codebook.
_update_unique_sid_ratio divides by B without an empty-batch guard (ZeroDivisionError).
Test coverage — only the NotImplementedError paths are exercised; the core residual-walk math (get_codes/decode_codes, the detach invariant, the normalize_residuals=True branch) is untested. A tiny fake subclass would cover it.

Minor polish (no need to block):

update_train_metric(predictions: dict, ...) — every sibling/base uses Dict[str, torch.Tensor].
# noqa: ANN201 on the abstract forward is a no-op (ANN is not in .ruff.toml lint.select); siblings annotate their forward return.

- decode_codes: seed the accumulator from the first lookup so device AND dtype follow the codebook, instead of pinning to fp32 (silently upcasting each layer's add under mixed precision). n_layers >= 1 is guaranteed. - BaseSidModel._update_unique_sid_ratio: guard B == 0 (empty final shard under DDP/TorchRec) to avoid ZeroDivisionError. - residual_quantizer_test: add a fake one-primitive subclass exercising the concrete residual walk the base owns — get_codes shape + aggregate == Σ quantized_i, the detach invariant (codebook grad flows, input gets none), the normalize_residuals branch, and decode_codes sum + codebook dtype. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-03T09:08:35Z

Code review summary

Reviewed across code-quality, performance, tests, docs, and security. This is clean, well-documented foundation code for the SID model split — the abstraction boundary (_quantize_layer/_lookup_code primitives vs. the shared _residual_pass walk) is cleanly drawn, the detach/grad semantics are correct and tested, and the decode_codes dtype/device-aware seeding and B == 0 empty-shard guard show good edge-case awareness. No blocking issues.

A few minor, optional items posted inline:

Tests — test_normalize_residuals_branch only asserts shapes, so it wouldn't catch the normalization branch being removed; and _update_unique_sid_ratio (incl. its empty-shard guard) is pure-tensor logic that's untestable-free and worth a small unit test now.
Polish — predictions: dict → Dict[str, torch.Tensor] for consistency with BaseModel; an optional assert n_layers >= 1 in ResidualQuantizer.__init__ to self-guard the construction boundary for the upcoming PR2/PR3 subclasses.

Docs were verified accurate (the :class: forward-refs to not-yet-existing subclasses don't break the Sphinx build). LGTM as the additive foundation layer.

_{🤖 Generated with Claude Code}

…aba#538 - ResidualQuantizer.__init__: assert n_layers >= 1, making the boundary self-guarding (matches the "n_layers >= 1 is guaranteed" comment in decode_codes; a future n_layers=0 subclass would otherwise index OOB). - BaseSidModel.update_train_metric: type predictions as Dict[str, Tensor] instead of bare dict, matching BaseModel / sibling models. - sid_model_test.py (new): unit-test _update_unique_sid_ratio via __new__ — empty-batch no-op (guard) and the exact 0.75 ratio on a known-duplicate batch. - residual_quantizer_test: make the normalize_residuals test behavioral — toggle the flag on the same input/codebook and assert the assignments differ, so the normalization branch is actually validated (the old shape assertions passed even with the branch deleted). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Metrics: switch the shared reconstruction metric to torchmetrics.MeanSquaredError (correct (preds, target) aggregation vs the biased mean-of-batch-means a MeanMetric gave), and add a UniqueRatio metric class (tzrec/metrics/unique_ratio.py) for codebook coverage instead of a MeanMetric fed a hand-computed ratio. BaseSidModel._update_unique_sid_ratio now delegates to it; the empty-batch guard + DDP reduction live in the metric. The RQ-VAE train-path mse moves to MeanSquaredError too. - __init__.py: revert tzrec/modules/sid_generation/__init__.py to a bare package marker (no re-exports) and import ResidualKMeansQuantizer from its submodule in sid_rqkmeans, per "avoid adding to __init__.py". Tests: tzrec/metrics/unique_ratio_test.py (ratio, empty no-op, mean over batches); sid_model_test verifies the delegation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Metrics: switch the shared reconstruction metric to torchmetrics.MeanSquaredError (correct (preds, target) aggregation vs the biased mean-of-batch-means a MeanMetric gave), and add a UniqueRatio metric class (tzrec/metrics/unique_ratio.py) for codebook coverage instead of a MeanMetric fed a hand-computed ratio. The empty-batch guard + DDP reduction live in the metric; callers update it directly (the one-line _update_unique_sid_ratio passthrough is removed). The RQ-VAE train-path mse moves to MeanSquaredError too. - __init__.py: revert tzrec/modules/sid_generation/__init__.py to a bare package marker (no re-exports) and import ResidualKMeansQuantizer from its submodule in sid_rqkmeans, per "avoid adding to __init__.py". Tests: tzrec/metrics/unique_ratio_test.py (ratio, empty no-op, mean over batches). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-03T09:49:12Z

Review summary

Clean, genuinely additive PR — backend-agnostic SID base classes only, no edits to existing files. The abstraction is well-factored: ResidualQuantizer isolates the shared residual walk from the two backend primitives (_quantize_layer / _lookup_code), and BaseSidModel centralizes config parsing and shared metrics. Docstrings are thorough (the decode_codes dtype/device seeding and the detach invariant are nicely documented), and the _FakeQuantizer-based tests exercise the concrete base logic without pulling in real backends. The _residual_pass / decode_codes paths are AMP-safe with no spurious host-device syncs.

Two inline comments on items worth confirming:

sid_model.py — base registers mse/unique_sid_ratio in init_metric but provides no shared update_metric to populate them; docstring says subclasses "extend update_metric" but there's nothing to extend.
unique_ratio.py — metric is a mean of per-batch ratios (batch-size sensitive), not global codebook coverage; worth confirming the intended semantic.

Minor / non-blocking notes:

init_metric hardcodes the metric set rather than reading proto metrics config like other model families. Fine if SID metrics are intentionally fixed — worth a one-line note so it isn't later mistaken for a bug.
The :class: cross-refs to SidRqvae / SidRqkmeans / ResidualVectorQuantizer / ResidualKMeansQuantizer (not yet in the tree) may emit Sphinx "reference target not found" warnings until PR2/PR3 land — expected for the split, just flagging in case the doc build is strict.
A couple of low-priority test gaps: the n_layers >= 1 assertion and the temperature forwarding path are untested. The absence of a BaseSidModel test is justified (its proto fields don't exist yet); _extract_feature / init_metric could be unit-tested now with a stub if you want coverage before the subclass PR.

Nice work — no blocking issues.

Addresses github-actions review on PR alibaba#538 (no behavior change): - UniqueRatio stays the cheap per-batch metric (mean of distinct-rows / batch-size, two scalar states) — deliberately kept per-batch, not global distinct-SID coverage, to avoid accumulating an O(#distinct-SIDs) set during predict. Docstrings now say so plainly (a batch-size-sensitive diversity proxy, not global codebook coverage) instead of overselling it as "codebook coverage". - BaseSidModel docstring: subclasses extend init_metric (via super) and *implement* update_metric (BaseModel.update_metric raises NotImplementedError, so there is nothing to "extend"); update_train_metric defaults to a no-op. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

First of three PRs splitting the Semantic-ID models onto a shared base. Purely additive — only the backend-agnostic foundation, no concrete quantizer or model and no edits to existing files. RQ-KMeans follows in PR2, RQ-VAE in PR3. What this adds: - ResidualQuantizer (abstract): owns the shared state (embed_dim, per-layer codebook sizes via normalize_n_embed, residual-normalization flag, layer list; asserts n_layers >= 1) and the shared residual walk — _residual_pass drives the concrete get_codes / decode_codes / output_dim. Subclasses implement just _quantize_layer (encode) and _lookup_code (decode), plus forward and get_codebook_embeddings. - BaseSidModel (abstract): the shared SID model scaffold — embedding-feature extraction, loss/metric init (reconstruction MSE via torchmetrics.MeanSquaredError + codebook coverage via UniqueRatio), and shared config parsing — that SidRqkmeans / SidRqvae subclass. - UniqueRatio (tzrec/metrics/unique_ratio.py): codebook-coverage metric (mean per-batch unique-row ratio) with empty-batch guard + DDP reduction. Tests: normalize_n_embed; the abstract-base contract; the concrete residual walk via a fake one-primitive subclass; and the UniqueRatio metric. No proto changes and no edits to existing modules; __init__.py is a bare package marker (no re-exports). The QuantizeForwardMode / QuantizeOutput / ResidualQuantizerOutput types and the concrete SidRqkmeans / SidRqvae models ship with the code that uses them in PR2 / PR3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…bump version to 1.2.16 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…_abstract Brings the reviewed alibaba#539 foundation onto feat/sid_abstract (which already carries alibaba#538 + an older RQ-VAE/RQ-Kmeans port), and syncs to upstream/master (alibaba#540, alibaba#541, which alibaba#539 already contains). Conflict resolutions: - sid_rqkmeans.py(+test), residual_kmeans_quantizer.py, sid_model.py: take alibaba#539's canonical versions (BaseSidModel now hosts both SID models, with mse/rel_loss/unique_sid_ratio and the unified x_hat recon key). - types.py: union — keep alibaba#539's QuantizeOutput, retain feat's QuantizeForwardMode enum + ResidualQuantizerOutput (RQ-VAE needs them). - protos/models/sid_model.proto: union — alibaba#539's typed FaissKmeansConfig + clean SidRqkmeans, re-add feat's SinkhornConfig/ClipConfig/SidRqvae; drop the now-unused struct.proto import. - protos/model.proto: enable `SidRqvae sid_rqvae = 600;` (the field alibaba#539 reserved for this follow-up). - main.py / model.py on_train_end: take alibaba#539's wording; drop feat's forced tail-checkpoint (SID models rely on the final=True tail save). Transitional state: old modules/sid/kmeans.py still coexists with alibaba#539's kmeans_quantize.py, and the RQ-VAE stack is still on the old abstraction — both retired in the follow-up refactor commit. All SID modules import.

WhiteSwan1 changed the title ~~[refactor] SID: add ResidualQuantizer / BaseSidModel base classes + o…~~ [feat] SID: add ResidualQuantizer / BaseSidModel base classes Jun 2, 2026

WhiteSwan1 force-pushed the sid-1-abstract branch 3 times, most recently from be47e44 to cb7e846 Compare June 3, 2026 05:22

WhiteSwan1 added the claude-review Let Claude Review label Jun 3, 2026

github-actions Bot removed the claude-review Let Claude Review label Jun 3, 2026

github-actions Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tzrec/modules/sid_generation/residual_quantizer.py Outdated

github-actions Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tzrec/models/sid_model.py Outdated

github-actions Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tzrec/modules/sid/residual_quantizer_test.py

WhiteSwan1 force-pushed the sid-1-abstract branch from cb7e846 to 2c8ffa4 Compare June 3, 2026 06:59

WhiteSwan1 requested review from eric-gecheng and tiankongdeguiji June 3, 2026 09:00

tiankongdeguiji added the claude-review Let Claude Review label Jun 3, 2026

github-actions Bot removed the claude-review Let Claude Review label Jun 3, 2026

github-actions Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tzrec/modules/sid_generation/residual_quantizer_test.py Outdated

github-actions Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tzrec/models/sid_model.py Outdated

github-actions Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tzrec/models/sid_model.py Outdated

github-actions Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tzrec/modules/sid/residual_quantizer.py

WhiteSwan1 force-pushed the sid-1-abstract branch from 2c8ffa4 to 17cadf3 Compare June 3, 2026 09:20

WhiteSwan1 force-pushed the sid-1-abstract branch from 17cadf3 to 7153d29 Compare June 3, 2026 09:23

tiankongdeguiji requested changes Jun 3, 2026

View reviewed changes

Comment thread tzrec/models/sid_model.py Outdated

Comment thread tzrec/models/sid_model.py Outdated

Comment thread tzrec/modules/sid_generation/types.py Outdated

Comment thread tzrec/modules/sid_generation/__init__.py Outdated

WhiteSwan1 force-pushed the sid-1-abstract branch from 7153d29 to 994c321 Compare June 3, 2026 09:36

WhiteSwan1 force-pushed the sid-1-abstract branch from 994c321 to 4f249ee Compare June 3, 2026 09:42

WhiteSwan1 added the claude-review Let Claude Review label Jun 3, 2026

github-actions Bot removed the claude-review Let Claude Review label Jun 3, 2026

github-actions Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tzrec/models/sid_model.py Outdated

github-actions Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread tzrec/metrics/unique_ratio.py

WhiteSwan1 force-pushed the sid-1-abstract branch from 4f249ee to 03a2a7a Compare June 3, 2026 10:43

WhiteSwan1 requested a review from tiankongdeguiji June 4, 2026 02:05

WhiteSwan1 and others added 3 commits June 5, 2026 03:47

Merge remote-tracking branch 'upstream/master' into sid-1-abstract

76f366b

[refactor] rename tzrec/modules/sid_generation -> tzrec/modules/sid; …

a0aecfe

…bump version to 1.2.16 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

[fix] update import path in residual_quantizer_test after sid rename

55b32cc

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tiankongdeguiji approved these changes Jun 5, 2026

View reviewed changes

tiankongdeguiji merged commit 7dc1c18 into alibaba:master Jun 5, 2026
7 checks passed

WhiteSwan1 mentioned this pull request Jun 5, 2026

[feat] SID: add SidRqkmeans model (FAISS-trained residual K-Means) #539

Merged

WhiteSwan1 mentioned this pull request Jun 15, 2026

[feat] SID: add SidRqvae (RQ-VAE) model — STE/Gumbel-Softmax + CLIP #545

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] SID: add ResidualQuantizer / BaseSidModel base classes#538

[feat] SID: add ResidualQuantizer / BaseSidModel base classes#538
tiankongdeguiji merged 4 commits into
alibaba:masterfrom
WhiteSwan1:sid-1-abstract

WhiteSwan1 commented Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

WhiteSwan1 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Review summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Code review summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Review summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WhiteSwan1 commented Jun 2, 2026 •

edited

Loading