[feat] SID: add ResidualQuantizer / BaseSidModel base classes#538
Conversation
be47e44 to
cb7e846
Compare
Review summaryClean, well-scoped additive PR — the abstract A few items, left inline:
Minor polish (no need to block):
|
cb7e846 to
2c8ffa4
Compare
- decode_codes: seed the accumulator from the first lookup so device AND dtype follow the codebook, instead of pinning to fp32 (silently upcasting each layer's add under mixed precision). n_layers >= 1 is guaranteed. - BaseSidModel._update_unique_sid_ratio: guard B == 0 (empty final shard under DDP/TorchRec) to avoid ZeroDivisionError. - residual_quantizer_test: add a fake one-primitive subclass exercising the concrete residual walk the base owns — get_codes shape + aggregate == Σ quantized_i, the detach invariant (codebook grad flows, input gets none), the normalize_residuals branch, and decode_codes sum + codebook dtype. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code review summaryReviewed across code-quality, performance, tests, docs, and security. This is clean, well-documented foundation code for the SID model split — the abstraction boundary ( A few minor, optional items posted inline:
Docs were verified accurate (the 🤖 Generated with Claude Code |
2c8ffa4 to
17cadf3
Compare
…aba#538 - ResidualQuantizer.__init__: assert n_layers >= 1, making the boundary self-guarding (matches the "n_layers >= 1 is guaranteed" comment in decode_codes; a future n_layers=0 subclass would otherwise index OOB). - BaseSidModel.update_train_metric: type predictions as Dict[str, Tensor] instead of bare dict, matching BaseModel / sibling models. - sid_model_test.py (new): unit-test _update_unique_sid_ratio via __new__ — empty-batch no-op (guard) and the exact 0.75 ratio on a known-duplicate batch. - residual_quantizer_test: make the normalize_residuals test behavioral — toggle the flag on the same input/codebook and assert the assignments differ, so the normalization branch is actually validated (the old shape assertions passed even with the branch deleted). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
17cadf3 to
7153d29
Compare
…aba#538 - ResidualQuantizer.__init__: assert n_layers >= 1, making the boundary self-guarding (matches the "n_layers >= 1 is guaranteed" comment in decode_codes; a future n_layers=0 subclass would otherwise index OOB). - BaseSidModel.update_train_metric: type predictions as Dict[str, Tensor] instead of bare dict, matching BaseModel / sibling models. - sid_model_test.py (new): unit-test _update_unique_sid_ratio via __new__ — empty-batch no-op (guard) and the exact 0.75 ratio on a known-duplicate batch. - residual_quantizer_test: make the normalize_residuals test behavioral — toggle the flag on the same input/codebook and assert the assignments differ, so the normalization branch is actually validated (the old shape assertions passed even with the branch deleted). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7153d29 to
994c321
Compare
- Metrics: switch the shared reconstruction metric to torchmetrics.MeanSquaredError (correct (preds, target) aggregation vs the biased mean-of-batch-means a MeanMetric gave), and add a UniqueRatio metric class (tzrec/metrics/unique_ratio.py) for codebook coverage instead of a MeanMetric fed a hand-computed ratio. BaseSidModel._update_unique_sid_ratio now delegates to it; the empty-batch guard + DDP reduction live in the metric. The RQ-VAE train-path mse moves to MeanSquaredError too. - __init__.py: revert tzrec/modules/sid_generation/__init__.py to a bare package marker (no re-exports) and import ResidualKMeansQuantizer from its submodule in sid_rqkmeans, per "avoid adding to __init__.py". Tests: tzrec/metrics/unique_ratio_test.py (ratio, empty no-op, mean over batches); sid_model_test verifies the delegation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Metrics: switch the shared reconstruction metric to torchmetrics.MeanSquaredError (correct (preds, target) aggregation vs the biased mean-of-batch-means a MeanMetric gave), and add a UniqueRatio metric class (tzrec/metrics/unique_ratio.py) for codebook coverage instead of a MeanMetric fed a hand-computed ratio. The empty-batch guard + DDP reduction live in the metric; callers update it directly (the one-line _update_unique_sid_ratio passthrough is removed). The RQ-VAE train-path mse moves to MeanSquaredError too. - __init__.py: revert tzrec/modules/sid_generation/__init__.py to a bare package marker (no re-exports) and import ResidualKMeansQuantizer from its submodule in sid_rqkmeans, per "avoid adding to __init__.py". Tests: tzrec/metrics/unique_ratio_test.py (ratio, empty no-op, mean over batches). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
994c321 to
4f249ee
Compare
Review summaryClean, genuinely additive PR — backend-agnostic SID base classes only, no edits to existing files. The abstraction is well-factored: Two inline comments on items worth confirming:
Minor / non-blocking notes:
Nice work — no blocking issues. |
Addresses github-actions review on PR alibaba#538 (no behavior change): - UniqueRatio stays the cheap per-batch metric (mean of distinct-rows / batch-size, two scalar states) — deliberately kept per-batch, not global distinct-SID coverage, to avoid accumulating an O(#distinct-SIDs) set during predict. Docstrings now say so plainly (a batch-size-sensitive diversity proxy, not global codebook coverage) instead of overselling it as "codebook coverage". - BaseSidModel docstring: subclasses extend init_metric (via super) and *implement* update_metric (BaseModel.update_metric raises NotImplementedError, so there is nothing to "extend"); update_train_metric defaults to a no-op. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First of three PRs splitting the Semantic-ID models onto a shared base. Purely additive — only the backend-agnostic foundation, no concrete quantizer or model and no edits to existing files. RQ-KMeans follows in PR2, RQ-VAE in PR3. What this adds: - ResidualQuantizer (abstract): owns the shared state (embed_dim, per-layer codebook sizes via normalize_n_embed, residual-normalization flag, layer list; asserts n_layers >= 1) and the shared residual walk — _residual_pass drives the concrete get_codes / decode_codes / output_dim. Subclasses implement just _quantize_layer (encode) and _lookup_code (decode), plus forward and get_codebook_embeddings. - BaseSidModel (abstract): the shared SID model scaffold — embedding-feature extraction, loss/metric init (reconstruction MSE via torchmetrics.MeanSquaredError + codebook coverage via UniqueRatio), and shared config parsing — that SidRqkmeans / SidRqvae subclass. - UniqueRatio (tzrec/metrics/unique_ratio.py): codebook-coverage metric (mean per-batch unique-row ratio) with empty-batch guard + DDP reduction. Tests: normalize_n_embed; the abstract-base contract; the concrete residual walk via a fake one-primitive subclass; and the UniqueRatio metric. No proto changes and no edits to existing modules; __init__.py is a bare package marker (no re-exports). The QuantizeForwardMode / QuantizeOutput / ResidualQuantizerOutput types and the concrete SidRqkmeans / SidRqvae models ship with the code that uses them in PR2 / PR3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4f249ee to
03a2a7a
Compare
…bump version to 1.2.16 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_abstract Brings the reviewed alibaba#539 foundation onto feat/sid_abstract (which already carries alibaba#538 + an older RQ-VAE/RQ-Kmeans port), and syncs to upstream/master (alibaba#540, alibaba#541, which alibaba#539 already contains). Conflict resolutions: - sid_rqkmeans.py(+test), residual_kmeans_quantizer.py, sid_model.py: take alibaba#539's canonical versions (BaseSidModel now hosts both SID models, with mse/rel_loss/unique_sid_ratio and the unified x_hat recon key). - types.py: union — keep alibaba#539's QuantizeOutput, retain feat's QuantizeForwardMode enum + ResidualQuantizerOutput (RQ-VAE needs them). - protos/models/sid_model.proto: union — alibaba#539's typed FaissKmeansConfig + clean SidRqkmeans, re-add feat's SinkhornConfig/ClipConfig/SidRqvae; drop the now-unused struct.proto import. - protos/model.proto: enable `SidRqvae sid_rqvae = 600;` (the field alibaba#539 reserved for this follow-up). - main.py / model.py on_train_end: take alibaba#539's wording; drop feat's forced tail-checkpoint (SID models rely on the final=True tail save). Transitional state: old modules/sid/kmeans.py still coexists with alibaba#539's kmeans_quantize.py, and the RQ-VAE stack is still on the old abstraction — both retired in the follow-up refactor commit. All SID modules import.
First of three PRs splitting the Semantic-ID models onto a shared base. This one is purely additive — only the backend-agnostic foundation, with no concrete quantizer or model and no edits to existing files. RQ-KMeans follows in PR2, RQ-VAE in PR3.
What this adds:
embed_dim, per-layer codebook sizes (normalize_n_embed), residual-normalization flag, and the layer list — and the shared residual walk._residual_passdrives the concreteget_codes/decode_codes/output_dim. Subclasses implement just the two per-layer primitives —_quantize_layer(encode) and_lookup_code(decode) — plusforwardandget_codebook_embeddings.SidRqkmeansandSidRqvaewill subclass.QuantizeForwardMode,QuantizeOutput,ResidualQuantizerOutput.Tests:
normalize_n_embed, the abstract-base contract (shared state present, backend primitives raise).No proto changes and no edits to existing modules — the concrete
SidRqkmeans/SidRqvaemessages, and theBaseModel.on_train_endlifecycle hook, arrive with their models in PR2 / PR3.