Fix jagged_unique_indices OOB regression from D104827588 (#5799) by AlbertDachiChen · Pull Request #5799 · pytorch/FBGEMM

AlbertDachiChen · 2026-05-29T17:03:23Z

Summary:

X-link: https://github.com/facebookresearch/FBGEMM/pull/2727

D104827588 (length kernel) and f2e9cba2e279 (delinearize kernel) both introduced an unstated "indices must lie in [0, per_feature_hash)" assumption that the op never enforced before. ZCH callers that pass raw IDs to jagged_unique_indices (dedup runs before the hash-to-bucket remap) tripped this on the last feature in prod, producing sum(output_lengths) < unique_indices.numel() and crashing downstream dist.all_to_all_single with "Split sizes doesn't match total dim 0 size".

Two fixes in jagged_unique_indices.cu:

unique_indices_length_kernel: when the group's hash_end == hash_size_cumsum.size(0) - 1 (last feature), set hi_pos = linear_unique_indices.size(0) instead of doing the upper-bound binary search. The OOB tail of the last feature sorts past hash_size_cumsum[T] and the binary search would otherwise stop short of it.
Replace delinearize_unique_from_sorted_kernel (gather: unique_indices[i] = v - hash_size_cumsum[t]) with the original delinearize_unique_index_kernel (scatter: unique_indices[reverse_index[i]] = indices[i]). The gather form misattributes OOB values whose linearized key exceeds hash_size_cumsum[T] to a phantom feature t = T and emits v - hash_size_cumsum[T] instead of the original index value. The scatter form is index-bound-agnostic by construction. Note reverse_index is int64_t (from the cub pipeline) rather than templated index_t (which was the case under at::_unique).

Contract enforcement: added a CUDA_KERNEL_ASSERT in linearize_index_flat_kernel that fires when an intermediate feature (t < T - 1) with per_feature_hash > 0 has idx >= per_feature_hash. Intermediate-feature OOB causes silent per-feature count drift (counts leak from t to t+1 while the total is preserved) — this has been broken since the op was written, but no caller hit it, so the assert surfaces violations rather than silently corrupting downstream embedding lookups. The assert exempts (a) the last feature (legitimately supported) and (b) merged/masked features with per_feature_hash == 0 (the hash_size_offsets indirection pattern used by multi_keys and zch_huge_hash_size).

The pipeline contract doc comment above jagged_unique_indices_cuda is updated to enumerate the three cases.

Differential Revision: D106305472

meta-codesync · 2026-05-29T17:03:32Z

@AlbertDachiChen has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106305472.

Summary: X-link: facebookresearch/FBGEMM#2727 D104827588 (length kernel) and f2e9cba2e279 (delinearize kernel) both introduced an unstated "indices must lie in [0, per_feature_hash)" assumption that the op never enforced before. ZCH callers that pass raw IDs to `jagged_unique_indices` (dedup runs before the hash-to-bucket remap) tripped this on the last feature in prod, producing `sum(output_lengths) < unique_indices.numel()` and crashing downstream `dist.all_to_all_single` with "Split sizes doesn't match total dim 0 size". Two fixes in `jagged_unique_indices.cu`: 1. `unique_indices_length_kernel`: when the group's `hash_end == hash_size_cumsum.size(0) - 1` (last feature), set `hi_pos = linear_unique_indices.size(0)` instead of doing the upper-bound binary search. The OOB tail of the last feature sorts past `hash_size_cumsum[T]` and the binary search would otherwise stop short of it. 2. Replace `delinearize_unique_from_sorted_kernel` (gather: `unique_indices[i] = v - hash_size_cumsum[t]`) with the original `delinearize_unique_index_kernel` (scatter: `unique_indices[reverse_index[i]] = indices[i]`). The gather form misattributes OOB values whose linearized key exceeds `hash_size_cumsum[T]` to a phantom feature `t = T` and emits `v - hash_size_cumsum[T]` instead of the original index value. The scatter form is index-bound-agnostic by construction. Note `reverse_index` is `int64_t` (from the cub pipeline) rather than templated `index_t` (which was the case under `at::_unique`). Contract enforcement: added a `CUDA_KERNEL_ASSERT` in `linearize_index_flat_kernel` that fires when an intermediate feature (`t < T - 1`) with `per_feature_hash > 0` has `idx >= per_feature_hash`. Intermediate-feature OOB causes silent per-feature count drift (counts leak from `t` to `t+1` while the total is preserved) — this has been broken since the op was written, but no caller hit it, so the assert surfaces violations rather than silently corrupting downstream embedding lookups. The assert exempts (a) the last feature (legitimately supported) and (b) merged/masked features with `per_feature_hash == 0` (the `hash_size_offsets` indirection pattern used by `multi_keys` and `zch_huge_hash_size`). The pipeline contract doc comment above `jagged_unique_indices_cuda` is updated to enumerate the three cases. Differential Revision: D106305472

meta-cla Bot added the cla signed label May 29, 2026

meta-codesync Bot added fb-exported meta-exported labels May 29, 2026

AlbertDachiChen force-pushed the export-D106305472 branch from ce9f749 to 4d28a63 Compare June 1, 2026 14:58

meta-codesync Bot changed the title ~~Fix jagged_unique_indices OOB regression from D104827588~~ Fix jagged_unique_indices OOB regression from D104827588 (#5799) Jun 1, 2026

AlbertDachiChen force-pushed the export-D106305472 branch from 4d28a63 to 28d1ec3 Compare June 1, 2026 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix jagged_unique_indices OOB regression from D104827588 (#5799)#5799

Fix jagged_unique_indices OOB regression from D104827588 (#5799)#5799
AlbertDachiChen wants to merge 1 commit into
pytorch:mainfrom
AlbertDachiChen:export-D106305472

AlbertDachiChen commented May 29, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AlbertDachiChen commented May 29, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AlbertDachiChen commented May 29, 2026 •

edited by meta-codesync Bot

Loading