Skip to content

Fix off-by-one num_segments in matrix::sort_cols_per_row#3010

Merged
rapids-bot[bot] merged 4 commits into
rapidsai:release/26.06from
viclafargue:fix-sort-columns-per-row
May 18, 2026
Merged

Fix off-by-one num_segments in matrix::sort_cols_per_row#3010
rapids-bot[bot] merged 4 commits into
rapidsai:release/26.06from
viclafargue:fix-sort-columns-per-row

Conversation

@viclafargue
Copy link
Copy Markdown
Contributor

@viclafargue viclafargue commented May 4, 2026

Closes #2049

raft::matrix::detail::sortColumnsPerRow passed n_rows + 1 to cub as num_segments. Per the cub contract, an aliased CSR offsets array must have length num_segments + 1, so cub read one int past the offsets allocation. This crashes with cudaErrorIllegalAddress when n_rows + 1 is a power of two ≥ 64 and n_columns ≥ ~16384 (silent otherwise).

Surfaces in cuvs::stats::trustworthiness_score whenever n % batch_size ∈ {63, 127, 255, …} (e.g. n=76927, batch_size=512).

Fix : pass n_rows to cub; size the offsets array as n_rows + 1.

@viclafargue viclafargue self-assigned this May 4, 2026
@viclafargue viclafargue requested a review from a team as a code owner May 4, 2026 13:25
@viclafargue viclafargue added bug Something isn't working non-breaking Non-breaking change labels May 4, 2026
@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented May 13, 2026

/ok to test a422e94

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented May 13, 2026

/merge

@cjnolet cjnolet moved this to In Progress in Unstructured Data Processing May 13, 2026
@viclafargue viclafargue changed the base branch from main to release/26.06 May 18, 2026 08:03
@viclafargue viclafargue requested review from a team as code owners May 18, 2026 08:04
@viclafargue viclafargue requested a review from bdice May 18, 2026 08:04
@viclafargue viclafargue force-pushed the fix-sort-columns-per-row branch 2 times, most recently from 2642b31 to a422e94 Compare May 18, 2026 13:42
@rapids-bot
Copy link
Copy Markdown

rapids-bot Bot commented May 18, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown

rapids-bot Bot commented May 18, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@cjnolet cjnolet removed request for a team and bdice May 18, 2026 19:37
@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented May 18, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 995fe0a into rapidsai:release/26.06 May 18, 2026
79 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] cuml.metrics.trustworthiness crashes with cudaErrorIllegalAddress for specific values of n

4 participants