speed update tie correct#701
Conversation
Signed-off-by: Intron7 <severin.dicks@icloud.com>
|
@coderabbitai review |
✅ Action performedReview finished.
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds an OVO Wilcoxon analytic-zeros mode
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (1 warning, 2 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
src/rapids_singlecell/_cuda/wilcoxon/kernels_wilcoxon_ovo.cuh (1)
248-252: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse the named warp-reduction buffer size.
The new analytic kernels hard-code
warp_buf[32]; useWARP_REDUCE_BUFto keep shared-memory sizing aligned with the reduction helper.As per coding guidelines,
src/rapids_singlecell/_cuda/**/*.{cu,cuh}: “All numeric literals for block sizes, tile dimensions, shared memory sizes, and heuristic thresholds MUST be defined as named constants”.Proposed fix
- __shared__ double warp_buf[32]; + __shared__ double warp_buf[WARP_REDUCE_BUF];- __shared__ double warp_buf[32]; + __shared__ double warp_buf[WARP_REDUCE_BUF];Also applies to: 375-376
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rapids_singlecell/_cuda/wilcoxon/kernels_wilcoxon_ovo.cuh` around lines 248 - 252, The shared-memory warp reduction buffer is hard-coded in the analytic Wilcoxon kernels, which violates the named-constant guideline and can drift from the helper’s expected size. Update the declarations in the Wilcoxon OVO kernel code to use WARP_REDUCE_BUF instead of the literal 32, and make the same change in the other affected spot, keeping the shared-memory layout aligned with the reduction helper and consistent with the kernel symbols in kernels_wilcoxon_ovo.cuh.Source: Coding guidelines
src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_kernels.cuh (1)
78-97: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winSize LARGE shared memory from the actual largest LARGE group.
large_max = min(max_grp_size, OVO_LARGE_MAX)overestimates when a HUGE group exists alongside smaller LARGE groups, which can reduce occupancy or force an unnecessary HUGE fallback.Proposed refactor
OvoTierPlan c; + int large_max = 0; for (int g = 0; g < n_groups; g++) { int sz = h_grp_offsets[g + 1] - h_grp_offsets[g]; if (sz > c.max_grp_size) c.max_grp_size = sz; if (sz <= OVO_MEDIUM_MAX) c.run_medium = true; - else if (sz <= OVO_LARGE_MAX) + else if (sz <= OVO_LARGE_MAX) { c.run_large = true; - else + large_max = std::max(large_max, sz); + } else { c.run_huge = true; + } } @@ // Size smem for the largest LARGE-band group. if (c.run_large) { - int large_max = std::min(c.max_grp_size, OVO_LARGE_MAX); c.large_padded = 1;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_kernels.cuh` around lines 78 - 97, The LARGE shared-memory sizing in the kernel config logic is using the global max group size instead of the largest group that actually falls in the LARGE range, which can over-allocate when a HUGE group is present. Update the sizing in the configuration block around c.run_large/c.large_smem to track the largest sz that is <= OVO_LARGE_MAX while iterating groups, and use that value when computing large_max in this setup path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_device_sparse.cuh`:
- Around line 24-29: Gate the huge-mode scratch allocation in the sparse
Wilcoxon OVO path so it only happens when tie correction is actually enabled;
currently `run_huge` and the huge-tier sort buffers are created in
`wilcoxon_ovo_device_sparse` even when `compute_tie_corr` will take the no-tie
fast path. Update the allocation/dispatch logic around `make_sort_group_ids` and
the later huge-buffer setup so `compute_tie_corr=false` skips all huge-mode
sort-ID/scratch setup, while preserving the existing huge path when tie
correction is on.
In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_kernels.cuh`:
- Around line 222-235: Validate the analytic-HUGE launch inputs before calling
compact_huge_nonzeros_kernel and cub_segmented_sortkeys in
wilcoxon_ovo_kernels.cuh: guard grp_nz against null before passing it to the
writing kernel, and check that sb_cols * n_all_grp fits the segment offset type
used by grp_seg_offsets and grp_seg_ends. Add these pre-launch validations
alongside the existing analytic_zeros branch so the kernel only runs with valid
scratch buffers and safe index ranges.
- Around line 3-4: The Wilcoxon kernel header uses std::min but only includes
<climits>, so add the missing direct <algorithm> include in
wilcoxon_ovo_kernels.cuh near the existing includes. Keep the change localized
to the header and ensure the std::min usage in the Wilcoxon kernel code remains
valid without relying on transitive includes.
In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon.cu`:
- Around line 369-375: The huge-tier scratch buffers are being enabled in the
Wilcoxon tier setup even when compute_tie_corr is false, but ovo_dispatch_tiers
exits early on that path, so avoid allocating the huge-mode data unless the
tie-correction path will actually run. Update the run_huge handling in
wilcoxon.cu around the tier setup and the later buffer preparation to gate
make_sort_group_ids and the related huge scratch allocation behind
compute_tie_corr, using the existing symbols run_huge, compute_tie_corr,
t1.run_huge, and ovo_dispatch_tiers to keep the fast path from triggering
unnecessary OOMs.
---
Nitpick comments:
In `@src/rapids_singlecell/_cuda/wilcoxon/kernels_wilcoxon_ovo.cuh`:
- Around line 248-252: The shared-memory warp reduction buffer is hard-coded in
the analytic Wilcoxon kernels, which violates the named-constant guideline and
can drift from the helper’s expected size. Update the declarations in the
Wilcoxon OVO kernel code to use WARP_REDUCE_BUF instead of the literal 32, and
make the same change in the other affected spot, keeping the shared-memory
layout aligned with the reduction helper and consistent with the kernel symbols
in kernels_wilcoxon_ovo.cuh.
In `@src/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_kernels.cuh`:
- Around line 78-97: The LARGE shared-memory sizing in the kernel config logic
is using the global max group size instead of the largest group that actually
falls in the LARGE range, which can over-allocate when a HUGE group is present.
Update the sizing in the configuration block around c.run_large/c.large_smem to
track the largest sz that is <= OVO_LARGE_MAX while iterating groups, and use
that value when computing large_max in this setup path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b2c19453-e26f-4e4e-8a32-32957bea7b1c
📒 Files selected for processing (7)
src/rapids_singlecell/_cuda/wilcoxon/kernels_wilcoxon_ovo.cuhsrc/rapids_singlecell/_cuda/wilcoxon/wilcoxon.cusrc/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_device_sparse.cuhsrc/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_host_sparse.cuhsrc/rapids_singlecell/_cuda/wilcoxon/wilcoxon_ovo_kernels.cuhsrc/rapids_singlecell/_cuda/wilcoxon/wilcoxon_sparse.cusrc/rapids_singlecell/tools/_rank_genes_groups/_wilcoxon.py
Signed-off-by: Intron7 <severin.dicks@icloud.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #701 +/- ##
=======================================
Coverage 90.31% 90.31%
=======================================
Files 104 104
Lines 9570 9570
=======================================
Hits 8643 8643
Misses 927 927
|
No description provided.