Skip to content

fix(cuda): handle validity in GPU kernels#7372

Merged
0ax1 merged 1 commit intodevelopfrom
ad/cuda-validity
Apr 9, 2026
Merged

fix(cuda): handle validity in GPU kernels#7372
0ax1 merged 1 commit intodevelopfrom
ad/cuda-validity

Conversation

@0ax1
Copy link
Copy Markdown
Contributor

@0ax1 0ax1 commented Apr 9, 2026

All GPU kernels should now handle validity or fall back to the CPU otherwise in cases they don't.

Dynamic dispatch:

  • Propagate root array validity through FusedPlan to output.
  • Reject Dict with nullable codes and RunEnd with nullable ends (out-of-bounds risk).
  • Short-circuit AllInvalid (skip kernel) and empty arrays.

Standalone kernels:

  • RunEnd/Zstd: replace unreachable!()/unimplemented!() with vortex_bail!()
    for unsupported nullable cases, enabling graceful CPU fallback.
  • ALP: remove stale TODO; patch validity scatter is unnecessary since the
    encoder already strips null positions from the exception list.

@0ax1 0ax1 force-pushed the ad/cuda-validity branch from 95d53b5 to 1e0c716 Compare April 9, 2026 17:39
@0ax1 0ax1 requested review from a10y and joseph-isaacs April 9, 2026 17:39
@0ax1 0ax1 added the changelog/feature A new feature label Apr 9, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 9, 2026

Merging this PR will improve performance by 29.94%

⚡ 2 improved benchmarks
✅ 1120 untouched benchmarks
⏩ 1455 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation take_10k_random 251.2 µs 193.3 µs +29.94%
Simulation take_10k_contiguous 315 µs 257 µs +22.54%

Comparing ad/cuda-validity (1aebccf) with develop (ae906c7)

Open in CodSpeed

Footnotes

  1. 1455 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@0ax1 0ax1 force-pushed the ad/cuda-validity branch from 1e0c716 to f75e977 Compare April 9, 2026 17:45
…lone kernel panics

The dynamic dispatch plan builder silently dropped validity (null bitmaps)
from nullable arrays, hardcoding Validity::NonNullable on the output. This
caused silent data corruption when nullable arrays were fused, and
especially when the output fed into downstream kernels like CUB filter.

Dynamic dispatch changes:
- Thread root array validity through FusedPlan -> MaterializedPlan ->
  execute_typed, replacing the hardcoded Validity::NonNullable.
- Guard against Dict with nullable codes (garbage code values could cause
  OOB shared memory reads in the DICT gather scalar op).
- Guard against RunEnd with nullable ends (garbage end values could cause
  unpredictable binary search / forward-scan behavior).
- Skip kernel launch entirely for Validity::AllInvalid arrays.
- Respect nullability in the len==0 early-return path.

Standalone kernel fixes:
- RunEnd: replace unreachable!() with vortex_bail!() when values have
  per-element validity (Validity::Array), allowing graceful CPU fallback
  instead of a panic.
- Zstd: replace unimplemented!() with vortex_bail!() when decompressed
  data contains nulls, allowing graceful CPU fallback instead of a panic.
- ALP: clarify that patch validity does not need scattering (the encoder
  strips null positions from the exception list), remove stale TODO.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 force-pushed the ad/cuda-validity branch from f75e977 to 1aebccf Compare April 9, 2026 17:50
Copy link
Copy Markdown
Contributor

@joseph-isaacs joseph-isaacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you actually do any work in the plan. is that in a different pr

@0ax1 0ax1 merged commit 256a029 into develop Apr 9, 2026
62 checks passed
@0ax1 0ax1 deleted the ad/cuda-validity branch April 9, 2026 18:06
@0ax1
Copy link
Copy Markdown
Contributor Author

0ax1 commented Apr 9, 2026

do you actually do any work in the plan. is that in a different pr

the plan doesn't really have to do sth in addition here other than attaching the validity to the output array. the kernels are fine operating on garbage data. the ones that aren't require non-nullable dict with nullable codes and RunEnd with nullable ends.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants