Skip to content

feat: add CUDA cuDF convenience API#8624

Merged
0ax1 merged 1 commit into
developfrom
ad/pycudf4
Jul 1, 2026
Merged

feat: add CUDA cuDF convenience API#8624
0ax1 merged 1 commit into
developfrom
ad/pycudf4

Conversation

@0ax1

@0ax1 0ax1 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

See the README.md added as part of this PR how to use pyVortex CUDA.

@0ax1 0ax1 added the changelog/chore A trivial change label Jun 29, 2026
@0ax1 0ax1 force-pushed the ad/pycudf4 branch 2 times, most recently from 93c8619 to 8ebaae5 Compare June 29, 2026 16:33
@vortex-data vortex-data deleted a comment from github-actions Bot Jun 29, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 29, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
❌ 3 regressed benchmarks
✅ 1590 untouched benchmarks
⏩ 4 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 169.6 µs 206.2 µs -17.75%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 169.3 µs 205.6 µs -17.67%
Simulation chunked_varbinview_opt_into_canonical[(1000, 10)] 183.4 µs 219.7 µs -16.52%
Simulation chunked_varbinview_into_canonical[(100, 100)] 306.4 µs 272.1 µs +12.61%
Simulation bitwise_not_vortex_buffer_mut[128] 273.6 ns 244.4 ns +11.93%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ad/pycudf4 (bbfb83a) with develop (0a45777)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@0ax1 0ax1 force-pushed the ad/pycudf4 branch 3 times, most recently from 71f8a1d to b7a5b32 Compare June 30, 2026 12:39
@0ax1 0ax1 changed the title chore: test CUDA Arrow Device capsule exports feat: add CUDA cuDF convenience API Jun 30, 2026
@0ax1 0ax1 force-pushed the ad/pycudf4 branch 2 times, most recently from 5e24a18 to e55b447 Compare June 30, 2026 15:28
@0ax1 0ax1 marked this pull request as ready for review June 30, 2026 15:30
@0ax1 0ax1 requested review from a team and robert3005 June 30, 2026 15:30
@0ax1 0ax1 enabled auto-merge (squash) June 30, 2026 15:39
@0ax1 0ax1 requested a review from AdamGS June 30, 2026 15:53
@0ax1 0ax1 marked this pull request as draft July 1, 2026 09:57
auto-merge was automatically disabled July 1, 2026 09:57

Pull request was converted to draft

@vortex-data vortex-data deleted a comment from github-actions Bot Jul 1, 2026
@0ax1 0ax1 force-pushed the ad/pycudf4 branch 2 times, most recently from da401e4 to c913efa Compare July 1, 2026 14:57
@vortex-data vortex-data deleted a comment from github-actions Bot Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Claude finished @0ax1's task in 3m 24s —— View job


Claude PR Review — CUDA cuDF convenience API

I reviewed the full diff (c913efa) against develop. Overall this is a careful, well-tested change: the validity-padding rework is conservative (it proves a tail is zeroed before reusing a device bitmap, and copies/repacks otherwise), the FFI capsule lifecycle is memory-safe, and the Python surface is small and documented. No blocking issues found. Notes below.


Correctness & safety — looks good

  • Validity tail-padding reuse (export_arrow_validity_bitmap / has_zeroed_tail_padding) is sound. The reuse path only returns the existing bitmap when zeroed_tail_start <= offset + logical_len and offset + padded_len <= allocation_len, so the whole [logical_end, offset+padded_len) window is provably inside the zeroed region. When the tail isn't tracked (zeroed_tail_start == None) or offsets differ, it falls back to copy_arrow_validity_buffer / repack_arrow_validity_buffer. That conservative default is the right call.
  • zeroed_tail_start is allocation-relative and is correctly carried through slice() / with_alignment() / cuda_backing_allocation unchanged, which matches offset also being allocation-relative. Consistent.
  • FFI capsule release is idempotent. release_schema / release_device_array clear the release callback, and release_*_capsule destructors (via Box::from_raw + capsule_pointer_with_name_or_used) run exactly once and see a cleared callback after _debug_consume_..., so no double-free of device buffers. The "used_" capsule renaming + fallback lookup is a nice touch.
  • setattr(vortex.Array, ...) works because Array is a PyO3 heap type (#[pyclass(..., subclass, frozen)]); frozen only affects Rust-side interior mutability, not class-level attribute assignment. The install is idempotent on re-import.

Observations / suggestions

  1. Padding constant bumped 4 → 64 bytes affects all copy_to_device allocations, not just validity. padded_device_allocation_len now rounds every host→device copy up to a multiple of CUDF_VALIDITY_BUFFER_PADDING. For large buffers this is negligible, but workloads with many tiny buffers now pay up to 16× the previous minimum tail. This looks intentional (uniform cuDF-safe padding avoids special-casing validity), but the constant name CUDF_VALIDITY_BUFFER_PADDING undersells that it's now the general device-copy padding. Consider a more general name or a comment on padded_device_allocation_len noting the trade-off for small allocations.

  2. to_cudf(..., fallback=...) is speculative generality. fallback only accepts "error" and otherwise raises NotImplementedError; it's a reserved-for-future no-op parameter. It's documented and harmless, but it's untested behavior surface (YAGNI). Fine to keep given the README documents it explicitly.

  3. Struct column renaming assumes positional name alignment. In to_cudf, dataframe.columns = dtype.names() relies on pylibcudf.Table.from_arrow preserving field order 1:1 with StructDType.names(). That holds for Arrow, but it silently overwrites whatever names cuDF produced — if a future export ever reordered/flattened fields this would mislabel columns. A brief comment noting the order invariant would help the next reader.

  4. Global side effect on import. Importing vortex_cuda mutates the third-party vortex.Array class. This is the documented design and is idempotent, but it's worth being aware that any import of vortex_cuda (even transitively) installs to_cudf/__arrow_c_device_array__ process-wide.

Test coverage — strong

  • New Rust tests cover matching-offset padding, padded-bitmap reuse, repack-without-tail-padding, and empty-array omission, and assert the backing allocation size + zeroed tail. Good edge coverage.
  • Python tests exercise the pure dispatch logic with fakes (no GPU needed) plus a real-cuDF smoke test gated on cuda_available(), including a lifetime/gc.collect() check that a derived Series outlives its parent DataFrame — nice validation of the zero-copy ownership claim.
  • The nullable-struct rejection and CUDA-unavailable paths are verified to not import cuDF, which keeps the fast-fail contract honest.

Nits

  • stream.rs: the new module-level comment block (lines ~26–30) sits detached above struct VortexCudaStream with a blank line between; reads slightly orphaned from what it documents.
  • test_native_bridge.py: a few assertion lines exceed the repo's usual width; confirm ruff/formatting passes (uv run ruff check on the changed test files).

I did not run builds/tests (no CUDA device in this environment and the change is GPU-centric); please ensure cargo clippy --all-targets --all-features -p vortex-cuda, cargo +nightly fmt --check, and the vortex-python-cuda pytest suite were run locally per CLAUDE.md.
· ad/pycudf4

Add vortex_cuda.to_cudf and install optional Vortex Array helpers for
cuDF conversion and Arrow C Device export. Keep the conversion path
CUDA-only by rejecting unsupported fallback policies and routing cuDF
ingestion through fresh Arrow C Device capsules.

Expand CUDA Python tests for the convenience API, installed Array
methods, Arrow Device export smoke coverage, and capsule ownership
paths.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 marked this pull request as ready for review July 1, 2026 15:20
@0ax1 0ax1 enabled auto-merge (squash) July 1, 2026 15:20
@0ax1 0ax1 merged commit 890704f into develop Jul 1, 2026
72 of 73 checks passed
@0ax1 0ax1 deleted the ad/pycudf4 branch July 1, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/chore A trivial change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants