Skip to content

Add mdspan example skeleton to guide cuda.core design for structured array arguments#12

Draft
Copilot wants to merge 7 commits intomasterfrom
copilot/add-cuda-mdspan-example
Draft

Add mdspan example skeleton to guide cuda.core design for structured array arguments#12
Copilot wants to merge 7 commits intomasterfrom
copilot/add-cuda-mdspan-example

Conversation

Copy link
Copy Markdown

Copilot AI commented Dec 14, 2025

  • Create skeleton example file for mdspan kernel demonstration
    • Add header comments explaining the purpose
    • Add C++ kernel code with mdspan parameters
    • Add Python code for C-order (row-major) 2D arrays
    • Add Python code for F-order (column-major) 2D arrays
    • Add Python code for strided arrays with skipped second axis
    • Add placeholder comments for layout information discovery
    • Include proper imports and structure following existing examples
  • Add mdspan argument verification example
    • Create verification kernels using printf to inspect mdspan properties
    • Verify pointer address, extents, and strides
    • Add verification for layout_right (C-order)
    • Add verification for layout_left (F-order)
    • Add verification for layout_stride with explicit strides
Original prompt

I would like to add a code example to cuda_core/examples, showcasing how to write a C++ kernel that takes cuda::std::mdspan as kernel arguments, JIT-compile it using cuda.core.experimental.Program, and prepare input/output CuPy arrays to launch this kernel and verify the result.

Now, we don't have everything prepared yet, so this example will be used to guide the cuda.core design. Specifically, we do not know how the mdspan layout looks like on the device side, and without this knowledge we can not prepare the kernel args on host.

Write a skeleton of this code, and focus on the following 3 situations.

  • 2D input/output, both in C-order
  • 2D input/output, both in F-order
  • 2D input/output, both have the second axis skipped by 1 step, i.e. arr[:, ::1

No need to test the code because it is obviously not runnable. I'll take the skeleton from here.

Ask me any questions that you might have to guide you to complete this work.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…ents

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Copilot AI changed the title [WIP] Add example of C++ kernel with cuda::std::mdspan Add mdspan example skeleton to guide cuda.core design for structured array arguments Dec 14, 2025
Copilot AI requested a review from leofang December 14, 2025 21:51
@leofang
Copy link
Copy Markdown
Owner

leofang commented Dec 15, 2025

@copilot Question: Can you determin in these cases what the mdspan memory layouts would look like on device? Give me C struct equivalents.

No need to change any code, just answer my question.

Copy link
Copy Markdown
Author

Copilot AI commented Dec 15, 2025

@copilot Question: Can you determin in these cases what the mdspan memory layouts would look like on device? Give me C struct equivalents.

No need to change any code, just answer my question.

Yes, I can determine the mdspan memory layouts for these three cases. Here are the C struct equivalents:

Case 1: C-order (row-major) with layout_right

struct mdspan_2d_layout_right {
    const float* ptr;      // pointer to data
    size_t extent0;        // number of rows
    size_t extent1;        // number of columns
    // No stride needed - implicit: stride0 = extent1, stride1 = 1
};

For a 128x256 array, this would be: {ptr, 128, 256}

Case 2: F-order (column-major) with layout_left

struct mdspan_2d_layout_left {
    const float* ptr;      // pointer to data
    size_t extent0;        // number of rows
    size_t extent1;        // number of columns
    // No stride needed - implicit: stride0 = 1, stride1 = extent0
};

For a 128x256 array, this would be: {ptr, 128, 256}

Case 3: Strided with layout_stride

struct mdspan_2d_layout_stride {
    const float* ptr;      // pointer to data
    size_t extent0;        // number of rows
    size_t extent1;        // number of columns
    size_t stride0;        // stride for dimension 0 (in elements, not bytes)
    size_t stride1;        // stride for dimension 1 (in elements, not bytes)
};

For arr[:, ::2] with base shape 128x512, this would be: {ptr, 128, 256, 512, 2}

The key insight is that layout_right and layout_left have implicit strides (computed from extents), while layout_stride stores explicit stride values. All pass-by-value, so the structs are copied to the kernel parameter space.

@leofang
Copy link
Copy Markdown
Owner

leofang commented Dec 15, 2025

@copilot For mdspan_2d_layout_stride how did you learn this? Could you link to where the mdspan source code is?

Copy link
Copy Markdown
Author

Copilot AI commented Dec 15, 2025

@copilot For mdspan_2d_layout_stride how did you learn this? Could you link to where the mdspan source code is?

I based this on the C++23 mdspan standard (P0009) and the reference implementation. Here are the key sources:

Official mdspan references:

CUDA libcudacxx implementation:

The key implementation detail: layout_right and layout_left only store extents (via extents<> member), while layout_stride has a __strides member of type extents<> to store explicit stride values. The data pointer is stored in the mdspan object itself, not in the layout policy.

Correction to my earlier answer: The actual structure is slightly more complex - the mdspan object contains both a pointer and a "mapping" object that contains the layout policy. But for kernel parameter passing purposes, the flattened representation I described is what gets passed.

@leofang
Copy link
Copy Markdown
Owner

leofang commented Dec 15, 2025

@copilot I do not understand how you concluded about the memory layouts. Could you quote the exact lines of the libcudacxx source code, instead of hand-waiving with a murky explanation? In particular, __strides seems to be a member function of the layout_stride class, not mdspan.

Copy link
Copy Markdown
Author

Copilot AI commented Dec 15, 2025

@copilot I do not understand how you concluded about the memory layouts. Could you quote the exact lines of the libcudacxx source code, instead of hand-waiving with a murky explanation? In particular,...

You're absolutely right - I apologize for the confusion. Let me provide exact source quotes from the libcudacxx code:

mdspan class storage (from __mdspan/mdspan.h line 138-140):

using __base = __mdspan_ebco<typename accessor_type::data_handle_type,
                              typename _LayoutPolicy::template mapping<_Extents>,
                              _AccessorPolicy>;

The mdspan stores: {data_handle, mapping, accessor} via EBCO (Empty Base Class Optimization).

layout_right::mapping (from __mdspan/layout_right.h line 75):

class layout_right::mapping : private __mdspan_ebco<_Extents>

Stores only: {extents} - no strides needed.

layout_left::mapping (from __mdspan/layout_left.h line 75):

class layout_left::mapping : private __mdspan_ebco<_Extents>

Stores only: {extents} - no strides needed.

layout_stride::mapping (from __mdspan/layout_stride.h line 75-85):

class layout_stride::mapping
    : private __mdspan_ebco<_Extents, __mdspan_detail::__possibly_empty_array<index_type, extents_type::rank()>>
{
  // ...
  using __stride_array = __mdspan_detail::__possibly_empty_array<index_type, extents_type::rank()>;

Stores: {extents, stride_array} where stride_array contains explicit stride values.

The __strides() I mentioned (line 379-386) is indeed a private accessor returning the stride array stored in the base class, not a direct member. My apologies for the imprecise explanation.

@leofang
Copy link
Copy Markdown
Owner

leofang commented Dec 15, 2025

@copilot could you add another mdspan example, where we don't do any compute, just verify the kernel arguments passed from the host using mdspan accessors and printf?

  • pointer
  • extents
  • strides (for layout_stride)

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Dec 15, 2025

@copilot could you add another mdspan example, where we don't do any compute, just verify the kernel arguments passed from the host using mdspan accessors and printf?

  • pointer
  • extents
    ...

Added mdspan_verify_args.py in commit 587f9e6.

The new example includes three verification kernels that use printf to inspect:

  • Data pointer: arr.data_handle()
  • Extents: arr.extent(0) and arr.extent(1)
  • Strides: arr.stride(0) and arr.stride(1) (including explicit strides for layout_stride)

Each kernel also:

  • Verifies expected stride patterns (e.g., for layout_right: stride(0) == extent(1), stride(1) == 1)
  • Tests element access with arr(0,0) to confirm proper indexing
  • For layout_stride, also prints arr.mapping().required_span_size()

The three examples cover layout_right (C-order), layout_left (F-order), and layout_stride with a strided view created by arr[:, ::2].

@leofang leofang force-pushed the copilot/add-cuda-mdspan-example branch from af10d49 to fa8f699 Compare December 16, 2025 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants