Add mdspan example skeleton to guide cuda.core design for structured array arguments#12
Add mdspan example skeleton to guide cuda.core design for structured array arguments#12
Conversation
…ents Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
|
@copilot Question: Can you determin in these cases what the mdspan memory layouts would look like on device? Give me C struct equivalents. No need to change any code, just answer my question. |
Yes, I can determine the mdspan memory layouts for these three cases. Here are the C struct equivalents: Case 1: C-order (row-major) with struct mdspan_2d_layout_right {
const float* ptr; // pointer to data
size_t extent0; // number of rows
size_t extent1; // number of columns
// No stride needed - implicit: stride0 = extent1, stride1 = 1
};For a 128x256 array, this would be: Case 2: F-order (column-major) with struct mdspan_2d_layout_left {
const float* ptr; // pointer to data
size_t extent0; // number of rows
size_t extent1; // number of columns
// No stride needed - implicit: stride0 = 1, stride1 = extent0
};For a 128x256 array, this would be: Case 3: Strided with struct mdspan_2d_layout_stride {
const float* ptr; // pointer to data
size_t extent0; // number of rows
size_t extent1; // number of columns
size_t stride0; // stride for dimension 0 (in elements, not bytes)
size_t stride1; // stride for dimension 1 (in elements, not bytes)
};For arr[:, ::2] with base shape 128x512, this would be: The key insight is that |
|
@copilot For mdspan_2d_layout_stride how did you learn this? Could you link to where the mdspan source code is? |
I based this on the C++23 mdspan standard (P0009) and the reference implementation. Here are the key sources: Official mdspan references:
CUDA libcudacxx implementation:
The key implementation detail: Correction to my earlier answer: The actual structure is slightly more complex - the mdspan object contains both a pointer and a "mapping" object that contains the layout policy. But for kernel parameter passing purposes, the flattened representation I described is what gets passed. |
|
@copilot I do not understand how you concluded about the memory layouts. Could you quote the exact lines of the libcudacxx source code, instead of hand-waiving with a murky explanation? In particular, |
You're absolutely right - I apologize for the confusion. Let me provide exact source quotes from the libcudacxx code: mdspan class storage (from using __base = __mdspan_ebco<typename accessor_type::data_handle_type,
typename _LayoutPolicy::template mapping<_Extents>,
_AccessorPolicy>;The mdspan stores: layout_right::mapping (from class layout_right::mapping : private __mdspan_ebco<_Extents>Stores only: layout_left::mapping (from class layout_left::mapping : private __mdspan_ebco<_Extents>Stores only: layout_stride::mapping (from class layout_stride::mapping
: private __mdspan_ebco<_Extents, __mdspan_detail::__possibly_empty_array<index_type, extents_type::rank()>>
{
// ...
using __stride_array = __mdspan_detail::__possibly_empty_array<index_type, extents_type::rank()>;Stores: The |
|
@copilot could you add another mdspan example, where we don't do any compute, just verify the kernel arguments passed from the host using mdspan accessors and
|
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Added The new example includes three verification kernels that use printf to inspect:
Each kernel also:
The three examples cover layout_right (C-order), layout_left (F-order), and layout_stride with a strided view created by |
af10d49 to
fa8f699
Compare
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.