Skip to content

[libcu++] Make argument namespace and wrappers construction public#9251

Open
pciolkosz wants to merge 5 commits into
NVIDIA:mainfrom
pciolkosz:make_arguments_framework_public
Open

[libcu++] Make argument namespace and wrappers construction public#9251
pciolkosz wants to merge 5 commits into
NVIDIA:mainfrom
pciolkosz:make_arguments_framework_public

Conversation

@pciolkosz
Copy link
Copy Markdown
Contributor

@pciolkosz pciolkosz commented Jun 4, 2026

Closes #9254

This PR removes underscores from the cuda::argument namespace and the user facing argument wrappers and bound objects construction.

Traits and other utility functions remain underscored, those are meant mostly to be used inside CCCL APIs to examine the argument wrappers passed by the user, but they will most likely be public later on, once we are more confident with the design.

@pciolkosz pciolkosz requested review from a team as code owners June 4, 2026 06:04
@pciolkosz pciolkosz requested a review from fbusato June 4, 2026 06:04
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 4, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 4, 2026
Copy link
Copy Markdown
Contributor

@davebayer davebayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I'll block this PR. We said there would be a discussion about the naming. I'm still strongly against having a separate namespace for these types.

@github-project-automation github-project-automation Bot moved this from In Review to In Progress in CCCL Jun 4, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine.

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Summary

This PR makes the user-facing CUDA argument wrapper API public: it adds a public header <cuda/argument> and moves the non‑underscored, public wrapper and bounds types into the cuda::args namespace. Internal inspection helpers and trait machinery remain underscored for now and are available under cuda::args::__* as internal APIs. The change resolves issue #9254.

Key Changes

  • New public header: libcudacxx/include/cuda/argument (forwards to implementation).
  • Public wrapper templates (moved/renamed into cuda::args):
    • __constant → constant
    • __constant_sequence → __constant_sequence (sequence forms kept underscored where still internal)
    • __immediate → immediate
    • __immediate_sequence → __immediate_sequence (sequence form still underscored in implementation)
    • __deferred → deferred
    • __deferred_sequence → deferred_sequence
  • Bounds types and factories (public, under cuda::args):
    • __no_bounds → no_bounds
    • __static_bounds → static_bounds
    • __runtime_bounds → runtime_bounds
    • __bounds(...) → bounds(...)
  • Traits / inspection helpers remain underscored and are accessed as cuda::args::__traits, cuda::args::__unwrap, cuda::args::_lowest, cuda::args::_highest, etc.
  • Namespace macros updated to open/close cuda::args (and tests/usage updated to match).
  • Tests and examples updated to include <cuda/argument> and to use cuda::args::{constant, immediate, deferred, static_bounds, runtime_bounds, bounds} with internal helpers via cuda::args::__*.

Codebase Impact

Widespread rename/usage updates across CUB, benchmarks, kernels, and libcudacxx tests; notable updated files:

  • cub/benchmarks/bench/segmented_topk/{fixed,variable}/keys.cu
  • cub/cub/agent/agent_batched_topk.cuh
  • cub/cub/detail/segmented_params.cuh
  • cub/cub/device/dispatch/dispatch_batched_topk.cuh
  • cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh
  • libcudacxx/include/cuda/__argument/argument.h, argument_bounds.h
  • libcudacxx/include/cuda/argument (new)
  • libcudacxx/test/libcudacxx/cuda/argument/* (many tests updated or added)

Most edits are renames and header switches; some constructor/validation logic for bounds and wrapper construction was tightened/rewired in the argument implementation—tests were updated accordingly.

API/ABI Impact

  • Public API: new public header <cuda/argument> and public wrapper/bounds types available under cuda::args. Callers should migrate to cuda::args::constant/immediate/deferred and cuda::args::bounds/static_bounds/runtime_bounds.
  • Internal APIs: trait/inspection helpers remain underscored (cuda::args::__*). No public trait helpers were promoted beyond the underscored set in this PR.
  • No functional algorithmic changes expected; reviewers should validate bounds-related constructor behavior where tests exercise it.

Review Notes

  • There is an outstanding blocking comment from @pciolkosz requesting discussion about naming and opposing a separate namespace for these types; maintainers should resolve naming/namespace concerns with the team (they tagged @gevtushenko).
  • Reviewers should pay attention to:
    • The public wrapper/bounds API shapes and deduction guides (immediate constructors, deduction guides, bounds factories).
    • Places where sequence wrapper forms remain underscored (intentional for internal trait use).
    • Tests added/updated for compile-fail and runtime-bounds mutation coverage to ensure intended behavior.

suggestion:

Walkthrough

This PR replaces internal cuda::__argument wrapper/bounds/traits names with public cuda::argument / cuda::args equivalents, adds a public forwarding header <cuda/argument>, and updates libcudacxx tests, CUB dispatch/kernels/agent, benchmarks, and test call sites to use the new API. No algorithmic behavior was changed.

Changes

CUDA argument API public-API migration

Layer / File(s) Summary
Core wrapper types and public header
libcudacxx/include/cuda/__argument/argument.h, libcudacxx/include/cuda/__argument/argument_bounds.h, libcudacxx/include/cuda/argument, libcudacxx/include/cuda/std/__internal/namespaces.h
Rename wrapper and bounds types/factories from underscored __* to public * names; add <cuda/argument> forwarding header; update deduction guides, wrapper detection, __is_wrapper_v, __unwrap, __traits_impl, and __lowest_/__highest_.
libcudacxx argument API tests
libcudacxx/test/libcudacxx/cuda/argument/*
Update tests to include <cuda/argument> and reference cuda::args wrapper/traits/bounds helpers; add compile-fail cases exercising misuse.
CUB dispatch, kernel, agent, and param extraction
cub/cub/device/dispatch/dispatch_batched_topk.cuh, cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh, cub/cub/agent/agent_batched_topk.cuh, cub/cub/detail/segmented_params.cuh
Switch CUB batched-topk dispatch/kernel/agent and segmented-params extraction code to use ::cuda::args wrapper types, traits, and __unwrap accessors; update kernel parameter element types derived from those traits.
CUB segmented top-k benchmarks using public API
cub/benchmarks/bench/segmented_topk/fixed/keys.cu, cub/benchmarks/bench/segmented_topk/variable/keys.cu
Update includes and argument construction for benchmark dispatches to use ::cuda::args::immediate, ::cuda::args::__immediate_sequence, ::cuda::args::constant, and ::cuda::args::bounds.
CUB device segmented top-k tests
cub/test/catch2_test_device_segmented_topk_keys.cu, cub/test/catch2_test_device_segmented_topk_pairs.cu
Replace internal ::cuda::__argument::__* wrappers in test launch callsites with ::cuda::args immediate/sequence/bounds wrappers without changing argument values.

Assessment against linked issues

Objective Addressed Explanation
Expose argument annotation framework for public usage in cub::DeviceBatchedTopK (#9254)

Possibly related PRs

  • NVIDIA/cccl#8875: Introduced earlier migrations of argument/traits to the public API used here.
  • NVIDIA/cccl#9074: Prior batched-topk refactor touching segmented-params and argument wrappers.

Suggested labels

libcu++

Suggested reviewers

  • gevtushenko
  • davebayer

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)
libcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpp

libcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpp:11:10: fatal error: 'cuda/argument' file not found
11 | #include <cuda/argument>
| ^~~~~~~~~~~~~~~
1 error generated.
libcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpp:74:3-81:3: ERROR translating statement 'CompoundStmt'
Aborting translation of method 'test' in file 'libcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpp': "Assert_failure src/clang/cAst_utils.ml:249:53"
Uncaught Internal Error: "Assert_failure src/clang/cAst_utils.ml:249:53"
Error backtrace:
Raised at ClangFrontend__CAst_utils.get_decl_from_typ_ptr in file "src/clang/cAst_utils.ml", line 249, characters 53-65
Called from ClangFrontend__CTrans.CTrans_funct.get_destructor_decl_ref in file "src/clang/cTrans.ml", line 658, characters 12-59
Called from ClangFrontend__CTrans.CTrans_funct.destructor_calls.(fun) in file "src/clang/cTrans.ml", line 2048, characters 12-69
Called from Base__List.rev_filter_map.loop in f

... [truncated 2200 characters] ...

ile "src/clang/cTrans.ml", line 4782, characters 12-47
Re-raised at IStdlib__IExn.reraise_after in file "src/istd/IExn.ml" (inlined), line 13, characters 2-50
Called from ClangFrontend__CTrans.CTrans_funct.instruction_log.(fun) in file "src/clang/cTrans.ml", line 4784, characters 10-1023
Called from ClangFrontend__CTrans.CTrans_funct.instruction in file "src/clang/cTrans.ml" (inlined), line 4765, characters 38-71
Called from ClangFrontend__CTrans.CTrans_funct.exec_with_node_creation in file "src/clang/cTrans.ml" (inlined), line 104, characters 20-38
Called from ClangFrontend__CTrans.CTrans_funct.get_clang_stmt_trans in file "src/clang/cTrans.ml" (inlined), line 5395, characters 4-69
Called from ClangFrontend__CTrans.CTrans_funct.get_custom_stmt_trans in file "src/clang/cTrans.ml", line 540

libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp

libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp:11:10: fatal error: 'cuda/argument' file not found
11 | #include <cuda/argument>
| ^~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/9088870b74443d908b05c80e381d6e129b269b7d-07842b3345a172fc/tmp/clang_command_.tmp.50494e.txt
++Contents of '/tmp/coderabbit-infer/9088870b74443d908b05c80e381d6e129b269b7d-07842b3345a172fc/tmp/clang_command_.tmp.50494e.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRING_SIZE=65535" "-cc1" "-triple"
"x86_64-unknown-linux-gnu"

... [truncated 1160 characters] ...

al-isystem" "/usr/local/include" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-fdeprecated-macro" "-ferror-limit" "19" "-fgnuc-version=4.2.1"
"-fskip-odr-check-in-gmf" "-fcxx-exceptions" "-fexceptions"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/07842b3345a172fc/file.o" "-x" "c++"
"libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp" "-O0"
"-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"

libcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpp

libcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpp:11:10: fatal error: 'cuda/argument' file not found
11 | #include <cuda/argument>
| ^~~~~~~~~~~~~~~
1 error generated.
libcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpp:158:3-163:3: ERROR translating statement 'CompoundStmt'
Aborting translation of method 'test' in file 'libcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpp': "Assert_failure src/clang/cAst_utils.ml:249:53"
Uncaught Internal Error: "Assert_failure src/clang/cAst_utils.ml:249:53"
Error backtrace:
Raised at ClangFrontend__CAst_utils.get_decl_from_typ_ptr in file "src/clang/cAst_utils.ml", line 249, characters 53-65
Called from ClangFrontend__CTrans.CTrans_funct.get_destructor_decl_ref in file "src/clang/cTrans.ml", line 658, characters 12-59
Called from ClangFrontend__CTrans.CTrans_funct.destructor_calls.(fun) in file "src/clang/cTrans.ml", line 2048, characters 12-69
Called from Base__List.rev_filter_map.l

... [truncated 2200 characters] ...

langFrontend__CTrans.CTrans_funct.exec_with_node_creation in file "src/clang/cTrans.ml" (inlined), line 104, characters 20-38
Called from ClangFrontend__CTrans.CTrans_funct.get_clang_stmt_trans in file "src/clang/cTrans.ml" (inlined), line 5395, characters 4-69
Called from ClangFrontend__CTrans.CTrans_funct.get_custom_stmt_trans in file "src/clang/cTrans.ml", line 5401, characters 8-55
Called from ClangFrontend__CTrans.CTrans_funct.exec_trans_instrs.exec_trans_instrs_rev in file "src/clang/cTrans.ml" (inlined), line 5365, characters 28-54
Called from ClangFrontend__CTrans.CTrans_funct.exec_trans_instrs in file "src/clang/cTrans.ml" (inlined), line 5389, characters 6-70
Called from ClangFrontend__CTrans.CTrans_funct.instructions_trans in file "src/clang/cTrans.ml", line 5451, characters 25-

  • 8 others

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4424dd10-8027-4d66-97f4-e3d43a7b6cf4

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7ae38 and 06c277f.

📒 Files selected for processing (20)
  • cub/benchmarks/bench/segmented_topk/fixed/keys.cu
  • cub/benchmarks/bench/segmented_topk/variable/keys.cu
  • cub/cub/agent/agent_batched_topk.cuh
  • cub/cub/detail/segmented_params.cuh
  • cub/cub/device/dispatch/dispatch_batched_topk.cuh
  • cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh
  • cub/test/catch2_test_device_segmented_topk_keys.cu
  • cub/test/catch2_test_device_segmented_topk_pairs.cu
  • libcudacxx/include/cuda/__argument/argument.h
  • libcudacxx/include/cuda/__argument/argument_bounds.h
  • libcudacxx/include/cuda/argument
  • libcudacxx/include/cuda/std/__internal/namespaces.h
  • libcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/dynamic_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_conversion.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_type_mismatch.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/usage_example.pass.cpp

Comment thread libcudacxx/include/cuda/__argument/argument.h Outdated
@pciolkosz
Copy link
Copy Markdown
Contributor Author

I'm sorry, but I'll block this PR. We said there would be a discussion about the naming. I'm still strongly against having a separate namespace for these types.

Tagging @gevtushenko

@pciolkosz pciolkosz force-pushed the make_arguments_framework_public branch from 6002bc6 to 3210e3d Compare June 4, 2026 06:36
@github-actions

This comment has been minimized.

auto segment_sizes = ::cuda::__argument::__constant<MaxSegmentSize>{};
auto k = ::cuda::__argument::__constant<MaxNumSelected>{};
auto select_direction = ::cuda::__argument::__constant<cub::detail::topk::select::max>{};
auto segment_sizes = ::cuda::argument::constant<MaxSegmentSize>{};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++26 has constant_wrapper. It's hard to back-port fully to earlier C++ versions (though there are proposals pending review that might fix this), but for now, it's how C++ spells "compile-time constant." Does this cuda::argument::constant do the same thing? If so, then should we consider just making this an alias to constant_wrapper (or a hypothetical cuda::std::constant_wrapper)?

Comment thread cub/cub/agent/agent_batched_topk.cuh Outdated

using segment_size_val_t = typename ::cuda::__argument::__traits<SegmentSizeParameterT>::element_type;
using num_segments_val_t = typename ::cuda::__argument::__traits<NumSegmentsParameterT>::element_type;
using segment_size_val_t = typename ::cuda::argument::__traits<SegmentSizeParameterT>::element_type;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this library usable without exposing __traits to the public API? If not, then should we consider waiting until __traits is ready?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main user facing part is the construction of the wrappers, while our APIs will use the traits to examine them. Long term we want to expose the __traits I think, but we don't have to right now so keeping them private leaves us room to change them easily and make public later

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining! That makes sense.

//! @brief Wraps a compile-time constant argument value.
template <auto _Value>
struct __constant
struct constant
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks more or less like constant_wrapper. Should we consider spelling it cuda::std::constant_wrapper?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We considered that design, cuda::std::constant_wrapper is difficult to back port to older c++ standards and the timeline probably doesn't leave enough room to implement it. I think its fine to just interoperate with constant wrapper later on

static_assert(__valid_static_bounds_v<__element_type, _StaticBounds>,
"static argument bounds cannot be represented by the element type");

_Arg __arg_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a public member. That means it's possible to set the value without validating the bounds. Should we instead consider making the value private, so that every constructed immediate instance has been validated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make it a private member and add an underscored __access structure that is a friend or I can just underscore the variable and have less code this way. I don't mind either way, but we need a way to access it at the end of the day.
But I think this question does apply to __unwrap, which currently has two version, const and not-const reference. I could see the non-const reference version being not necessary, but it's also not public yet, so we don't have to resolve it right now. I might as well remove it and we can add it when needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pciolkosz Thanks for answering! I don't quite understand the argument why validation is separate from construction.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: Why can't users access the value (by value)? Isn't this just a validated value? Why wouldn't this class just have a value() member?

Copy link
Copy Markdown
Contributor Author

@pciolkosz pciolkosz Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to not be stubborn and changed all members of the wrappers to private and added a friend struct to access them. We might end up exposing member functions to access them, but for now there is only __unwrap free function

Comment thread libcudacxx/include/cuda/__argument/argument.h

template <class _ElementType, auto _Lowest, auto _Highest>
inline constexpr bool __valid_static_bounds_v<_ElementType, __static_bounds<_Lowest, _Highest>> =
inline constexpr bool __valid_static_bounds_v<_ElementType, static_bounds<_Lowest, _Highest>> =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are runtime_bounds and static_bounds. Do we also need a representation of half-run-time, half-static bounds?

submdspan represents bounds generally as any type for which structured binding into two elements, both convertible to index_type, is well-formed. This would let users pass in bounds however they like. It also would permit a unified representation of any combination of run-time and static bounds.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the template parameters' types are unconstrained here. What are the actual requirements on these types?

Copy link
Copy Markdown
Contributor Author

@pciolkosz pciolkosz Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can tighten the static bounds with runtime bounds, so in this case you can do static_bounds<0, 1000> and runtime_bounds(100, 1000) which effectively would be a half-run-time, half-static bounds. It avoids overcomplicating the input types at the expense of this case being a bit more verbose, but I think it should be rare anyway.

Also, the template parameters' types are unconstrained here. What are the actual requirements on these types?

Good point, I added type validation

Comment thread libcudacxx/include/cuda/__argument/argument.h
@github-actions

This comment has been minimized.

Comment thread libcudacxx/include/cuda/__argument/argument_bounds.h
Comment thread libcudacxx/include/cuda/__argument/argument.h Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7cc00549-bb49-459c-96d8-ed4906627049

📥 Commits

Reviewing files that changed from the base of the PR and between 2c740b7 and 9088870.

📒 Files selected for processing (22)
  • cub/benchmarks/bench/segmented_topk/fixed/keys.cu
  • cub/benchmarks/bench/segmented_topk/variable/keys.cu
  • cub/cub/agent/agent_batched_topk.cuh
  • cub/cub/detail/segmented_params.cuh
  • cub/cub/device/dispatch/dispatch_batched_topk.cuh
  • cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh
  • cub/test/catch2_test_device_segmented_topk_keys.cu
  • cub/test/catch2_test_device_segmented_topk_pairs.cu
  • libcudacxx/include/cuda/__argument/argument.h
  • libcudacxx/include/cuda/__argument/argument_bounds.h
  • libcudacxx/include/cuda/std/__internal/namespaces.h
  • libcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar_traits.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/dynamic_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_argument_bounds_type.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_conversion.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_type_mismatch.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/usage_example.pass.cpp
✅ Files skipped from review due to trivial changes (1)
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_type_mismatch.fail.cpp
🚧 Files skipped from review as they are similar to previous changes (17)
  • libcudacxx/include/cuda/std/__internal/namespaces.h
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar_traits.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_conversion.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_argument_bounds_type.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp
  • cub/cub/agent/agent_batched_topk.cuh
  • cub/cub/detail/segmented_params.cuh
  • cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh
  • cub/benchmarks/bench/segmented_topk/fixed/keys.cu
  • cub/benchmarks/bench/segmented_topk/variable/keys.cu
  • cub/test/catch2_test_device_segmented_topk_pairs.cu
  • libcudacxx/test/libcudacxx/cuda/argument/usage_example.pass.cpp
  • cub/test/catch2_test_device_segmented_topk_keys.cu
  • libcudacxx/include/cuda/__argument/argument_bounds.h
  • libcudacxx/test/libcudacxx/cuda/argument/dynamic_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_argument.pass.cpp

Comment on lines +242 to 243
static_assert(::cuda::args::__traits<NumSegmentsParameterT>::is_single_value,
"Only uniform segment sizes are currently supported.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

suggestion: Fix the static_assert message to match the checked condition.
Line 243 says “Only uniform segment sizes are currently supported.” but Line 242 checks NumSegmentsParameterT (uniform number of segments). Update the message to avoid misleading compile-time diagnostics.

@pciolkosz
Copy link
Copy Markdown
Contributor Author

Changes from the code review session:

  • cuda::argument -> cuda::args
  • immediate_sequence and constant_sequence will remain underscored for now
  • added a template argument to constant as a work around for c++17 limitations, so you can do constant<10, float>
  • Changed all the member variables of wrappers and bounds to be private

One thing to consider is if we want to expose unwrap

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

🥳 CI Workflow Results

🟩 Finished in 2h 27m: Pass: 100%/340 | Total: 6d 16h | Max: 2h 27m | Hits: 94%/500627

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Expose argument annotation framework for public usage in cub::DeviceBatchedTopK

3 participants