[Contrib] Fix CUDA contrib build after FFI/header cleanups#19539
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds logging headers to several CUTLASS and VLLM source files and refactors VLLM kernel registrations to use the Tensor type. The reviewer identified that using the unqualified Tensor type in attention_kernels.cu is ambiguous and will likely cause compilation errors because tvm::runtime::Tensor lacks the GetDLTensorPtr() method. It is recommended to explicitly use ffi::Tensor to resolve this issue.
There was a problem hiding this comment.
Code Review
This pull request updates several CUDA contrib modules (CUTLASS, Thrust, and vLLM) by adding the tvm/runtime/logging.h header and migrating vLLM kernels to use the new FFI Tensor and ffi::Array types. The changes include updating function signatures and registration logic to handle the transition from raw DLTensor pointers. Feedback suggests investigating if the Tensor class provides a more direct way to access mutable DLTensor pointers to avoid the verbose const_cast currently used in the attention kernels.
Six CUDA sources in src/runtime/contrib used LOG(FATAL) via transitive includes that apache#19483 trimmed; add the explicit <tvm/runtime/logging.h> include to thrust.cu, attention_kernels.cu, and the four cutlass kernel headers (fp16/fp8 sm90/sm100, gemm_runner, fp8_groupwise_scaled_gemm). cache_kernels.cu used the bare Array{...} alias that apache#19483 removed; switch to ffi::Array<Tensor>{...}. attention_kernels.cu registered FFI functions whose parameters were raw DLTensor*; the new reflection registry requires TypeSchema, so wrap both TVM_FFI_STATIC_INIT_BLOCK registrations to take Tensor and forward to the unchanged launchers via GetDLTensorPtr() (with const_cast for the output tensors, matching the mt_random_engine / cudnn pattern).
c0122a6 to
90683af
Compare
Six CUDA sources in src/runtime/contrib used LOG(FATAL) via transitive includes that #19483 trimmed; add the explicit <tvm/runtime/logging.h> include to thrust.cu, attention_kernels.cu, and the four cutlass kernel headers (fp16/fp8 sm90/sm100, gemm_runner, fp8_groupwise_scaled_gemm).
cache_kernels.cu used the bare Array{...} alias that #19483 removed; switch to ffi::Array{...}.
attention_kernels.cu registered FFI functions whose parameters were raw DLTensor*; the new reflection registry requires TypeSchema, so wrap both TVM_FFI_STATIC_INIT_BLOCK registrations to take Tensor and forward to the unchanged launchers via GetDLTensorPtr() (with const_cast for the output tensors, matching the mt_random_engine / cudnn pattern).