You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D (#17094)
* Refactor CUDA 2D transpose implementation to support multiple kernel types and improve parameter handling
- Introduced a `conv2d_transpose_params` struct for better parameter management.
- Updated `conv2d_transpose_kernel` to be templated for different kernel types (float and half).
- Modified `ggml_cuda_conv_2d_transpose_p0` to handle both F16 and F32 kernel types.
- Enhanced test cases to validate functionality for both kernel types.
* Refactor test cases for 2D convolution transpose to support dynamic kernel types
- Updated `test_conv_transpose_2d` structure to improve parameter handling by reordering constructor arguments.
- Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations.
- Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability.
* Refactor ggml_compute_forward_conv_transpose_2d to support both F16 and F32 tensor types.
* Refactor conv2d transpose kernel to use a template for kernel type, enhancing flexibility for different data types.
Update test cases to include both F16 and F32 tensor types for comprehensive coverage.
* Update ggml/src/ggml-cuda/conv2d-transpose.cu
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu.c
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
* Refactor conv2d transpose implementation by removing the conv2d_transpose_params struct and dispatching with direct kernel launch.
* Enhance cpu conv2d transpose implementation by introducing a templated kernel type for improved flexibility with F16 and F32 data types.
---------
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
0 commit comments