[cccl.c] Introduce minimal compilation support for host freestanding mode#8437
[cccl.c] Introduce minimal compilation support for host freestanding mode#8437shwina merged 3 commits intoNVIDIA:mainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 238c422 |
|
/ok to test 238c422 |
238c422 to
3979052
Compare
|
/ok to test 3979052 |
|
/ok to test 3979052 |
|
/ok to test github.com//pull/8437/commits/dd0aed4fefc65b68c378a6c0b38762866578090d |
|
/ok to test dd0aed4 |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Q: Why do we need this header? Why is <cuda/std/execution> not enough?
|
/ok to test d9e65a1 |
|
/ok to test 3a9db84 |
|
/ok to test efd5bdf |
This comment has been minimized.
This comment has been minimized.
|
/ok to test 8969631 |
This comment has been minimized.
This comment has been minimized.
|
Things left to be done here:
|
8969631 to
7136dc0
Compare
|
/ok to test 7136dc0 |
This comment has been minimized.
This comment has been minimized.
|
/ok to test 079740b |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Drop codegen altogether
There was a problem hiding this comment.
Rename "clangjit" to something else like "hostjit".
| # Eventually we will want building with ClangJIT to be the | ||
| # default, and will do it across the entire matrix. Currently | ||
| # blocked on libnvfatbin availability on Windows containers, and for CUDA <12.4. | ||
| - {jobs: ['test'], project: 'cccl_c_parallel_clangjit', ctk: '13.X', cxx: ['gcc13'], gpu: 'rtx2080'} |
There was a problem hiding this comment.
Add corresponding entry in project_files_and_dependencies.yml
There was a problem hiding this comment.
Ensure that libcudacxx and c_parallel_internal are dependencies of it.
079740b to
b3e8366
Compare
| endif() | ||
|
|
||
| # Link against LLVM/Clang/LLD | ||
| target_link_libraries( |
There was a problem hiding this comment.
Might be worth trying installing libclang/llvm in the devcontainer if build times are too large.
There was a problem hiding this comment.
We shouldn't need any libcudacxx changes.
This comment has been minimized.
This comment has been minimized.
b3e8366 to
0aca07f
Compare
| #ifndef _HOSTJIT_CLIMITS | ||
| #define _HOSTJIT_CLIMITS | ||
|
|
||
| #include "limits.h" |
There was a problem hiding this comment.
We should always avoid relative includes. Please add the proper include path and use <limits.h> Applies throughout
|
|
||
|
|
| #include "cstddef" | ||
| #define EXIT_SUCCESS 0 | ||
| #define EXIT_FAILURE 1 | ||
| #define RAND_MAX 2147483647 | ||
| extern "C" { | ||
| void* malloc(size_t); void* calloc(size_t, size_t); | ||
| void* realloc(void*, size_t); void free(void*); | ||
| void abort(void); void exit(int); void _Exit(int); | ||
| } |
There was a problem hiding this comment.
| #include "cstddef" | |
| #define EXIT_SUCCESS 0 | |
| #define EXIT_FAILURE 1 | |
| #define RAND_MAX 2147483647 | |
| extern "C" { | |
| void* malloc(size_t); void* calloc(size_t, size_t); | |
| void* realloc(void*, size_t); void free(void*); | |
| void abort(void); void exit(int); void _Exit(int); | |
| } | |
| #include "cstddef" | |
| #define EXIT_SUCCESS 0 | |
| #define EXIT_FAILURE 1 | |
| #define RAND_MAX 2147483647 | |
| extern "C" { | |
| void* malloc(size_t); void* calloc(size_t, size_t); | |
| void* realloc(void*, size_t); void free(void*); | |
| void abort(void); void exit(int); void _Exit(int); | |
| } | |
| // resolve to libcudacxx/include/cuda/std/limits, which cascades through | ||
| // numeric_limits, bit_cast, popcount, etc. — incompatible with freestanding. | ||
| // | ||
| // This stub (found first on -internal-isystem) stops that cascade, providing |
There was a problem hiding this comment.
@gevtushenko why do we need those? the cuda::std ones should work perfectly fine
There was a problem hiding this comment.
@miscco you can try dropping it to check, I don't remember
There was a problem hiding this comment.
We can't drop the limits or utility stubs. That makes clang include the libcudacxx versions of those headers, which are not host freestanding. (I tried empirically and got runtime errors).
| #define HUGE_VALL __builtin_huge_vall() | ||
| #define INFINITY __builtin_inff() | ||
| #define NAN __builtin_nanf("") | ||
| #define MATH_ERRNO 1 |
There was a problem hiding this comment.
We already define a bunch of those internally in fp_classify.h and cmath
We should see whether we can just reuse those or move them to a different header
|
|
||
| namespace std { | ||
|
|
||
| template <typename _Tp> struct remove_reference { using type = _Tp; }; |
There was a problem hiding this comment.
Why do we need this when we have a fully working remove_reference at home?
There was a problem hiding this comment.
@miscco please try using it. The more things you can drop from this PR the better. The only criteria is that test keeps passing.
There was a problem hiding this comment.
lets keep it as is an file an issue when merged
| template <typename _Tp> | ||
| __host__ __device__ constexpr _Tp&& | ||
| forward(remove_reference_t<_Tp>& __t) noexcept { return static_cast<_Tp&&>(__t); } | ||
|
|
||
| template <typename _Tp> | ||
| __host__ __device__ constexpr _Tp&& | ||
| forward(remove_reference_t<_Tp>&& __t) noexcept { return static_cast<_Tp&&>(__t); } | ||
|
|
||
| template <typename _Tp> | ||
| __host__ __device__ constexpr remove_reference_t<_Tp>&& | ||
| move(_Tp&& __t) noexcept { return static_cast<remove_reference_t<_Tp>&&>(__t); } |
There was a problem hiding this comment.
We should just include the necessary headers, or are those required to be in namespace std?
ba50f21 to
55f9365
Compare
This comment has been minimized.
This comment has been minimized.
55f9365 to
597e259
Compare
🥳 CI Workflow Results🟩 Finished in 1h 23m: Pass: 100%/480 | Total: 4d 14h | Max: 1h 23m | Hits: 94%/569464See results here. |
| std::unordered_map<std::string, std::string> macro_definitions; // key=macro name, value=macro value (empty for flag | ||
| // macros) | ||
| int sm_version = 70; | ||
| int optimization_level = 2; |
There was a problem hiding this comment.
Suggestion: use 3 instead
| std::vector<std::string> device_bitcode_files; // Paths to .bc files to link into device code | ||
| std::unordered_map<std::string, std::string> macro_definitions; // key=macro name, value=macro value (empty for flag | ||
| // macros) | ||
| int sm_version = 70; |
There was a problem hiding this comment.
Suggestion: should probably be 75
There was a problem hiding this comment.
Absolutely! We do not support anything < sm75 officially.
| return 1; | ||
| } | ||
|
|
||
| int* d_ptr = nullptr; |
There was a problem hiding this comment.
Suggestion: use pointer_t
…mode (NVIDIA#8437) * HostJIT minimal infra * Fix? * Apply stylistic changes --------- Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>
Description
Closes #8401
This PR introduces the minimal infrastructure for compiling and testing CCCL in "host freestanding" mode. The infrastructure being introduced suffices to compile the following code using the included JIT compiler:
A CI job has been added which builds the host JIT infrastructure for a specific combination of arch/compiler/CUDA. Eventually, we will want the entire matrix to build with host JIT, but during initial iteration this is sufficient:
Note that an important consideration/blocker is that nvfatbin (a dependency for host JIT) is available only on CUDA 12.4+.
Checklist