Skip to content

[cccl.c] Introduce minimal compilation support for host freestanding mode#8437

Merged
shwina merged 3 commits intoNVIDIA:mainfrom
shwina:clangjit-minimal-infra
Apr 24, 2026
Merged

[cccl.c] Introduce minimal compilation support for host freestanding mode#8437
shwina merged 3 commits intoNVIDIA:mainfrom
shwina:clangjit-minimal-infra

Conversation

@shwina
Copy link
Copy Markdown
Contributor

@shwina shwina commented Apr 15, 2026

Description

Closes #8401

This PR introduces the minimal infrastructure for compiling and testing CCCL in "host freestanding" mode. The infrastructure being introduced suffices to compile the following code using the included JIT compiler:

std::string source = R"(
__global__ void device(int* ptr) {
  *ptr = 42;
}
void host(int* ptr) {
  device<<<1, 1>>>(ptr);
}
)";

A CI job has been added which builds the host JIT infrastructure for a specific combination of arch/compiler/CUDA. Eventually, we will want the entire matrix to build with host JIT, but during initial iteration this is sufficient:

Screenshot 2026-04-24 at 7 54 09 AM

Note that an important consideration/blocker is that nvfatbin (a dependency for host JIT) is available only on CUDA 12.4+.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Apr 15, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL Apr 15, 2026
@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 15, 2026

/ok to test 238c422

@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 15, 2026

/ok to test 238c422

@shwina shwina force-pushed the clangjit-minimal-infra branch from 238c422 to 3979052 Compare April 15, 2026 13:36
@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 15, 2026

/ok to test 3979052

@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 15, 2026

/ok to test 3979052

@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 15, 2026

/ok to test github.com//pull/8437/commits/dd0aed4fefc65b68c378a6c0b38762866578090d

@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 15, 2026

/ok to test dd0aed4

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why do we need this header? Why is <cuda/std/execution> not enough?

@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 15, 2026

/ok to test d9e65a1

@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 15, 2026

/ok to test 3a9db84

@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 15, 2026

/ok to test efd5bdf

@github-actions

This comment has been minimized.

@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 15, 2026

/ok to test 8969631

@github-actions

This comment has been minimized.

@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 16, 2026

Things left to be done here:

@shwina shwina force-pushed the clangjit-minimal-infra branch from 8969631 to 7136dc0 Compare April 17, 2026 21:11
@shwina shwina changed the title [cccl.c] Introduce host freestanding mode and minimal compilation support for it [cccl.c] Introduce minimal compilation support for host freestanding mode Apr 17, 2026
@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 17, 2026

/ok to test 7136dc0

@github-actions

This comment has been minimized.

@shwina
Copy link
Copy Markdown
Contributor Author

shwina commented Apr 20, 2026

/ok to test 079740b

@github-actions

This comment has been minimized.

@shwina shwina marked this pull request as ready for review April 20, 2026 14:57
@shwina shwina requested review from a team as code owners April 20, 2026 14:57
@shwina shwina requested a review from a team as a code owner April 20, 2026 14:57
@cccl-authenticator-app cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL Apr 20, 2026
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop codegen altogether

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename "clangjit" to something else like "hostjit".

Comment thread ci/matrix.yaml Outdated
# Eventually we will want building with ClangJIT to be the
# default, and will do it across the entire matrix. Currently
# blocked on libnvfatbin availability on Windows containers, and for CUDA <12.4.
- {jobs: ['test'], project: 'cccl_c_parallel_clangjit', ctk: '13.X', cxx: ['gcc13'], gpu: 'rtx2080'}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add corresponding entry in project_files_and_dependencies.yml

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that libcudacxx and c_parallel_internal are dependencies of it.

@shwina shwina force-pushed the clangjit-minimal-infra branch from 079740b to b3e8366 Compare April 20, 2026 17:25
endif()

# Link against LLVM/Clang/LLD
target_link_libraries(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth trying installing libclang/llvm in the devcontainer if build times are too large.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need any libcudacxx changes.

@github-actions

This comment has been minimized.

@shwina shwina force-pushed the clangjit-minimal-infra branch from b3e8366 to 0aca07f Compare April 21, 2026 10:12
#ifndef _HOSTJIT_CLIMITS
#define _HOSTJIT_CLIMITS

#include "limits.h"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always avoid relative includes. Please add the proper include path and use <limits.h> Applies throughout

Comment on lines +7 to +8


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines +3 to +11
#include "cstddef"
#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1
#define RAND_MAX 2147483647
extern "C" {
void* malloc(size_t); void* calloc(size_t, size_t);
void* realloc(void*, size_t); void free(void*);
void abort(void); void exit(int); void _Exit(int);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "cstddef"
#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1
#define RAND_MAX 2147483647
extern "C" {
void* malloc(size_t); void* calloc(size_t, size_t);
void* realloc(void*, size_t); void free(void*);
void abort(void); void exit(int); void _Exit(int);
}
#include "cstddef"
#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1
#define RAND_MAX 2147483647
extern "C" {
void* malloc(size_t); void* calloc(size_t, size_t);
void* realloc(void*, size_t); void free(void*);
void abort(void); void exit(int); void _Exit(int);
}

// resolve to libcudacxx/include/cuda/std/limits, which cascades through
// numeric_limits, bit_cast, popcount, etc. — incompatible with freestanding.
//
// This stub (found first on -internal-isystem) stops that cascade, providing
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gevtushenko why do we need those? the cuda::std ones should work perfectly fine

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco you can try dropping it to check, I don't remember

Copy link
Copy Markdown
Contributor Author

@shwina shwina Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't drop the limits or utility stubs. That makes clang include the libcudacxx versions of those headers, which are not host freestanding. (I tried empirically and got runtime errors).

#define HUGE_VALL __builtin_huge_vall()
#define INFINITY __builtin_inff()
#define NAN __builtin_nanf("")
#define MATH_ERRNO 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already define a bunch of those internally in fp_classify.h and cmath

We should see whether we can just reuse those or move them to a different header


namespace std {

template <typename _Tp> struct remove_reference { using type = _Tp; };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this when we have a fully working remove_reference at home?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco please try using it. The more things you can drop from this PR the better. The only criteria is that test keeps passing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets keep it as is an file an issue when merged

Comment on lines +22 to +32
template <typename _Tp>
__host__ __device__ constexpr _Tp&&
forward(remove_reference_t<_Tp>& __t) noexcept { return static_cast<_Tp&&>(__t); }

template <typename _Tp>
__host__ __device__ constexpr _Tp&&
forward(remove_reference_t<_Tp>&& __t) noexcept { return static_cast<_Tp&&>(__t); }

template <typename _Tp>
__host__ __device__ constexpr remove_reference_t<_Tp>&&
move(_Tp&& __t) noexcept { return static_cast<remove_reference_t<_Tp>&&>(__t); }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just include the necessary headers, or are those required to be in namespace std?

@shwina shwina force-pushed the clangjit-minimal-infra branch from ba50f21 to 55f9365 Compare April 21, 2026 15:38
@github-actions

This comment has been minimized.

@shwina shwina force-pushed the clangjit-minimal-infra branch from 55f9365 to 597e259 Compare April 23, 2026 10:46
@shwina shwina enabled auto-merge (squash) April 23, 2026 10:47
@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 1h 23m: Pass: 100%/480 | Total: 4d 14h | Max: 1h 23m | Hits: 94%/569464

See results here.

std::unordered_map<std::string, std::string> macro_definitions; // key=macro name, value=macro value (empty for flag
// macros)
int sm_version = 70;
int optimization_level = 2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: use 3 instead

std::vector<std::string> device_bitcode_files; // Paths to .bc files to link into device code
std::unordered_map<std::string, std::string> macro_definitions; // key=macro name, value=macro value (empty for flag
// macros)
int sm_version = 70;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: should probably be 75

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! We do not support anything < sm75 officially.

return 1;
}

int* d_ptr = nullptr;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: use pointer_t

@shwina shwina merged commit cb93dee into NVIDIA:main Apr 24, 2026
552 checks passed
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL Apr 24, 2026
shwina added a commit to shwina/cccl that referenced this pull request Apr 30, 2026
…mode (NVIDIA#8437)

* HostJIT minimal infra

* Fix?

* Apply stylistic changes

---------

Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[FEA]: Minimal compiler infrastructure for running freestanding tests

5 participants