-
Notifications
You must be signed in to change notification settings - Fork 828
[NFC][DO NOT MERGE] Add scratch design for sycl-non-rdc- mode in NewOffloadModel #21858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
maksimsab
wants to merge
1
commit into
intel:sycl
Choose a base branch
from
maksimsab:sycl-non-rdc-design
base: sycl
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| # Problem: `-fno-sycl-rdc` Support for New Offload Model | ||
|
|
||
| ## Context | ||
|
|
||
| The new SYCL offload model (`--offload-new-driver`) currently treats all device code as RDC (Relocatable Device Code): all TUs are merged in `clang-linker-wrapper` before post-link. `-fno-sycl-rdc` is a high-priority gap (P0) that enables per-TU self-contained device images, important for reducing link time, shipping precompiled device libraries, and scenarios where `SYCL_EXTERNAL` is not used. | ||
|
|
||
| The old model implements this via `shouldDoPerObjectFileLinking()` + per-TU device link steps during the compile phase in the driver. This plan implements the **linker-wrapper-side approach** as it is simpler, consistent with the new model's architecture, and can be done incrementally. | ||
|
|
||
| ## Approach: Linker-Wrapper Non-RDC Mode | ||
|
|
||
| In non-RDC mode, instead of merging all device inputs into one module before post-link, each input file is processed independently through the full pipeline (device lib linking → post-link → SPIR-V/AOT), and all resulting images are wrapped together at the end. | ||
|
|
||
| ### Current RDC flow in `linkAndWrapDeviceFiles()`: | ||
| ``` | ||
| [TU1.bc, TU2.bc, ...] → link all input device code → ONE_MERGED.bc | ||
| → link device libs → ONE_MERGED_WITH_LIBS.bc | ||
| → sycl-post-link → [split0, split1, ...] | ||
| → per-split codegen (SPIR-V, AOT) | ||
| → runWrapperAndCompile() → wrapper.o | ||
| ``` | ||
|
|
||
| ### Non-RDC flow: | ||
| ``` | ||
| For each TU.bc independently: | ||
| TU.bc → link device libs → TU_WITH_LIBS.bc | ||
| → sycl-post-link → [split0, ...] | ||
| → per-split codegen (SPIR-V, AOT) | ||
| → append to AllSplitModules | ||
| runWrapperAndCompile(AllSplitModules) → wrapper.o (one object containing many images) | ||
| ``` | ||
|
|
||
| **Implementation note:** | ||
| * Refactor the existing codegen loop (per-split SPIR-V/AOT) into a helper lambda/function so it can be reused for both RDC and non-RDC paths without duplication. | ||
| * clang-linker-wrapper already contains a suitable command line flag `--relocatable` from upstream which should be reused. It means that Clang Driver should start pass `--relocatable` for the existing RDC mode. Later, an absence of `--relocatable` flag will imply non-rdc sycl mode. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| **Implementation note on device lib extraction:** The per-TU device lib linking in non-RDC mode needs to extract compatible device lib files from the device library list. This logic can be factored out of `sycl::linkDevice()` into a helper (e.g., `sycl::collectDeviceLibFiles()`) shared by both paths. | ||
|
|
||
| ## Post-Link Split Mode in Non-RDC | ||
|
|
||
| In non-RDC mode, each module is already a single TU. The split mode should respect the user's `-fsycl-device-code-split` option, but default to `SPLIT_NONE` (equivalent to per-source, since each TU is already separate). | ||
|
|
||
| ## Error Handling | ||
|
|
||
| Following the old model, add validation: when `-fno-sycl-rdc` is used with `--offload-new-driver`, emit an error if AOT target is missing (consistent with `err_drv_no_rdc_sycl_target_missing` in the old model). This check belongs in `Clang.cpp` or `Driver.cpp` near the `--sycl-no-rdc` propagation. | ||
|
|
||
| ## Key Constraints / Non-Goals | ||
|
|
||
| - `SYCL_EXTERNAL` across TUs is disallowed with `-fno-sycl-rdc` (already enforced by frontend, unchanged) | ||
| - SYCLBIN packaging in non-RDC mode: follows same wrapping path; no separate changes needed | ||
| - Non-RDC in `--fsycl-device-only` mode: skip linker wrapper; driver-side packaging unchanged | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The driver changes are not covered by this proposal but based on the provided information I conclude that you propose clang-linker-wrapper to perform offload code linking during "link step". I might be wrong, but I think one of the requirements for
-fno-gpu-rdcis to perform offload code linking at compile step. Please, check if this such requirement.Anyway, this is big step away from the current SYCL workflow and upstream driver workflow for non-SYCL mode.
-fno-gpu-rdcalways links offload executable at compile step, not at the link step.