Skip to content

Commit 367e6ce

Browse files
authored
[SYCL] Clarify cache_config as a kernel launch property (#21872)
Update the sycl_ext_intel_cache_config specification to state explicitly that cache_config is a runtime kernel launch property: it must be passed to the kernel invocation via launch_config and must not be embedded in a kernel functor via get(properties_tag). It looks like property-list overloads for single_task/parallel_for are deprecated, so I have removed those from examples in the extension doc. Instead added examples with launch_config. Also fixed line wrapping to 80 columns. Updated e2e test accordingly and added test cases for the enqueue free functions (parallel_for, nd_launch) from sycl_ext_oneapi_enqueue_functions with a launch_config (including free function kernel case). Kept coverage for deprecated APIs (just removed embeddings into kernel type). Assisted-by: Claude
1 parent 0cdeea0 commit 367e6ce

2 files changed

Lines changed: 244 additions & 100 deletions

File tree

sycl/doc/extensions/experimental/sycl_ext_intel_cache_config.asciidoc

Lines changed: 115 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
== Notice
2121

2222
[%hardbreaks]
23-
Copyright (c) 2023-2023 Intel Corporation. All rights reserved.
23+
Copyright (c) 2023 Intel Corporation. All rights reserved.
2424

2525
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
2626
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
@@ -43,7 +43,7 @@ This extension is written against the SYCL 2020 specification, Revision 6 and
4343
the following extensions:
4444

4545
- link:../experimental/sycl_ext_oneapi_properties.asciidoc[sycl_ext_oneapi_properties]
46-
- link:sycl_ext_oneapi_kernel_properties.asciidoc[sycl_ext_oneapi_kernel_properties]
46+
- link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[sycl_ext_oneapi_enqueue_functions]
4747

4848
== Status
4949

@@ -57,9 +57,10 @@ specification.*
5757
== Overview
5858

5959
There are devices where the same hardware resources are used for shared local
60-
memory (SLM) and L1 data cache. Developers may want more L1 data cache or SLM based
61-
on their application. This extension adds runtime kernel property `cache_config`
62-
which provides a way to set the preferred cache configuration for a kernel.
60+
memory (SLM) and L1 data cache. Developers may want more L1 data cache or SLM
61+
based on their application. This extension adds a kernel launch property
62+
`cache_config` which provides a way to set the preferred cache configuration
63+
for a kernel invocation.
6364

6465
=== Feature Test Macro
6566

@@ -80,15 +81,27 @@ supports.
8081

8182
=== Introduction
8283

83-
This extension introduces new kernel property that can be applied to kernels
84-
using the mechanism defined in link:sycl_ext_oneapi_kernel_properties.asciidoc[sycl_ext_oneapi_kernel_properties].
84+
This extension introduces a new kernel launch property that specifies the
85+
preferred cache configuration for a single kernel invocation. The recommended
86+
way to apply the property is to pass it through a `launch_config` object to
87+
one of the enqueue free functions defined in
88+
link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[
89+
sycl_ext_oneapi_enqueue_functions] (such as `parallel_for` or `nd_launch`).
90+
The property is applicable both to kernels expressed as SYCL lambdas/function
91+
objects and to free function kernels defined by
92+
link:../experimental/sycl_ext_oneapi_free_function_kernels.asciidoc[
93+
sycl_ext_oneapi_free_function_kernels].
94+
95+
Because `cache_config` is a launch-time property rather than a compile-time
96+
kernel property, it is not embedded into the kernel's type and the cache
97+
configuration can be chosen independently for each invocation of the same
98+
kernel.
8599

86100
=== Cache Config Property
87101

88102
```c++
89103
namespace sycl::ext::intel::experimental {
90104

91-
92105
enum class cache_config_enum : /*unspecified*/ { large_slm, large_data };
93106

94107
inline constexpr cache_config_enum large_slm =
@@ -105,71 +118,122 @@ struct cache_config {
105118
```
106119

107120
The `cache_config` property provides a way to set the preferred cache
108-
configuration for a kernel. The following values are supported:
121+
configuration for a kernel invocation. The following values are supported:
109122

110-
* `large_slm`: Prefer having larger shared local memory and smaller L1 data cache.
111-
In this case driver will ensure that all workgroups will have enough
112-
SLM to run.
123+
* `large_slm`: Prefer having larger shared local memory and smaller L1 data
124+
cache. In this case the driver will ensure that all workgroups
125+
will have enough SLM to run.
113126

114-
* `large_data`: Prefer having larger L1 data cache and smaller shared local memory.
115-
In this case SLM size may be shrinked (which may result in workgroups
116-
spawning as there will be not enough SLM to handle multiple workgroups)
117-
but L1 data cache will be bigger. There may be rare use cases when this
118-
is beneficial.
127+
* `large_data`: Prefer having larger L1 data cache and smaller shared local
128+
memory. In this case SLM size may be shrunk (which may result
129+
in workgroups spawning as there will be not enough SLM to
130+
handle multiple workgroups) but L1 data cache will be bigger.
131+
There may be rare use cases when this is beneficial.
119132

120-
These property may be passed to any kernel invocation function (e.g.
121-
`parallel_for`) via the properties parameter. At most, only one of these
122-
values may be passed to any single kernel invocation function.
133+
The `cache_config` property must be passed to a kernel invocation through a
134+
`launch_config` object. At most one `cache_config` value may be passed to
135+
any single kernel invocation.
123136

124137
Backends that do not support this extension may accept and ignore this
125138
property.
126139

127-
=== Adding a Property List to a Kernel Launch
140+
=== Usage Examples
128141

129-
A simple example of using this extension is shown below.
142+
Simple examples of using this extension are shown below.
130143

131-
The example assumes that the kernel will benefit from large SLM and hence uses the property
132-
`cache_config_large_slm`:
144+
==== parallel_for and nd_launch
133145

134146
```c++
135-
using namespace sycl::ext::intel::experimental;
136-
{
137-
...
138-
properties kernel_properties{cache_config{large_slm}};
139-
140-
q.single_task(kernel_properties, [=] {
141-
*a = *b + *c;
142-
}).wait();
147+
#include <sycl/sycl.hpp>
148+
149+
namespace syclex = sycl::ext::oneapi::experimental;
150+
namespace intelex = sycl::ext::intel::experimental;
151+
152+
int main() {
153+
sycl::queue q;
154+
syclex::properties props{intelex::cache_config{intelex::large_slm}};
155+
156+
// Range-based launch.
157+
syclex::parallel_for(q,
158+
syclex::launch_config{sycl::range<1>{1024}, props},
159+
[=](sycl::item<1> i) { /* kernel code */ });
160+
161+
// nd_range-based launch.
162+
syclex::nd_launch(q,
163+
syclex::launch_config{sycl::nd_range<1>{{1024}, {64}}, props},
164+
[=](sycl::nd_item<1> i) { /* kernel code */ });
165+
166+
q.wait();
143167
}
144168
```
145169

146-
=== Embedding Property into a Kernel
170+
==== Free function kernels
147171

148-
The example below shows how the kernel from the previous section could be
149-
rewritten to leverage an embedded property list (see link:sycl_ext_oneapi_kernel_properties.asciidoc#embedding-properties-into-a-kernel[embedding-properties-into-a-kernel]):
172+
For a kernel declared with the free function kernel syntax, the property is
173+
passed to `nd_launch` via the same `launch_config` mechanism:
150174

151175
```c++
152-
using namespace sycl::ext::intel::experimental;
153-
struct KernelFunctor {
176+
#include <sycl/sycl.hpp>
154177

155-
KernelFunctor(int* a, int* b, int* c) : a(a), b(b), c(c) {}
178+
namespace syclex = sycl::ext::oneapi::experimental;
179+
namespace intelex = sycl::ext::intel::experimental;
156180

157-
void operator()() const {
158-
*a = *b + *c;
159-
}
181+
SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::nd_range_kernel<1>))
182+
void my_kernel(float *ptr) { /* kernel code */ }
160183

161-
auto get(properties_tag) const {
162-
return properties{cache_config{large_slm}};
163-
}
184+
int main() {
185+
sycl::queue q;
186+
float *ptr = sycl::malloc_shared<float>(1024, q);
164187

165-
int* a;
166-
int* b;
167-
int* c;
168-
};
188+
syclex::launch_config cfg{
189+
sycl::nd_range<1>{{1024}, {64}},
190+
syclex::properties{intelex::cache_config{intelex::large_slm}}
191+
};
192+
syclex::nd_launch(q, cfg, syclex::kernel_function<my_kernel>, ptr);
193+
194+
q.wait();
195+
}
196+
```
197+
198+
==== Different cache configurations for different launches of the same kernel
169199

170-
...
200+
Because `cache_config` is a runtime property, different invocations of the
201+
same kernel can use different cache configurations.
171202

172-
q.single_task(KernelFunctor{a, b, c}).wait();
203+
```c++
204+
#include <sycl/sycl.hpp>
205+
206+
namespace syclex = sycl::ext::oneapi::experimental;
207+
namespace intelex = sycl::ext::intel::experimental;
208+
209+
SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::nd_range_kernel<1>))
210+
void my_kernel(float *ptr) { /* kernel code */ }
211+
212+
int main() {
213+
sycl::queue q;
214+
sycl::context ctxt = q.get_context();
215+
float *ptr = sycl::malloc_shared<float>(1024, q);
216+
217+
auto bundle =
218+
syclex::get_kernel_bundle<my_kernel, sycl::bundle_state::executable>(ctxt);
219+
sycl::kernel k = bundle.ext_oneapi_get_kernel<my_kernel>();
220+
221+
sycl::nd_range<1> ndr{{1024}, {64}};
222+
223+
// First launch with large SLM.
224+
syclex::nd_launch(q,
225+
syclex::launch_config{ndr,
226+
syclex::properties{intelex::cache_config{intelex::large_slm}}},
227+
k, ptr);
228+
229+
// Second launch of the same kernel with large data cache.
230+
syclex::nd_launch(q,
231+
syclex::launch_config{ndr,
232+
syclex::properties{intelex::cache_config{intelex::large_data}}},
233+
k, ptr);
234+
235+
q.wait();
236+
}
173237
```
174238

175239
== Revision History
@@ -180,4 +244,5 @@ q.single_task(KernelFunctor{a, b, c}).wait();
180244
|========================================
181245
|Rev|Date|Author|Changes
182246
|1|2022-03-01|Artur Gainullin|*Initial public working draft*
247+
|2|2026-04-24|Artur Gainullin|*Clarify `cache_config` as a kernel launch property; remove embedding via `get(properties_tag)`*
183248
|========================================

0 commit comments

Comments
 (0)