2020== Notice
2121
2222[%hardbreaks]
23- Copyright (c) 2023-2023 Intel Corporation. All rights reserved.
23+ Copyright (c) 2023 Intel Corporation. All rights reserved.
2424
2525Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
2626of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
@@ -43,7 +43,7 @@ This extension is written against the SYCL 2020 specification, Revision 6 and
4343the following extensions:
4444
4545- link:../experimental/sycl_ext_oneapi_properties.asciidoc[sycl_ext_oneapi_properties]
46- - link:sycl_ext_oneapi_kernel_properties. asciidoc[sycl_ext_oneapi_kernel_properties ]
46+ - link:../experimental/sycl_ext_oneapi_enqueue_functions. asciidoc[sycl_ext_oneapi_enqueue_functions ]
4747
4848== Status
4949
@@ -57,9 +57,10 @@ specification.*
5757== Overview
5858
5959There are devices where the same hardware resources are used for shared local
60- memory (SLM) and L1 data cache. Developers may want more L1 data cache or SLM based
61- on their application. This extension adds runtime kernel property `cache_config`
62- which provides a way to set the preferred cache configuration for a kernel.
60+ memory (SLM) and L1 data cache. Developers may want more L1 data cache or SLM
61+ based on their application. This extension adds a kernel launch property
62+ `cache_config` which provides a way to set the preferred cache configuration
63+ for a kernel invocation.
6364
6465=== Feature Test Macro
6566
@@ -80,15 +81,27 @@ supports.
8081
8182=== Introduction
8283
83- This extension introduces new kernel property that can be applied to kernels
84- using the mechanism defined in link:sycl_ext_oneapi_kernel_properties.asciidoc[sycl_ext_oneapi_kernel_properties].
84+ This extension introduces a new kernel launch property that specifies the
85+ preferred cache configuration for a single kernel invocation. The recommended
86+ way to apply the property is to pass it through a `launch_config` object to
87+ one of the enqueue free functions defined in
88+ link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[
89+ sycl_ext_oneapi_enqueue_functions] (such as `parallel_for` or `nd_launch`).
90+ The property is applicable both to kernels expressed as SYCL lambdas/function
91+ objects and to free function kernels defined by
92+ link:../experimental/sycl_ext_oneapi_free_function_kernels.asciidoc[
93+ sycl_ext_oneapi_free_function_kernels].
94+
95+ Because `cache_config` is a launch-time property rather than a compile-time
96+ kernel property, it is not embedded into the kernel's type and the cache
97+ configuration can be chosen independently for each invocation of the same
98+ kernel.
8599
86100=== Cache Config Property
87101
88102```c++
89103namespace sycl::ext::intel::experimental {
90104
91-
92105enum class cache_config_enum : /*unspecified*/ { large_slm, large_data };
93106
94107inline constexpr cache_config_enum large_slm =
@@ -105,71 +118,122 @@ struct cache_config {
105118```
106119
107120The `cache_config` property provides a way to set the preferred cache
108- configuration for a kernel. The following values are supported:
121+ configuration for a kernel invocation . The following values are supported:
109122
110- * `large_slm`: Prefer having larger shared local memory and smaller L1 data cache.
111- In this case driver will ensure that all workgroups will have enough
112- SLM to run.
123+ * `large_slm`: Prefer having larger shared local memory and smaller L1 data
124+ cache. In this case the driver will ensure that all workgroups
125+ will have enough SLM to run.
113126
114- * `large_data`: Prefer having larger L1 data cache and smaller shared local memory.
115- In this case SLM size may be shrinked (which may result in workgroups
116- spawning as there will be not enough SLM to handle multiple workgroups)
117- but L1 data cache will be bigger. There may be rare use cases when this
118- is beneficial.
127+ * `large_data`: Prefer having larger L1 data cache and smaller shared local
128+ memory. In this case SLM size may be shrunk (which may result
129+ in workgroups spawning as there will be not enough SLM to
130+ handle multiple workgroups) but L1 data cache will be bigger.
131+ There may be rare use cases when this is beneficial.
119132
120- These property may be passed to any kernel invocation function (e.g.
121- `parallel_for`) via the properties parameter . At most, only one of these
122- values may be passed to any single kernel invocation function .
133+ The `cache_config` property must be passed to a kernel invocation through a
134+ `launch_config` object . At most one `cache_config` value may be passed to
135+ any single kernel invocation.
123136
124137Backends that do not support this extension may accept and ignore this
125138property.
126139
127- === Adding a Property List to a Kernel Launch
140+ === Usage Examples
128141
129- A simple example of using this extension is shown below.
142+ Simple examples of using this extension are shown below.
130143
131- The example assumes that the kernel will benefit from large SLM and hence uses the property
132- `cache_config_large_slm`:
144+ ==== parallel_for and nd_launch
133145
134146```c++
135- using namespace sycl::ext::intel::experimental;
136- {
137- ...
138- properties kernel_properties{cache_config{large_slm}};
139-
140- q.single_task(kernel_properties, [=] {
141- *a = *b + *c;
142- }).wait();
147+ #include <sycl/sycl.hpp>
148+
149+ namespace syclex = sycl::ext::oneapi::experimental;
150+ namespace intelex = sycl::ext::intel::experimental;
151+
152+ int main() {
153+ sycl::queue q;
154+ syclex::properties props{intelex::cache_config{intelex::large_slm}};
155+
156+ // Range-based launch.
157+ syclex::parallel_for(q,
158+ syclex::launch_config{sycl::range<1>{1024}, props},
159+ [=](sycl::item<1> i) { /* kernel code */ });
160+
161+ // nd_range-based launch.
162+ syclex::nd_launch(q,
163+ syclex::launch_config{sycl::nd_range<1>{{1024}, {64}}, props},
164+ [=](sycl::nd_item<1> i) { /* kernel code */ });
165+
166+ q.wait();
143167}
144168```
145169
146- === Embedding Property into a Kernel
170+ ==== Free function kernels
147171
148- The example below shows how the kernel from the previous section could be
149- rewritten to leverage an embedded property list (see link:sycl_ext_oneapi_kernel_properties.asciidoc#embedding-properties-into-a-kernel[embedding-properties-into-a-kernel]) :
172+ For a kernel declared with the free function kernel syntax, the property is
173+ passed to `nd_launch` via the same `launch_config` mechanism :
150174
151175```c++
152- using namespace sycl::ext::intel::experimental;
153- struct KernelFunctor {
176+ #include <sycl/sycl.hpp>
154177
155- KernelFunctor(int* a, int* b, int* c) : a(a), b(b), c(c) {}
178+ namespace syclex = sycl::ext::oneapi::experimental;
179+ namespace intelex = sycl::ext::intel::experimental;
156180
157- void operator()() const {
158- *a = *b + *c;
159- }
181+ SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::nd_range_kernel<1>))
182+ void my_kernel(float *ptr) { /* kernel code */ }
160183
161- auto get(properties_tag) const {
162- return properties{cache_config{large_slm}} ;
163- }
184+ int main() {
185+ sycl::queue q ;
186+ float *ptr = sycl::malloc_shared<float>(1024, q);
164187
165- int* a;
166- int* b;
167- int* c;
168- };
188+ syclex::launch_config cfg{
189+ sycl::nd_range<1>{{1024}, {64}},
190+ syclex::properties{intelex::cache_config{intelex::large_slm}}
191+ };
192+ syclex::nd_launch(q, cfg, syclex::kernel_function<my_kernel>, ptr);
193+
194+ q.wait();
195+ }
196+ ```
197+
198+ ==== Different cache configurations for different launches of the same kernel
169199
170- ...
200+ Because `cache_config` is a runtime property, different invocations of the
201+ same kernel can use different cache configurations.
171202
172- q.single_task(KernelFunctor{a, b, c}).wait();
203+ ```c++
204+ #include <sycl/sycl.hpp>
205+
206+ namespace syclex = sycl::ext::oneapi::experimental;
207+ namespace intelex = sycl::ext::intel::experimental;
208+
209+ SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclex::nd_range_kernel<1>))
210+ void my_kernel(float *ptr) { /* kernel code */ }
211+
212+ int main() {
213+ sycl::queue q;
214+ sycl::context ctxt = q.get_context();
215+ float *ptr = sycl::malloc_shared<float>(1024, q);
216+
217+ auto bundle =
218+ syclex::get_kernel_bundle<my_kernel, sycl::bundle_state::executable>(ctxt);
219+ sycl::kernel k = bundle.ext_oneapi_get_kernel<my_kernel>();
220+
221+ sycl::nd_range<1> ndr{{1024}, {64}};
222+
223+ // First launch with large SLM.
224+ syclex::nd_launch(q,
225+ syclex::launch_config{ndr,
226+ syclex::properties{intelex::cache_config{intelex::large_slm}}},
227+ k, ptr);
228+
229+ // Second launch of the same kernel with large data cache.
230+ syclex::nd_launch(q,
231+ syclex::launch_config{ndr,
232+ syclex::properties{intelex::cache_config{intelex::large_data}}},
233+ k, ptr);
234+
235+ q.wait();
236+ }
173237```
174238
175239== Revision History
@@ -180,4 +244,5 @@ q.single_task(KernelFunctor{a, b, c}).wait();
180244|========================================
181245|Rev|Date|Author|Changes
182246|1|2022-03-01|Artur Gainullin|*Initial public working draft*
247+ |2|2026-04-24|Artur Gainullin|*Clarify `cache_config` as a kernel launch property; remove embedding via `get(properties_tag)`*
183248|========================================
0 commit comments