Skip to content

Commit 9fff1a8

Browse files
authored
spec source for OpenCL 3.1 (#1554)
1 parent 154cb2b commit 9fff1a8

30 files changed

Lines changed: 1079 additions & 599 deletions

.gitlab-ci.yml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Copyright 2018-2025 The Khronos Group Inc.
2+
3+
# Gitlab CI file for OpenCL specs
4+
5+
# All stages use the same Docker image, so there are no prerequisites
6+
# Refer to the container by its SHA instead of the name, to prevent
7+
# caching problems when updating the image.
8+
# image: khronosgroup/docker-images:asciidoctor-spec.202506
9+
# There are no 'before_script' tags for most stages, because all
10+
# prerequisites are in the default image.
11+
image: khronosgroup/docker-images@sha256:0f91e60e1af2bdd889783af3907f63279c08f573f2eccbc31094e348d1a32a4f
12+
13+
# Specify which gitlab runner to use
14+
default:
15+
tags:
16+
- khrmedium
17+
18+
# Build the OpenCL specification
19+
spec-generate:
20+
stage: build
21+
script:
22+
- make -C xml validate
23+
- python3 makeSpec -clean -spec core OUTDIR=out.core -j -O api c env
24+
- python3 makeSpec -clean -spec khr+ext OUTDIR=out.ext -j -O api c env
25+
- python3 makeSpec -clean -spec khr+ext OUTDIR=out.refpages -j -O manhtmlpages
26+
artifacts:
27+
when: always
28+
paths:
29+
- out.core/
30+
- out.ext/
31+
- out.refpages/
32+
expire_in: 1 week
33+
34+
spec-deploy-pages:
35+
stage: deploy
36+
script:
37+
- mkdir -p public
38+
- cp -r out.ext/* public/
39+
- cp -r out.refpages/* public/
40+
artifacts:
41+
paths:
42+
- public
43+
pages: true
44+
rules:
45+
- if: '$CI_COMMIT_BRANCH == "github"'
46+
- if: '$CI_COMMIT_TAG =~ /^v3\.1/'

OpenCL_C.txt

Lines changed: 189 additions & 202 deletions
Large diffs are not rendered by default.

api/acknowledgements.asciidoc

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ MonPing Wang, Apple +
3838
Tanya Lattner, Apple +
3939
Mikael Bourges-Sevenier, Aptina +
4040
Brice Videau, Argonne National Laboratory +
41+
Thomas Applencourt, Argonne National Laboratory +
42+
Nevin Liber, Argonne National Laboratory +
4143
Anton Lokhmotov, ARM +
4244
Dave Shreiner, ARM +
4345
Einar Hov, ARM +
@@ -58,11 +60,15 @@ Maria Rovatsou, Codeplay +
5860
Alistair Donaldson, Codeplay +
5961
Alastair Murray, Codeplay +
6062
Ewan Crawford, Codeplay +
63+
Ewa Gałamon, Cognizant +
64+
Marcin Hajder, Cognizant +
65+
Faith Ekstrand, Collabora +
6166
Stephen Frye, Electronic Arts +
6267
Eric Schenk, Electronic Arts +
6368
Daniel Laroche, Freescale +
6469
David Neto, Google +
6570
James Price, Google +
71+
Romaric Jodin, Google +
6672
Robin Grosman, Huawei +
6773
Craig Davies, Huawei +
6874
Brian Horton, IBM +
@@ -73,6 +79,7 @@ Joaquin Madruga, IBM +
7379
Mark Nutter, IBM +
7480
Mike Perks, IBM +
7581
Sean Wagner, IBM +
82+
Ahmed Amrani Akdi, Imagination Technologies +
7683
Jeremy Kemp, Imagination Technologies +
7784
Jon Parr, Imagination Technologies +
7885
Paul Fradgley, Imagination Technologies +
@@ -98,6 +105,7 @@ Hong Jiang, Intel +
98105
Jayanth Rao, Intel +
99106
Josh Fryman, Intel +
100107
Kevin Stevens, Intel +
108+
Konrad Trifunovic, Intel +
101109
Larry Seiler, Intel +
102110
Michael Kinsner, Intel +
103111
Michal Mrozek, Intel +
@@ -114,6 +122,7 @@ Roy Ju, Mediatek +
114122
Bor-Sung Liang, Mediatek +
115123
Rahul Agarwal, Mediatek +
116124
Michal Witaszek, Mobica +
125+
Aharon Abramson, Mobileye +
117126
JenqKuen Lee, NTHU +
118127
Amit Rao, NVIDIA +
119128
Ashish Srivastava, NVIDIA +
@@ -142,9 +151,11 @@ Yuan Lin, NVIDIA +
142151
Mayuresh Pise, NVIDIA +
143152
Allan Tzeng, QUALCOMM +
144153
Alex Bourd, QUALCOMM +
154+
Alexander Galazin, QUALCOMM +
145155
Andrew Gruber, QUALCOMM +
146156
Andrzej Mamona, QUALCOMM +
147157
Anirudh Acharya, QUALCOMM +
158+
Arvind Sudarsanam, QUALCOMM +
148159
Balaji Calidas, QUALCOMM +
149160
Benedict Gaster, QUALCOMM +
150161
Bill Torzewski, QUALCOMM +
@@ -157,6 +168,7 @@ David Ligon, QUALCOMM +
157168
Hongqiang Wang, QUALCOMM +
158169
Jay Yun, QUALCOMM +
159170
Jian Liu, QUALCOMM +
171+
Jose Lopez, QUALCOMM +
160172
Joshua Kelly, QUALCOMM +
161173
Lee Howes, QUALCOMM +
162174
Lihan Bin, QUALCOMM +
@@ -170,7 +182,11 @@ Vineet Goel, QUALCOMM +
170182
Vlad Shimanskiy, QUALCOMM +
171183
Yu-Chi Huang, QUALCOMM +
172184
Yuehai Du, QUALCOMM +
185+
Karol Herbst, Red Hat +
186+
Austin Annestrand, Samsung +
173187
Raun Krisch, Samsung +
188+
Gowtham Tammana, Samsung +
189+
Pavan Lanka, Samsung +
174190
Tasneem Brutch, Samsung +
175191
Yoonseo Choi, Samsung +
176192
Dennis Adams, Sony +
@@ -181,6 +197,7 @@ Anton Gorenko, StreamHPC +
181197
Jakub Szuppe, StreamHPC +
182198
Máté Ferenc Nagy-Egri, StreamHPC +
183199
Vincent Hindriksen, StreamHPC +
200+
Pekka Jääskeläinen, Tampere University +
184201
Ajay Jayaraj, Texas Instruments +
185202
Alan Ward, Texas Instruments +
186203
Yuan Zhao, Texas Instruments +

api/appendix_c.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ For example:
165165

166166
Vector data type components may also be accessed using the `.rgba` field
167167
naming convention, similar to how they are used within the OpenCL C 3.0
168-
language.
168+
or newer language.
169169
Use of the `.rgba` field naming convention only allows accessing of the
170170
first 4 component fields.
171171
Support of these notations is identified by the `CL_HAS_NAMED_RGBA_VECTOR_FIELDS`

api/appendix_e.asciidoc

Lines changed: 57 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ runtime (_sections 4 and 5_):
200200

201201
* Shared virtual memory. The associated API additions are:
202202
** {clSetKernelArgSVMPointer} to control which shared virtual memory (SVM)
203-
pointer to associate with a kernel instance.
203+
pointer to associate with a kernel-instance.
204204
** {clSVMAlloc}, {clSVMFree} and {clEnqueueSVMFree} to allocate and free
205205
memory for use with SVM.
206206
** {clEnqueueSVMMap} and {clEnqueueSVMUnmap} to map and unmap to update
@@ -326,6 +326,7 @@ The OpenCL 2.0 kernel language will still be consumed by OpenCL 2.1
326326
runtimes.
327327

328328
The SPIR-V and OpenCL SPIR-V Environment specifications have been added.
329+
OpenCL 2.1 requires support for the SPIR-V 1.0 intermediate language.
329330

330331
== Summary of Changes from OpenCL 2.1 to OpenCL 2.2
331332

@@ -353,6 +354,9 @@ runtime (section 4 and 5):
353354
Added definition of Deprecation and Specialization constants to the
354355
glossary.
355356

357+
OpenCL 2.2 requires support for the SPIR-V 1.0, SPIR-V 1.1, and SPIR-V 1.2
358+
intermediate languages.
359+
356360
== Summary of Changes from OpenCL 2.2 to OpenCL 3.0
357361

358362
OpenCL 3.0 is a major revision that breaks backwards compatibility with
@@ -457,8 +461,6 @@ conformance process:
457461

458462
* {CL_DEVICE_LATEST_CONFORMANCE_VERSION_PASSED}
459463

460-
== Summary of Changes to OpenCL 3.0
461-
462464
The first non-experimental version of the OpenCL 3.0 specifications was *v3.0.5*.
463465

464466
Changes from *v3.0.5* to *v3.0.6*:
@@ -597,7 +599,7 @@ Changes from *v3.0.14* to *v3.0.15*:
597599
** Added an semaphore-specific device handle list enum, see {khronos-opencl-pr}/956[#956].
598600
** Restricted semaphores to a single associated device, see {khronos-opencl-pr}/996[#996].
599601
* {cl_khr_subgroup_rotate_EXT}:
600-
** Clarified that only rotating within a subgroup is supported, see {khronos-opencl-pr}/967[#967].
602+
** Clarified that only rotating within a sub-group is supported, see {khronos-opencl-pr}/967[#967].
601603

602604
Changes from *v3.0.15* to *v3.0.16*:
603605

@@ -707,5 +709,54 @@ Changes from *v3.0.18* to *v3.0.19*:
707709
* The following extension has been finalized and is no longer experimental:
708710
** {cl_khr_kernel_clock_EXT}
709711
* Added new extensions:
710-
** {cl_khr_external_memory_android_hardware_buffer_EXT}
711-
** {cl_khr_spirv_queries_EXT}
712+
** {cl_khr_external_memory_android_hardware_buffer_EXT} (experimental)
713+
** {cl_khr_spirv_queries_EXT}
714+
715+
== Summary of Changes from OpenCL 3.0 to OpenCL 3.1
716+
717+
OpenCL 3.1 adds the OpenCL 3.1 C kernel language.
718+
Please refer to the OpenCL C specification for details.
719+
720+
OpenCL 3.1 requires support for the SPIR-V 1.0, SPIR-V 1.1, SPIR-V 1.2, SPIR-V
721+
1.3, and SPIR-V 1.4 intermediate languages.
722+
Please refer to the OpenCL SPIR-V Environment specification for details.
723+
724+
OpenCL 3.1 removes the OpenCL Extension Specification.
725+
The OpenCL Extension Specification is no longer needed now that EXT and KHR extensions are documented in the main OpenCL specifications.
726+
See {khronos-opencl-pr}/1516[#1516].
727+
728+
Other changes in OpenCL 3.1:
729+
730+
* Added required support for sub-groups, see {CL_DEVICE_MAX_NUM_SUB_GROUPS}.
731+
* Relaxed the definition of inclusive scopes, see internal issue 367.
732+
* Clarified and un-deprecated the {CL_DEVICE_HOST_UNIFIED_MEMORY} query, see internal issue 370.
733+
* Updated the memory model so observing that an event is {CL_COMPLETE} is a synchronization point, see internal issue 373.
734+
* Allowed {clSetKernelArg} to set a local memory kernel argument to zero, see internal issue 374.
735+
* Deprecated the confusingly named {CL_DEVICE_MAX_WORK_ITEM_SIZES} query, use {CL_DEVICE_MAX_WORK_GROUP_SIZES} instead, see internal issue 375.
736+
* Cleaned up inconsistencies in the error condition descriptions for {clSetKernelExecInfo}, see {khronos-opencl-pr}/1419[#1419].
737+
* Improved error code documentation consistency for many OpenCL APIs, see {khronos-opencl-pr}/1439[#1439] and others.
738+
* Added missing error conditions for {clCompileProgram} and {clLinkProgram}, see {khronos-opencl-pr}/1453[#1453].
739+
* Clarified the description of sub-group functions, see {khronos-opencl-pr}/1483[#1483].
740+
* Refactored and clarified the description for {clSetKernelArg}, see {khronos-opencl-pr}/1493[#1493].
741+
* Clarified the description of atomic operations, see {khronos-opencl-pr}/1500[#1500].
742+
* Cleaned up a few more descriptions of custom devices, see {khronos-opencl-pr}/1540[#1540].
743+
* Added missing error condition for {clEnqueueNDRangeKernel} when the local work-group size is zero, see {khronos-opencl-pr}/1542[#1542].
744+
* {cl_khr_command_buffer_mutable_dispatch_EXT} (experimental):
745+
** Relaxed the requirement to set all kernel arguments before recording a command buffer, see {khronos-opencl-pr}/1382[#1382].
746+
** Redefined and clarified command-buffer simultaneous use, see {khronos-opencl-pr}/1411[#1411].
747+
* {cl_khr_external_memory_android_hardware_buffer_EXT} (experimental)
748+
** Clarified that images cannot be created if the format is `AHARDWAREBUFFER_FORMAT_BLOB`, see {khronos-opencl-pr}/1477[#1477].
749+
* {cl_ext_buffer_device_address}
750+
** Added a missing error condition for {clSetKernelArgDevicePointerEXT}, see {khronos-opencl-pr}/1493[#1492].
751+
* Added new extensions:
752+
** {cl_khr_unified_svm_EXT} (experimental)
753+
* Promoted the following extensions to the core API:
754+
** {cl_khr_device_uuid_EXT}
755+
** {cl_khr_extended_bit_ops_EXT}
756+
** {cl_khr_integer_dot_product_EXT}
757+
** {cl_khr_spirv_queries_EXT}
758+
** {cl_khr_subgroup_extended_types_EXT}
759+
** {cl_khr_subgroup_rotate_EXT}
760+
** {cl_khr_subgroup_shuffle_EXT}
761+
** {cl_khr_subgroup_shuffle_relative_EXT}
762+
** {cl_khr_suggested_local_work_size_EXT}

api/cl_ext_immutable_memory_objects.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ include::{generated}/meta/interfaces/cl_ext_immutable_memory_objects.txt[]
3030
1) Can {CL_MEM_READ_ONLY} be used instead of {CL_MEM_IMMUTABLE_EXT}?
3131
--
3232
*RESOLVED*: No. Memory objects created with {CL_MEM_READ_ONLY} can be modified
33-
by copy or fill commands and this behaviour cannot be changed without breaking
33+
by copy or fill commands and this behavior cannot be changed without breaking
3434
backwards compatibility.
3535
--
3636

api/cl_khr_extended_versioning.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ Here's a recommended policy:
103103
Adding {CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION_KHR}.
104104
--
105105

106-
. What is the behaviour of the queries that return an array of structures when
106+
. What is the behavior of the queries that return an array of structures when
107107
there are no elements to return?
108108
+
109109
--

api/embedded_profile.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -323,5 +323,5 @@ profile devices is:
323323
|====
324324

325325
For embedded profiles devices that support reading from and writing to the same
326-
image object from the same kernel instance (see {CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS})
326+
image object from the same kernel-instance (see {CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS})
327327
there is no required minimum list of supported image formats.

api/footnotes.asciidoc

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,12 @@ The value of {CL_COMPLETE} and {CL_SUCCESS} are the same. \
4848
{clGetDeviceIDs} may return all or a subset of the actual physical devices present in the platform and that match _device_type_. \
4949
]
5050

51+
:fn-host-unified-memory: pass:n[ \
52+
When the query for {CL_DEVICE_HOST_UNIFIED_MEMORY} is {CL_TRUE}, allocating OpenCL memory will likely reduce the amount of host memory available to the system. \
53+
Likewise, allocating host memory will likely reduce the amount of memory available to OpenCL. \
54+
If the memory described by {CL_DEVICE_GLOBAL_MEM_SIZE} is primarily host memory, such as for CPUs, integrated GPUs, and other devices with a relatively small amount of dedicated device memory, then the device should return {CL_TRUE} for {CL_DEVICE_HOST_UNIFIED_MEMORY}. \
55+
]
56+
5157
:fn-image-array-performance: pass:n[ \
5258
Note that reading and writing 2D image arrays from a kernel with `image_array_size` equal to one may perform worse than 2D images. \
5359
]
@@ -68,6 +74,10 @@ This value for *memory_scope* can only be used with *atomic_work_item_fence* wit
6874
Note that the performance of 64-bit integer arithmetic can vary significantly between embedded devices. \
6975
]
7076

77+
:fn-local-arg-size-zero: pass:n[ \
78+
When the size of a `local` argument is set to zero, the value of the pointer within the kernel is implementation-defined. \
79+
]
80+
7181
:fn-map-count-usage: pass:n[ \
7282
The map count returned should be considered immediately stale. \
7383
It is unsuitable for general use in applications. \

api/glossary.asciidoc

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -364,7 +364,8 @@ In-order Execution ::
364364

365365
Intermediate Language ::
366366
A lower-level language that may be used to create programs.
367-
SPIR-V is a required intermediate language (IL) for OpenCL 2.1 and 2.2 devices.
367+
SPIR-V is a required intermediate language (IL) for OpenCL 2.1, OpenCL 2.2,
368+
and OpenCL 3.1 or newer devices.
368369
Other OpenCL devices may optionally support SPIR-V or other ILs.
369370

370371
Kernel ::
@@ -376,7 +377,7 @@ Kernel ::
376377
Kernel-instance ::
377378
The work carried out by an OpenCL program occurs through the execution
378379
of kernel-instances on devices.
379-
The kernel instance is the _kernel object_, the values associated with
380+
The kernel-instance is the _kernel object_, the values associated with
380381
the arguments to the kernel, and the parameters that define the
381382
_ND-range_ index space.
382383

@@ -493,9 +494,9 @@ Pipe ::
493494
items.
494495
A pipe has two endpoints: a write endpoint into which data items are
495496
inserted, and a read endpoint from which data items are removed.
496-
At any one time, only one kernel instance may write into a pipe, and
497-
only one kernel instance may read from a pipe.
498-
To support the producer consumer design pattern, one kernel instance
497+
At any one time, only one kernel-instance may write into a pipe, and
498+
only one kernel-instance may read from a pipe.
499+
To support the producer consumer design pattern, one kernel-instance
499500
connects to the write endpoint (the producer) while another kernel
500501
instance connects to the reading endpoint (the consumer).
501502

@@ -636,16 +637,12 @@ Sampler ::
636637
image coordinate is a normalized or unnormalized value.
637638

638639
Scope inclusion ::
639-
Two actions *A* and *B* are defined to have an inclusive scope if they
640-
have the same scope *P* such that: (1) if *P* is
641-
*memory_scope_sub_group*, and *A* and *B* are executed by work-items
642-
within the same sub-group, or (2) if *P* is *memory_scope_work_group*,
643-
and *A* and *B* are executed by work-items within the same work-group,
644-
or (3) if *P* is *memory_scope_device*, and *A* and *B* are executed by
645-
work-items on the same device, or (4) if *P* is
646-
*memory_scope_all_svm_devices* or *memory_scope_all_devices*, if *A* and *B*
647-
are executed by host threads or by work-items on one or more devices that
648-
can share SVM memory with each other and the host process.
640+
Two actions *A* and *B* have an inclusive scope if the work-item or host
641+
thread performing *A* is in the memory scope of *B* and the work-item or
642+
host thread performing *B* is in the memory scope of *A*.
643+
Prior to OpenCL 3.1, the memory scope of *A* and *B* additionally must be
644+
the same for the actions to have an inclusive scope.
645+
For OpenCL 3.1 and newer, the memory scope of *A* and *B* may be different.
649646

650647
Sequenced before ::
651648
A relation between evaluations executed by a single unit of execution.

0 commit comments

Comments
 (0)