Skip to content

Commit bee2c54

Browse files
maleadtclaude
andauthored
Decouple SPIRVIntrinsics atomics from OpenCL builtins (#445)
Emit integer atomics and subgroup shuffles through LLVM SPIR-V wrapper builtins, bump SPIRVIntrinsics to 1.0, and keep OpenCL.std usage limited to operations where that is the SPIR-V extended-instruction contract. This avoids depending on driver or translator recognition of OpenCL C builtin spellings, fixes the 64-bit atomic exchange path across backends, and keeps the Intel USM allocation path spec-compliant by passing NULL when no allocation properties are requested. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 8cbaae9 commit bee2c54

12 files changed

Lines changed: 109 additions & 62 deletions

File tree

.buildkite/pipeline.yml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ steps:
1010
using Pkg
1111
1212
println("--- :julia: Instantiating project")
13+
# Julia 1.10 does not support [sources], so dev the in-tree
14+
# SPIRVIntrinsics; Pkg.test then carries it into the test sandbox.
1315
Pkg.develop(path="lib/intrinsics")
1416
1517
println("+++ :julia: Running tests")
@@ -35,9 +37,13 @@ steps:
3537
using Pkg
3638
3739
println("--- :julia: Instantiating project")
40+
# Julia 1.10 does not support [sources], so dev the in-tree
41+
# SPIRVIntrinsics first; otherwise the Pkg.add below resolves
42+
# against the registry, which has no SPIRVIntrinsics 1. Pkg.test
43+
# then carries the in-tree copy into the test sandbox.
44+
Pkg.develop(path="lib/intrinsics")
3845
Pkg.add("pocl_jll")
3946
Pkg.add("InteractiveUtils")
40-
Pkg.develop(path="lib/intrinsics")
4147
4248
println("+++ :julia: Running tests")
4349
using InteractiveUtils

.github/workflows/Test.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,8 @@ jobs:
126126
echo 'default_memory_backend="${{ matrix.memory_backend }}"' >> test/LocalPreferences.toml
127127
julia --project -e '
128128
using Pkg
129+
# Julia 1.10 does not support [sources], so dev the in-tree
130+
# SPIRVIntrinsics; Pkg.test then carries it into the test sandbox.
129131
Pkg.develop(path="lib/intrinsics")'
130132
131133
- name: Test OpenCL.jl

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Random = "1"
4141
Random123 = "1.7.1"
4242
RandomNumbers = "1.6.0"
4343
Reexport = "1"
44-
SPIRVIntrinsics = "0.5.7"
44+
SPIRVIntrinsics = "1"
4545
SPIRV_LLVM_Backend_jll = "22"
4646
SPIRV_Tools_jll = "2025.1"
4747
StaticArrays = "1"

lib/cl/memory/usm.jl

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
abstract type UnifiedMemory <: AbstractPointerMemory end
22

3+
usm_alloc_properties(flags::Integer) =
4+
flags == 0 ? C_NULL : cl_mem_properties_intel[CL_MEM_ALLOC_FLAGS_INTEL, flags, 0]
5+
36
function usm_free(mem::UnifiedMemory; blocking::Bool = false)
47
if blocking
58
clMemBlockingFreeINTEL(context(mem), mem)
@@ -35,7 +38,7 @@ function device_alloc(bytesize::Integer;
3538
end
3639

3740
error_code = Ref{Cint}()
38-
props = cl_mem_properties_intel[CL_MEM_ALLOC_FLAGS_INTEL, flags, 0]
41+
props = usm_alloc_properties(flags)
3942
ptr = clDeviceMemAllocINTEL(context(), device(), props, bytesize, alignment, error_code)
4043
if error_code[] != CL_SUCCESS
4144
throw(CLError(error_code[]))
@@ -81,7 +84,7 @@ function host_alloc(bytesize::Integer;
8184
end
8285

8386
error_code = Ref{Cint}()
84-
props = cl_mem_properties_intel[CL_MEM_ALLOC_FLAGS_INTEL, flags, 0]
87+
props = usm_alloc_properties(flags)
8588
ptr = clHostMemAllocINTEL(context(), props, bytesize, alignment, error_code)
8689
if error_code[] != CL_SUCCESS
8790
throw(CLError(error_code[]))
@@ -135,7 +138,7 @@ function shared_alloc(bytesize::Integer;
135138
end
136139

137140
error_code = Ref{Cint}()
138-
props = cl_mem_properties_intel[CL_MEM_ALLOC_FLAGS_INTEL, flags, 0]
141+
props = usm_alloc_properties(flags)
139142
ptr = clSharedMemAllocINTEL(context(), device(), props, bytesize, alignment, error_code)
140143
if error_code[] != CL_SUCCESS
141144
throw(CLError(error_code[]))

lib/intrinsics/Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "SPIRVIntrinsics"
22
uuid = "71d1d633-e7e8-4a92-83a1-de8814b09ba8"
33
authors = ["Tim Besard <tim.besard@gmail.com>"]
4-
version = "0.5.9"
4+
version = "1.0.0"
55

66
[deps]
77
ExprTools = "e2ba6199-217a-4e67-a87a-7c52f15ade04"

lib/intrinsics/README.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,14 @@ considerations:
4141
```
4242

4343

44-
## OpenCL intrinsics
44+
## SPIR-V representation
4545

46-
The current set of intrinsics implemented by this package are OpenCL intrinsics,
47-
assuming that the generated LLVM IR will be compiled to SPIR-V using the
48-
Khronos LLVM to SPIR-V translator. That tool will take care of the conversion to
49-
actual SPIR-V intrinsics.
46+
Intrinsics that map to core SPIR-V operations should be encoded using LLVM's
47+
SPIR-V wrapper builtins, such as `__spirv_AtomicIAdd` or
48+
`__spirv_GroupNonUniformShuffle`. This keeps the emitted LLVM IR independent of
49+
OpenCL C builtin spellings.
50+
51+
Some math and integer functions intentionally keep OpenCL.std names. In SPIR-V,
52+
those operations are represented through the OpenCL extended instruction set,
53+
so the OpenCL spelling is the SPIR-V-level contract rather than an OpenCL.jl API
54+
dependency.

lib/intrinsics/src/SPIRVIntrinsics.jl

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,11 @@ using GPUToolbox
1212
include("pointer.jl")
1313
include("utils.jl")
1414

15-
# OpenCL intrinsics
15+
# SPIR-V intrinsics
1616
#
17-
# we currently don't implement SPIR-V intrinsics directly, but rely on
18-
# the SPIR-V to LLVM translator supporting OpenCL intrinsics
17+
# Prefer direct SPIR-V wrapper builtins where LLVM's SPIR-V backend supports
18+
# them. Math and integer library functions still use OpenCL.std names when
19+
# those are the SPIR-V extended instruction-set operations.
1920
include("work_item.jl")
2021
include("synchronization.jl")
2122
include("memory.jl")

lib/intrinsics/src/atomic.jl

Lines changed: 57 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,85 @@
11
# Atomic Functions
22

3-
# provides atomic functions that rely on the OpenCL base atomics, as well as the
4-
# cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics extensions.
3+
# Integer atomics are emitted as SPIR-V wrapper builtins, so the LLVM SPIR-V
4+
# backend lowers them to OpAtomic* instructions directly.
55

66
const atomic_float_types = [Float32, Float64]
77
const atomic_integer_types = [UInt32, Int32, UInt64, Int64]
88
const atomic_memory_types = [AS.Workgroup, AS.CrossWorkgroup]
9-
const atomic_types = vcat(atomic_float_types, atomic_integer_types)
9+
10+
const atomic_scope = Scope.Workgroup
11+
12+
atomic_memory_semantics(::Val{AS.Workgroup}) = MemorySemantics.WorkgroupMemory
13+
atomic_memory_semantics(::Val{AS.CrossWorkgroup}) = MemorySemantics.CrossWorkgroupMemory
1014

1115

1216
# generically typed
1317

14-
for gentype in atomic_types, as in atomic_memory_types
18+
for gentype in atomic_integer_types, as in atomic_memory_types
19+
atomic_min_intrinsic = gentype <: Signed ? "__spirv_AtomicSMin" : "__spirv_AtomicUMin"
20+
atomic_max_intrinsic = gentype <: Signed ? "__spirv_AtomicSMax" : "__spirv_AtomicUMax"
1521
@eval begin
1622

1723
@device_function atomic_add!(p::LLVMPtr{$gentype,$as}, val::$gentype) =
18-
@builtin_ccall("atomic_add", $gentype,
19-
(LLVMPtr{$gentype,$as}, $gentype), p, val)
24+
@builtin_ccall("__spirv_AtomicIAdd", $gentype,
25+
(LLVMPtr{$gentype,$as}, UInt32, UInt32, $gentype),
26+
p, UInt32(atomic_scope),
27+
UInt32(atomic_memory_semantics(Val($as))), val)
2028

2129
@device_function atomic_sub!(p::LLVMPtr{$gentype,$as}, val::$gentype) =
22-
@builtin_ccall("atomic_sub", $gentype,
23-
(LLVMPtr{$gentype,$as}, $gentype), p, val)
30+
@builtin_ccall("__spirv_AtomicISub", $gentype,
31+
(LLVMPtr{$gentype,$as}, UInt32, UInt32, $gentype),
32+
p, UInt32(atomic_scope),
33+
UInt32(atomic_memory_semantics(Val($as))), val)
2434

2535
@device_function atomic_inc!(p::LLVMPtr{$gentype,$as}) =
26-
@builtin_ccall("atomic_inc", $gentype, (LLVMPtr{$gentype,$as},), p)
36+
atomic_add!(p, one($gentype))
2737

2838
@device_function atomic_dec!(p::LLVMPtr{$gentype,$as}) =
29-
@builtin_ccall("atomic_dec", $gentype, (LLVMPtr{$gentype,$as},), p)
39+
atomic_sub!(p, one($gentype))
3040

3141
@device_function atomic_min!(p::LLVMPtr{$gentype,$as}, val::$gentype) =
32-
@builtin_ccall("atomic_min", $gentype,
33-
(LLVMPtr{$gentype,$as}, $gentype), p, val)
42+
@builtin_ccall($atomic_min_intrinsic, $gentype,
43+
(LLVMPtr{$gentype,$as}, UInt32, UInt32, $gentype),
44+
p, UInt32(atomic_scope),
45+
UInt32(atomic_memory_semantics(Val($as))), val)
3446

3547
@device_function atomic_max!(p::LLVMPtr{$gentype,$as}, val::$gentype) =
36-
@builtin_ccall("atomic_max", $gentype,
37-
(LLVMPtr{$gentype,$as}, $gentype), p, val)
48+
@builtin_ccall($atomic_max_intrinsic, $gentype,
49+
(LLVMPtr{$gentype,$as}, UInt32, UInt32, $gentype),
50+
p, UInt32(atomic_scope),
51+
UInt32(atomic_memory_semantics(Val($as))), val)
3852

3953
@device_function atomic_and!(p::LLVMPtr{$gentype,$as}, val::$gentype) =
40-
@builtin_ccall("atomic_and", $gentype,
41-
(LLVMPtr{$gentype,$as}, $gentype), p, val)
54+
@builtin_ccall("__spirv_AtomicAnd", $gentype,
55+
(LLVMPtr{$gentype,$as}, UInt32, UInt32, $gentype),
56+
p, UInt32(atomic_scope),
57+
UInt32(atomic_memory_semantics(Val($as))), val)
4258

4359
@device_function atomic_or!(p::LLVMPtr{$gentype,$as}, val::$gentype) =
44-
@builtin_ccall("atomic_or", $gentype,
45-
(LLVMPtr{$gentype,$as}, $gentype), p, val)
60+
@builtin_ccall("__spirv_AtomicOr", $gentype,
61+
(LLVMPtr{$gentype,$as}, UInt32, UInt32, $gentype),
62+
p, UInt32(atomic_scope),
63+
UInt32(atomic_memory_semantics(Val($as))), val)
4664

4765
@device_function atomic_xor!(p::LLVMPtr{$gentype,$as}, val::$gentype) =
48-
@builtin_ccall("atomic_xor", $gentype,
49-
(LLVMPtr{$gentype,$as}, $gentype), p, val)
50-
end
51-
if gentype in atomic_integer_types
52-
@eval begin
53-
@device_function atomic_xchg!(p::LLVMPtr{$gentype,$as}, val::$gentype) =
54-
@builtin_ccall("atomic_xchg", $gentype,
55-
(LLVMPtr{$gentype,$as}, $gentype), p, val)
56-
57-
@device_function atomic_cmpxchg!(p::LLVMPtr{$gentype,$as}, cmp::$gentype, val::$gentype) =
58-
@builtin_ccall("atomic_cmpxchg", $gentype,
59-
(LLVMPtr{$gentype,$as}, $gentype, $gentype), p, cmp, val)
60-
end
66+
@builtin_ccall("__spirv_AtomicXor", $gentype,
67+
(LLVMPtr{$gentype,$as}, UInt32, UInt32, $gentype),
68+
p, UInt32(atomic_scope),
69+
UInt32(atomic_memory_semantics(Val($as))), val)
70+
71+
@device_function atomic_xchg!(p::LLVMPtr{$gentype,$as}, val::$gentype) =
72+
@builtin_ccall("__spirv_AtomicExchange", $gentype,
73+
(LLVMPtr{$gentype,$as}, UInt32, UInt32, $gentype),
74+
p, UInt32(atomic_scope),
75+
UInt32(atomic_memory_semantics(Val($as))), val)
76+
77+
@device_function atomic_cmpxchg!(p::LLVMPtr{$gentype,$as}, cmp::$gentype, val::$gentype) =
78+
@builtin_ccall("__spirv_AtomicCompareExchange", $gentype,
79+
(LLVMPtr{$gentype,$as}, UInt32, UInt32, UInt32, $gentype, $gentype),
80+
p, UInt32(atomic_scope),
81+
UInt32(atomic_memory_semantics(Val($as))),
82+
UInt32(atomic_memory_semantics(Val($as))), val, cmp)
6183
end
6284
end
6385

@@ -248,10 +270,9 @@ end
248270

249271
# native atomics
250272
# TODO: support inc/dec
251-
# TODO: this depends on available extensions
252-
# - UInt64: requires cl_khr_int64_base_atomics for add/sub/inc/dec,
253-
# requires cl_khr_int64_extended_atomics for min/max/and/or/xor
254-
# - Float64: always should hit the fallback
273+
# TODO: this depends on backend support for the corresponding SPIR-V atomic
274+
# operation. Floating-point arithmetic should hit the cmpxchg fallback
275+
# unless a caller explicitly uses a floating-point atomic extension.
255276
for (op,impl) in [(+) => atomic_add!,
256277
(-) => atomic_sub!,
257278
(&) => atomic_and!,
@@ -265,7 +286,7 @@ for (op,impl) in [(+) => atomic_add!,
265286
end
266287

267288
# fallback using compare-and-swap
268-
# TODO: for 64-bit types, this depends on cl_khr_int64_base_atomics
289+
# TODO: for 64-bit types, this depends on backend support for 64-bit cmpxchg.
269290
function atomic_arrayset(A::AbstractArray{T}, I::Integer, op::Function, val) where {T}
270291
ptr = pointer(A, I)
271292
old = Base.unsafe_load(ptr, 1)

lib/intrinsics/src/shuffle.jl

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,13 @@ const gentypes = [Int8, UInt8, Int16, UInt16, Int32, UInt32, Int64, UInt64, Floa
44

55
for gentype in gentypes
66
@eval begin
7-
# cl_khr_subgroup_shuffle extension operations
8-
@device_function sub_group_shuffle(x::$gentype, i::Integer) = @builtin_ccall("sub_group_shuffle", $gentype, ($gentype, Int32), x, i % Int32 - 1i32)
9-
@device_function sub_group_shuffle_xor(x::$gentype, mask::Integer) = @builtin_ccall("sub_group_shuffle_xor", $gentype, ($gentype, Int32), x, mask % Int32)
7+
@device_function sub_group_shuffle(x::$gentype, i::Integer) =
8+
@builtin_ccall("__spirv_GroupNonUniformShuffle", $gentype,
9+
(UInt32, $gentype, UInt32),
10+
UInt32(Scope.Subgroup), x, UInt32(i - 1))
11+
@device_function sub_group_shuffle_xor(x::$gentype, mask::Integer) =
12+
@builtin_ccall("__spirv_GroupNonUniformShuffleXor", $gentype,
13+
(UInt32, $gentype, UInt32),
14+
UInt32(Scope.Subgroup), x, UInt32(mask))
1015
end
1116
end

lib/intrinsics/src/synchronization.jl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ end
6767
convert(UInt32, memory_scope),
6868
convert(UInt32, memory_semantics))
6969

70-
## OpenCL types
70+
## OpenCL-compatible fence API
7171

7272
const cl_mem_fence_flags = UInt32
7373
const LOCAL_MEM_FENCE = cl_mem_fence_flags(1)
@@ -94,7 +94,7 @@ end
9494
memory_scope_all_devices
9595
end
9696

97-
@inline function cl_scope_to_spirv(scope)
97+
@inline function memory_scope_to_spirv(scope)
9898
if scope == memory_scope_work_item
9999
Scope.Invocation
100100
elseif scope == memory_scope_sub_group
@@ -119,7 +119,7 @@ end
119119
end
120120

121121

122-
## OpenCL memory barriers
122+
## Memory barriers
123123

124124
export atomic_work_item_fence, mem_fence, read_mem_fence, write_mem_fence
125125

@@ -138,7 +138,7 @@ export atomic_work_item_fence, mem_fence, read_mem_fence, write_mem_fence
138138
else
139139
error("Invalid memory order: $order")
140140
end
141-
memory_barrier(cl_scope_to_spirv(scope), semantics)
141+
memory_barrier(memory_scope_to_spirv(scope), semantics)
142142
end
143143

144144
# legacy fence functions
@@ -147,16 +147,16 @@ read_mem_fence(flags) = atomic_work_item_fence(flags, memory_order_acquire, memo
147147
write_mem_fence(flags) = atomic_work_item_fence(flags, memory_order_release, memory_scope_work_group)
148148

149149

150-
## OpenCL execution barriers
150+
## Execution barriers
151151

152152
export barrier, work_group_barrier, sub_group_barrier
153153

154154
@inline work_group_barrier(flags, scope = memory_scope_work_group) =
155-
control_barrier(Scope.Workgroup, cl_scope_to_spirv(scope),
155+
control_barrier(Scope.Workgroup, memory_scope_to_spirv(scope),
156156
MemorySemantics.SequentiallyConsistent | mem_fence_flags_to_semantics(flags))
157157

158158
@inline sub_group_barrier(flags, scope = memory_scope_sub_group) =
159-
control_barrier(Scope.Subgroup, cl_scope_to_spirv(scope),
159+
control_barrier(Scope.Subgroup, memory_scope_to_spirv(scope),
160160
MemorySemantics.SequentiallyConsistent | mem_fence_flags_to_semantics(flags))
161161

162162
barrier(flags) = work_group_barrier(flags)

0 commit comments

Comments
 (0)