Skip to content

Commit 691a550

Browse files
committed
sycl-ref: expand grad kernel d-loop barrier comment
The terse 'see interp kernel comment' cross-reference was unhelpful. Spell out why local_space fence is both correct and preferred here.
1 parent 64ae995 commit 691a550

1 file changed

Lines changed: 4 additions & 1 deletion

File tree

backends/sycl-ref/ceed-sycl-ref-basis.sycl.cpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,10 @@ static int CeedBasisApplyGrad_Sycl(sycl::queue &sycl_queue, const SyclModule_t &
211211
CeedScalar *cur_v = v + elem * v_stride + dim_1 * v_dim_stride + comp * v_comp_stride;
212212

213213
for (CeedInt dim_2 = 0; dim_2 < dim; dim_2++) {
214-
// Full work-group barrier with local-only fence: see interp kernel comment.
214+
// Full work-group barrier with local-only fence: s_buffer_1/2 are SLM
215+
// (local_accessor), so local_space is sufficient and avoids the cost of
216+
// a global memory fence. Do not replace with a sub-group barrier —
217+
// work_group_size (= Q) can exceed the hardware sub-group size.
215218
work_item.barrier(sycl::access::fence_space::local_space);
216219

217220
pre /= P;

0 commit comments

Comments
 (0)