Skip to content

Commit b67c586

Browse files
committed
try turning off SIMD
1 parent fbd4d23 commit b67c586

2 files changed

Lines changed: 10 additions & 12 deletions

File tree

.github/workflows/ci.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,14 @@ jobs:
143143
sudo apt-get update && sudo apt-get install -y valgrind
144144
export PYTHONMALLOC=malloc
145145
export NO_COLOR=1
146+
147+
# Test whether disabling the Intel CPU runtime's SIMD vectorizer
148+
# avoids the out-of-bounds tail-lane store at its source. The
149+
# over-provisioned, bounds-guard-masked work-items only have
150+
# their stores leak through the vectorized code path; running
151+
# scalar should honor the guard. If valgrind reports no "Invalid
152+
# write" here (with no padding), this is a clean global fix.
153+
export CL_CONFIG_CPU_VECTORIZER_MODE=1
146154
valgrind \
147155
--smc-check=all-non-file \
148156
--leak-check=no --errors-for-leak-kinds=none \

intel_crash_reproducer.py

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -41,19 +41,9 @@
4141
print()
4242
print(lp.generate_code_v2(knl).device_code())
4343

44-
# Execute the kernel. Allocate the output through the array context's padding
45-
# allocator, which over-allocates buffers to absorb the Intel CPU runtime's
46-
# out-of-bounds tail-lane stores. Under valgrind this should turn the previous
47-
# "Invalid write ... 0 bytes after a block" into a write that lands inside the
48-
# (padded) block.
49-
from pyopencl.tools import ImmediateAllocator
50-
51-
from arraycontext.impl.pytato import _PaddedAllocator
52-
53-
44+
# Execute the kernel.
5445
ctx = cl.create_some_context(interactive=False)
5546
queue = cl.CommandQueue(ctx)
56-
allocator = _PaddedAllocator(ImmediateAllocator(queue))
5747

58-
_evt, (out,) = knl(queue, allocator=allocator)
48+
_evt, (out,) = knl(queue)
5949
print(out.get())

0 commit comments

Comments
 (0)