Skip to content

Commit 7248dee

Browse files
committed
more debugging
1 parent 92876d0 commit 7248dee

2 files changed

Lines changed: 35 additions & 2 deletions

File tree

.github/workflows/ci.yml

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,20 @@ jobs:
8585
curl -L -O https://tiker.net/ci-support-v0
8686
. ./ci-support-v0
8787
build_py_project_in_conda_env
88-
export PYTEST_FLAGS=-sv
89-
export CISUPPORT_PYTEST_NRUNNERS=1
88+
89+
# Diagnose the Intel CPU OpenCL heap corruption: run serially
90+
# in-process (no xdist workers, via CISUPPORT_PARALLEL_PYTEST=no)
91+
# so a glibc abort happens at the culprit test with a full
92+
# faulthandler traceback, instead of detonating later in another
93+
# worker's GC. ARRAYCONTEXT_INTEL_DIAG makes the conftest force a
94+
# gc pass at each test teardown so the abort lands on the test
95+
# that caused it. A fresh cache dir keeps a crash from leaving a
96+
# poisoned sqlite cache that cascades into spurious failures.
97+
export CISUPPORT_PARALLEL_PYTEST=no
98+
export ARRAYCONTEXT_INTEL_DIAG=1
99+
export PYTHONFAULTHANDLER=1
100+
export XDG_CACHE_HOME="$(mktemp -d)"
101+
export PYTEST_FLAGS="-sv -p no:cacheprovider"
90102
export NO_COLOR=1
91103
test_py_project
92104

test/conftest.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
from __future__ import annotations
2+
3+
import gc
4+
import os
5+
6+
import pytest
7+
8+
9+
@pytest.hookimpl(hookwrapper=True)
10+
def pytest_runtest_teardown(item, nextitem):
11+
# Diagnostic only (enabled via ARRAYCONTEXT_INTEL_DIAG): force a garbage
12+
# collection after each test's fixtures have been torn down. The Intel CPU
13+
# OpenCL runtime corrupts the host heap during some kernel executions; the
14+
# abort only fires when the smashed block is next freed, which otherwise
15+
# happens at an arbitrary later GC (often in an unrelated test's startup).
16+
# Collecting here frees this test's OpenCL objects immediately, so the abort
17+
# detonates at the teardown of the test that caused it, pinning the culprit.
18+
yield
19+
20+
if os.environ.get("ARRAYCONTEXT_INTEL_DIAG") == "1":
21+
gc.collect()

0 commit comments

Comments
 (0)