Skip to content

Heap corruption during SYCL kernel JIT on DG2 (Arc A770) — detected via tcmalloc #407

Description

@BrandonRaeder

Summary

Intel Graphics Compiler (libigc.so.2) corrupts glibc heap metadata during SYCL kernel JIT compilation targeting DG2 (Arc A770). The corruption is silent under glibc's default allocator but detectable via LD_PRELOAD=libtcmalloc.so.4, which crashes with munmap_chunk(): invalid pointer or double free or corruption (out) inside IGC's internal free() calls during IgcOclTranslationCtx::Translate.

Reproducer

// igc_heap_repro.cpp
#include <sycl/sycl.hpp>
#include <complex>
#include <cstdio>
#include <vector>

using cf = std::complex<float>;

int main() {
    sycl::queue q;
    fprintf(stderr, "device: %s\n",
        q.get_device().get_info<sycl::info::device::name>().c_str());

    constexpr int N = 177147;  // 3^11
    constexpr int G = 9;

    cf *d_sv = sycl::malloc_device<cf>(N * G, q);
    float *d_gate = sycl::malloc_device<float>(G * G, q);
    if (!d_sv || !d_gate) { fprintf(stderr, "alloc failed\n"); return 1; }

    std::vector<float> h_gate(G * G, 0.1f);
    h_gate[0] = 1.0f;
    q.memcpy(d_gate, h_gate.data(), G * G * sizeof(float)).wait();

    std::vector<cf> h_sv(N * G, cf(0.0f));
    h_sv[0] = cf(1.0f, 0.0f);
    q.memcpy(d_sv, h_sv.data(), N * G * sizeof(cf)).wait();

    fprintf(stderr, "dispatching kernel (triggers IGC JIT)...\n");
    q.parallel_for(sycl::range<1>(N), [=](sycl::id<1> id) {
        int bid = id[0];
        cf local[9];
        for (int i = 0; i < G; i++)
            local[i] = d_sv[bid * G + i];

        cf result[9];
        for (int i = 0; i < G; i++) {
            cf sum(0.0f, 0.0f);
            for (int j = 0; j < G; j++)
                sum += cf(d_gate[i * G + j], 0.0f) * local[j];
            result[i] = sum;
        }

        for (int i = 0; i < G; i++)
            d_sv[bid * G + i] = result[i];
    }).wait();
    fprintf(stderr, "kernel complete\n");

    sycl::free(d_gate, q);
    sycl::free(d_sv, q);

    for (int i = 0; i < 100; i++) {
        void *p = malloc(1024 + i * 64);
        free(p);
    }

    fprintf(stderr, "PASS\n");
    return 0;
}

Build & Run

# Build
icpx -std=c++20 -O2 -g -fsycl -fsycl-targets=spir64 \
  -o igc_heap_repro igc_heap_repro.cpp -lze_loader -lsycl -lur_loader

# Normal run — passes (corruption is silent under glibc):
./igc_heap_repro
# Output: "PASS"

# tcmalloc run — crashes (corruption detected):
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 ./igc_heap_repro
# Output: "munmap_chunk(): invalid pointer" then SIGABRT

GDB Backtrace (under tcmalloc)

Thread 1 "igc_heap_repro" received signal SIGABRT, Aborted.
#0  __pthread_kill_implementation at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill at ./nptl/pthread_kill.c:89
#3  __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#4  __GI_abort () at ./stdlib/abort.c:79
#5  __libc_message_impl at ../sysdeps/posix/libc_fatal.c:134
#6  malloc_printerr (str="munmap_chunk(): invalid pointer") at ./malloc/malloc.c:5775
#7  munmap_chunk (p=<optimized out>) at ./malloc/malloc.c:3040
#8  __GI___libc_free (mem=0x...) at ./malloc/malloc.c:3388
#9  ?? () from /lib/x86_64-linux-gnu/libigc.so.2
#10 ?? () from /lib/x86_64-linux-gnu/libigc.so.2
...
#20 IGC::IgcOclTranslationCtx<3ul>::TranslateImpl(...) from /lib/x86_64-linux-gnu/libigc.so.2
#21 ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
...
#28 sycl::_V1::detail::ProgramManager::build(...) from libsycl.so.9

All frames #9#20 are inside libigc.so.2. The corruption happens during the Translate (kernel compilation) phase.

Additional observation: tcmalloc_debug TCMALLOC_PAGE_FENCE=1

Running with the debug tcmalloc library reveals a related issue at startup:

memory allocation/deallocation mismatch at 0x...: allocated with malloc being deallocated with delete

This fires during SYCL/L0 runtime initialization (before any user kernel dispatch), suggesting a malloc/delete type mismatch inside the runtime libraries.

Environment

  • OS: Ubuntu 24.04 (kernel 6.17.0-19-generic)
  • GPU: Intel Arc A770 (DG2)
  • IGC: libigc2 2.10.11-1133~24.04
  • Compute Runtime: intel-opencl-icd 25.13.33276.22-1133~24.04
  • oneAPI: 2026.0 (icpx, libsycl.so.9)
  • glibc: 2.39-0ubuntu8.7
  • tcmalloc: libgoogle-perftools4 2.15-3build1

Impact

Under glibc's default allocator, the corruption is dormant — it only manifests as double free or corruption (!prev) during process exit (destructor chain). This makes it appear as an application bug when it's actually an IGC-internal buffer overflow. The SYCL_CACHE_PERSISTENT=1 workaround (kernel caching) avoids repeated JIT and thus avoids repeated corruption, but the first run after cache invalidation still triggers it.

Ahead-of-time compilation (-fsycl-targets=intel_gpu_dg2) eliminates the JIT path and sidesteps the bug entirely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions