Skip to content

Commit 17a3073

Browse files
author
Mark Saroufim
committed
Add test workflow for ARC GPU isolation
Verifies that 8 concurrent jobs each get exactly 1 GPU, isolated CPU cores, and capped RAM via ARC runner scale set.
1 parent f273dcf commit 17a3073

1 file changed

Lines changed: 41 additions & 0 deletions

File tree

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
name: Test ARC GPU Isolation
2+
3+
on:
4+
workflow_dispatch:
5+
6+
jobs:
7+
gpu-test:
8+
runs-on: arc-runner-set
9+
strategy:
10+
fail-fast: false
11+
matrix:
12+
job_id: [1, 2, 3, 4, 5, 6, 7, 8]
13+
timeout-minutes: 5
14+
steps:
15+
- name: Report resources
16+
run: |
17+
echo "=== Job ${{ matrix.job_id }} ==="
18+
echo ""
19+
echo "--- GPU ---"
20+
rocm-smi --showid 2>/dev/null || echo "rocm-smi not available"
21+
echo ""
22+
echo "--- ROCR_VISIBLE_DEVICES ---"
23+
echo "ROCR_VISIBLE_DEVICES=${ROCR_VISIBLE_DEVICES:-not set}"
24+
echo "HIP_VISIBLE_DEVICES=${HIP_VISIBLE_DEVICES:-not set}"
25+
echo "GPU_DEVICE_ORDINAL=${GPU_DEVICE_ORDINAL:-not set}"
26+
echo ""
27+
echo "--- GPU device files ---"
28+
ls -la /dev/dri/ 2>/dev/null || echo "no /dev/dri"
29+
ls -la /dev/kfd 2>/dev/null || echo "no /dev/kfd"
30+
echo ""
31+
echo "--- CPU ---"
32+
echo "CPU cores available: $(nproc)"
33+
echo "CPU quota: $(cat /sys/fs/cgroup/cpu.max 2>/dev/null || cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us 2>/dev/null || echo 'N/A')"
34+
lscpu | grep -E "^CPU\(s\)|Thread|Core|Socket" || true
35+
echo ""
36+
echo "--- Memory ---"
37+
echo "Memory limit: $(cat /sys/fs/cgroup/memory.max 2>/dev/null || cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>/dev/null || echo 'N/A')"
38+
free -h
39+
echo ""
40+
echo "--- Hostname ---"
41+
hostname

0 commit comments

Comments
 (0)