|
| 1 | +--- |
| 2 | +name: gpustack-operator-xbuild-and-verify |
| 3 | +description: "Build and verify the GPUStack Operator's accelerator soft-slicing **builder stages** (`xbuild-ascend-cann-*` and `xbuild-nvidia-cuda-*` in `pack/gpustack-operator/Dockerfile`) end to end, either on the local docker host or on a remote accelerator host over ssh. Builds one stage via buildx `--target`, then runs numbered cases against the produced runtime. SCOPE — two backends: **Ascend (vcann-rt: `libvruntime.so` + `enpu-monitor`)** and **NVIDIA (HAMi-core: `libvgpu.so`)**. Ascend cases: (1) artifacts+linking [no NPU], (2) inject + `enpu-monitor`, (3) memory-quota enforcement. NVIDIA cases: (1) artifacts+linking [no GPU], (2) single-card inject + `nvidia-smi`/SM-limit, (3) multi-card per-device limits. The hardware cases need a real accelerator. Proactively offer this whenever a branch changes the Docker build flow — `pack/gpustack-operator/Dockerfile` or `pack/gpustack-operator/external/(ascend|nvidia)/**`. Examples: \"verify my Dockerfile build-stage change\", \"did the vcann-rt / HAMi-core build still link\", \"test the soft-slicing build on the 910B / 4090 host\", \"does enpu-monitor still work in a container\", \"does nvidia-smi show the sliced memory\", \"prove the memory slice is enforced on real hardware\"." |
| 4 | +allowed-tools: "Read, AskUserQuestion, Bash(bash .claude/skills/gpustack-operator-xbuild-and-verify/scripts/preflight.sh*), Bash(bash .claude/skills/gpustack-operator-xbuild-and-verify/scripts/build.sh*), Bash(bash .claude/skills/gpustack-operator-xbuild-and-verify/cases/ascend-case-1.sh*), Bash(bash .claude/skills/gpustack-operator-xbuild-and-verify/cases/ascend-case-2.sh*), Bash(bash .claude/skills/gpustack-operator-xbuild-and-verify/cases/ascend-case-3.sh*), Bash(bash .claude/skills/gpustack-operator-xbuild-and-verify/cases/nvidia-case-1.sh*), Bash(bash .claude/skills/gpustack-operator-xbuild-and-verify/cases/nvidia-case-2.sh*), Bash(bash .claude/skills/gpustack-operator-xbuild-and-verify/cases/nvidia-case-3.sh*), Bash(grep*), Bash(git diff*), Bash(git rev-parse*), Bash(ssh*), Bash(docker buildx*), Bash(docker images*), Bash(docker info*), Bash(command -v*)" |
| 5 | +model: sonnet |
| 6 | +--- |
| 7 | + |
| 8 | +# GPUStack Operator — accelerator xbuild & verify |
| 9 | + |
| 10 | +Build one soft-slicing builder stage from `pack/gpustack-operator/Dockerfile` and verify the runtime it |
| 11 | +produces, on the local docker host or on a remote accelerator host over ssh. Two backends: |
| 12 | + |
| 13 | +- **Ascend (vcann-rt).** `xbuild-ascend-cann-*` → `libvruntime.so` + `enpu-monitor`. Verifies artifacts/ |
| 14 | + linking, the `npu_info.config` injection, and **real memory-quota enforcement** on a real NPU. |
| 15 | +- **NVIDIA (HAMi-core).** `xbuild-nvidia-cuda-*` → `libvgpu.so`. Verifies artifacts/linking, single-card |
| 16 | + injection (`nvidia-smi` shows the sliced VRAM + the SM/compute limit is applied), and **multi-card |
| 17 | + per-device limits** on real GPUs. |
| 18 | + |
| 19 | +It is the build+runtime-contract counterpart to the cluster-level `gpustack-operator-e2e` (scheduling |
| 20 | +chain) and `gpustack-operator-chart-e2e` (chart). This is an evolving, e2e-style skill: extend the cases as |
| 21 | +the build flow grows. |
| 22 | + |
| 23 | +## When to offer it |
| 24 | +Proactively suggest this skill when a branch changes the Docker build flow: |
| 25 | +```bash |
| 26 | +git diff --name-only origin/main...HEAD | grep -E 'pack/gpustack-operator/(Dockerfile|external/(ascend|nvidia)/)' |
| 27 | +``` |
| 28 | + |
| 29 | +## Runner model (local or remote) |
| 30 | +All scripts source `scripts/lib.sh` and run through one runner, selected by env: |
| 31 | +- `XB_MODE=local` — build & verify on this host. |
| 32 | +- `XB_MODE=ssh XB_HOST=user@host` — build & verify on a remote host. Files move via base64-over-ssh |
| 33 | + (never scp — a login banner corrupts it); a remote login banner is filtered from output. |
| 34 | + |
| 35 | +The remote host is **never hardcoded** — always ask the user for it. |
| 36 | + |
| 37 | +## Hard rules |
| 38 | +- **Never push images** — builds use `buildx --load` into the local/remote docker store only. |
| 39 | +- **Confirm before any remote build or container run** (they consume the host's accelerator/driver). |
| 40 | + Preflight and the build-artifact case (ASCEND-CASE 1 / NVIDIA-CASE 1) are safe once the user names the target. |
| 41 | +- Touch only what the skill creates (the `vcann-build:*` / `vgpu-build:*` image, `${XB_STAGE}` artifacts, |
| 42 | + `${XB_STAGE}/test` config/preload, the remote build context). Never modify the user's other resources. |
| 43 | +- The hardware cases require a **real accelerator** (local or the ssh host): ASCEND-CASE 2/3 need an NPU; |
| 44 | + NVIDIA-CASE 2 needs a GPU, NVIDIA-CASE 3 needs **≥ 2** GPUs. The two CASE-1 builds need only docker+buildx. |
| 45 | + |
| 46 | +## Flow |
| 47 | + |
| 48 | +1. **Discover targets.** List the builder stages and ask which to verify (multi-select): |
| 49 | + ```bash |
| 50 | + grep -nE 'AS xbuild-(ascend-cann|nvidia-cuda)-' pack/gpustack-operator/Dockerfile |
| 51 | + ``` |
| 52 | + Ascend: `xbuild-ascend-cann-8-910b`, `-8-910c`, `-9-910b`, `-9-910c`, `-9-950`. |
| 53 | + NVIDIA: `xbuild-nvidia-cuda-12`, `-13`. |
| 54 | + |
| 55 | +2. **Pick connection (AskUserQuestion).** Local, or ssh — and if ssh, the host. Set `XB_MODE`/`XB_HOST`. |
| 56 | + |
| 57 | +3. **Preflight (read-only, confirm target first).** |
| 58 | + ```bash |
| 59 | + XB_MODE=… XB_HOST=… bash .claude/skills/gpustack-operator-xbuild-and-verify/scripts/preflight.sh |
| 60 | + ``` |
| 61 | + docker+buildx must PASS to build. The hardware rows (`npu-smi`/ascend-runtime/`/dev/davinci*` and |
| 62 | + `nvidia-smi`/nvidia-runtime/`/dev/nvidia*`) WARN when absent — the matching hardware cases are then |
| 63 | + unavailable. If buildx is missing, the table prints the install one-liner. |
| 64 | + |
| 65 | +4. **Build the chosen target (confirm).** `build.sh` infers the backend from the target prefix. Native on a |
| 66 | + matching-arch host (fast); cross-arch uses qemu. |
| 67 | + ```bash |
| 68 | + XB_MODE=… XB_HOST=… bash .claude/skills/gpustack-operator-xbuild-and-verify/scripts/build.sh xbuild-nvidia-cuda-13 |
| 69 | + ``` |
| 70 | + Produces `XB_IMAGE` (Ascend `vcann-build:<suffix>` / NVIDIA `vgpu-build:<suffix>`) and stages the |
| 71 | + artifacts under `XB_STAGE` (Ascend `/opt/enpu/vcann-rt`, NVIDIA `/opt/vgpu`). The built image is |
| 72 | + CANN/CUDA-based and doubles as the workload image for the hardware cases (`XB_WORKLOAD_IMAGE` defaults |
| 73 | + to it). |
| 74 | + |
| 75 | +5. **Run cases.** Pass the same target; read each PASS/FAIL table — don't re-derive from raw logs. |
| 76 | + ```bash |
| 77 | + # Ascend |
| 78 | + XB_MODE=… XB_HOST=… bash .../cases/ascend-case-1.sh xbuild-ascend-cann-8-910b |
| 79 | + XB_MODE=… XB_HOST=… XB_NPU=0 bash .../cases/ascend-case-2.sh xbuild-ascend-cann-8-910b |
| 80 | + XB_MODE=… XB_HOST=… XB_NPU=0 XB_MEM=1024 bash .../cases/ascend-case-3.sh xbuild-ascend-cann-8-910b |
| 81 | + # NVIDIA |
| 82 | + XB_MODE=… XB_HOST=… bash .../cases/nvidia-case-1.sh xbuild-nvidia-cuda-13 |
| 83 | + XB_MODE=… XB_HOST=… XB_GPU=0 XB_MEM=4096 XB_SM=50 bash .../cases/nvidia-case-2.sh xbuild-nvidia-cuda-13 |
| 84 | + XB_MODE=… XB_HOST=… XB_GPUS=0,1 XB_MEM=4096 bash .../cases/nvidia-case-3.sh xbuild-nvidia-cuda-13 |
| 85 | + ``` |
| 86 | + |
| 87 | +## Cases (locked titles) |
| 88 | + |
| 89 | +### Ascend (`xbuild-ascend-cann-*`) |
| 90 | +| Case | Title | Needs NPU | Asserts | |
| 91 | +|---|---|---|---| |
| 92 | +| 1 | Build artifacts + linking | no | `libvruntime.so` (0644) + `enpu-monitor` (0755) exist; ELF arch == build platform; build linked (the `--allow-shlib-undefined` path); both `NEEDED` `libc_sec.so`+`libascendcl.so`; notes the weak UND `dcmi_*` syms | |
| 93 | +| 2 | Inject + enpu-monitor | yes | VDie-ID→`shm-id`; render `npu_info.config`; preload (libdcmi×2 + libvruntime); container `enpu-monitor` loads all 6 fields, initializes, and prints `Aicore Limit Quota`/`Memory Limit quota` matching the config | |
| 94 | +| 3 | Memory-quota enforcement | yes | injected HBM alloc capped at `memory-quota` (the `Out of memory! … quota:<bytes>` log); baseline (no inject) exceeds it | |
| 95 | + |
| 96 | +### NVIDIA (`xbuild-nvidia-cuda-*`) |
| 97 | +| Case | Title | Needs GPU | Asserts | |
| 98 | +|---|---|---|---| |
| 99 | +| 1 | Build artifacts + linking | no | `libvgpu.so` (0644) exists; ELF arch == build platform; `NEEDED` `libcuda.so.1`+`libnvidia-ml.so.1` (hard deps the NVIDIA runtime injects — no weak-UND preload, contrast Ascend) | |
| 100 | +| 2 | Single-card inject + nvidia-smi | yes (1 GPU) | preload `libvgpu.so`; with `CUDA_DEVICE_MEMORY_LIMIT_0`+`CUDA_DEVICE_SM_LIMIT`, `nvidia-smi memory.total` == the limit (NVML hook); a CUDA probe logs `core utilization limit = <SM>` and `cuMemGetInfo` total == the limit (real CUDA-level enforcement) | |
| 101 | +| 3 | Multi-card per-device limits | yes (≥2 GPU) | each exposed card gets a **distinct** `CUDA_DEVICE_MEMORY_LIMIT_<i>` and the container's `nvidia-smi` reports each card's `memory.total` at its own limit (skips with WARN if too few GPUs) | |
| 102 | + |
| 103 | +## Env knobs |
| 104 | +`XB_MODE`/`XB_HOST` (runner); `XB_PLATFORM` (default from target arch); `XB_IMAGE`/`XB_WORKLOAD_IMAGE`; |
| 105 | +`XB_STAGE` (Ascend `/opt/enpu/vcann-rt` | NVIDIA `/opt/vgpu`); `XB_REMOTE_CTX` (remote build-context dir). |
| 106 | +- Ascend: `XB_NPU`/`XB_CHIP` (card/chip, default 0); `XB_VNPU` (0); `XB_AICORE` (20); `XB_MEM` (1024 MB). |
| 107 | +- NVIDIA: `XB_GPU` (single-card index, 0); `XB_GPUS` (multi-card csv, `0,1`); `XB_MEM` (MiB, 4096); |
| 108 | + `XB_SM` (compute %, 50 / 30). |
| 109 | + |
| 110 | +## References |
| 111 | +- `references/ascend-npu-info-config.md` — Ascend: the 6 config fields, VDie-ID→shm-id, allocator mapping. |
| 112 | +- `references/ascend-ld-preload-and-libdcmi.md` — Ascend activation via `/etc/ld.so.preload`; **why libdcmi must |
| 113 | + be preloaded** (weak dcmi syms); the `libc_sec`/CANN-image requirement. |
| 114 | +- `references/ascend-npu-smi-and-aicore.md` — Ascend: `npu-smi` shows the physical card; AICore-quota mechanism, |
| 115 | + the benign CANN-8.5.0 warnings, the unverified-throttle gap. |
| 116 | +- `references/nvidia-hami-core-vgpu.md` — NVIDIA: what `libvgpu.so` is, the env+mount injection contract, the |
| 117 | + one-CUDA-major-per-container rule, HAMi-core knobs. |
| 118 | +- `references/nvidia-smi-and-sm-limit.md` — NVIDIA: memory limit is directly visible in `nvidia-smi` (NVML |
| 119 | + hook); the SM/compute limit is a time-slice throttle (HAMi log / under-load only); CUDA-13 probe gotchas. |
| 120 | +- `references/troubleshooting.md` — both backends: scp banner, buildx-missing, Ascend link/segfault/hgemm, |
| 121 | + NVIDIA runtime/preload/stale-cache/cuCtxCreate-v4/stub-lib/SM-visibility. |
0 commit comments