You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before setting up, the script checks if the system is a HaloBox. If it is, some steps, such as creating a virtual environment, are skipped as HaloBox comes with pre-installed driver.
126
+
127
+
<!-- @os:halobox -->
128
+
### HaloBox: Skipping Virtual Environment Setup
129
+
No need to set up a virtual environment as necessary driver and configurations are pre-installed.
@@ -299,13 +331,14 @@ On Windows, `rocm-smi` is not supported. To track GPU utilization, you can use T
299
331
The full manual path: write the kernel and Python binding in a single `.cu` file, compile it as a native extension using PyTorch's build system, then import and call it from Python.
300
332
301
333
**Files:**
302
-
334
+
#### Windows
303
335
<!-- @os:windows -->
304
336
| File | Role |
305
337
|---|---|
306
338
| [add_one_kernel.cu](assets/Vector_Addition/add_one_kernel.cu) | Kernel + launcher + pybind11 binding, everything in one file |
307
339
| [setup.py](assets/Vector_Addition/setup.py) | Build script, uses `CUDAExtension` to compile the `.cu` into a `.pyd`|
308
340
<!-- @os:end -->
341
+
#### Linux
309
342
<!-- @os:linux -->
310
343
| File | Role |
311
344
|---|---|
@@ -349,16 +382,26 @@ the CPU immediately continues executing the next instruction without waiting for
`CUDAExtension` is a CUDA build helper from `torch.utils.cpp_extension`. On AMD with ROCm, PyTorch **remaps `CUDAExtension` to use `hipcc`** instead of `nvcc`, so the same `setup.py` that would build a CUDA extension on NVIDIA compiles to AMD GPU code without any changes. This is the key mechanism that makes CUDA extension code portable to AMD: PyTorch's ROCm build intercepts the build path and routes it through the HIP compiler. Produces these in the same directory:
357
398
<!-- @os:windows -->
399
+
#### Windows
358
400
- `build/`: directory with the `.pyd` files
359
401
- `add_one_kernel.hip`: the HIP source generated by hipifying the `.cu` file; this is what `hipcc` actually compiled
360
402
<!-- @os:end -->
361
403
<!-- @os:linux -->
404
+
#### Linux
362
405
- `build/`: directory with the `.so` files
363
406
- `add_one_kernel.hip`: the HIP source generated by hipifying the `.cu` file; this is what `hipcc` actually compiled
364
407
<!-- @os:end -->
@@ -530,12 +573,14 @@ Average GPU Utilization: 55.00%
530
573
The full manual path: write the kernel and Python binding in a `.cu` file, compile it as a native extension, then import and call it from Python. Mirrors the structure of `add_one_kernel.cu` exactly, only the kernel signature and launcher logic differ.
0 commit comments