Skip to content

Commit 60e654c

Browse files
committed
Improve step 4 routing and use nvidia-smi for GPU detection
Signed-off-by: Meng Xin <mxin@nvidia.com>
1 parent d8e1d37 commit 60e654c

2 files changed

Lines changed: 10 additions & 2 deletions

File tree

.claude/skills/common/environment-setup.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,11 @@ Run these checks on the **target machine** (local, or via SSH if remote):
4444
```bash
4545
which srun sbatch 2>/dev/null && echo "SLURM"
4646
docker info 2>/dev/null | grep -qi nvidia && echo "Docker+GPU"
47-
python -c "import torch; print(torch.cuda.device_count(), 'GPUs')" 2>/dev/null
47+
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null
4848
```
4949

50+
Use `nvidia-smi` for GPU detection — it's more reliable than `torch.cuda` which depends on the Python environment having CUDA-enabled PyTorch installed.
51+
5052
### Execution context summary
5153

5254
After detection, you should know which row you're in:

.claude/skills/ptq/SKILL.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,13 @@ All format definitions: `modelopt/torch/quantization/config.py`.
4646

4747
**Goal: checkpoint on disk** (`.safetensors` + `config.json`). Always smoke test first (`--calib_size 4`), then full calibration.
4848

49-
### 4A — Direct: supported model
49+
**Which path?** Based on step 1:
50+
51+
- SLURM or Docker+GPU detected → **4B (Launcher)**
52+
- Bare GPU, no Docker/SLURM → **4A (Direct)**
53+
- Unsupported model (any env) → **4C (Custom script)**
54+
55+
### 4A — Direct: supported model (bare GPU, no Docker/SLURM)
5056

5157
```bash
5258
pip install --no-build-isolation "nvidia-modelopt[hf]"

0 commit comments

Comments
 (0)