Skip to content

Commit 645a545

Browse files
committed
Split credential setup out of slurm-setup.md into credentials.md
Addresses three overlapping review comments on slurm-setup.md:62 from PR #1239: - @mxinO: NGC/HF/Docker tokens aren't SLURM-specific — wanted a general credential setup guide referenced from multiple skills. - CodeRabbit: `$oauthtoken` needs to be called out as a literal NGC login string, not a shell variable to substitute. - Copilot: the previous snippet overwrote `~/.config/enroot/.credentials` unconditionally, clobbering entries for other registries. New `skills/common/credentials.md` covers HF_TOKEN, NGC API key (Docker + enroot paths), and Docker Hub. The NGC/enroot block uses an append-if-missing pattern (`grep -q ... || echo ... >>`) and spells out that `$oauthtoken` is a literal, kept unexpanded via single quotes. `slurm-setup.md` now keeps only the pyxis-specific signpost — one paragraph pointing at `credentials.md` for the actual setup. Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
1 parent 31c4fe8 commit 645a545

2 files changed

Lines changed: 61 additions & 11 deletions

File tree

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Credentials Setup
2+
3+
Tokens and registry credentials that ModelOpt workflows need across local and cluster environments. Not SLURM-specific — referenced from PTQ, deployment, evaluation, and slurm-setup skills.
4+
5+
## HuggingFace token (`HF_TOKEN`)
6+
7+
Required for gated models (e.g., Llama, Mistral, some Nemotron variants) and gated datasets (e.g., GPQA, HLE).
8+
9+
Generate at <https://huggingface.co/settings/tokens>, then export:
10+
11+
```bash
12+
export HF_TOKEN=hf_...
13+
```
14+
15+
Persist in `~/.bashrc` or a project-local `.env` file. For remote clusters, check whether the cluster's shell config already sets it: `ssh <cluster-login> 'env | grep -c HF_TOKEN'`.
16+
17+
## NGC API key (for `nvcr.io`)
18+
19+
Required for pulling NGC images (`nvcr.io/nvidia/pytorch:...`, `nvcr.io/nvidia/vllm:...`) via Docker, `srun --container-image`, or enroot.
20+
21+
Generate at <https://ngc.nvidia.com/setup/api-key>.
22+
23+
### Docker
24+
25+
```bash
26+
docker login nvcr.io -u '$oauthtoken' -p <NGC_API_KEY>
27+
```
28+
29+
### Enroot (SLURM / pyxis)
30+
31+
Add an entry to `~/.config/enroot/.credentials` on the cluster. The file may already hold credentials for other registries — **append rather than overwrite**:
32+
33+
```bash
34+
mkdir -p ~/.config/enroot
35+
CREDS=~/.config/enroot/.credentials
36+
touch "$CREDS"
37+
grep -q '^machine nvcr.io ' "$CREDS" || \
38+
echo 'machine nvcr.io login $oauthtoken password <NGC_API_KEY>' >> "$CREDS"
39+
chmod 600 "$CREDS"
40+
```
41+
42+
> **Note**: `$oauthtoken` is a **literal string** required by NGC, not a shell variable. Do not replace it and do not let your shell expand it — the single quotes above keep it literal.
43+
44+
Without this, `srun --container-image=nvcr.io/...` fails with `401 Unauthorized` when the compute node tries to pull.
45+
46+
## Docker Hub login
47+
48+
Only needed if you hit rate limits pulling public images:
49+
50+
```bash
51+
docker login
52+
```
53+
54+
## Summary
55+
56+
| Credential | Used for | Set via |
57+
|---|---|---|
58+
| `HF_TOKEN` | Gated HF models / datasets | Env var (`export HF_TOKEN=...`) or `.env` |
59+
| NGC API key | `nvcr.io` image pulls | `docker login` or `~/.config/enroot/.credentials` |
60+
| Docker Hub | Rate-limited public image pulls | `docker login` |

.claude/skills/common/slurm-setup.md

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -53,17 +53,7 @@ srun \
5353

5454
### Container registry credentials (pyxis)
5555

56-
If `srun --container-image` uses an image from a private registry (e.g., `nvcr.io/nvidia/...`), pyxis/enroot needs credentials on the cluster. Check for existing credentials and add if missing:
57-
58-
```bash
59-
cat ~/.config/enroot/.credentials 2>/dev/null || echo "No credentials"
60-
# To add NGC credentials:
61-
mkdir -p ~/.config/enroot
62-
echo 'machine nvcr.io login $oauthtoken password <NGC_API_KEY>' > ~/.config/enroot/.credentials
63-
chmod 600 ~/.config/enroot/.credentials
64-
```
65-
66-
Without this, `srun` will fail with `401 Unauthorized` when pulling from `nvcr.io`.
56+
If `srun --container-image` uses an image from a private registry (e.g., `nvcr.io/nvidia/...`), pyxis/enroot needs registry credentials on the cluster in `~/.config/enroot/.credentials`. See `skills/common/credentials.md` for the NGC / Docker / HF token setup. Without this, `srun` fails with `401 Unauthorized` when the compute node pulls.
6757

6858
Submit and capture the job ID:
6959

0 commit comments

Comments
 (0)