Skip to content

docs: genericize PPP terminology and lustre path in launching-evals SKILL#1023

Merged
marta-sd merged 1 commit into
NVIDIA-NeMo:mainfrom
Edwardf0t1:Edwardf0t1/docs-genericize-launching-evals-internal-refs
May 22, 2026
Merged

docs: genericize PPP terminology and lustre path in launching-evals SKILL#1023
marta-sd merged 1 commit into
NVIDIA-NeMo:mainfrom
Edwardf0t1:Edwardf0t1/docs-genericize-launching-evals-internal-refs

Conversation

@Edwardf0t1
Copy link
Copy Markdown

Closes #938. cc @marta-sd

Summary

The "Key Facts" section of launching-evals/SKILL.md had two NVIDIA-internal references that bias the skill toward NVIDIA infra and confuse external users:

  1. "PPP" terminology with coreai_dlalgo_* example account names. "PPP" is internal NVIDIA jargon; the example values are NVIDIA-specific.

    • Renamed the bullet to "SLURM account" (the universally-correct term).
    • Kept "PPP" as a parenthetical alias so internal users still recognize it.
    • Genericized the example values to <account_name> / <new_account_name>.
  2. HF cache download example hardcoded an NVIDIA-internal lustre path (/lustre/fsw/portfolios/coreai/users/<username>/cache/...).

    • Replaced with <your_hf_cache_dir> placeholder.
    • Added a hint that it should be a shared filesystem accessible from compute nodes — /lustre/... for multi-node clusters or ~/.cache/huggingface for single-node setups.

Diff

- - **PPP** = Slurm account (the `account` field in cluster_config.yaml). When the user says "change PPP to X", update the account value (e.g., `coreai_dlalgo_compeval` → `coreai_dlalgo_llm`).
+ - **SLURM account**: the `account` field in `cluster_config.yaml`. When the user asks to change it (some teams call this a "PPP"), update the value (e.g., `<account_name>` → `<new_account_name>`).
- - **HF cache requirement**: ... then `HF_HOME=/lustre/fsw/portfolios/coreai/users/<username>/cache/huggingface hf download <model>`. Without this, vLLM will fail with `LocalEntryNotFoundError`.
+ - **HF cache requirement**: ... then `HF_HOME=<your_hf_cache_dir> hf download <model>` (typically a shared filesystem accessible from compute nodes — e.g., a `/lustre/...` mount on multi-node clusters or `~/.cache/huggingface` for single-node setups). Without this, vLLM will fail with `LocalEntryNotFoundError`.

Context

This came up while vendoring launching-evals into NVIDIA/Model-Optimizer (PR #1239). Reviewers flagged the internal references. Issue #938 has the full discussion.

Thanks for the earlier gitlab-master URL fix in #920 — this PR addresses the two leftovers.

The "Key Facts" section of `launching-evals/SKILL.md` had two
NVIDIA-internal references that bias the skill toward NVIDIA infra and
confuse external users:

1. "PPP" terminology with `coreai_dlalgo_*` example account names. "PPP"
   is internal NVIDIA jargon; the example values are NVIDIA-specific.
   Renamed the bullet to "SLURM account" (the universally-correct term)
   and kept "PPP" as a parenthetical alias so internal users still
   recognize it. Genericized the example values to `<account_name>` /
   `<new_account_name>`.

2. The HF cache download example hardcoded an NVIDIA-internal lustre
   path (`/lustre/fsw/portfolios/coreai/users/<username>/cache/...`).
   Replaced with `<your_hf_cache_dir>` placeholder, with a hint that it
   should be a shared filesystem accessible from compute nodes
   (`/lustre/...` for multi-node, `~/.cache/huggingface` for single-node).

Closes NVIDIA-NeMo#938

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@marta-sd
Copy link
Copy Markdown
Contributor

/ok to test 54a7f4d

Copy link
Copy Markdown
Contributor

@marta-sd marta-sd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @Edwardf0t1 !

@marta-sd marta-sd merged commit 1c9488b into NVIDIA-NeMo:main May 22, 2026
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

launching-evals SKILL.md: "PPP" terminology and internal lustre path remain NVIDIA-internal

2 participants