Skip to content

Commit 7aadb62

Browse files
committed
Address feedbacks
Signed-off-by: Kai Xu <kaix@nvidia.com>
1 parent 10d4238 commit 7aadb62

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

.claude/skills/evaluation/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ After the dry-run, check the output from `nel` for any problems with the config.
256256

257257
**Monitoring Progress**
258258

259-
After job submission, register the job and set up monitoring per the **monitor skill**.
259+
After job submission, register the job per the **monitor skill** for durable cross-session tracking. For one-off queries (live status, debugging a failed run, analyzing results) use the **launching-evals skill**; for querying past runs in MLflow use **accessing-mlflow**.
260260

261261
**NEL-specific diagnostics** (for debugging failures):
262262

.claude/skills/monitor/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: monitor
3-
description: Monitor submitted jobs (PTQ, evaluation, deployment) on SLURM clusters. Use when the user asks "check job status", "is my job done", "monitor my evaluation", "what's the status of the PTQ", "check on job 12345", or after any skill submits a long-running job. Also triggers on "nel status", "squeue", or any request to check progress of a previously submitted job.
3+
description: Monitor submitted jobs (PTQ, evaluation, deployment) on SLURM clusters. Use when the user asks "check job status", "is my job done", "monitor my evaluation", "what's the status of the PTQ", "check on job <slurm_job_id>", or after any skill submits a long-running job. Also triggers on "nel status", "squeue", or any request to check progress of a previously submitted job.
44
---
55

66
# Job Monitor

0 commit comments

Comments
 (0)