Skip to content

Add slime-train PATH shim for cwd-agnostic invocation#5066

Closed
neilyan-msft wants to merge 1 commit into
mainfrom
users/neilyan/slime-train-shim-20260519
Closed

Add slime-train PATH shim for cwd-agnostic invocation#5066
neilyan-msft wants to merge 1 commit into
mainfrom
users/neilyan/slime-train-shim-20260519

Conversation

@neilyan-msft
Copy link
Copy Markdown
Contributor

Summary

Adds a small slime-train shim at /usr/local/bin/slime-train so consumers can launch the slime training entrypoint from any working directory.

Motivation

Slime's entrypoint is the repo-root train.py script (not a Python module), so python train.py resolves against the caller's cwd. AML/Singularity sets cwd to a per-job working directory (e.g. /scratch/azureml/cr/j/{uuid}/exe/wd/) that does not contain train.py. As a result, jobs that compose their command as python train.py ... against this curated environment fail immediately with:

python: can't open file '/scratch/azureml/cr/j/{uuid}/exe/wd/train.py': [Errno 2] No such file or directory

(observed end-to-end on canary, jobs tb-slime-{sft,grpo}-gsm8k-qwen05b-05191606, after the /root/slime -> /opt/slime move in #5046 unblocked image readability.)

Change

  • Dockerfile: install /usr/local/bin/slime-train (mode 0755) that cds to /opt/slime and execs python /opt/slime/train.py "$@". The cd keeps any relative artefact paths the slime entrypoint resolves at runtime anchored to the slime source tree.
  • smoke_test.py: assert the shim is present, world readable+executable, and references /opt/slime/train.py.

Why not change consumers instead

Every other curated training environment (TRL, VERL, OpenRLHF, NeMo-RL, TorchForge) uses python -m <module> and is therefore cwd-independent. Slime is the only one whose entrypoint is a bare script file, so the cwd contract has to be solved either by hard-coding the absolute path in every consumer or by giving the asset a PATH-resolvable name. The latter keeps the layout knowledge inside the asset that owns it.

Validation

Local Python parse check on smoke_test.py. Image build + smoke test will validate the shim during the asset CI build.

Slime's entrypoint is the repo-root train.py script (not a Python
module), so `python train.py` resolves against the caller's cwd.
AML/Singularity sets cwd to a per-job working directory (e.g.
/scratch/azureml/cr/j/{uuid}/exe/wd/) that does not contain
train.py, so the slime curated environment can only be exercised
end-to-end if the consumer hard-codes the absolute path
/opt/slime/train.py.

Install a small `slime-train` shim at /usr/local/bin/slime-train
that cds into /opt/slime and execs the slime entrypoint. This lets
consumers invoke slime via a stable PATH-resolvable name regardless
of the job working directory, and keeps any relative artefact paths
the slime entrypoint resolves at runtime anchored to the slime
source tree.

Extend the smoke test to assert the shim is present, world
readable+executable, and references /opt/slime/train.py.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Test Results for assets-test

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit ecc2418.

♻️ This comment has been updated with latest results.

@neilyan-msft
Copy link
Copy Markdown
Contributor Author

Closing — switching to a consumer-side fix in Vienna MFE (hard-code /opt/slime/train.py in TrainingRecipeCompiler instead of adding a PATH shim in the asset). The slime asset stays cwd-coupled, but that contract already exists implicitly and slime is the only direct-script framework.

@neilyan-msft neilyan-msft deleted the users/neilyan/slime-train-shim-20260519 branch May 20, 2026 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant