Use upstream NeMo-RL entrypoints and simplify sandbox integration by wedu-nvidia · Pull Request #1381 · NVIDIA-NeMo/Skills

wedu-nvidia · 2026-04-17T16:30:24Z

Summary by CodeRabbit

New Features
- Added config override and container image options for GRPO and SFT pipelines
- Added Ray sandbox startup templates for multi-node cluster bring-up and optional sandboxed execution
Refactor
- Switched from local build to a pinned upstream runtime image
- Training pipelines now invoke upstream training entrypoints and accept configurable training configs
- Removed legacy on-host training entrypoints and local SFT config files
- Improved sandbox behavior for Ray-enabled heterogeneous jobs
Bug Fixes
- SLURM executor now fails the job if any task exits non-zero (stricter failure handling)
Documentation
- Cleared references to deprecated script paths

coderabbitai · 2026-04-17T16:37:05Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Removed the local NeMo‑RL Dockerfile and local SFT/GRPO entrypoints; pinned and use an upstream NeMo‑RL container image; updated pipelines to call upstream entrypoints with new CLI options and config mounting; added Ray sandbox SLURM templates and adjusted SLURM/Ray sandbox orchestration and SFT config references.

Changes

Cohort / File(s)	Summary
Dockerfile removed `dockerfiles/Dockerfile.nemo-rl`	Deleted local multi-stage NeMo‑RL Dockerfile and its hermetic build/prefetch steps.
Container mapping / cluster examples `nemo_skills/__init__.py`, `cluster_configs/example-local.yaml`, `tests/gpu-tests/test-local.yaml`	Switched `nemo-rl` executor from a local Dockerfile build to pinned remote image `nvcr.io/nvidian/nemo-rl:733c7d6-48669320`.
Pipeline — GRPO `nemo_skills/pipeline/nemo_rl/grpo.py`	Added upstream path constants; invoke upstream GRPO script/config; added CLI options `config`, `container`, `sandbox_container`; added `NemoRLTask.config_path`; remapped data CLI keys and PYTHONPATH/UV_PROJECT to upstream root; mount absolute configs outside upstream root.
Pipeline — SFT `nemo_skills/pipeline/nemo_rl/sft.py`	Added upstream path constants; switched to upstream SFT entrypoint (`run_sft.py`) using `config_path`; added `config` and `container` CLI options; remapped data CLI keys and environment wiring; unified container selection and config mounting behavior.
Pipeline — executor / sandbox logic `nemo_skills/pipeline/utils/exp.py`	Added `--kill-on-bad-exit` to srun args; adjusted heterogeneous-group counting/order and GPU partitioning; when using Ray, inject `SANDBOX_*` env vars into main task and skip standalone sandbox executor so Ray template manages sandbox.
Removed local training orchestration `nemo_skills/training/nemo_rl/start_grpo.py`, `nemo_skills/training/nemo_rl/start_sft.py`	Deleted both local training entrypoint scripts and their helper classes/functions; orchestration delegated to upstream NeMo‑RL entrypoints.
SFT config removed `nemo_skills/training/nemo_rl/configs/sft.yaml`	Removed the entire SFT YAML configuration file (all training/policy/data/checkpointing/logger/cluster keys deleted).
Docs `docs/basics/cluster-configs.md`, `docs/pipelines/training.md`	Removed references to local `start_sft.py` and a sentence about forwarding extra parameters.
Ray sandbox templates added `nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray.sub.j2`, `..._enroot.sub.j2`	Added two Jinja SLURM templates to provision multi-node Ray clusters (head/workers), optional sandbox containers, port/env management, readiness polling, retries, log sync, attach helpers, and ENSURE worker readiness counting.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(30,144,255,0.5)
    participant CLI as Pipeline CLI
    participant Orchestrator as exp.add_task / Scheduler
    participant SLURM as SLURM Job / srun
    participant RayTemplate as Ray Sandbox Template
    participant Container as Upstream NeMo‑RL Container
  end
  CLI->>Orchestrator: submit job (opts: config, container, sandbox, ray_template)
  Orchestrator->>SLURM: compute het groups, node counts, sandbox strategy
  alt with Ray
    Orchestrator->>SLURM: inject SANDBOX_* env into main task
    SLURM->>RayTemplate: run Ray head/workers + optional sandbox
    RayTemplate->>Container: start head/workers, run SANDBOX_COMMAND inside container
  else without Ray
    Orchestrator->>SLURM: create per-het-group executors and sandbox sidecars
    SLURM->>Container: run executors and sandbox sidecars via srun
  end
  SLURM->>Container: execute upstream entrypoint (run_sft/run_grpo) with --config and mounted paths

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Switch to building containers on-the-fly for local runs #969 — touches container image resolution and dockerfile: handling; directly related to switching between local Dockerfile builds and pinned remote images.

Suggested reviewers

activatedgeek

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.05% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly reflects the main objective: migrating from local NeMo-RL setup to upstream entrypoints and improving sandbox integration with Ray templates.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch wedu/nemo-rl-update

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

dockerfiles/Dockerfile.nemo-rl (1)
7-7: Consider adding a non-root user for container runtime.

Trivy flagged that the final USER in this Dockerfile is root. While this may be intentional for training workloads requiring elevated permissions, running as non-root is a security best practice. If root access isn't strictly required at runtime, consider adding a non-root user.
💡 Optional fix
 RUN uv pip install /opt/NeMo-Skills
+
+# Optional: Run as non-root user at runtime
+RUN useradd -m -u 1000 nemorl
+USER nemorl
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dockerfiles/Dockerfile.nemo-rl` at line 7, The Dockerfile currently ends with
"USER root", which keeps the container running as root; replace this by creating
and switching to a non-root user (e.g., create a user/group like "appuser" or
"nemo" during the image build and end with "USER appuser") and ensure any
runtime directories or files used by the process are owned or writable by that
user (use chown/chmod during build for directories the service needs). Verify
that any entrypoint or startup scripts (if referenced) do not require root-only
operations or elevate privileges only when necessary. Ensure the new non-root
username matches the final USER instruction and adjust permissions for any
volumes or bind-mounted paths the container needs to write to.
nemo_skills/pipeline/utils/exp.py (1)
536-539: Variable client_num_gpus is reassigned later in the function.

client_num_gpus is computed at line 536 for the server_goes_first logic, but then reassigned at line 638 with different logic (num_gpus if (server_config is None or num_nodes > 1) else 0). This reassignment appears intentional (different contexts), but the same variable name could cause confusion. Consider renaming one to clarify the distinct purposes.

Also applies to: 638-638
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/pipeline/utils/exp.py` around lines 536 - 539, The variable
client_num_gpus is computed early for ordering logic and then reassigned later
with different semantics; rename the first use to a distinct, descriptive name
(e.g., client_num_gpus_for_order or client_gpus_hint) and update its reference
in the server_goes_first assignment so it only represents the ordering check
(server_goes_first = server_needs_gpus and not client_num_gpus_for_order). Leave
the later variable (client_num_gpus computed from
num_gpus/num_nodes/server_config) as-is or rename it to client_num_gpus_actual
if you prefer clarity, and update any subsequent uses to the new name
(references: client_num_gpus, server_goes_first, num_gpus, server_config,
num_nodes).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/pipeline/nemo_rl/grpo.py`:
- Around line 423-425: The guard checking dependent_jobs is inconsistent with
its error text and sft.py; update the conditional that reads "if dependent_jobs
> 0:" to use "if dependent_jobs >= 0:" (the block that raises
ValueError("training_data is required when dependent_jobs >= 0")) so the
condition and message match — look for the dependent_jobs / training_data
validation in grpo.py and make this change.

In `@nemo_skills/training/nemo_rl/configs/sft.yaml`:
- Around line 25-29: The config currently hardcodes absolute paths for
model_name and chat_template (see keys model_name and tokenizer.chat_template
and tokenizer.name), which will break outside the original cluster; change these
to environment-variable or placeholder values (e.g., ${MODEL_NAME} /
${CHAT_TEMPLATE} or a relative/default template name) and update any README or
config docs to state they must be overridden; ensure the code that loads this
config respects overrides (note ++policy.model_name is already used) and add a
validation step that raises a clear error if chat_template is unset or missing
so callers must supply/override tokenizer.chat_template before runtime.
- Around line 165-166: The local nemo_skills/training/nemo_rl/configs/sft.yaml
contains cluster-specific hardcoded paths in train_data_path and val_data_path
that are unused by the pipeline; either remove this local config or turn it into
a template: replace the hardcoded paths with placeholder values (e.g.
"<TRAIN_DATA_PATH>" and "<VAL_DATA_PATH>") or add a comment stating it's a
reference and that runtime overrides use Hydra ++data.train.data_path and
++data.validation.data_path; ensure the file header mentions it is not used by
default and can be overridden via --config so future readers don't rely on the
hardcoded paths.

---

Nitpick comments:
In `@dockerfiles/Dockerfile.nemo-rl`:
- Line 7: The Dockerfile currently ends with "USER root", which keeps the
container running as root; replace this by creating and switching to a non-root
user (e.g., create a user/group like "appuser" or "nemo" during the image build
and end with "USER appuser") and ensure any runtime directories or files used by
the process are owned or writable by that user (use chown/chmod during build for
directories the service needs). Verify that any entrypoint or startup scripts
(if referenced) do not require root-only operations or elevate privileges only
when necessary. Ensure the new non-root username matches the final USER
instruction and adjust permissions for any volumes or bind-mounted paths the
container needs to write to.

In `@nemo_skills/pipeline/utils/exp.py`:
- Around line 536-539: The variable client_num_gpus is computed early for
ordering logic and then reassigned later with different semantics; rename the
first use to a distinct, descriptive name (e.g., client_num_gpus_for_order or
client_gpus_hint) and update its reference in the server_goes_first assignment
so it only represents the ordering check (server_goes_first = server_needs_gpus
and not client_num_gpus_for_order). Leave the later variable (client_num_gpus
computed from num_gpus/num_nodes/server_config) as-is or rename it to
client_num_gpus_actual if you prefer clarity, and update any subsequent uses to
the new name (references: client_num_gpus, server_goes_first, num_gpus,
server_config, num_nodes).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0e189349-d2f2-40e2-a87b-0050328fe955

📥 Commits

Reviewing files that changed from the base of the PR and between 420d006 and 2d5133f.

📒 Files selected for processing (9)

dockerfiles/Dockerfile.nemo-rl
docs/basics/cluster-configs.md
docs/pipelines/training.md
nemo_skills/pipeline/nemo_rl/grpo.py
nemo_skills/pipeline/nemo_rl/sft.py
nemo_skills/pipeline/utils/exp.py
nemo_skills/training/nemo_rl/configs/sft.yaml
nemo_skills/training/nemo_rl/start_grpo.py
nemo_skills/training/nemo_rl/start_sft.py

💤 Files with no reviewable changes (3)

docs/pipelines/training.md
nemo_skills/training/nemo_rl/start_grpo.py
nemo_skills/training/nemo_rl/start_sft.py

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2`:
- Around line 100-104: The template hardcodes node geometry (gpus_per_node=8 and
CPUS_PER_WORKER=112) which breaks Ray actor counting and causes CPU
oversubscription for non-8-GPU nodes; change the template to accept/consume
environment/template variables (e.g., GPUS_PER_NODE and CPUS_PER_WORKER) with
sensible defaults and use those instead of literals, and update
nemo_skills/pipeline/nemo_rl/sft.py so NUM_ACTORS and any cpus-per-task
calculations derive from the passed GPUS_PER_NODE and CPUS_PER_WORKER (or the
existing num_gpus input) rather than assuming 8 GPUs and 112 CPUs, ensuring
NUM_ACTORS math and --cpus-per-task use the parameterized values.
- Around line 289-293: The current get_container_pid() returns the first enroot
PID and can pick the wrong container; change get_container_pid(node,
container_name) to accept a container name pattern and use enroot list -f output
to filter by that name (e.g., match "ray-head" or "ray-worker-.*") before
printing the PID so it reliably returns the PID for the requested container;
update all callers (the sandbox start loop, extract_worker_units, and the final
attach/exec paths that currently call get_container_pid) to pass the appropriate
container name pattern (e.g., "ray-head" for head operations and "ray-worker-*"
for workers) so exec/attach and readiness checks target the correct container.

In `@nemo_skills/pipeline/nemo_rl/sft.py`:
- Around line 394-396: The code currently only remaps absolute config paths
outside UPSTREAM_NEMO_RL_ROOT, allowing repo-relative --config values
(training_config) to be passed through and potentially fail at runtime; update
the logic in the training_config handling to either (a) normalize repo-relative
paths by resolving them into the mounted repo location
(/nemo_run/code/<relative>) using get_mounted_path(cluster_config, ...) when
training_config is not absolute and does not equal UPSTREAM_SFT_CONFIG, or (b)
explicitly reject non-upstream relative inputs with a clear error message
(raise/exit) so callers of --config get immediate feedback; modify the block
referencing training_config, UPSTREAM_SFT_CONFIG, UPSTREAM_NEMO_RL_ROOT and
get_mounted_path to implement one of these behaviors and ensure cluster_config
is used to derive the mounted repo path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b6ed4524-6203-4b32-8cb1-f57723936387

📥 Commits

Reviewing files that changed from the base of the PR and between fa59f66 and 66a6480.

📒 Files selected for processing (4)

nemo_skills/pipeline/nemo_rl/grpo.py
nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray.sub.j2
nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2
nemo_skills/pipeline/nemo_rl/sft.py

🚧 Files skipped from review as they are similar to previous changes (1)

nemo_skills/pipeline/nemo_rl/grpo.py

coderabbitai · 2026-04-20T19:49:41Z

+# Number of GPUs per node
+gpus_per_node=8
+
+# Number of CPUs per worker node
+CPUS_PER_WORKER=${CPUS_PER_WORKER:-112}


⚠️ Potential issue | 🟠 Major

Parameterize GPU/CPU geometry instead of baking in one node shape.

This template locks Ray accounting to gpus_per_node=8 and CPUS_PER_WORKER=112, while nemo_skills/pipeline/nemo_rl/sft.py still accepts arbitrary num_gpus. Non-8-GPU jobs will miscount NUM_ACTORS, and smaller nodes can be oversubscribed immediately by --cpus-per-task=$CPUS_PER_WORKER.

Also applies to: 343-343

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2` around lines 100 - 104, The template hardcodes node geometry (gpus_per_node=8 and CPUS_PER_WORKER=112) which breaks Ray actor counting and causes CPU oversubscription for non-8-GPU nodes; change the template to accept/consume environment/template variables (e.g., GPUS_PER_NODE and CPUS_PER_WORKER) with sensible defaults and use those instead of literals, and update nemo_skills/pipeline/nemo_rl/sft.py so NUM_ACTORS and any cpus-per-task calculations derive from the passed GPUS_PER_NODE and CPUS_PER_WORKER (or the existing num_gpus input) rather than assuming 8 GPUs and 112 CPUs, ensuring NUM_ACTORS math and --cpus-per-task use the parameterized values.

coderabbitai · 2026-04-20T19:49:41Z

+# Helper function to get container PID using enroot (workaround for --overlap --container-name bug)
+get_container_pid() {
+  local node=$1
+  srun --overlap --nodes=1 -w "$node" bash -c "enroot list -f | awk 'NR>1 && \$2 ~ /^[0-9]+\$/ {print \$2; exit}'"
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

git ls-files "*nemo_skills_sandbox_ray_enroot*"

Repository: NVIDIA-NeMo/Skills

Length of output: 142

🏁 Script executed:

cat -n nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2 | head -100

Repository: NVIDIA-NeMo/Skills

Length of output: 4686

🏁 Script executed:

sed -n '280,350p' nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 2857

🏁 Script executed:

sed -n '315,345p' nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 1327

🏁 Script executed:

sed -n '345,450p' nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 3739

🏁 Script executed:

sed -n '420,440p' nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 805

🏁 Script executed:

sed -n '500,510p' nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 722

🏁 Script executed:

sed -n '495,530p' nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 2064

🏁 Script executed:

grep -n "container-name" nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 605

🏁 Script executed:

sed -n '280,295p' nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 583

🏁 Script executed:

sed -n '315,345p' nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 1327

🏁 Script executed:

sed -n '315,340p' nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 1245

🏁 Script executed:

enroot list --help 2>&1 | head -20

Repository: NVIDIA-NeMo/Skills

Length of output: 106

🏁 Script executed:

rg "enroot list" nemo_skills/pipeline/nemo_rl/ray_templates/

Repository: NVIDIA-NeMo/Skills

Length of output: 482

🌐 Web query:

enroot container name lookup query PID

💡 Result:

To look up the Enroot container name for a given PID, use the enroot list -f command, which scans all potential Enroot processes and displays their associated container names, PIDs, and other details. Example output: NAME PID COMM STATE STARTED TIME MNTNS USERNS COMMAND ubuntu 1583996 S 05:08 01:51:13 4026532603 4026532602 /bin/sh /root/test/1597122491.2123 This lists running Enroot containers with their PIDs. To find a specific PID, grep the output: enroot list -f | grep Internally, Enroot implements this in src/runtime.sh runtime::list function: - Lists directories in $ENROOT_DATA_PATH (default ~/.local/share/enroot/, container rootfs dirs). - Uses lsns -n -r -t mnt -o pid to get PIDs with mount namespaces. - For each PID, checks if /proc/${pid}/root/.lock exists (Enroot lock file indicating container root). - Parses /proc/${pid}/mountinfo to extract the root mount source path. - Strips the $ENROOT_DATA_PATH prefix to get the container name (basename of rootfs dir). - If no match, shows "". - Then runs ps on matching PIDs for details. To manually query for a specific PID without listing all: 1. Check if /proc/PID/root/.lock exists (confirms Enroot container). 2. Extract root mount from /proc/PID/mountinfo: find line where mount_point is "/" , take the "root" field (4th field), which is the path relative to filesystem, typically the full host path to rootfs like /home/user/.local/share/enroot/ubuntu. 3. Container name is basename of that path (e.g., "ubuntu"). Example awk one-liner for mountinfo root: awk '$5 == "/" {for(i=4; i<NF && $i != "-"; i++) printf "%s ", $i; print ""}' /proc/PID/mountinfo | xargs dirname | xargs basename This derives the name from the filesystem path without Enroot.

Citations:

1: is there any way to name enroot container so that the container could be filtered by enroot list -f easier? NVIDIA/enroot#33

2: enroot list -f behaviour on fedora NVIDIA/enroot#84

3: "enroot list -f" can fail if a container PID disappears NVIDIA/enroot#126

4: https://github.com/NVIDIA/enroot/blob/main/src/runtime.sh

5: https://github.com/NVIDIA/enroot/blob/main/enroot.in

6: https://www.man7.org/linux/man-pages/man5/proc_pid_mountinfo.5.html

🏁 Script executed:

sed -n '295,310p' nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2

Repository: NVIDIA-NeMo/Skills

Length of output: 802

Use the Ray container name to look up the correct container PID, not the first enroot PID.

get_container_pid() returns the first container PID on a node from enroot list. When the sandbox container starts on all nodes (lines 317-338) without a --container-name, multiple containers exist on the head node. Since the first PID from enroot list output is non-deterministic, get_container_pid() may select the sandbox container instead of the ray-head container. This causes extract_worker_units() (line 423), the final driver command (line 502), and the attach flow (lines 501-504) to exec into the wrong container, making readiness checks and job execution non-deterministic.

Modify get_container_pid() to filter by container name (e.g., ray-head or ray-worker-*) rather than returning the first result.

Also applies to: 317-338, 423-432, 501-504

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2` around lines 289 - 293, The current get_container_pid() returns the first enroot PID and can pick the wrong container; change get_container_pid(node, container_name) to accept a container name pattern and use enroot list -f output to filter by that name (e.g., match "ray-head" or "ray-worker-.*") before printing the PID so it reliably returns the PID for the requested container; update all callers (the sandbox start loop, extract_worker_units, and the final attach/exec paths that currently call get_container_pid) to pass the appropriate container name pattern (e.g., "ray-head" for head operations and "ray-worker-*" for workers) so exec/attach and readiness checks target the correct container.

coderabbitai · 2026-04-20T19:49:41Z

+    training_config = config or UPSTREAM_SFT_CONFIG
+    if training_config.startswith("/") and not training_config.startswith(UPSTREAM_NEMO_RL_ROOT):
+        training_config = get_mounted_path(cluster_config, training_config)


⚠️ Potential issue | 🟠 Major

Normalize or reject repo-relative --config paths.

Only absolute configs outside /opt/nemo-rl are remapped here. A relative override is passed through unchanged, but this PR’s Ray launch templates run COMMAND from COMMAND_WORKDIR (the submit directory), not /nemo_run/code, so --config some/local.yaml can fail even though the new flag advertises arbitrary YAML overrides. Either rewrite repo-local paths to /nemo_run/code/... or fail fast on non-upstream relative inputs.

Suggested fix

training_config = config or UPSTREAM_SFT_CONFIG - if training_config.startswith("/") and not training_config.startswith(UPSTREAM_NEMO_RL_ROOT): - training_config = get_mounted_path(cluster_config, training_config) + if training_config.startswith("/"): + if not training_config.startswith(UPSTREAM_NEMO_RL_ROOT): + training_config = get_mounted_path(cluster_config, training_config) + elif config is not None: + training_config = f"/nemo_run/code/{training_config}"

As per coding guidelines, "Avoid cases where user-passed parameters are unused; code should fail if user specifies an unsupported argument or if a required argument is missing."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/pipeline/nemo_rl/sft.py` around lines 394 - 396, The code currently only remaps absolute config paths outside UPSTREAM_NEMO_RL_ROOT, allowing repo-relative --config values (training_config) to be passed through and potentially fail at runtime; update the logic in the training_config handling to either (a) normalize repo-relative paths by resolving them into the mounted repo location (/nemo_run/code/<relative>) using get_mounted_path(cluster_config, ...) when training_config is not absolute and does not equal UPSTREAM_SFT_CONFIG, or (b) explicitly reject non-upstream relative inputs with a clear error message (raise/exit) so callers of --config get immediate feedback; modify the block referencing training_config, UPSTREAM_SFT_CONFIG, UPSTREAM_NEMO_RL_ROOT and get_mounted_path to implement one of these behaviors and ensure cluster_config is used to derive the mounted repo path.

wedu-nvidia · 2026-04-20T19:54:39Z

I updated the code to support local ray template for sandbox

Kipok · 2026-04-20T20:49:49Z

-# NOTICES.txt file points to where the OSS source code is archived
-RUN echo "This distribution includes open source which is archived at the following URL: https://opensource.nvidia.com/oss/teams/nvidia/nemo-rl/${RC_DATE}:linux-${TARGETARCH}/index.html" > NOTICES.txt && \
-    echo "For further inquiries or assistance, contact us at oss-requests@nvidia.com" >> NOTICES.txt
+WORKDIR /opt/NeMo-Skills


do we need nemo skills to be installed? Ideally we just use the corresponding nightly in cluster configs and not even have this dockerfile

I think we do not need it, I put it in case later we might need nemo-skills for this.
I think we can remove the dockerfile completely now

Kipok · 2026-04-20T21:04:18Z

+    sandbox_container: str = typer.Option(None, help="Override container image for the sandbox sidecar"),
+    ray_template: str = typer.Option(
+        None,
+        help="Override the Ray template for training tasks. Can be a template name or a local .j2 path. Only applies to Ray-backed jobs; local executor ignores it.",


Suggested change

help="Override the Ray template for training tasks. Can be a template name or a local .j2 path. Only applies to Ray-backed jobs; local executor ignores it.",

help="Override the Ray template for training tasks. Can be a template name or a local .j2 path. Only applies to slurm-based jobs; local executor ignores it.",

Kipok · 2026-04-20T21:05:27Z

+            validation_data = get_mounted_path(cluster_config, validation_data)
    if run_conversion_only:
        dependent_jobs = -1
    if dependent_jobs > 0:


Suggested change

if dependent_jobs > 0:

if dependent_jobs >= 0:

Kipok · 2026-04-21T00:30:25Z

@@ -2,118 +2,123 @@
 sft:


is this not available in nemo-rl directly?

The nano-v3 yaml has not merged, and I can put it for reference, I can remove it now or after nemo-rl pr merged.

Kipok · 2026-04-21T00:31:07Z

    het_group = 0
    het_group_indices = []
-    total_het_groups = (n_servers if server_config is not None else 0) + bool(cmd) + with_sandbox
+    het_group_num_nodes = {}


not fully sure i understand this change, @gwarmstrong could you also have a look?

wedu-nvidia · 2026-04-21T03:32:53Z

@Kipok
I removed below

the local Dockerfile.nemo-rl
the local nemo_rl/configs/sft.yaml
SFT-specific template override exposure

coderabbitai

♻️ Duplicate comments (1)

nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2 (1)
289-293: ⚠️ Potential issue | 🟠 Major

get_container_pid() may return wrong container PID when sandbox is running.

This function returns the first container PID from enroot list, but when the sandbox container starts (lines 316-341), multiple containers exist on the same node. The function may return the sandbox container's PID instead of ray-head, causing ray status checks and job execution to fail or behave non-deterministically.

Consider filtering by container name pattern. For example, enroot containers often include the image name in their identifier.
#!/bin/bash
# Check how enroot list -f output looks and if container names are distinguishable
# This helps understand if filtering by name is feasible

# Look for enroot-related documentation or usage in the codebase
rg -n "enroot list" --type-add 'jinja:*.j2' --type jinja
rg -n "container-name" nemo_skills/pipeline/nemo_rl/ray_templates/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2`
around lines 289 - 293, get_container_pid() currently returns the first PID from
enroot list which can pick the sandbox container; change it to select the PID
for the ray-head container by filtering the enroot list output by the container
name or image pattern (look for "ray-head" or the Ray image tag used when
launching the head) before extracting the PID, i.e. modify the logic in
get_container_pid to match container identifiers by name/pattern rather than
taking the first numeric PID so subsequent ray status checks and job runs target
the correct container.

🧹 Nitpick comments (3)

nemo_skills/pipeline/utils/exp.py (1)

611-611: Variable shadowing: client_num_gpus is redefined.

client_num_gpus is first defined at line 537, then redefined at line 611 inside the loop. While the logic appears correct (the inner one uses different conditions), this shadowing can cause confusion during maintenance. Consider using a different name like main_task_gpus for the inner variable.

♻️ Rename to avoid shadowing

-            client_num_gpus = num_gpus if (server_config is None or num_nodes > 1) else 0
+            main_task_gpus = num_gpus if (server_config is None or num_nodes > 1) else 0
             main_env_updates = {"NEMO_SKILLS_SANDBOX_PORT": sandbox_port}
             if with_ray:
-                main_env_updates["GPUS_PER_NODE"] = str(client_num_gpus)
+                main_env_updates["GPUS_PER_NODE"] = str(main_task_gpus)

And update the executor call at line 632:

-                        gpus_per_node=client_num_gpus,
+                        gpus_per_node=main_task_gpus,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/pipeline/utils/exp.py` at line 611, The inner redefinition of
client_num_gpus inside the loop shadows the earlier variable; rename the inner
variable (e.g., to main_task_gpus or task_client_gpus) where it is assigned with
the expression "num_gpus if (server_config is None or num_nodes > 1) else 0" and
update any subsequent uses (notably the executor call that currently references
client_num_gpus) to use the new name so the outer client_num_gpus remains
unchanged and the intent is clear.

nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2 (1)

411-413: Dead code: condition is never true.

Same issue as the other template. The loop starts at i = 1, so $i -eq 0 is never satisfied.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2`
around lines 411 - 413, The conditional checking if [[ $i -eq 0 ]] inside the
loop is dead because the loop starts at i=1; either remove the unreachable
branch or change the check to the correct index (e.g., [[ $i -eq 1 ]]) depending
on intended behavior; update the template snippet that sets
OVERLAP_HEAD_AND_WORKER_ARG (the block containing
OVERLAP_HEAD_AND_WORKER_ARG="--overlap") so it executes for the actual
first-iteration index or is removed if not needed.

nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray.sub.j2 (1)

403-405: Dead code: condition is never true.

The loop starts at i = 1, so the condition $i -eq 0 can never be true. The OVERLAP_HEAD_AND_WORKER_ARG variable is never set. If workers don't need --overlap, this block can be removed. If overlap was intended for worker 0 (which doesn't exist in this loop since the head is on node 0), the logic needs rethinking.

🧹 Remove dead code

   srun {{ common_srun_args }} ${OVERLAP_HEAD_AND_WORKER_ARG:-} --container-name=ray-worker-$i --exact --nodes=1 --ntasks=1 --cpus-per-task=$CPUS_PER_WORKER -w "$node_i" -o $LOG_DIR/{{ ray_log_prefix }}worker-$i.log bash -x -c "$worker_cmd" &
+  srun {{ common_srun_args }} --container-name=ray-worker-$i --exact --nodes=1 --ntasks=1 --cpus-per-task=$CPUS_PER_WORKER -w "$node_i" -o $LOG_DIR/{{ ray_log_prefix }}worker-$i.log bash -x -c "$worker_cmd" &
-  if [[ $i -eq 0 ]]; then
-    OVERLAP_HEAD_AND_WORKER_ARG="--overlap"
-  fi
-  srun {{ common_srun_args }} ${OVERLAP_HEAD_AND_WORKER_ARG:-} --container-name=ray-worker-$i --exact --nodes=1 --ntasks=1 --cpus-per-task=$CPUS_PER_WORKER -w "$node_i" -o $LOG_DIR/{{ ray_log_prefix }}worker-$i.log bash -x -c "$worker_cmd" &

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray.sub.j2`
around lines 403 - 405, The conditional block that checks "if [[ $i -eq 0 ]]"
inside the worker loop is dead because the loop starts at i=1, so
OVERLAP_HEAD_AND_WORKER_ARG is never set; remove the entire if-block (the lines
setting OVERLAP_HEAD_AND_WORKER_ARG and its fi) from
nemo_skills_sandbox_ray.sub.j2, or if the intent was to enable overlap for the
head, move/convert that logic to the head-creation branch (where node 0/head is
created) instead of inside the worker loop; reference symbols: the loop variable
i and OVERLAP_HEAD_AND_WORKER_ARG.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2`:
- Around line 289-293: get_container_pid() currently returns the first PID from
enroot list which can pick the sandbox container; change it to select the PID
for the ray-head container by filtering the enroot list output by the container
name or image pattern (look for "ray-head" or the Ray image tag used when
launching the head) before extracting the PID, i.e. modify the logic in
get_container_pid to match container identifiers by name/pattern rather than
taking the first numeric PID so subsequent ray status checks and job runs target
the correct container.

---

Nitpick comments:
In
`@nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2`:
- Around line 411-413: The conditional checking if [[ $i -eq 0 ]] inside the
loop is dead because the loop starts at i=1; either remove the unreachable
branch or change the check to the correct index (e.g., [[ $i -eq 1 ]]) depending
on intended behavior; update the template snippet that sets
OVERLAP_HEAD_AND_WORKER_ARG (the block containing
OVERLAP_HEAD_AND_WORKER_ARG="--overlap") so it executes for the actual
first-iteration index or is removed if not needed.

In `@nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray.sub.j2`:
- Around line 403-405: The conditional block that checks "if [[ $i -eq 0 ]]"
inside the worker loop is dead because the loop starts at i=1, so
OVERLAP_HEAD_AND_WORKER_ARG is never set; remove the entire if-block (the lines
setting OVERLAP_HEAD_AND_WORKER_ARG and its fi) from
nemo_skills_sandbox_ray.sub.j2, or if the intent was to enable overlap for the
head, move/convert that logic to the head-creation branch (where node 0/head is
created) instead of inside the worker loop; reference symbols: the loop variable
i and OVERLAP_HEAD_AND_WORKER_ARG.

In `@nemo_skills/pipeline/utils/exp.py`:
- Line 611: The inner redefinition of client_num_gpus inside the loop shadows
the earlier variable; rename the inner variable (e.g., to main_task_gpus or
task_client_gpus) where it is assigned with the expression "num_gpus if
(server_config is None or num_nodes > 1) else 0" and update any subsequent uses
(notably the executor call that currently references client_num_gpus) to use the
new name so the outer client_num_gpus remains unchanged and the intent is clear.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5b927823-65af-48cc-989f-8cdd66446574

📥 Commits

Reviewing files that changed from the base of the PR and between 66a6480 and 76ad2be.

📒 Files selected for processing (9)

cluster_configs/example-local.yaml
dockerfiles/Dockerfile.nemo-rl
nemo_skills/__init__.py
nemo_skills/pipeline/nemo_rl/grpo.py
nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray.sub.j2
nemo_skills/pipeline/nemo_rl/ray_templates/nemo_skills_sandbox_ray_enroot.sub.j2
nemo_skills/pipeline/utils/exp.py
nemo_skills/training/nemo_rl/configs/sft.yaml
tests/gpu-tests/test-local.yaml

💤 Files with no reviewable changes (2)

dockerfiles/Dockerfile.nemo-rl
nemo_skills/training/nemo_rl/configs/sft.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

nemo_skills/pipeline/nemo_rl/grpo.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/pipeline/utils/exp.py`:
- Around line 611-623: main_env_updates currently sets "SANDBOX_PORT" to
sandbox_port which may be an integer; make it consistently a string like
"GPUS_PER_NODE" by converting sandbox_port to str when updating main_env_updates
(update the block that builds main_env_updates before calling
temporary_env_update; reference variables/symbols main_env_updates,
SANDBOX_PORT, sandbox_port, temporary_env_update, and cluster_config to locate
the change).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a959d48a-d304-44e6-9d54-3fa7d8ba200b

📥 Commits

Reviewing files that changed from the base of the PR and between 76ad2be and 0feb908.

📒 Files selected for processing (1)

nemo_skills/pipeline/utils/exp.py

Kipok

lgtm, as long as tests pass. Did we launch slurm test (the omr one) here?

Kipok · 2026-04-23T00:33:02Z

        help="If specified, will reuse the code from this experiment. "
        "Can provide an experiment name or an experiment object if running from code.",
    ),
+    config: str = typer.Option(None, help="Override training config YAML; defaults to the upstream container config"),


Suggested change

config: str = typer.Option(None, help="Override training config YAML; defaults to the upstream container config"),

config: str = typer.Option(None, help="Override training config YAML; defaults to the upstream container config for nano-v3"),

might need more changes

Kipok · 2026-04-23T00:38:52Z

  nemo-skills: dockerfile:dockerfiles/Dockerfile.nemo-skills
  verl: dockerfile:dockerfiles/Dockerfile.verl
-  nemo-rl: dockerfile:dockerfiles/Dockerfile.nemo-rl
+  nemo-rl: nvcr.io/nvidian/nemo-rl:733c7d6-48669320


is this internal container? We ideally want something publicly available here

It is for nightly container published internally, but not sure if public ones are up to date, I will have a check

@Kipok I contacted the team and was told public container is released for every 2 months roughly, it seems too old then?

I guess it's ok to keep it as private as it has a commit in the name. Can you update the readme in dockerfiles folder to mention that it defaults to private container accessible to nvidia employees, but for anyone else they can build it from corresponding dockerfile / commit in nemo-rl repo? But one thing we should make sure is that our ci still works, so can you check what key / setting we need to set there and sync with me offline so that I can update our repo settings and we can test that gpu / gitlab ci passes

Kipok · 2026-04-23T00:41:02Z

@@ -0,0 +1,502 @@
+#!/bin/bash


why do we need two templates? If one was for lax cluster, we can probably not port it as it's not needed anymore

It is for lax, I keep it here in case one day we need for lax types. But I can delete it now

wedu-nvidia · 2026-04-23T14:55:40Z

lgtm, as long as tests pass. Did we launch slurm test (the omr one) here?

not yet, I can test for this.

Signed-off-by: Wei Du <wedu@nvidia.com>

wedu-nvidia · 2026-05-06T16:46:11Z

lgtm, as long as tests pass. Did we launch slurm test (the omr one) here?

sorry for late reply here, it seems omr cannot pass. I checked the sft loss curve which looks good to me, but eval results is bad, I will investigate this.

wedu-nvidia requested review from Kipok and gwarmstrong April 17, 2026 16:30

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread nemo_skills/pipeline/nemo_rl/grpo.py Outdated

Comment thread nemo_skills/training/nemo_rl/configs/sft.yaml Outdated

Comment thread nemo_skills/training/nemo_rl/configs/sft.yaml Outdated

wedu-nvidia changed the title ~~Wedu/nemo rl update~~ wedu/nemo rl update Apr 17, 2026

wedu-nvidia changed the title ~~wedu/nemo rl update~~ Use upstream NeMo-RL entrypoints and simplify sandbox integration Apr 17, 2026

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

Kipok reviewed Apr 21, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread nemo_skills/pipeline/utils/exp.py

Kipok previously approved these changes Apr 23, 2026

View reviewed changes

Kipok added the run GPU tests label Apr 23, 2026

Kipok reviewed Apr 23, 2026

View reviewed changes

wedu-nvidia force-pushed the wedu/nemo-rl-update branch from 894cee3 to 1a725b3 Compare April 28, 2026 16:51

wedu-nvidia added 9 commits April 28, 2026 09:53

remove local copy and use nemo-rl directly

62df00e

Signed-off-by: Wei Du <wedu@nvidia.com>

update grpo and exp for tir rl support

871c549

Signed-off-by: Wei Du <wedu@nvidia.com>

delete origianl entry point

dc538fe

Signed-off-by: Wei Du <wedu@nvidia.com>

remove old dockerfile and use updtream container directly

03226cf

Signed-off-by: Wei Du <wedu@nvidia.com>

fix Ray sandbox

0ed75fa

Signed-off-by: Wei Du <wedu@nvidia.com>

add support to load local ray template

6355c33

Signed-off-by: Wei Du <wedu@nvidia.com>

Add task-level Ray template override for GRPO sandbox

415b283

Signed-off-by: Wei Du <wedu@nvidia.com>

Support GRPO-specific Ray template override for sandbox

1e4f5a7

Signed-off-by: Wei Du <wedu@nvidia.com>

Address review feedback on GRPO sandbox integration

0ad7b49

Signed-off-by: Wei Du <wedu@nvidia.com>

wedu-nvidia force-pushed the wedu/nemo-rl-update branch from 1a725b3 to 0ad7b49 Compare April 28, 2026 16:53

	help="Override the Ray template for training tasks. Can be a template name or a local .j2 path. Only applies to Ray-backed jobs; local executor ignores it.",
	help="Override the Ray template for training tasks. Can be a template name or a local .j2 path. Only applies to slurm-based jobs; local executor ignores it.",

	config: str = typer.Option(None, help="Override training config YAML; defaults to the upstream container config"),
	config: str = typer.Option(None, help="Override training config YAML; defaults to the upstream container config for nano-v3"),

Conversation

wedu-nvidia commented Apr 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

wedu-nvidia commented Apr 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wedu-nvidia commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wedu-nvidia commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading

wedu-nvidia commented Apr 21, 2026 •

edited

Loading