Skip to content

Refactor train and play scripts for single entry point#5554

Open
StafaH wants to merge 5 commits intoisaac-sim:developfrom
StafaH:train_script
Open

Refactor train and play scripts for single entry point#5554
StafaH wants to merge 5 commits intoisaac-sim:developfrom
StafaH:train_script

Conversation

@StafaH
Copy link
Copy Markdown

@StafaH StafaH commented May 8, 2026

Description

This PR refactors the reinforcement learning train/play scripts into unified entrypoints while preserving the existing library folder structure and adding a lightweight uv workflow for fresh source checkouts.

The main changes are:

  • Added unified RL entrypoints:
    • scripts/reinforcement_learning/train.py --library <library>
    • scripts/reinforcement_learning/play.py --library <library>
  • Added library-specific implementation files under the existing library folders, for example:
    • scripts/reinforcement_learning/rsl_rl/train_rsl_rl.py
    • scripts/reinforcement_learning/rsl_rl/play_rsl_rl.py
  • Kept the old per-library train.py and play.py scripts intact, with deprecation warnings and migration examples.
  • Added shared RL entrypoint utilities in scripts/reinforcement_learning/common.py.
  • Added direct Isaac Lab CLI commands:
    • ./isaaclab.sh train --library <library> ...
    • ./isaaclab.sh play --library <library> ...
  • Kept bare script aliases for ./isaaclab.sh -p train.py ... and ./isaaclab.sh -p play.py ....
  • Added Python package entry points so installed environments can run:
    • train --library <library> ...
    • play --library <library> ...
  • Added a root source-checkout pyproject.toml project so a fresh clone can run kitless Newton training with:
    • uv run train --library rsl_rl --task Isaac-Cartpole-Direct-v0 presets=newton_mjwarp --num_envs 4096
  • Pinned the source-checkout uv environment to the same PyTorch family used by the Isaac Lab installer to avoid CUDA stack churn when users switch between uv run and ./isaaclab.sh.
  • Updated docs, tests, tools, and pretrained checkpoint helpers to use the unified train/play entrypoints.
  • Fixed CLI Python discovery so ./isaaclab.sh prefers an active or repo-local virtual environment before falling back to system Python.
  • Fixed ./isaaclab.sh --install in uv-created virtual environments that do not include the pip module by using uv pip for venv-targeted pip operations.

Motivation:

The previous RL scripts duplicated substantial train/play setup logic across libraries. This made behavior harder to keep consistent and increased maintenance cost when updating shared functionality. The new structure keeps library-specific logic in each library folder while centralizing shared dispatch and common helpers.

The uv workflow gives users a fast path from a fresh clone to kitless Newton training without manually creating an environment first. Isaac Sim / Kit workflows, including PhysX, continue to use the existing full installation path.

Dependencies:

No new required runtime dependencies are added to Isaac Lab packages. The root source-checkout pyproject.toml describes the local development environment used by uv run.

Fixes # N/A

Type of change

  • New feature (non-breaking change which adds functionality)
  • Documentation update

Screenshots

Not applicable.

Validation

Ran:

  • uv lock
  • uv lock --check
  • uv run --frozen train --help
  • ./isaaclab.sh --install
  • ./isaaclab.sh -p -m pytest source/isaaclab/test/cli/test_install.py -q
  • ./isaaclab.sh -p -m pytest source/isaaclab/test/cli/test_install.py::TestGetPipCommand source/isaaclab/test/cli/test_install.py::TestExtractPythonExe -q
  • Help smoke tests for unified train/play entrypoints and ./isaaclab.sh -p train.py --help / ./isaaclab.sh -p play.py --help

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team infrastructure labels May 8, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 8, 2026

Greptile Summary

This PR centralizes the previously duplicated RL training/play entrypoints across six libraries (rl_games, rlinf, rsl_rl, sb3, skrl) into unified train.py / play.py scripts under scripts/reinforcement_learning/. Library-specific logic moves to new train_<lib>.py / play_<lib>.py files, the old per-library scripts get deprecation warnings, and the isaaclab.sh -p CLI gains train.py / play.py shorthands.

  • Unified dispatch (common.py + dispatch_library_entrypoint): train scripts are loaded via import_local_module and call run(argv), while play scripts are executed with runpy.run_path because they still carry module-level argparse code. This asymmetry is functional but undocumented.
  • CLI alias in cli/utils.py maps the bare names \"train.py\" / \"play.py\" to absolute paths; pretrained_checkpoint.py constants are updated to use the new unified paths with --library <lib> args injected by WORKFLOW_TRAINER_ARGS / WORKFLOW_PLAYER_ARGS.

Confidence Score: 4/5

The refactor is primarily additive — existing per-library scripts remain functional with deprecation warnings and the new unified entry-points replicate original logic faithfully.

The core dispatch logic, pretrained-checkpoint constant updates, and CLI alias tests are all consistent. The heavy module-level imports in common.py and the undocumented run_as_script API contract are design-level concerns rather than functional regressions on the standard train/play workflows.

scripts/reinforcement_learning/common.py for module-level import overhead and undocumented run_as_script contract; source/isaaclab/isaaclab/cli/utils.py for the Path normalization edge case in the alias lookup.

Important Files Changed

Filename Overview
scripts/reinforcement_learning/common.py New shared helper module; imports heavy packages (gymnasium, isaaclab.envs, isaaclab_tasks) at module level, adding startup overhead even for --help or bare library-dispatch paths.
scripts/reinforcement_learning/train.py Unified train entrypoint; delegates to library-specific run() functions via dispatch_library_entrypoint (run_as_script=False).
scripts/reinforcement_learning/play.py Unified play entrypoint; dispatches via runpy.run_path (run_as_script=True) to preserve module-level argparse in each play script.
source/isaaclab/isaaclab/cli/utils.py Adds _PYTHON_SCRIPT_ALIASES dict and resolves bare train.py/play.py to full paths; Path normalization can cause Path('./train.py') to match the alias unexpectedly.
scripts/reinforcement_learning/rsl_rl/train_rsl_rl.py New library-specific train implementation; correctly exposes run(argv) and replicates all original rsl_rl/train.py functionality including external_callback, distributed, and IO descriptor support.
source/isaaclab_rl/isaaclab_rl/utils/pretrained_checkpoint.py WORKFLOW_TRAINER/PLAYER updated to unified script paths; new WORKFLOW_TRAINER_ARGS/PLAYER_ARGS dicts correctly inject --library flag for each workflow.
source/isaaclab/test/cli/test_install.py Well-structured tests for train.py/play.py shorthand resolution and explicit relative path passthrough; covers the key alias behaviors.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as "isaaclab.sh -p"
    participant utils as "cli/utils.py"
    participant train as "train.py"
    participant common as "common.py"
    participant lib as "train_lib.py"

    User->>CLI: "./isaaclab.sh -p train.py --library rsl_rl --task X"
    CLI->>utils: "run_python_command(train.py, args)"
    utils->>utils: "_PYTHON_SCRIPT_ALIASES lookup -> absolute path"
    utils->>train: "python .../train.py --library rsl_rl --task X"
    train->>common: "dispatch_library_entrypoint(argv, entrypoints)"
    common->>common: "parse --library rsl_rl"
    common->>lib: "import_local_module(train_rsl_rl, path)"
    lib-->>common: "module loaded (top-level code runs)"
    common->>lib: "module.run(library_args)"
    lib->>lib: "_parse_args + launch_simulation + runner.learn()"
    lib-->>User: "training complete"
    Note over common,lib: "Play path uses runpy.run_path() instead of import+run()"
Loading

Reviews (1): Last reviewed commit: "Refactor play scripts and create alias" | Re-trigger Greptile

Comment on lines +8 to +25
from __future__ import annotations

import argparse
import importlib.util
import logging
import os
import runpy
import sys
from pathlib import Path
from types import ModuleType
from typing import Any

import gymnasium as gym

from isaaclab.envs import DirectMARLEnvCfg, ManagerBasedRLEnvCfg
from isaaclab.utils.dict import print_dict
from isaaclab.utils.io import dump_yaml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Heavy package imports at module level inflate startup time

common.py imports gymnasium, isaaclab.envs (two env base classes), isaaclab.utils.dict, isaaclab.utils.io, and isaaclab_tasks.utils unconditionally at import time. Since train.py and play.py both do from common import dispatch_library_entrypoint, these imports fire even when the user just runs train.py --help (no --library selected) or when any test imports the dispatch helper. Only dispatch_library_entrypoint itself is needed for routing; the remaining helpers (create_isaaclab_env, wrap_record_video, etc.) are only called from individual library scripts. Moving those heavyweight imports to be lazy (inside the functions that need them) or splitting dispatch_library_entrypoint into its own thin module would keep the help/routing path fast.

Comment on lines +29 to +90
def dispatch_library_entrypoint(
argv: list[str] | None,
entrypoints: dict[str, Path],
*,
action: str,
description: str,
library_help: str,
run_as_script: bool = False,
) -> int:
"""Dispatch a unified entrypoint to a library-specific implementation.

Args:
argv: Command-line arguments, excluding the script path.
entrypoints: Mapping from library name to implementation path.
action: Action name used to create a unique module name.
description: Top-level parser description.
library_help: Help text for the ``--library`` argument.
run_as_script: Whether to execute the selected implementation as a script.

Returns:
Process exit code.
"""
if argv is None:
argv = sys.argv[1:]

parser = argparse.ArgumentParser(add_help=False)
parser.add_argument("--library", choices=sorted(entrypoints))
args_cli, library_args = parser.parse_known_args(argv)

if args_cli.library is None:
help_parser = argparse.ArgumentParser(description=description)
help_parser.add_argument("--library", choices=sorted(entrypoints), required=True, help=library_help)
help_parser.add_argument("args", nargs=argparse.REMAINDER, help="Arguments forwarded to the selected library.")
help_parser.print_help()
return 0 if "-h" in argv or "--help" in argv else 2

module_path = entrypoints[args_cli.library]
if run_as_script:
original_argv = sys.argv
original_path = list(sys.path)
try:
sys.argv = [str(module_path)] + library_args
sys.path.insert(0, str(module_path.parent))
runpy.run_path(str(module_path), run_name="__main__")
finally:
sys.argv = original_argv
sys.path[:] = original_path
return 0

module = import_local_module(f"isaaclab_rl_{action}_{args_cli.library}", module_path)
module.run(library_args)
return 0


def add_common_train_args(
parser: argparse.ArgumentParser,
*,
agent_default: str | None,
agent_help: str,
include_agent: bool = True,
include_distributed: bool = True,
) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Undocumented API contract between dispatch strategy and script structure

dispatch_library_entrypoint supports two incompatible modes keyed on run_as_script:

  • False (train): loads the target file as a module and calls module.run(argv) — the file must expose a run() function and must not parse sys.argv at module level.
  • True (play): uses runpy.run_path(..., run_name="__main__") and sets sys.argv itself — the file must not have a run() function and must parse sys.argv at module level.

Neither mode is validated at call time. If a new contributor creates a play script that exposes run() and uses run_as_script=True, the dispatch silently ignores run(). Conversely, if a new train script parses sys.argv at module level, it will read the wrong argv. Adding an assertion or a clear note in the docstring about the run(argv) requirement for run_as_script=False would prevent future breakage.

Comment on lines +534 to +535
else:
script_or_module = _PYTHON_SCRIPT_ALIASES.get(str(script_or_module), script_or_module)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Path("./train.py") silently matches the alias while "./train.py" string does not

str(Path("./train.py")) normalises to "train.py" on all platforms, so a caller passing Path("./train.py") would be redirected to the global script, whereas the string "./train.py" correctly passes through. The test covers the string form but not the Path form. Guarding on len(Path(_key).parts) == 1 would make the intent explicit and consistent.

Suggested change
else:
script_or_module = _PYTHON_SCRIPT_ALIASES.get(str(script_or_module), script_or_module)
else:
# Only resolve bare filenames (e.g. "train.py"), not explicit paths such as
# "./train.py" or "/abs/path/train.py". Using Path(x).parts ensures a leading
# "./" or directory component prevents a match.
_key = str(script_or_module)
if len(Path(_key).parts) == 1:
script_or_module = _PYTHON_SCRIPT_ALIASES.get(_key, script_or_module)

@StafaH StafaH changed the title Train script Refactor train and play scripts for single entry point May 8, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This is a substantial quality-of-life improvement for the RL workflow. The unified train/play entrypoints with --library dispatch remove significant duplication, and the uv run path for fresh clones is a great developer experience improvement.

Strengths

  • Clean dispatch pattern via dispatch_library_entrypoint — extensible for new libraries
  • common.py consolidates duplicated arg definitions and env creation logic well
  • Deprecation warnings on old scripts with clear migration guidance
  • _PYTHON_SCRIPT_ALIASES is a nice touch for isaaclab.sh -p train.py shorthand
  • Python discovery improvements (active venv → repo-local → kit) are sensible prioritization
  • Root pyproject.toml with uv sources/overrides is well-structured

Concerns

See inline comments for specific issues. High-level:

  1. The common.py relative import (from common import ...) requires the script's parent dir to be on sys.path — this works when run directly but may fail in certain packaging scenarios.
  2. Hard-pinning torch==2.10.0 in the root pyproject.toml will need frequent updates.
  3. The run_as_script path in dispatch_library_entrypoint mutates sys.argv and sys.path which could have side effects if the dispatched script doesn't clean up properly.


from pathlib import Path

from common import dispatch_library_entrypoint
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Relative import fragility: from common import dispatch_library_entrypoint relies on the script being run from its parent directory or with sys.path containing the scripts/reinforcement_learning/ directory. This works for ./isaaclab.sh -p and python scripts/... but would break if someone tries to import this module from elsewhere.

Consider using a relative path-based import pattern:

import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
from common import dispatch_library_entrypoint

Or make common.py a proper package (scripts/reinforcement_learning/__init__.py).

return 0 if "-h" in argv or "--help" in argv else 2

module_path = entrypoints[args_cli.library]
if run_as_script:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 sys.argv / sys.path mutation in run_as_script: Mutating global state (sys.argv, sys.path) and relying on runpy.run_path is fragile. If the dispatched script raises an exception, the finally block restores state, but if it calls sys.exit(), the restore runs but the process is exiting anyway.

More concerning: runpy.run_path executes in the caller's process — any global state changes in the dispatched script persist. This is fine for train scripts that exit, but if used in a library context it could leak state.

Consider documenting this behavior or adding a comment that run_as_script=True is only safe for scripts that are expected to terminate the process.

"""
python_path = Path(python_exe)
return (python_path.parent.parent / "pyvenv.cfg").is_file()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Side effect: os.environ["UV_PYTHON"] set globally: Setting UV_PYTHON as an environment variable affects all subsequent subprocess.run calls that use uv. If get_pip_command is called multiple times with different python_exe values, only the last one wins globally.

Consider returning the env var as part of the command context rather than setting it globally:

# Return (cmd, env_override) tuple, or document the mutation

Or at minimum, document that this function has a global side effect.


def train(args: list[str] | None = None) -> None:
"""Run the unified reinforcement learning training script."""
if args is None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 CLI dispatch pattern: The if sys.argv[1] == "train" / "play" check before argparse means these sub-commands bypass the normal argparse help system. Running isaaclab.sh --help would show the epilog mentioning train and play, but isaaclab.sh train --help delegates directly to the train script.

This is fine for now, but if more commands are added, consider using argparse subparsers or a proper CLI framework (click, typer) for the top-level dispatch.


"""Script to train RL agent with RL-Games."""

import warnings
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Deprecation warning fires at import time: Because warnings.warn(...) is at module level, it fires even if someone imports a function from this module without intending to run it as a script. This is probably fine since the old scripts are meant to be run, not imported — but worth noting for anything that does from scripts.reinforcement_learning.rl_games.train import main.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

Well-structured refactoring that unifies RL train/play scripts behind a single dispatch entry point (train.py --library <lib> / play.py --library <lib>). The architecture is sound — clean separation between shared utilities (common.py), unified dispatchers, and library-specific implementations. Old scripts are preserved with deprecation warnings for backwards compatibility. A few design concerns around import fragility, namespace collision, and test coverage should be addressed.

Design Assessment

Acceptable, but import resolution relies on an implicit invariant. The unified dispatch pattern is the right approach. The from common import ... in library-specific implementations works because Python adds the script's directory to sys.path[0] when running train.py as a subprocess — but this invariant isn't explicitly enforced by import_local_module() or dispatch_library_entrypoint(). If these modules are ever imported from a different context (tests, REPL, another script), they'll fail with ModuleNotFoundError.

Findings

🟡 Warning: Generic train/play console_script names risk collisionsource/isaaclab/setup.py

The entry points "train=isaaclab.cli:train" and "play=isaaclab.cli:play" are extremely generic names that could conflict with other ML packages (fairseq, lightning, etc.). Since isaaclab train / isaaclab play already works via the CLI dispatcher, the bare train/play entry points are redundant. Consider namespacing them to isaaclab-train/isaaclab-play, or removing them entirely.

🟡 Warning: Import resolution relies on implicit sys.path[0] invariantscripts/reinforcement_learning/common.py

When dispatch_library_entrypoint() calls import_local_module() to load a library module (e.g., train_rsl_rl.py), the from common import ... statement in that module resolves ONLY because scripts/reinforcement_learning/ is already on sys.path[0] (set by Python when train.py runs as a subprocess). This is fragile — if dispatch_library_entrypoint is ever called from a different context (tests, REPL), it fails. Consider adding explicit path management:

rl_scripts_dir = str(Path(__file__).resolve().parent)
if rl_scripts_dir not in sys.path:
    sys.path.insert(0, rl_scripts_dir)

🟡 Warning: common.py has no unit tests — 12+ shared utility functions (dispatch_library_entrypoint, import_local_module, apply_env_overrides, validate_distributed_device, etc.) have zero direct test coverage. These are the foundation used by all 5 library implementations. A lightweight test_common.py with edge-case tests (missing --library, invalid module path, CPU+distributed rejection) would significantly reduce regression risk.

🔵 Suggestion: env.close() skipped on KeyboardInterruptscripts/reinforcement_learning/rsl_rl/train_rsl_rl.py

The except KeyboardInterrupt: pass at the end of training skips env.close(). Consider using finally to ensure cleanup (logging final metrics, releasing GPU resources):

try:
    runner.learn(...)
    print(f"Training time: ...")
except KeyboardInterrupt:
    pass
finally:
    env.close()

Test Coverage

  • CLI wiring (alias resolution, command dispatch): ✅ Well-tested with mocks
  • common.py utility functions (12+ functions): ❌ No unit tests
  • Dispatch mechanism edge cases: ❌ Not directly tested
  • Deprecated script backwards compatibility: ❌ Not tested
  • Integration tests: ⚠️ Present but currently skipped

CI Status

  • Build Wheel: ✅ pass
  • Most checks: ⏳ pending

Verdict

Minor fixes needed

The architecture and implementation are solid for all standard execution paths. The dispatch pattern is clean, backwards compatibility is preserved, and the code is well-organized. Address the namespace collision concern (train/play console_scripts) and consider adding explicit sys.path management for robustness. Unit tests for common.py would be a valuable addition but aren't strictly blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation infrastructure isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant