Feature/benchmark envs by Mark2000 · Pull Request #338 · AVSLab/bsk_rl

Mark2000 · 2026-04-21T23:54:04Z

Description

Closes #XXX

python '/Users/markstephenson/avslab/bsk_rl/benchmarks/train.py' -o /Users/markstephenson/ray_results/bench_test -e nadir_science:nadir_science_benchmark --restart

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

How should this pull request be reviewed?

By commit
All changes at once

How Has This Been Tested?

Please describe how tests have been updated to verify your changes.

Future Work

What future tasks are needed, if any?

Checklist

I have performed a self-review of my code
I have commented my code in hard-to-understand areas
I have made corresponding changes to the documentation and release notes
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works

Copilot

Pull request overview

Adds a lightweight benchmark harness (env templates + a Ray RLlib PPO training entrypoint) to support repeatable training runs against specific benchmark environments, and fixes a Basilisk support-data path fallback bug.

Changes:

Fix Basilisk dataFetcher ImportError fallback path variable naming in WorldModel setup.
Add benchmarks/train.py RLlib PPO training script with checkpointing and dynamic benchmark loading.
Add a first benchmark environment (nadir_science) plus a small Benchmark dataclass template.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
`src/bsk_rl/sim/world.py`	Fixes fallback Basilisk path variable used when `dataFetcher` API is unavailable.
`benchmarks/train.py`	New RLlib PPO training/continue script for benchmark runs (checkpoint mgmt, dynamic env import).
`benchmarks/nadir_science.py`	Defines a benchmark environment configuration (satellite model + env/training args).
`benchmarks/env_template.py`	Introduces a simple `Benchmark` dataclass container for env/training configuration.

Comments suppressed due to low confidence (2)

benchmarks/train.py:359

The script initializes Ray and a PPO algorithm but never calls ppo.stop() / ray.shutdown() before exiting. This can leave worker processes running (especially in interactive or repeated benchmark runs) and can hold onto temp dirs/object store resources. Consider adding a try/finally around training to ensure cleanup.
benchmarks/nadir_science.py:133
These print(...) statements will execute at import time, which is noisy when using the dynamic import in benchmarks/train.py and makes it harder to use these benchmarks as a library. Consider removing them or switching to logging guarded by if __name__ == "__main__" / a verbosity flag.


training_args = dict(
    lr=3e-5,
    gamma=0.999,
    train_batch_size=1000,
    num_sgd_iter=10,
    use_kl_loss=False,
    clip_param=0.1,
    grad_clip=0.5,
)

nadir_science_benchmark = Benchmark(
    env_args=env_args,
    policies=policies,
    policy_mapping_fn=policy_mapping_fn,
    module_specs=module_specs,
    training_args=training_args,
)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T23:57:39Z

+    ray.init(
+        ignore_reinit_error=True,
+        num_cpus=get_available_cores(),
+        object_store_memory=2_000_000_000,  # 2 GB
+        _temp_dir=temp_dir,
+    )
+    config = (
+        PPOConfig()
+        .training(**training_args)
+        .env_runners(
+            num_env_runners=num_env_runners,
+            sample_timeout_s=50000.0,


The print statements dumping module_specs (and the hard-coded RLModuleSpec dict) look like debugging leftovers and will spam logs on every run, especially on clusters. Consider removing them or switching to logger.debug(...) behind a verbosity flag.

Copilot · 2026-04-21T23:57:39Z

+
+
+def train(


The local variable name iter shadows Python’s built-in iter() function, which can be confusing and makes debugging harder (and can break code if iter is later needed in this scope). Rename to something like iteration/checkpoint_iter in load_existing_model/train.

Copilot · 2026-04-21T23:57:39Z

+        "--env",
+        type=str,
+        default="nadir_science:nadir_science_benchmark",


avs_rl_tools is imported here but it is not listed in pyproject.toml dependencies or optional extras, so running benchmarks/train.py will fail in a clean install. Either vendor/inline the small sanitize_np functionality, move it into this repo, or add the package to an appropriate optional dependency group.

Copilot · 2026-04-21T23:57:39Z

 except ImportError:
-    bskPath = __path__[0]
+    bsk_path = __path__[0]
    _DATA_FETCHER_API = False


The _DATA_FETCHER_API = False fallback path (where Basilisk’s dataFetcher isn’t available) is currently untested, and this change fixes a name mismatch that would only surface in that branch. Consider adding a unit test that forces _DATA_FETCHER_API to False (e.g., via monkeypatch) and asserts setup_gravity_bodies uses the fallback paths without raising.

Copilot · 2026-04-21T23:57:40Z

+    training_args={},
+    temp_dir="/tmp",
+):
+    """Configure a PPO model for training with sMDP discounting and asynchronous multiagent actions."""
+


training_args={} as a default argument is a mutable default and will be shared across calls to create_new_model, which can lead to surprising cross-run configuration leakage. Use training_args=None and initialize to {} inside the function (or use an immutable mapping type).

Suggested change

training_args={},

temp_dir="/tmp",

):

"""Configure a PPO model for training with sMDP discounting and asynchronous multiagent actions."""

training_args=None,

temp_dir="/tmp",

):

"""Configure a PPO model for training with sMDP discounting and asynchronous multiagent actions."""

if training_args is None:

training_args = {}

Copilot · 2026-04-21T23:57:40Z

+# TODO remove, for cluster only
+torch.set_num_threads(11)
+os.environ["MKL_NUM_THREADS"] = "11"


This script forces torch.set_num_threads(11) and MKL_NUM_THREADS=11 unconditionally, which can severely underutilize or oversubscribe CPUs depending on the machine/SLURM allocation and makes runs non-reproducible across environments. Consider deriving this from get_available_cores() (and/or a CLI flag) and only setting it when explicitly requested.

Copilot AI review requested due to automatic review settings April 21, 2026 23:54

Copilot started reviewing on behalf of Mark2000 April 21, 2026 23:54 View session

Mark2000 force-pushed the feature/benchmark-envs branch from f4e79b1 to dfae80a Compare April 21, 2026 23:55

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Mark2000 force-pushed the feature/benchmark-envs branch from fe9180d to 0b7b2b1 Compare May 7, 2026 02:32

Mark2000 added 14 commits May 7, 2026 11:59

[MINOR] Get rid of explicit SPICE unloading

f2f382c

[MINOR] Fix variable spelling mismatch

f1054bc

First pass at benchmarks

3b296ed

more envs

b638636

fix path

ee21cf2

f

eed31d2

more envs

230ec72

temp test script

f1cac3a

[MINOR] Fix reference attitude generation and imaging FSW inheritance

8a03951

simplify rso inspection

9b797a7

f

153b3ba

f

e040964

Optimizations to RSO inspection

d2415b3

final?

ed7c2ff

Mark2000 force-pushed the feature/benchmark-envs branch from 5828acc to ed7c2ff Compare May 7, 2026 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/benchmark envs#338

Feature/benchmark envs#338
Mark2000 wants to merge 14 commits intodevelopfrom
feature/benchmark-envs

Mark2000 commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def train(

Conversation

Mark2000 commented Apr 21, 2026

Description

Type of change

How should this pull request be reviewed?

How Has This Been Tested?

Future Work

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants