Skip to content

Feature/benchmark envs#338

Open
Mark2000 wants to merge 14 commits intodevelopfrom
feature/benchmark-envs
Open

Feature/benchmark envs#338
Mark2000 wants to merge 14 commits intodevelopfrom
feature/benchmark-envs

Conversation

@Mark2000
Copy link
Copy Markdown
Contributor

Description

Closes #XXX

python '/Users/markstephenson/avslab/bsk_rl/benchmarks/train.py' -o /Users/markstephenson/ray_results/bench_test -e nadir_science:nadir_science_benchmark --restart

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How should this pull request be reviewed?

  • By commit
  • All changes at once

How Has This Been Tested?

Please describe how tests have been updated to verify your changes.

Future Work

What future tasks are needed, if any?

Checklist

  • I have performed a self-review of my code
  • I have commented my code in hard-to-understand areas
  • I have made corresponding changes to the documentation and release notes
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works

Copilot AI review requested due to automatic review settings April 21, 2026 23:54
@Mark2000 Mark2000 force-pushed the feature/benchmark-envs branch from f4e79b1 to dfae80a Compare April 21, 2026 23:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a lightweight benchmark harness (env templates + a Ray RLlib PPO training entrypoint) to support repeatable training runs against specific benchmark environments, and fixes a Basilisk support-data path fallback bug.

Changes:

  • Fix Basilisk dataFetcher ImportError fallback path variable naming in WorldModel setup.
  • Add benchmarks/train.py RLlib PPO training script with checkpointing and dynamic benchmark loading.
  • Add a first benchmark environment (nadir_science) plus a small Benchmark dataclass template.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
src/bsk_rl/sim/world.py Fixes fallback Basilisk path variable used when dataFetcher API is unavailable.
benchmarks/train.py New RLlib PPO training/continue script for benchmark runs (checkpoint mgmt, dynamic env import).
benchmarks/nadir_science.py Defines a benchmark environment configuration (satellite model + env/training args).
benchmarks/env_template.py Introduces a simple Benchmark dataclass container for env/training configuration.
Comments suppressed due to low confidence (2)

benchmarks/train.py:359

  • The script initializes Ray and a PPO algorithm but never calls ppo.stop() / ray.shutdown() before exiting. This can leave worker processes running (especially in interactive or repeated benchmark runs) and can hold onto temp dirs/object store resources. Consider adding a try/finally around training to ensure cleanup.
    benchmarks/nadir_science.py:133
  • These print(...) statements will execute at import time, which is noisy when using the dynamic import in benchmarks/train.py and makes it harder to use these benchmarks as a library. Consider removing them or switching to logging guarded by if __name__ == "__main__" / a verbosity flag.

training_args = dict(
    lr=3e-5,
    gamma=0.999,
    train_batch_size=1000,
    num_sgd_iter=10,
    use_kl_loss=False,
    clip_param=0.1,
    grad_clip=0.5,
)

nadir_science_benchmark = Benchmark(
    env_args=env_args,
    policies=policies,
    policy_mapping_fn=policy_mapping_fn,
    module_specs=module_specs,
    training_args=training_args,
)


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread benchmarks/benchmark.py
Comment on lines +53 to +64
ray.init(
ignore_reinit_error=True,
num_cpus=get_available_cores(),
object_store_memory=2_000_000_000, # 2 GB
_temp_dir=temp_dir,
)
config = (
PPOConfig()
.training(**training_args)
.env_runners(
num_env_runners=num_env_runners,
sample_timeout_s=50000.0,
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The print statements dumping module_specs (and the hard-coded RLModuleSpec dict) look like debugging leftovers and will spam logs on every run, especially on clusters. Consider removing them or switching to logger.debug(...) behind a verbosity flag.

Copilot uses AI. Check for mistakes.
Comment thread benchmarks/benchmark.py
Comment on lines +154 to +156


def train(
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local variable name iter shadows Python’s built-in iter() function, which can be confusing and makes debugging harder (and can break code if iter is later needed in this scope). Rename to something like iteration/checkpoint_iter in load_existing_model/train.

Copilot uses AI. Check for mistakes.
Comment thread benchmarks/train.py Outdated
Comment on lines +238 to +240
"--env",
type=str,
default="nadir_science:nadir_science_benchmark",
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avs_rl_tools is imported here but it is not listed in pyproject.toml dependencies or optional extras, so running benchmarks/train.py will fail in a clean install. Either vendor/inline the small sanitize_np functionality, move it into this repo, or add the package to an appropriate optional dependency group.

Copilot uses AI. Check for mistakes.
Comment thread src/bsk_rl/sim/world.py
Comment on lines 51 to 53
except ImportError:
bskPath = __path__[0]
bsk_path = __path__[0]
_DATA_FETCHER_API = False
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _DATA_FETCHER_API = False fallback path (where Basilisk’s dataFetcher isn’t available) is currently untested, and this change fixes a name mismatch that would only surface in that branch. Consider adding a unit test that forces _DATA_FETCHER_API to False (e.g., via monkeypatch) and asserts setup_gravity_bodies uses the fallback paths without raising.

Copilot uses AI. Check for mistakes.
Comment thread benchmarks/benchmark.py
Comment on lines +44 to +48
training_args={},
temp_dir="/tmp",
):
"""Configure a PPO model for training with sMDP discounting and asynchronous multiagent actions."""

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

training_args={} as a default argument is a mutable default and will be shared across calls to create_new_model, which can lead to surprising cross-run configuration leakage. Use training_args=None and initialize to {} inside the function (or use an immutable mapping type).

Suggested change
training_args={},
temp_dir="/tmp",
):
"""Configure a PPO model for training with sMDP discounting and asynchronous multiagent actions."""
training_args=None,
temp_dir="/tmp",
):
"""Configure a PPO model for training with sMDP discounting and asynchronous multiagent actions."""
if training_args is None:
training_args = {}

Copilot uses AI. Check for mistakes.
Comment thread benchmarks/benchmark.py
Comment on lines +23 to +25
# TODO remove, for cluster only
torch.set_num_threads(11)
os.environ["MKL_NUM_THREADS"] = "11"
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script forces torch.set_num_threads(11) and MKL_NUM_THREADS=11 unconditionally, which can severely underutilize or oversubscribe CPUs depending on the machine/SLURM allocation and makes runs non-reproducible across environments. Consider deriving this from get_available_cores() (and/or a CLI flag) and only setting it when explicitly requested.

Copilot uses AI. Check for mistakes.
@Mark2000 Mark2000 force-pushed the feature/benchmark-envs branch from fe9180d to 0b7b2b1 Compare May 7, 2026 02:32
@Mark2000 Mark2000 force-pushed the feature/benchmark-envs branch from 5828acc to ed7c2ff Compare May 7, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants