Skip to content

Fixes benchmark scripts when tensorboard logs are missing#5564

Open
kellyguo11 wants to merge 2 commits intoisaac-sim:developfrom
kellyguo11:fix-benchmark-scripts
Open

Fixes benchmark scripts when tensorboard logs are missing#5564
kellyguo11 wants to merge 2 commits intoisaac-sim:developfrom
kellyguo11:fix-benchmark-scripts

Conversation

@kellyguo11
Copy link
Copy Markdown
Contributor

@kellyguo11 kellyguo11 commented May 10, 2026

Description

When benchmarking scripts are executed with num_iterations set to below the threshold for reward logging, the run can produce missing reward data. However, the scripts are hardcoded to always parse rewards from tensorboard, which may not exist in these cases. This change patches the RL benchmarking scripts to only process rewards logging if they were written to tensorboard.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@kellyguo11 kellyguo11 requested a review from ooctipus as a code owner May 10, 2026 06:58
@github-actions github-actions Bot added bug Something isn't working isaac-lab Related to Isaac Lab team labels May 10, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 10, 2026

Greptile Summary

This PR guards the rewards and episode_lengths TensorBoard log lookups in both RL benchmarking scripts with .get() and conditional logging, preventing KeyError crashes when training runs fewer iterations than the reward-logging threshold.

  • benchmark_rlgames.py: log_data["rewards/iter"] and log_data["episode_lengths/iter"] replaced with .get() calls; log_rl_policy_rewards, log_rl_policy_episode_lengths, and log_convergence are now skipped with a printed warning when the keys are absent.
  • benchmark_rsl_rl.py: Identical pattern applied to Train/mean_reward and Train/mean_episode_length, and convergence checking is similarly guarded.

Confidence Score: 4/5

Safe to merge; the fix correctly targets the crash path and leaves the always-present performance timing metrics untouched.

The change is small and well-scoped. The only concern is that if rewards: (a falsy check on a list) is used instead of the more explicit if rewards is not None:, which could produce a misleading warning if parse_tf_logs ever returns an empty list for a key that was registered but had no events written — a subtle but unlikely edge case.

Both changed files carry the same if rewards: pattern; no other files need attention.

Important Files Changed

Filename Overview
scripts/benchmarks/benchmark_rlgames.py Switches reward and episode-length log lookups to .get() with truthiness guards; minor concern that if rewards: (falsy check) should be if rewards is not None: to correctly distinguish a missing key from an empty data list.
scripts/benchmarks/benchmark_rsl_rl.py Same defensive .get() pattern applied to Train/mean_reward and Train/mean_episode_length; same if rewards: vs if rewards is not None: nit applies here as well.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[parse_tf_logs] --> B{rewards key present?}
    B -- "None (missing)" --> C[Print WARNING: missing rewards]
    B -- "list of values" --> D[log_rl_policy_rewards]
    D --> E[log_convergence]
    C --> F{check_convergence flag?}
    F -- yes --> G[Print WARNING: cannot check convergence]
    F -- no --> H[skip]
    E --> I[log_success / finalize]
    G --> I
    H --> I
    A --> J{episode_lengths key present?}
    J -- "None (missing)" --> K[Print WARNING: missing episode lengths]
    J -- "list of values" --> L[log_rl_policy_episode_lengths]
    L --> I
    K --> I
Loading

Comments Outside Diff (4)

  1. scripts/benchmarks/benchmark_rlgames.py, line 292-301 (link)

    P2 Using a plain truthiness check (if rewards:) here conflates two different situations: the key being absent from log_data (returns None) and the key being present but having no events yet (returns []). Both produce the same "TensorBoard log is missing" warning message even though the second case means the key was registered but no data was emitted. Using is not None makes the intent explicit and keeps the two cases distinguishable for future debugging.

  2. scripts/benchmarks/benchmark_rlgames.py, line 306-317 (link)

    P2 Same truthiness-vs-None concern as above: the guard on log_convergence should match the guards above so the two branches stay semantically consistent.

  3. scripts/benchmarks/benchmark_rsl_rl.py, line 291-300 (link)

    P2 Same if rewards: vs if rewards is not None: concern as in benchmark_rlgames.py — applies to both the rewards and episode-lengths guards here.

  4. scripts/benchmarks/benchmark_rsl_rl.py, line 306-317 (link)

    P2 Same consistency concern for the convergence guard in benchmark_rsl_rl.py.

Reviews (1): Last reviewed commit: "add CI tests" | Re-trigger Greptile

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

Fixes a KeyError crash in benchmark scripts when num_iterations is below the reward logging threshold. The fix replaces direct dict access (log_data["key"]) with .get() + conditional processing and appropriate warnings. Clean, well-scoped defensive fix applied consistently to both benchmark_rlgames.py and benchmark_rsl_rl.py.

Design Assessment

Design is sound. The fix correctly converts hard crashes into graceful degradation with user-visible warnings. The warning pattern (print("[WARNING]...")) is consistent with existing conventions in the benchmark scripts and utils.py.

Findings

🔵 Suggestion: Consider if rewards is not None: for consistencyscripts/benchmarks/benchmark_rsl_rl.py:291

The code uses if rewards: (truthiness) but the adjacent success_rates check uses if success_rates is not None:. Both protect against missing keys, but the truthiness check also treats empty lists as "missing." In practice this is fine — TensorBoard won't produce empty lists for registered tags, and log_rl_policy_rewards() would crash on max([]) anyway — but is not None would be more semantically precise and consistent with the existing pattern. Not a bug, just a consistency nit.

Test Coverage

  • Tests not practical — standalone GPU benchmark scripts lack unit-test infrastructure
  • The fix is mechanical (.get() + guard) with very low regression risk
  • Acceptable without additional tests

CI Status

  • Check for Broken Links: ✅ pass
  • Most other checks: ⏳ pending

Verdict

Ship it

Clean, correct, and well-scoped bug fix. The if rewards: truthiness check is safe for the actual data patterns produced by parse_tf_logs(). The consistency suggestion is a minor polish, not a blocker.

episode_length_tag="Train/mean_episode_length",
task=args_cli.task,
workflow="rsl_rl",
should_check_convergence=args_cli.check_convergence,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Suggestion: For consistency with the success_rates is not None check a few lines below, consider using if rewards is not None: here (and for episode_lengths). Both work correctly in practice — parse_tf_logs returns None for missing keys — but is not None makes the intent clearer and matches the existing pattern in this file.

Non-blocking — the truthiness check is also safe since an empty list from TensorBoard is practically impossible, and log_rl_policy_rewards would crash on max([]) anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant