Skip to content

Fix: Episode freeze with staggered agent death in multi-agent environments#2

Open
yoosunghong wants to merge 1 commit intoGPUOpen-LibrariesAndSDKs:mainfrom
yoosunghong:fix/staggered-death-freeze
Open

Fix: Episode freeze with staggered agent death in multi-agent environments#2
yoosunghong wants to merge 1 commit intoGPUOpen-LibrariesAndSDKs:mainfrom
yoosunghong:fix/staggered-death-freeze

Conversation

@yoosunghong
Copy link
Copy Markdown

Hello, I would like to express my sincere gratitude for your dedication to this project. Schola has been an invaluable tool for my research.

While using it, I identified a minor issue in a multi-agent setup and prepared a potential fix. I'm not sure if this is the best approach, but I wanted to share it in hopes that it might be useful. I would greatly appreciate any feedback you may have.


Problem

In multi-agent environments using Schola + RLlib with NEXT_STEP autoreset, agents dying at different timesteps cause two compounding failures:

  1. Episode Freeze (Python → C++): When an agent dies, RLlib stops sending its actions. Unreal's Step() receives an incomplete action map, causing the environment to stall.
  2. Stale Data Leak (C++ → Python): Unreal re-emitted observations for dead agents because Step() could overwrite bTerminated flags back to false, preventing AllAgentsCompleted() from ever returning true.

Result: Any multi-agent episode with staggered deaths hangs permanently after the first agent dies.


Root Cause

  • Python (RayEnv.step): Raw RLlib actions (live agents only) were forwarded directly without padding for dead agents.
  • C++ (AbstractGymConnector): Step() was called with all received actions, allowing environment implementations to accidentally clear terminal flags on dead agents.

Fix Details

Python Side (schola/rllib/env.py)

  • _make_noop_action(): Generates zero-valued actions matching any action space structure.
  • Action Padding: RayEnv and RayVecEnv now pad previously-dead agents with no-op actions.
  • Data Filtering: Observations/rewards for already-dead agents are filtered out from the response.

C++ Side (AbstractGymConnector.cpp)

  • Snapshot & Filter: In the NextStep branch, it now snapshots terminal agents before Step() and builds a LiveActions map.
  • State Restoration: Manually restores terminal flags for previously-dead agents post-step to prevent accidental "revival."

Testing & Verification

Unit Tests (No Unreal Required)

python -m pytest Test/rllib/test_staggered_death.py -v
  • Covers: No-op generation, staggered death flow, padding logic, and __all__ computation.

Integration Test (UE5.6 Environment)

Verified against a real UE5.6 environment with 3 agents dying at steps 5, 10, and 15.

  • Mean ep_len_mean: 14.0 (Expected 15, tolerance ±2)
  • Episodes completed: 274 (30 PPO iterations)
  • Hang detected: No
  • Reproduction Project: [https://github.com/yoosunghong/ScholaStaggeredTest]

📝 Compliance Checklist

  • Python code is formatted using Black.
  • C++ code follows the Unreal Style Guide.
  • All new tests pass locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant