fix(saute): coerce tensors to correct device in SauteAdapter#387
Open
96sanjay wants to merge 1 commit into
Open
fix(saute): coerce tensors to correct device in SauteAdapter#38796sanjay wants to merge 1 commit into
96sanjay wants to merge 1 commit into
Conversation
3e215b6 to
467deed
Compare
When the environment returns CPU tensors while training on GPU, SauteAdapter crashes on the first step because _safety_obs lives on the training device but env outputs (obs, reward, cost, etc.) are still on CPU. Coerce all env outputs to self._device at the step() and reset() boundary. Also guard _augment_obs for final_observation from the info dict. No-op when tensors are already on the correct device.
467deed to
6589a1b
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hit a device mismatch crash running PPOSaute on GPU with a custom env that returns cpu tensors.
_safety_obsis on cuda but env outputs stay on cpu, so the in-place ops in_safety_stepfail immediately withRuntimeError: Expected all tensors to be on the same device.Fix coerces all env outputs to
self._deviceat the top ofstep()andreset(), plus a guard in_augment_obsforfinal_observation. No-op if everything is already on theright device.
Motivation and Context
Affects any custom env that doesn't move its outputs to the training device — pretty common outside of mujoco/safety-gym. Crashes on the very first step.
Types of changes
Checklist
make format. (required)make lint. (required)make testpass. (required)