[BugFix] Fix GAE compact path bias on recurrent value nets at internal truncations by vmoens · Pull Request #3771 · pytorch/rl

vmoens · 2026-05-17T16:39:18Z

Stack from ghstack (oldest at bottom):

_call_value_net_compact previously built data_in = [root[0:T], boundary]
along the time dim and read value_[t] = V(data_in[t+1]). For every internal
done[t]=True (t < T-1), that read was V(root_obs[t+1]) -- the
post-reset first observation of the next episode -- rather than the
env-returned ("next", obs)[t] (the true pre-reset truncation observation).
GAE bootstraps with (1 - terminated), so on envs that truncate without
terminating (Isaac-Ant, Pendulum-on-timeout, ...) the wrong
next_state_value was not masked out and flowed straight into the value
target. With recurrent value nets the wrong observation also corrupted the
LSTM hidden state going forward, cascading into downstream slots. The wandb
runs g71pk34w / x05igvw7 (shifted='compact') trailed sln6yf2a /
6c8ihgh7 (shifted=False) by ~20% end-of-traj reward at iter 1000 for
exactly this reason.

The fix replaces the T+1 interleave with a fused batched call: the root
and ("next", ...) streams are concatenated along a non-time batch dim
into a constant-shape [2*B, T, *F] tensor and the value net is invoked
once. Reads of value and value_ are simple batch-half slices. The
("final", k) collector contract still overrides the next side at slot
T-1. For recurrent value nets ("next", "is_init") |= root_is_init is
applied so the LSTM resets at every trajectory boundary, exactly matching
the shifted=False reference (verified byte-exact on the regression
fixture). _fill_missing_next_inputs handles compact_obs=True rollouts
that don't populate ("next", k). A time_idx == 0 guard keeps 1D
rollouts correct by unsqueezing a batch dim before the cat.

Shape stays constant across calls in a training run (no .item() syncs,
no Python branching on tensor values), so torch.compile and vmap
remain happy.

The regression test test_gae_recurrent_shifted_compact_matches_unshifted_isaac_shape
asserts compact matches shifted=False to within rel < 0.05 on an
Isaac-shaped multi-trajectory rollout (B=4, T=16, truncations every 4
steps, compact_obs=False semantics). Pre-fix LSTM/GRU rel-err: 0.52 /
0.52. Post-fix: < 1e-7 (FP noise) for both. All other 1972 tests in
test/objectives/test_values.py continue to pass.

[ghstack-poisoned]

pytorch-bot · 2026-05-17T16:39:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3771

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ 1 New Failure, 1 Cancelled Job, 2 Unrelated Failures

As of commit 45c9ef4 with merge base 5d11fa3 ():

NEW FAILURE - The following job has failed:

Libs Tests on Linux / unittests-sklearn (3.10, 12.8) / linux-job (gh)
test/libs/test_datasets.py::TestOpenML::test_data[mushroom_onehot]

CANCELLED JOB - The following job was cancelled. Please retry:

Unit-tests on Windows / unittests-cpu (3.10, windows.4xlarge, cpu) / windows-job (gh)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Build Windows Wheels / pytorch/rl / build-wheel-py3_10-cpu (gh) (trunk failure)
Process completed with exit code 1.
Build Windows Wheels / pytorch/rl / upload / upload-wheel-py3_10-cpu (gh) (trunk failure)
Unable to download artifact(s): Artifact not found for name: pytorch_rl__3.10_cpu_x64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

…l truncations `_call_value_net_compact` previously built `data_in = [root[0:T], boundary]` along the time dim and read `value_[t] = V(data_in[t+1])`. For every internal `done[t]=True` (`t < T-1`), that read was `V(root_obs[t+1])` -- the post-reset first observation of the next episode -- rather than the env-returned `("next", obs)[t]` (the true pre-reset truncation observation). GAE bootstraps with `(1 - terminated)`, so on envs that truncate without terminating (Isaac-Ant, Pendulum-on-timeout, ...) the wrong `next_state_value` was not masked out and flowed straight into the value target. With recurrent value nets the wrong observation also corrupted the LSTM hidden state going forward, cascading into downstream slots. The wandb runs `g71pk34w` / `x05igvw7` (`shifted='compact'`) trailed `sln6yf2a` / `6c8ihgh7` (`shifted=False`) by ~20% end-of-traj reward at iter 1000 for exactly this reason. The fix replaces the `T+1` interleave with a fused batched call: the root and `("next", ...)` streams are concatenated along a non-time batch dim into a constant-shape `[2*B, T, *F]` tensor and the value net is invoked once. Reads of `value` and `value_` are simple batch-half slices. The `("final", k)` collector contract still overrides the next side at slot `T-1`. For recurrent value nets `("next", "is_init") |= root_is_init` is applied so the LSTM resets at every trajectory boundary, exactly matching the `shifted=False` reference (verified byte-exact on the regression fixture). `_fill_missing_next_inputs` handles `compact_obs=True` rollouts that don't populate `("next", k)`. A `time_idx == 0` guard keeps 1D rollouts correct by unsqueezing a batch dim before the cat. Shape stays constant across calls in a training run (no `.item()` syncs, no Python branching on tensor values), so `torch.compile` and `vmap` remain happy. The regression test `test_gae_recurrent_shifted_compact_matches_unshifted_isaac_shape` asserts compact matches `shifted=False` to within `rel < 0.05` on an Isaac-shaped multi-trajectory rollout (`B=4`, `T=16`, truncations every 4 steps, `compact_obs=False` semantics). Pre-fix LSTM/GRU rel-err: 0.52 / 0.52. Post-fix: < 1e-7 (FP noise) for both. All other 1972 tests in `test/objectives/test_values.py` continue to pass. ghstack-source-id: f2d1a3e Pull-Request: #3771

Update

c865bca

[ghstack-poisoned]

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 17, 2026

github-actions Bot added the BugFix label May 17, 2026

vmoens mentioned this pull request May 17, 2026

[BugFix] Recurrent policy auto-register with policy_factory #3753

Merged

github-actions Bot added the Objectives label May 17, 2026

Update

b190811

[ghstack-poisoned]

This was referenced May 18, 2026

[Example] Expose compact GAE cat dimension #3775

Merged

[Doc] Migrate shifted=True callers to legacy/compact + docstring polish #3776

Merged

vmoens added 3 commits May 18, 2026 19:43

Update

23367d7

[ghstack-poisoned]

Update

9189736

[ghstack-poisoned]

Update

45c9ef4

[ghstack-poisoned]

vmoens merged commit 45c9ef4 into gh/vmoens/285/base May 19, 2026
107 of 113 checks passed

vmoens deleted the gh/vmoens/285/head branch May 19, 2026 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix GAE compact path bias on recurrent value nets at internal truncations#3771

[BugFix] Fix GAE compact path bias on recurrent value nets at internal truncations#3771
vmoens merged 5 commits into
gh/vmoens/285/basefrom
gh/vmoens/285/head

vmoens commented May 17, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented May 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vmoens commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3771

❗ 1 Active SEVs

❌ 1 New Failure, 1 Cancelled Job, 2 Unrelated Failures

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vmoens commented May 17, 2026 •

edited

Loading

pytorch-bot Bot commented May 17, 2026 •

edited

Loading