[Feature] Collector final_obs: store true boundary next-obs for shifted-GAE by vmoens · Pull Request #3758 · pytorch/rl

vmoens · 2026-05-15T07:20:47Z

Stack from ghstack (oldest at bottom):

When compact_obs=True is paired with shifted-GAE, the last step of
each rollout window has no ("next", obs) to read; GAE was falling
back to V(s_{T-1}) via _fill_missing_next_inputs and biasing
the advantage by 1/T at every window boundary that is not a real done.

Add a final_obs flag (requires compact_obs=True) on the single
and multi-process collectors. When on, the collector maintains a
per-env side buffer that mirrors the most recent ("next", k) in
place; the rollout returned to the consumer carries each leaf under
("final", k) wrapped in :class:tensordict.UnbatchedTensor so the
[*envs, T] batch shape is preserved (the buffer has no time dim).

On the consumer side, ValueEstimatorBase gets two new static
methods:

_apply_final_obs_to_next_done finds the synthetic last-step
positions inside the interleaved bootstrap batch and substitutes
("final", k) values for the bootstrap input. No-op when
("final", ...) is absent — non-final-obs callers are unaffected.
_maybe_drop_final_obs deletes ("final", ...) from the
consumed tensordict. UnbatchedTensor leaves are incompatible
with contiguous-storage replay buffers (LazyTensorStorage,
LazyMemmapStorage); dropping after consumption keeps the
collector -> GAE -> ReplayBuffer.extend() pipeline clean.

All five estimators (TD0 / TD1 / TDLambda / GAE / VTrace) invoke
_maybe_drop_final_obs at the end of forward.

Also documents the memmap-vs-on-device tradeoff on
LazyMemmapStorage — the example below the stack picks memmap to
keep large rollout buffers off-device.

Tests cover: final_obs=True without compact_obs raises;
boundary obs match a compact_obs=False reference (non-done envs);
UnbatchedTensor survives indexing along time; bootstrap parity
between compact+final and a non-compact reference for TD0 / TD1 /
TDLambda / GAE; and ("final", ...) is dropped from the returned
tensordict after consumption.

[ghstack-poisoned]

pytorch-bot · 2026-05-15T07:20:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3758

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ 18 New Failures, 1 Unrelated Failure, 20 Unclassified Failures

As of commit 843af6f with merge base 0a01ee8 ():

NEW FAILURES - The following jobs have failed:

Build Aarch64 Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / build-wheel-py3_10-cpu-aarch64 (gh)
Build Aarch64 Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / upload / upload-wheel-py3_10-cpu-aarch64 (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_rl__3.10_cpu_aarch64
Build Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / build-manywheel-py3_10-cpu (gh)
/__w/_temp/conda_environment_25983286306/lib/python3.10/site-packages/torch/include/ATen/core/stack.h:158:11: error: expected constructor, destructor, or type conversion before ‘(’ token
Build Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / build-manywheel-py3_10-cuda12_6 (gh)
/__w/_temp/conda_environment_25983286306/lib/python3.10/site-packages/torch/include/ATen/core/stack.h:158:11: error: expected constructor, destructor, or type conversion before ‘(’ token
Build Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / build-manywheel-py3_10-cuda13_0 (gh)
/__w/_temp/conda_environment_25983286306/lib/python3.10/site-packages/torch/include/ATen/core/stack.h:158:11: error: expected constructor, destructor, or type conversion before ‘(’ token
Build Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / build-manywheel-py3_10-cuda13_2 (gh)
/__w/_temp/conda_environment_25983286306/lib/python3.10/site-packages/torch/include/ATen/core/stack.h:158:11: error: expected constructor, destructor, or type conversion before ‘(’ token
Build Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / upload / upload-manywheel-py3_10-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_rl__3.10_cpu_x86_64
Build Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / upload / upload-manywheel-py3_10-cuda12_6 (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_rl__3.10_cu126_x86_64
Build Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / upload / upload-manywheel-py3_10-cuda13_0 (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_rl__3.10_cu130_x86_64
Build Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / upload / upload-manywheel-py3_10-cuda13_2 (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_rl__3.10_cu132_x86_64
Build M1 Wheels / pytorch/rl / build-wheel-py3_10-cpu (gh)
/Users/ec2-user/runner/_work/_temp/conda_environment_25983286325/lib/python3.10/site-packages/torch/include/ATen/core/stack.h:160:14: error: use of undeclared identifier 'Types'
Build M1 Wheels / pytorch/rl / upload / upload-wheel-py3_10-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_rl__3.10_cpu_
Build Windows Wheels / pytorch/rl / build-wheel-py3_10-cpu (gh)
Process completed with exit code 1.
Build Windows Wheels / pytorch/rl / upload / upload-wheel-py3_10-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_rl__3.10_cpu_x64
Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Process completed with exit code 1.
Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Process completed with exit code 1.
Unit-tests on Linux / tests-olddeps (3.10, 11.8) / linux-job (gh)
test/transforms/test_env_transforms.py::TestTargetReturn::test_parallel_trans_env_check[device0-constant]
Unit-tests on Linux / tests-optdeps (3.12, 13.0) / linux-job (gh)
RuntimeError: Command docker exec -t 9037bf744ff71dd1292be44f93a96d2bfa678c0a5f937a654d0ecfdccc38b27f /exec failed with exit code 1

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Push Binary Nightly / build-wheel-unix (linux, ubuntu-22.04, 3.10, cp310-cp310, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Push Binary Nightly / build-wheel-unix (linux, ubuntu-22.04, 3.11, cp311-cp311, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Push Binary Nightly / build-wheel-unix (linux, ubuntu-22.04, 3.12, cp312-cp312, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/torch/include/ATen/core/stack.h:158:11: error: expected constructor, destructor, or type conversion before ‘(’ token
Push Binary Nightly / build-wheel-unix (linux, ubuntu-22.04, 3.13, cp313-cp313, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
/opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/site-packages/torch/include/ATen/core/stack.h:158:11: error: expected constructor, destructor, or type conversion before ‘(’ token
Push Binary Nightly / build-wheel-unix (linux, ubuntu-22.04, 3.14, cp314-cp314, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Push Binary Nightly / build-wheel-unix (macos, macos-latest, 3.10, cp310-cp310, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Push Binary Nightly / build-wheel-unix (macos, macos-latest, 3.11, cp311-cp311, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Push Binary Nightly / build-wheel-unix (macos, macos-latest, 3.12, cp312-cp312, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Push Binary Nightly / build-wheel-unix (macos, macos-latest, 3.13, cp313-cp313, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/torch/include/ATen/core/stack.h:160:14: error: use of undeclared identifier 'Types'
Push Binary Nightly / build-wheel-unix (macos, macos-latest, 3.14, cp314-cp314, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Push Binary Nightly / test-wheel-unix (linux, ubuntu-22.04, 3.10, cp310-cp310, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: torchrl-linux-3.10_cpu.whl
Push Binary Nightly / test-wheel-unix (linux, ubuntu-22.04, 3.11, cp311-cp311, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: torchrl-linux-3.11_cpu.whl
Push Binary Nightly / test-wheel-unix (linux, ubuntu-22.04, 3.12, cp312-cp312, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: torchrl-linux-3.12_cpu.whl
Push Binary Nightly / test-wheel-unix (linux, ubuntu-22.04, 3.13, cp313-cp313, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: torchrl-linux-3.13_cpu.whl
Push Binary Nightly / test-wheel-unix (linux, ubuntu-22.04, 3.14, cp314-cp314, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: torchrl-linux-3.14_cpu.whl
Push Binary Nightly / test-wheel-unix (macos, macos-latest, 3.10, cp310-cp310, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: torchrl-macos-3.10_cpu.whl
Push Binary Nightly / test-wheel-unix (macos, macos-latest, 3.11, cp311-cp311, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: torchrl-macos-3.11_cpu.whl
Push Binary Nightly / test-wheel-unix (macos, macos-latest, 3.12, cp312-cp312, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: torchrl-macos-3.12_cpu.whl
Push Binary Nightly / test-wheel-unix (macos, macos-latest, 3.13, cp313-cp313, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: torchrl-macos-3.13_cpu.whl
Push Binary Nightly / test-wheel-unix (macos, macos-latest, 3.14, cp314-cp314, cpu, cpu) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: torchrl-macos-3.14_cpu.whl

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Unit-tests on Windows / unittests-cpu (3.10, windows.4xlarge, cpu) / windows-job (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens · 2026-05-15T08:00:09Z

+        final_obs (bool, optional): if ``True`` (and ``compact_obs=True``), the
+            collector additionally stores the true next-observation reached
+            after the last step of the rollout under a top-level ``("final", k)``
+            sub-tensordict for each observation/state key ``k`` that was
+            compacted away. The value is wrapped in
+            :class:`tensordict.UnbatchedTensor` (one obs per env, no time
+            dimension) so the rollout's batch shape ``[*envs, T]`` is preserved.
+
+            This closes the bootstrap-correctness gap when running with short
+            rollout windows: under ``compact_obs=True``, the ``("next", obs)``
+            of the very last step of each window is dropped, and shifted-GAE
+            falls back to bootstrapping ``V(s_T) ≈ V(s_{T-1})`` for that step
+            (a 1/T fraction of corruption). With ``final_obs=True``, GAE reads
+            the true ``s_T`` from ``("final", obs)`` instead.
+
+            The pipeline assumption is:
+            ``collector -> GAE(shifted=True) -> ReplayBuffer.extend()``.
+            :class:`~torchrl.objectives.value.advantages.GAE` consumes and
+            drops ``("final", ...)`` from the returned tensordict, so the
+            downstream replay buffer never sees an
+            :class:`~tensordict.UnbatchedTensor` (which would otherwise be
+            incompatible with a contiguous storage). Defaults to ``False``.


mark this as experimental (can go at any time).
You mention GAE but it could be any value estimator.

This feature is for correctness but in practice using the last root observation as trailing obs works well in practice.

[ghstack-poisoned]

Update

5589424

[ghstack-poisoned]

github-actions Bot added Feature New feature Objectives Collectors ReplayBuffers Integrations/torch_geometric Integrations and removed Feature New feature labels May 15, 2026

github-actions Bot added the Feature New feature label May 15, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2026

vmoens mentioned this pull request May 15, 2026

[Feature] Collector.fake_tensordict() / MultiCollector.fake_tensordict() #3761

Closed

vmoens commented May 15, 2026

View reviewed changes

Update

1d9e602

[ghstack-poisoned]

Update

843af6f

[ghstack-poisoned]

This was referenced May 17, 2026

[Test] Enable scan compile RNN tests on Windows #3770

Closed

[BugFix] Fix GAE compact path bias on recurrent value nets at internal truncations #3771

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Collector final_obs: store true boundary next-obs for shifted-GAE#3758

[Feature] Collector final_obs: store true boundary next-obs for shifted-GAE#3758
vmoens wants to merge 3 commits into
gh/vmoens/274/basefrom
gh/vmoens/274/head

vmoens commented May 15, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented May 15, 2026 •

edited

Loading

Uh oh!

vmoens May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vmoens commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3758

❗ 1 Active SEVs

❌ 18 New Failures, 1 Unrelated Failure, 20 Unclassified Failures

Uh oh!

vmoens May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vmoens commented May 15, 2026 •

edited

Loading

pytorch-bot Bot commented May 15, 2026 •

edited

Loading