Skip to content

[Feature Request] Implementing HER(Hindsight Experience Replay) #3713

@theap06

Description

@theap06

Motivation
Goal-conditioned RL is sparse-reward by nature. An agent manipulating objects, navigating to targets, or following language instructions almost always fails early in training — the desired goal is never reached, so the reward signal is nearly always zero and gradient flow stalls.

HER (Andrychowicz et al., 2017, https://arxiv.org/abs/1707.01495) solves this by relabeling failed trajectories at sample time: the achieved state is substituted as the goal, the reward is recomputed under that relabeled goal, and the transition becomes a valid positive training example. This turns failures into learning signal with no architectural changes to the policy or loss.

This is the standard baseline for any robotics or manipulation task. OpenAI Gym's GoalEnv was designed around it. Every major RL library (Stable-Baselines3, CleanRL, RLlib) has an implementation. TorchRL does not.

HER belongs in the replay buffer layer, not the loss or env. Relabeling happens at sample time, which is consistent with how PrioritizedSampler and SliceSampler extend sampling behavior without touching storage.

from torchrl.data import ReplayBuffer, LazyMemmapStorage
from torchrl.data.replay_buffers.samplers import HERSampler, HindsightStrategy

sampler = HERSampler(
strategy=HindsightStrategy.FUTURE, # "future" | "episode" | "random" | "final"
her_ratio=0.8, # fraction of sampled transitions to relabel
goal_key="desired_goal",
achieved_goal_key="achieved_goal",
reward_fn=recompute_reward, # Callable[[achieved, desired, info], Tensor]
)

rb = ReplayBuffer(storage=LazyMemmapStorage(100_000), sampler=sampler)
The buffer stores transitions with the original goal. HERSampler intercepts the sample call, selects relabeling candidates according to strategy, substitutes achieved_goal into desired_goal, calls reward_fn to recompute the scalar reward, and returns the modified tensordict. The policy and loss see nothing unusual.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions