Skip to content

[Question] Why env_change[batch_inds] is not considered during _get_samples(*) in RecurrentRolloutBuffer? #284

@Hhannzch

Description

@Hhannzch

❓ Question

When getting batches from a well-collected RecurrentRolloutBuffer, only episode_starts[batch_inds] will be returned to the sequence data. And this "episode_starts" is important for lstm policy to reset the hidden state during the training.
However, I have a question about the behavior here. As the seq_start_indices are decided together by both episode_starts and env_change, why are only episode_starts returned?
To be more clear, why the line 240 in common.recurrent.buffers is like "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds])" instead of "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds] or env_change[batch_inds])"?

Thank you for the explanation in advance.

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions