Skip to content

On GAE calculation math #583

@michael-lutz

Description

@michael-lutz

In losses.py, I noticed that the code includes the following step before returning value targets and advantages:

advantages = (rewards + discount * (1 - termination) * vs_t_plus_1 - values) * truncation_mask

From what I understand, compute_vs_minus_v_xs should return the standard GAE result. Why do we perform an additional TD computation at the end?

Second, I was hoping to ask why the value loss includes an extra 0.5 term:

v_loss = jnp.mean(v_error * v_error) * 0.5 * 0.5

Both these decisions seem non-standard. Did you find they improved performance empirically?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions