Skip to content

Does ActorNetwork Module have problem? #3

@cycjdt

Description

@cycjdt

In hopper-v2_3bsac/networks.py/ActorNetwork1:
def sample_normal(self, state, reparameterize=True):
mu, sigma = self.forward(state)
probabilities = Normal(mu, sigma)

    if reparameterize:
        actions = probabilities.rsample()        # trick: mean+std*N(0,1)
    else:
        actions = probabilities.sample()

    action = T.tanh(actions)*T.FloatTensor(self.max_action).to(self.device)
    action = T.tanh(actions)         ############################################
    log_probs = probabilities.log_prob(actions)
    log_probs -= T.log(1-action.pow(2)+self.reparam_noise)
    log_probs = log_probs.sum(1, keepdim=True)

    return action, log_probs

There, the scope of the action is [-1,1].

In hopper-v2_3bsac/networks.py/ActorNetwork2:
def sample_normal(self, state, action, reparameterize=True):
mu, sigma = self.forward(state, action)
probabilities = Normal(mu, sigma)

    if reparameterize:
        actions = probabilities.rsample()
    else:
        actions = probabilities.sample()

    action = T.tanh(actions)*T.FloatTensor(self.max_action).to(self.device)
    log_probs = probabilities.log_prob(actions)
    log_probs -= T.log(1-action.pow(2)+self.reparam_noise)  ###################################
    log_probs = log_probs.sum(1, keepdim=True)

    return action, log_probs

There, pleaselook at the # position, the scope of 1-action.pow(2)+self.reparam_noise maybe less than 0.

Therefor, do we change the way we evaluate log_probs for this address,https://github.com/openai/spinningup/blob/master/spinup/algos/pytorch/sac/core.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions