In hopper-v2_3bsac/networks.py/ActorNetwork1:
def sample_normal(self, state, reparameterize=True):
mu, sigma = self.forward(state)
probabilities = Normal(mu, sigma)
if reparameterize:
actions = probabilities.rsample() # trick: mean+std*N(0,1)
else:
actions = probabilities.sample()
action = T.tanh(actions)*T.FloatTensor(self.max_action).to(self.device)
action = T.tanh(actions) ############################################
log_probs = probabilities.log_prob(actions)
log_probs -= T.log(1-action.pow(2)+self.reparam_noise)
log_probs = log_probs.sum(1, keepdim=True)
return action, log_probs
There, the scope of the action is [-1,1].
In hopper-v2_3bsac/networks.py/ActorNetwork2:
def sample_normal(self, state, action, reparameterize=True):
mu, sigma = self.forward(state, action)
probabilities = Normal(mu, sigma)
if reparameterize:
actions = probabilities.rsample()
else:
actions = probabilities.sample()
action = T.tanh(actions)*T.FloatTensor(self.max_action).to(self.device)
log_probs = probabilities.log_prob(actions)
log_probs -= T.log(1-action.pow(2)+self.reparam_noise) ###################################
log_probs = log_probs.sum(1, keepdim=True)
return action, log_probs
There, pleaselook at the # position, the scope of 1-action.pow(2)+self.reparam_noise maybe less than 0.
Therefor, do we change the way we evaluate log_probs for this address,https://github.com/openai/spinningup/blob/master/spinup/algos/pytorch/sac/core.py
In hopper-v2_3bsac/networks.py/ActorNetwork1:
def sample_normal(self, state, reparameterize=True):
mu, sigma = self.forward(state)
probabilities = Normal(mu, sigma)
There, the scope of the action is [-1,1].
In hopper-v2_3bsac/networks.py/ActorNetwork2:
def sample_normal(self, state, action, reparameterize=True):
mu, sigma = self.forward(state, action)
probabilities = Normal(mu, sigma)
There, pleaselook at the # position, the scope of 1-action.pow(2)+self.reparam_noise maybe less than 0.
Therefor, do we change the way we evaluate log_probs for this address,https://github.com/openai/spinningup/blob/master/spinup/algos/pytorch/sac/core.py