I think below code maybe has problem:
|
# train actor-critic by target loss |
|
self.actor_network.train( |
|
self.critic_network.train( |
|
y_batch, action_batch, state_batch |
|
) |
|
) |
The two gradients need to be calculated separately, because their loss functions are different.
I think it should be changed to below:
`
#for critic
self.critic_network.train(y_batch, action_batch, state_batch)
#for actor
actor_loss =-self.critic_network.critic(self.actor_network.actor_action(state_batch), state_batch).mean()
self.actor_network.optimizer.zero_grad()
actor_loss.backward()
self.actor_network.optimizer.step()
`
thanks!
I think below code maybe has problem:
PyTorch-DDPG-Stock-Trading/DDPG.py
Lines 85 to 90 in 0c5c2f9
The two gradients need to be calculated separately, because their loss functions are different.
I think it should be changed to below:
`
`
thanks!