a question in DDPG.train_on_batch

I think below code maybe has problem:
https://github.com/JoshuaWu1997/PyTorch-DDPG-Stock-Trading/blob/0c5c2f9095f5871d26b573fd960fb772f4ea050b/DDPG.py#L85-L90

The two gradients need to be calculated separately, because their loss functions are different.
I think it should be changed to below:

`

        #for critic
        self.critic_network.train(y_batch, action_batch, state_batch)

        #for actor
        actor_loss =-self.critic_network.critic(self.actor_network.actor_action(state_batch), state_batch).mean()
        self.actor_network.optimizer.zero_grad()
        actor_loss.backward()
        self.actor_network.optimizer.step()

`
thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a question in DDPG.train_on_batch #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	# train actor-critic by target loss
	self.actor_network.train(
	self.critic_network.train(
	y_batch, action_batch, state_batch
	)
	)

a question in DDPG.train_on_batch #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions