Skip to content

a question in DDPG.train_on_batch #2

@lbgitjp

Description

@lbgitjp

I think below code maybe has problem:

# train actor-critic by target loss
self.actor_network.train(
self.critic_network.train(
y_batch, action_batch, state_batch
)
)

The two gradients need to be calculated separately, because their loss functions are different.
I think it should be changed to below:

`

    #for critic
    self.critic_network.train(y_batch, action_batch, state_batch)

    #for actor
    actor_loss =-self.critic_network.critic(self.actor_network.actor_action(state_batch), state_batch).mean()
    self.actor_network.optimizer.zero_grad()
    actor_loss.backward()
    self.actor_network.optimizer.step()

`
thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions