Skip to content

The problem of different action time scales #864

@battery7

Description

@battery7

I want to use HPPO to handle mixed action space problems, but I encountered a problem where the time scales of discrete and continuous actions are different. For example, if a continuous action is executed once, the discrete action needs to be executed many times. How should I handle this issue? I think it's difficult to converge in setting the reward function. Or are there any other algorithms that support it, or am I discretizing continuous actions and then using a discrete action mask?

Metadata

Metadata

Assignees

No one assigned

    Labels

    algoAdd new algorithm or improve old one

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions