DuelingDQN by qgallouedec · Pull Request #127 · Stable-Baselines-Team/stable-baselines3-contrib

qgallouedec · 2022-12-10T10:21:28Z

Description

TODOs

Make learning curves
Complete DuelingDQN short presentation
Fill the result tabular
Results validation against another implementation
Create a branch in rl-zoo3
Check if that command lines in the doc work

Naming convention

Method : Dueling DQN
Class name : DuelingDQN
Lowercase : dueling_dqn
branch : dueling-dqn

Context

I have raised an issue to propose this change (required)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

Note: we are using a maximum length of 127 characters per line

araffin · 2022-12-10T11:17:18Z

Just so you know, I had in mind to implement rainbow: DLR-RM/stable-baselines3#622
and make double dqn / dueling dqn special case of it.
But so far I had no time, so I think it's good to have reference implementation here =)

btw, double dqn is implemented here: https://github.com/osigaud/stable-baselines3/blob/feat/double-dqn/stable_baselines3/dqn/dqn.py#L170-L176

Related: DLR-RM/stable-baselines3#487

qgallouedec · 2022-12-10T11:20:32Z

Just so you know, I had in mind to implement rainbow: DLR-RM/stable-baselines3#622
and make double dqn / dueling dqn special case of it.

Agree, it would be much better

btw, double dqn is implemented here: https://github.com/osigaud/stable-baselines3/blob/feat/double-dqn/stable_baselines3/dqn/dqn.py#L170-L176

Thanks for the ref!

qgallouedec · 2022-12-10T13:19:56Z

@araffin I want your opinion on this:
Wouldn't it make more sense to modify DQN to make DuelingDQN a special case of DQN? Rather than rewriting DQN, as I am doing on this PR.
Eventually, we would have the same structure as DDPG/TD3.
Moreover, the code written would be more easily reusable for Rainbow.

araffin · 2022-12-12T17:00:29Z

+        "MultiInputPolicy": MultiInputPolicy,
+    }
+
+    def __init__(


if init is the same as DQN, I guess you can drop it.

The only (but necessary) diff is the policy type hint.

araffin · 2022-12-12T17:01:54Z

e to modify DQN to make DuelingDQN a special case of DQN?

you mean DQN a special case of DuelingDQN?
how would that look like?

Rather than rewriting DQN, as I am doing on this PR.

you are mainly changing the policy, no?

qgallouedec · 2022-12-12T17:26:55Z

you are mainly changing the policy, no?

I'm changing only the policy.

qgallouedec · 2022-12-12T17:41:41Z

you mean DQN a special case of DuelingDQN?

Yes, it makes more sense in that order. But this would require putting DuelingDQN in the main repo.
That said, the cleanest way for future implementation would be that DuelingDQN is a special case of RAINBOW (in which all tricks except dueling are disabled), and DQN is another special case of RAINBOW (in which all tricks are disabled).

how would that look like?

I'll make a draft PR in SB3

araffin · 2022-12-16T10:51:45Z

That said, the cleanest way for future implementation would be that DuelingDQN is a special case of RAINBOW

yes, so for now, I would keep it as is. My goal with Vanilla DQN in SB3 was to keep it as simple as possible to be a reference when learning or implementing other things.

qgallouedec · 2023-01-09T12:57:51Z

On second thought, it might be more practical to work in reverse. I mean, instead of implementing RAINBOW all at once, I was thinking of implementing the features one by one upon DQN, disabling them by default. And at the renaming DQN to RAINBOW, changing the defaults. Roughly we would have:

Currently:

class DQN:
    def __init__(self, policy, env, ...): ...

We implement dueling:

class DQN:
    def __init__(self, policy, env, ..., dueling=False): ...

class DuelingDQN(DQN):
    def __init__(self, policy, env, ...):
        super().__init__(policy, env, ..., dueling=True)

Then we implement PER

class DQN:
   def __init__(self, policy, env, ..., dueling=False, per=False): ...

class DuelingDQN(DQN):
   def __init__(self, policy, env, ..., per=False):
       super().__init__(policy, env, ..., dueling=True, per=per)

class PERDQN(DQN):
   def __init__(self, policy, env, ..., dueling=False):
       super().__init__(policy, env, ..., dueling=dueling, per=True)

And so on.

Once all the features have been implemented, we can rename DQN to RAINBOW, changing only the default values, and add a DQN class that inherits from RAINBOW:

class RAINBOW:
   def __init__(self, policy, env, ..., dueling=True, per=True):  ...

class DuelingDQN(RAINBOW):
   def __init__(self, policy, env, ..., per=False):
       super().__init__(policy, env, ..., dueling=True, per=per)

class PERDQN(RAINBOW):
   def __init__(self, policy, env, ..., dueling=False):
       super().__init__(policy, env, ..., dueling=dueling, per=True)

class DQN(RAINBOW):
   def __init__(self, policy, env, ...):
       super().__init__(policy, env, ..., dueling=False, per=False)

The main constraint is that when renaming DQN -> RAINBOW, and then creating a DQN class that inherits from RAINBOW, we introduce a breaking change, since arguments from DQN are removed (per, dueling, etc.). They would have to be deprecated first.

araffin · 2023-01-13T09:25:31Z

I mean, instead of implementing RAINBOW all at once, I was thinking of implementing the features one by one upon DQN, disabling them by default.

why not implement all at once?

Compared to DDPG vs TD3, obtaining DQN from Rainbow is not trivial (correct me if I'm wrong). As DQN is a key algorithm in RL, I would really like to keep it simple so people can study it (even if it means duplicated code when implementing DQN extensions).
And I would see two "family" of algorithms: DQN vs RAINBOW (with more or less extensions), so PERDQN, DuelingDQN, ... would all derive from RAINBOW but not DQN.

What you propose is what I had in mind (so Dueling special case of RAINBOW) except for DQN.

qgallouedec · 2023-01-13T10:25:21Z

why not implement all at once?

We could.
I don't know exactly what changes will be needed in the library to implement RAINBOW. But let's take PER as an example. I think it would be best to implement it as HER, i.e. as a subclass of ReplayBuffer. To make PER compatible with all off-policy algorithms, we will have to find a way to update the weights after the TD error calculation. Probably we will have to add in all algorithms a call to the method self.replay_buffer.update_weight(td_errors) or (or self.replay_buffer.update_weight(idxs, td_errors), depending on how we go about it), and thus add this method to the parent class ReplayBuffer.
All in all, it shouldn't be complicated but it concerns a lot of files, so I think it would be much easier to work on it on a dedicated PR.

I don't know if the same reasoning applies to other rainbow features.

Compared to DDPG vs TD3, obtaining DQN from Rainbow is not trivial (correct me if I'm wrong).

Right now I imagine that it should not be complicated, but I may be wrong, I would have to look into it in detail.

As DQN is a key algorithm in RL, I would really like to keep it simple so people can study it (even if it means duplicated code when implementing DQN extensions). And I would see two "family" of algorithms: DQN vs RAINBOW (with more or less extensions), so PERDQN, DuelingDQN, ... would all derive from RAINBOW but not DQN.

So DQN would be a special case for which we would keep a readable implementation by not deriving it from RAINBOW, right? It seems legitimate.

DuelingDQN

61a24de

qgallouedec mentioned this pull request Dec 10, 2022

DuelingDQN #126

Closed

18 tasks

qgallouedec added 10 commits December 10, 2022 11:36

dueling_dqn.rst

888a06b

Update changelog

0d19cfe

Add example in example.rst

0a30312

add dueling to index.rst

726d4b9

Add policy_aliases

400e636

test-cnn

b2ee629

typo

823760b

simplification

bf44d99

typo

372da3e

Rm policy_kwargs as error from copying from DRDQN

d93f00b

qgallouedec added 4 commits December 10, 2022 12:26

Update README

bd8755a

Update setup.py

9b23663

Add dueling to the list of algorithm

7a34a77

ignore mypy error

989f31f

Merge branch 'master' into feat/dueling-dqn

738f09f

araffin reviewed Dec 12, 2022

View reviewed changes

qgallouedec mentioned this pull request Nov 22, 2023

Prioritized experience replay DLR-RM/stable-baselines3#1622

Open

16 tasks

Conversation

qgallouedec commented Dec 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

TODOs

Context

Types of changes

Checklist:

Uh oh!

araffin commented Dec 10, 2022

Uh oh!

qgallouedec commented Dec 10, 2022

Uh oh!

qgallouedec commented Dec 10, 2022

Uh oh!

araffin Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

qgallouedec Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

araffin Dec 16, 2022

Choose a reason for hiding this comment

Uh oh!

araffin commented Dec 12, 2022

Uh oh!

qgallouedec commented Dec 12, 2022

Uh oh!

qgallouedec commented Dec 12, 2022

Uh oh!

araffin commented Dec 16, 2022

Uh oh!

qgallouedec commented Jan 9, 2023

Uh oh!

araffin commented Jan 13, 2023

Uh oh!

qgallouedec commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented Dec 10, 2022 •

edited

Loading

qgallouedec commented Jan 13, 2023 •

edited

Loading