[WIP] ReinforcementLearning.jl integration#9
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9 +/- ##
==========================================
- Coverage 92.48% 92.31% -0.17%
==========================================
Files 81 81
Lines 4005 3761 -244
==========================================
- Hits 3704 3472 -232
+ Misses 301 289 -12 ☔ View full report in Codecov by Sentry. |
22e4549 to
b606aa1
Compare
| actor = Chain( | ||
| Dense(ns, 256, relu; init = glorot_uniform(rng)), | ||
| Dense(256, na; init = glorot_uniform(rng)), | ||
| ), |
There was a problem hiding this comment.
Note that you are using the discrete version of PPO here. But the cart pole env here seems to be a continuous version. (The actions space is [-1.0, 1.0]). So you may take reference from https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/935f68b6cb378f9929a8d9914eb388e86213c86d/src/ReinforcementLearningExperiments/deps/experiments/experiments/Policy%20Gradient/JuliaRL_PPO_Pendulum.jl#L43-L50
There was a problem hiding this comment.
Good point! Thanks for checking in. Although currently I also need to define the reward/cost function for cartpole on Dojo side.
|
We should probably rethink the interface to ReinforcementLearning.jl once their updates are done (JuliaReinforcementLearning/ReinforcementLearning.jl#614) |
I realized that
CommonRLInterface.jlnever settled on what to do with continuous action spaces, so directly integrating with RLBase from ReinforcementLearning.jl.Will add tests and examples with PPO and DDPG.