diff --git a/CITATION.bib b/CITATION.bib index 1f24b4040c..0b60adbd49 100644 --- a/CITATION.bib +++ b/CITATION.bib @@ -6,5 +6,5 @@ @article{stable-baselines3 volume = {22}, number = {268}, pages = {1-8}, - url = {http://jmlr.org/papers/v22/20-1364.html} + url = {https://jmlr.org/papers/v22/20-1364.html} } diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 54a05a828f..d0a36351aa 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -18,7 +18,7 @@ https://github.com/DLR-RM/stable-baselines3 Note: If you do not follow the template (and its mandatory steps), your pull request will be ignored. If you are not familiar with creating a Pull Request, here are some guides: -- http://stackoverflow.com/questions/14680711/how-to-do-a-github-pull-request +- https://stackoverflow.com/questions/14680711/how-to-do-a-github-pull-request - https://help.github.com/articles/creating-a-pull-request/ diff --git a/NOTICE b/NOTICE index 6dbbda6480..5b480ad342 100644 --- a/NOTICE +++ b/NOTICE @@ -2,7 +2,7 @@ Large portion of the code of Stable-Baselines3 (in `common/`) were ported from S both licensed under the MIT License: before the fork (June 2018): -Copyright (c) 2017 OpenAI (http://openai.com) +Copyright (c) 2017 OpenAI (https://openai.com) after the fork (June 2018): Copyright (c) 2018-2019 Stable-Baselines Team diff --git a/README.md b/README.md index ccb5706104..1c16597757 100644 --- a/README.md +++ b/README.md @@ -256,7 +256,7 @@ To cite this repository in publications: volume = {22}, number = {268}, pages = {1-8}, - url = {http://jmlr.org/papers/v22/20-1364.html} + url = {https://jmlr.org/papers/v22/20-1364.html} } ``` @@ -264,7 +264,7 @@ Note: If you need to refer to a specific version of SB3, you can also use the [Z ## Maintainers -Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave), [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli) and [Quentin Gallouédec](https://gallouedec.com/) (@qgallouedec). +Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave), [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli) and [Quentin Gallouédec](https://github.com/qgallouedec) (@qgallouedec). **Important Note: We do not provide technical support, or consulting** and do not answer personal questions via email. Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/), or [Stack Overflow](https://stackoverflow.com/) in that case. @@ -279,7 +279,7 @@ If you want to contribute, please read [**CONTRIBUTING.md**](./CONTRIBUTING.md) The initial work to develop Stable Baselines3 was partially funded by the project *Reduced Complexity Models* from the *Helmholtz-Gemeinschaft Deutscher Forschungszentren*, and by the EU Horizon 2020 Research and Innovation Programme under grant number 951992 ([VeriDream](https://www.veridream.eu/)). -The original version, Stable Baselines, was created in the [robotics lab U2IS](http://u2is.ensta-paristech.fr/index.php?lang=en) ([INRIA Flowers](https://flowers.inria.fr/) team) at [ENSTA ParisTech](http://www.ensta-paristech.fr/en). +The original version, Stable Baselines, was created in the [robotics lab U2IS](http://u2is.ensta-paris.fr/?lang=en) ([INRIA Flowers](https://flowers.inria.fr/) team) at [ENSTA ParisTech](https://www.ensta.fr/en). Logo credits: [L.M. Tenkes](https://www.instagram.com/lucillehue/) diff --git a/docs/_static/img/colab-badge.svg b/docs/_static/img/colab-badge.svg index c08066ee33..98bb8a3786 100644 --- a/docs/_static/img/colab-badge.svg +++ b/docs/_static/img/colab-badge.svg @@ -1 +1 @@ - Open in ColabOpen in Colab + Open in ColabOpen in Colab diff --git a/docs/_static/img/colab.svg b/docs/_static/img/colab.svg index c2d30e973a..ce487f4407 100644 --- a/docs/_static/img/colab.svg +++ b/docs/_static/img/colab.svg @@ -1,7 +1,7 @@ - + - \ No newline at end of file + diff --git a/docs/conf.py b/docs/conf.py index 3db4240137..23062c2fea 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -3,7 +3,7 @@ # # This file does only contain a selection of the most common options. For a # full list see the documentation: -# http://www.sphinx-doc.org/en/master/config +# https://www.sphinx-doc.org/en/master/config # -- Path setup -------------------------------------------------------------- @@ -223,6 +223,6 @@ def setup(app): # Example configuration for intersphinx: refer to the Python standard library. # intersphinx_mapping = { # 'python': ('https://docs.python.org/3/', None), -# 'numpy': ('http://docs.scipy.org/doc/numpy/', None), -# 'torch': ('http://pytorch.org/docs/master/', None), +# 'numpy': ('https://docs.scipy.org/doc/numpy/', None), +# 'torch': ('https://pytorch.org/docs/master/', None), # } diff --git a/docs/guide/checking_nan.md b/docs/guide/checking_nan.md index 4c0b3ad30c..4c6fbdb8b7 100644 --- a/docs/guide/checking_nan.md +++ b/docs/guide/checking_nan.md @@ -152,4 +152,4 @@ As some datasets will sometimes fill missing values with NaNs as a surrogate val Here is some reading material about finding NaNs: -And filling the missing values with something else (imputation): +And filling the missing values with something else (imputation): diff --git a/docs/guide/examples.md b/docs/guide/examples.md index 4144d92c92..1958971a72 100644 --- a/docs/guide/examples.md +++ b/docs/guide/examples.md @@ -448,7 +448,7 @@ model = PPO.load(log_dir / "ppo_halfcheetah", env=vec_env) ## Hindsight Experience Replay (HER) -For this example, we use [Highway-Env](https://github.com/eleurent/highway-env) by [@eleurent](https://github.com/eleurent). +For this example, we use [Highway-Env](https://github.com/Farama-Foundation/HighwayEnv) by [@eleurent](https://github.com/eleurent). ```{image} ../_static/img/colab-badge.svg :target: https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/stable_baselines_her.ipynb @@ -639,7 +639,7 @@ Policies also offers a simple way to save/load weights as a NumPy vector, using and `load_from_vector()` method. Following example demonstrates reading parameters, modifying some of them and loading them to model -by implementing [evolution strategy (es)](http://blog.otoro.net/2017/10/29/visual-evolution-strategies/) +by implementing [evolution strategy (es)](https://blog.otoro.net/2017/10/29/visual-evolution-strategies/) for solving the `CartPole-v1` environment. The initial guess for parameters is obtained by running A2C policy gradient updates on the model. @@ -722,7 +722,7 @@ Some massively parallel simulation environments such as [EnvPool](https://github To use SB3 with these tools, you need to wrap the environment with tool-specific `VecEnvWrapper` that preprocesses the data for SB3, you can find links to some of these wrappers in [issue #772](https://github.com/DLR-RM/stable-baselines3/issues/772#issuecomment-1048657002). -- Isaac Lab wrapper: [link](https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/sb3.py) +- Isaac Lab wrapper: [link](https://github.com/isaac-sim/IsaacLab/blob/3e73d6dd79080fd7632488c061052a6edd52e230/source/isaaclab_rl/isaaclab_rl/sb3.py#L93) - Brax: [link](https://gist.github.com/araffin/a7a576ec1453e74d9bb93120918ef7e7) - EnvPool: [link](https://github.com/sail-sg/envpool/blob/main/examples/sb3_examples/ppo.py) - Getting SAC to Work on a Massive Parallel Simulator: diff --git a/docs/guide/export.md b/docs/guide/export.md index 2c4526fcd4..57386f2588 100644 --- a/docs/guide/export.md +++ b/docs/guide/export.md @@ -338,7 +338,7 @@ motivation for the code example above). The Coral chip is fast, with very low power consumption, but only has limited on-device training abilities. More information is on the webpage here: -. +. To deploy to a Coral, one must work via TFLite, and quantize the network to reflect the Coral's capabilities. The full chain to go from diff --git a/docs/guide/imitation.md b/docs/guide/imitation.md index d9aad6d818..ec4b65248a 100644 --- a/docs/guide/imitation.md +++ b/docs/guide/imitation.md @@ -13,5 +13,5 @@ imitation learning algorithms on top of Stable-Baselines3, including: You can install imitation with `pip install imitation`. The [imitation documentation](https://imitation.readthedocs.io/en/latest/) has more details -on how to use the library, including [a quick start guide](https://imitation.readthedocs.io/en/latest/getting-started/first-steps.html) +on how to use the library, including [a quick start guide](https://imitation.readthedocs.io/en/latest/getting-started/first_steps.html) for the impatient. diff --git a/docs/guide/install.md b/docs/guide/install.md index 5574feace5..1ad9ed08fa 100644 --- a/docs/guide/install.md +++ b/docs/guide/install.md @@ -8,7 +8,7 @@ Stable-Baselines3 requires python 3.10+ and PyTorch >= 2.3 ### Windows -We recommend using [Anaconda](https://conda.io/docs/user-guide/install/windows.html) for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.8 or above. +We recommend using [miniforge](https://github.com/conda-forge/miniforge#windows) for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.10 or above. For a quick start you can move straight to installing Stable-Baselines3 in the next step. @@ -22,7 +22,7 @@ issue with atari-py package. [See this discussion for more information](https:// To install Stable Baselines3 with pip, execute: ```bash -pip install stable-baselines3[extra] +pip install 'stable-baselines3[extra]' ``` :::{note} @@ -58,7 +58,7 @@ To contribute to Stable-Baselines3, with support for running tests and building ```bash git clone https://github.com/DLR-RM/stable-baselines3 && cd stable-baselines3 -pip install -e .[docs,tests,extra] +pip install -e '.[docs,tests,extra]' ``` ## Using Docker Images @@ -101,7 +101,7 @@ Note: if you are using a proxy, you need to pass extra params during build and do some [tweaks]: ```bash ---network=host --build-arg HTTP_PROXY=http://your.proxy.fr:8080/ --build-arg http_proxy=http://your.proxy.fr:8080/ --build-arg HTTPS_PROXY=https://your.proxy.fr:8080/ --build-arg https_proxy=https://your.proxy.fr:8080/ +--network=host --build-arg HTTPS_PROXY=https://your.proxy.fr:8080/ --build-arg https_proxy=https://your.proxy.fr:8080/ ``` ### Run the images (CPU/GPU) diff --git a/docs/guide/rl.md b/docs/guide/rl.md index 951dfaf02a..bade373f96 100644 --- a/docs/guide/rl.md +++ b/docs/guide/rl.md @@ -13,6 +13,6 @@ However, if you want to learn about RL, there are several good resources to get - [RL103: From Deep Q-Learning (DQN) to Soft Actor-Critic (SAC) and Beyond](https://araffin.github.io/post/rl103/) - [Lilian Weng's blog](https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html) - [Berkeley's Deep RL Bootcamp](https://sites.google.com/view/deep-rl-bootcamp/lectures) -- [Berkeley's Deep Reinforcement Learning course](http://rail.eecs.berkeley.edu/deeprlcourse/) +- [Berkeley's Deep Reinforcement Learning course](https://rail.eecs.berkeley.edu/deeprlcourse/) - [Decisions & Dragons - FAQ for RL foundations](https://www.decisionsanddragons.com) - [More resources](https://github.com/dennybritz/reinforcement-learning) diff --git a/docs/guide/sb3_contrib.md b/docs/guide/sb3_contrib.md index bac53bf481..4f866ab215 100644 --- a/docs/guide/sb3_contrib.md +++ b/docs/guide/sb3_contrib.md @@ -35,7 +35,7 @@ See documentation for the full list of included features. - [Augmented Random Search (ARS)](https://arxiv.org/abs/1803.07055) - [Quantile Regression DQN (QR-DQN)] - [PPO with invalid action masking (Maskable PPO)](https://arxiv.org/abs/2006.14171) -- [PPO with recurrent policy (RecurrentPPO aka PPO LSTM)](https://ppo-details.cleanrl.dev//2021/11/05/ppo-implementation-details/) +- [PPO with recurrent policy (RecurrentPPO aka PPO LSTM)](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) - [Truncated Quantile Critics (TQC)] - [Trust Region Policy Optimization (TRPO)](https://arxiv.org/abs/1502.05477) - [Batch Normalization in Deep Reinforcement Learning (CrossQ)](https://openreview.net/forum?id=PczQtTsTIX) diff --git a/docs/guide/tensorboard.md b/docs/guide/tensorboard.md index 5c90464500..66499daca5 100644 --- a/docs/guide/tensorboard.md +++ b/docs/guide/tensorboard.md @@ -124,7 +124,7 @@ class ImageRecorderCallback(BaseCallback): image = self.training_env.render(mode="rgb_array") # "HWC" specify the dataformat of the image, here channel last # (H for height, W for width, C for channel) - # See https://pytorch.org/docs/stable/tensorboard.html + # See https://docs.pytorch.org/docs/stable/tensorboard.html # for supported formats self.logger.record("trajectory/image", Image(image, "HWC"), exclude=("stdout", "log", "json", "csv")) return True @@ -223,7 +223,7 @@ class VideoRecorderCallback(BaseCallback): """ # We expect `render()` to return a uint8 array with values in [0, 255] or a float array # with values in [0, 1], as described in - # https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_video + # https://docs.pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_video screen = self._eval_env.render(mode="rgb_array") # PyTorch uses CxHxW vs HxWxC gym (and tensorflow) image convention screens.append(screen.transpose(2, 0, 1)) @@ -297,7 +297,7 @@ model.learn(total_timesteps=int(5e4), callback=HParamCallback()) ## Directly Accessing The Summary Writer -If you would like to log arbitrary data (in one of the formats supported by [pytorch](https://pytorch.org/docs/stable/tensorboard.html)), you +If you would like to log arbitrary data (in one of the formats supported by [PyTorch](https://docs.pytorch.org/docs/stable/tensorboard.html)), you can get direct access to the underlying SummaryWriter in a callback: :::{warning} @@ -306,7 +306,7 @@ This is method is not recommended and should only be used by advanced users. :::{note} If you want a concrete example, you can watch [how to log lap time with donkeycar env](https://www.youtube.com/watch?v=v8j2bpcE4Rg&t=4619s), -or read the code in the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/feat/gym-donkeycar/rl_zoo3/callbacks.py#L251-L270). +or read the code in the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/eb5d9c7770abe9a60f5511193ebcb260dfdc2706/rl_zoo3/callbacks.py#L262). You might also want to take a look at [issue #1160](https://github.com/DLR-RM/stable-baselines3/issues/1160) and [issue #1219](https://github.com/DLR-RM/stable-baselines3/issues/1219). ::: diff --git a/docs/index.rst b/docs/index.rst index 6f1a919edc..60a3dc4bee 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -113,7 +113,7 @@ To cite this project in publications: volume = {22}, number = {268}, pages = {1-8}, - url = {http://jmlr.org/papers/v22/20-1364.html} + url = {https://jmlr.org/papers/v22/20-1364.html} } Note: If you need to refer to a specific version of SB3, you can also use the `Zenodo DOI `_. diff --git a/docs/make.bat b/docs/make.bat index 22b5fff4ee..3adf0a243b 100644 --- a/docs/make.bat +++ b/docs/make.bat @@ -22,7 +22,7 @@ if errorlevel 9009 ( echo.may add the Sphinx directory to PATH. echo. echo.If you don't have Sphinx installed, grab it from - echo.http://sphinx-doc.org/ + echo.https://sphinx-doc.org/ exit /b 1 ) diff --git a/docs/misc/changelog.md b/docs/misc/changelog.md index d58c42090f..971a004a20 100644 --- a/docs/misc/changelog.md +++ b/docs/misc/changelog.md @@ -29,6 +29,7 @@ ### Documentation: - Added example for using torch.compile +- Fixed many broken links and updated links to https whenever possible ## Release 2.8.0 (2026-04-01) @@ -684,7 +685,7 @@ We highly recommended you to upgrade to Python >= 3.8. :::{warning} Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). -You can find a migration guide here: . +You can find a migration guide here: . If you want to try the SB3 v2.0 alpha version, you can take a look at [PR #1327](https://github.com/DLR-RM/stable-baselines3/pull/1327). ::: @@ -1872,7 +1873,7 @@ And all the contributors: [antonin raffin]: https://araffin.github.io/ [ashley hill]: https://github.com/hill-a [maximilian ernestus]: https://github.com/ernestum -[quentin gallouédec]: https://gallouedec.com/ +[quentin gallouédec]: https://github.com/qgallouedec [rl zoo]: https://github.com/DLR-RM/rl-baselines3-zoo [sb3-contrib]: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib [sbx]: https://github.com/araffin/sbx diff --git a/docs/misc/projects.md b/docs/misc/projects.md index f3569e367a..56fcc1d320 100644 --- a/docs/misc/projects.md +++ b/docs/misc/projects.md @@ -13,7 +13,7 @@ Authors: Parth Kothari, Christian Perone, Luca Bergamini, Alexandre Alahi, Peter Github: - + Paper: @@ -100,7 +100,7 @@ Author: Jacopo Panerati Github: - + Paper: @@ -118,11 +118,8 @@ Author: Justin Terry GitHub: - + -Tutorial on multi-agent support in stable baselines: - - ## Rocket League Gym @@ -140,9 +137,6 @@ GitHub: -Website: - - ## gym-electric-motor @@ -193,11 +187,11 @@ Authors: Junyeob Baek GitHub: - + Demo: -[link](https://github.com/CUN-bjy/policy-distillation-baselines/issues/3#issuecomment-817730173) +[link](https://github.com/dion-jy/policy-distillation-baselines/issues/3#issuecomment-817730173) ## highway-env @@ -211,11 +205,11 @@ Author: GitHub: - + Examples: -[Colab Links](https://github.com/eleurent/highway-env/tree/master/scripts#using-stable-baselines3) +[Colab Links](https://github.com/Farama-Foundation/HighwayEnv/tree/master/scripts#using-stable-baselines3) ## tactile-gym @@ -231,9 +225,6 @@ Paper: -Website: - -[tactile-gym website](https://sites.google.com/my.bristol.ac.uk/tactile-gym-sim2real/home) ## RLeXplore @@ -247,7 +238,7 @@ Author: Mingqi Yuan GitHub: - + ## UAV_Navigation_DRL_AirSim diff --git a/docs/modules/a2c.md b/docs/modules/a2c.md index de16aab87d..784022afb0 100644 --- a/docs/modules/a2c.md +++ b/docs/modules/a2c.md @@ -20,7 +20,7 @@ Read more [here](https://github.com/DLR-RM/stable-baselines3/pull/110#issuecomme ## Notes - Original paper: -- OpenAI blog post: +- OpenAI blog post: ## Can I use? @@ -82,7 +82,7 @@ For more information, see [Vectorized Environments](../guide/vec_envs.md), [Issu ::: :::{note} -Using gSDE (Generalized State-Dependent Exploration) during inference (see [PR #1767](https://github.com/DLR-RM/stable-baselines3/pull/1767)): +Using gSDE (Generalized State-Dependent Exploration) during inference (see [issue #1767](https://github.com/DLR-RM/stable-baselines3/issues/1767)): When using A2C models trained with `use_sde=True`, the automatic noise resetting that occurs during training (controlled by `sde_sample_freq`) does not happen when using `model.predict()` for inference. This results in deterministic behavior even when `deterministic=False`. diff --git a/docs/modules/ddpg.md b/docs/modules/ddpg.md index a0d16434a7..a52520c7d5 100644 --- a/docs/modules/ddpg.md +++ b/docs/modules/ddpg.md @@ -31,7 +31,7 @@ they share the same policies and same implementation. ## Notes -- Deterministic Policy Gradient: +- Deterministic Policy Gradient: - DDPG Paper: - OpenAI Spinning Guide for DDPG: diff --git a/docs/modules/dqn.md b/docs/modules/dqn.md index 2befcbb3ec..e11eb15796 100644 --- a/docs/modules/dqn.md +++ b/docs/modules/dqn.md @@ -7,7 +7,7 @@ # DQN -[Deep Q Network (DQN)](https://arxiv.org/abs/1312.5602) builds on [Fitted Q-Iteration (FQI)](http://ml.informatik.uni-freiburg.de/former/_media/publications/rieecml05.pdf) +[Deep Q Network (DQN)](https://arxiv.org/abs/1312.5602) builds on [Fitted Q-Iteration (FQI)](https://www.jmlr.org/papers/volume6/ernst05a/ernst05a.pdf) and make use of different tricks to stabilize the learning with neural networks: it uses a replay buffer, a target network and gradient clipping. ```{eval-rst} diff --git a/docs/modules/ppo.md b/docs/modules/ppo.md index f9e6e68d4e..87903932ea 100644 --- a/docs/modules/ppo.md +++ b/docs/modules/ppo.md @@ -96,7 +96,7 @@ For more information, see [Vectorized Environments](../guide/vec_envs.md), [Issu ::: :::{note} -Using gSDE (Generalized State-Dependent Exploration) during inference (see [PR #1767](https://github.com/DLR-RM/stable-baselines3/pull/1767)): +Using gSDE (Generalized State-Dependent Exploration) during inference (see [issue #1767](https://github.com/DLR-RM/stable-baselines3/issues/1767)): When using PPO models trained with `use_sde=True`, the automatic noise resetting that occurs during training (controlled by `sde_sample_freq`) does not happen when using `model.predict()` for inference. This results in deterministic behavior even when `deterministic=False`. diff --git a/docs/modules/sac.md b/docs/modules/sac.md index 097fc95b8e..c18c9c56aa 100644 --- a/docs/modules/sac.md +++ b/docs/modules/sac.md @@ -91,7 +91,7 @@ while True: ``` :::{note} -Using gSDE (Generalized State-Dependent Exploration) during inference (see [PR #1767](https://github.com/DLR-RM/stable-baselines3/pull/1767)): +Using gSDE (Generalized State-Dependent Exploration) during inference (see [PR #1767](https://github.com/DLR-RM/stable-baselines3/issues/1767)): When using SAC models trained with `use_sde=True`, the automatic noise resetting that occurs during training (controlled by `sde_sample_freq`) does not happen when using `model.predict()` for inference. This results in deterministic behavior even when `deterministic=False`. diff --git a/stable_baselines3/common/noise.py b/stable_baselines3/common/noise.py index 550dbb4255..9b203784d1 100644 --- a/stable_baselines3/common/noise.py +++ b/stable_baselines3/common/noise.py @@ -51,7 +51,7 @@ class OrnsteinUhlenbeckActionNoise(ActionNoise): """ An Ornstein Uhlenbeck action noise, this is designed to approximate Brownian motion with friction. - Based on http://math.stackexchange.com/questions/1287634/implementing-ornstein-uhlenbeck-in-matlab + Based on https://math.stackexchange.com/questions/1287634/implementing-ornstein-uhlenbeck-in-matlab :param mean: Mean of the noise :param sigma: Scale of the noise diff --git a/stable_baselines3/common/sb2_compat/rmsprop_tf_like.py b/stable_baselines3/common/sb2_compat/rmsprop_tf_like.py index 62b4dc4d32..73d4b4260c 100644 --- a/stable_baselines3/common/sb2_compat/rmsprop_tf_like.py +++ b/stable_baselines3/common/sb2_compat/rmsprop_tf_like.py @@ -20,7 +20,7 @@ class RMSpropTFLike(Optimizer): - Initialize squared gradient to ones rather than zeros Proposed by G. Hinton in his - `course `_. + `course `_. The centered version first appears in `Generating Sequences With Recurrent Neural Networks `_. diff --git a/stable_baselines3/ddpg/ddpg.py b/stable_baselines3/ddpg/ddpg.py index 0d76f62654..b5e449dbf6 100644 --- a/stable_baselines3/ddpg/ddpg.py +++ b/stable_baselines3/ddpg/ddpg.py @@ -15,7 +15,7 @@ class DDPG(TD3): """ Deep Deterministic Policy Gradient (DDPG). - Deterministic Policy Gradient: http://proceedings.mlr.press/v32/silver14.pdf + Deterministic Policy Gradient: https://proceedings.mlr.press/v32/silver14.pdf DDPG Paper: https://arxiv.org/abs/1509.02971 Introduction to DDPG: https://spinningup.openai.com/en/latest/algorithms/ddpg.html