diff --git a/CITATION.bib b/CITATION.bib
index 1f24b4040c..0b60adbd49 100644
--- a/CITATION.bib
+++ b/CITATION.bib
@@ -6,5 +6,5 @@ @article{stable-baselines3
volume = {22},
number = {268},
pages = {1-8},
- url = {http://jmlr.org/papers/v22/20-1364.html}
+ url = {https://jmlr.org/papers/v22/20-1364.html}
}
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 54a05a828f..d0a36351aa 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -18,7 +18,7 @@ https://github.com/DLR-RM/stable-baselines3
Note: If you do not follow the template (and its mandatory steps), your pull request will be ignored.
If you are not familiar with creating a Pull Request, here are some guides:
-- http://stackoverflow.com/questions/14680711/how-to-do-a-github-pull-request
+- https://stackoverflow.com/questions/14680711/how-to-do-a-github-pull-request
- https://help.github.com/articles/creating-a-pull-request/
diff --git a/NOTICE b/NOTICE
index 6dbbda6480..5b480ad342 100644
--- a/NOTICE
+++ b/NOTICE
@@ -2,7 +2,7 @@ Large portion of the code of Stable-Baselines3 (in `common/`) were ported from S
both licensed under the MIT License:
before the fork (June 2018):
-Copyright (c) 2017 OpenAI (http://openai.com)
+Copyright (c) 2017 OpenAI (https://openai.com)
after the fork (June 2018):
Copyright (c) 2018-2019 Stable-Baselines Team
diff --git a/README.md b/README.md
index ccb5706104..1c16597757 100644
--- a/README.md
+++ b/README.md
@@ -256,7 +256,7 @@ To cite this repository in publications:
volume = {22},
number = {268},
pages = {1-8},
- url = {http://jmlr.org/papers/v22/20-1364.html}
+ url = {https://jmlr.org/papers/v22/20-1364.html}
}
```
@@ -264,7 +264,7 @@ Note: If you need to refer to a specific version of SB3, you can also use the [Z
## Maintainers
-Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave), [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli) and [Quentin Gallouédec](https://gallouedec.com/) (@qgallouedec).
+Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave), [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli) and [Quentin Gallouédec](https://github.com/qgallouedec) (@qgallouedec).
**Important Note: We do not provide technical support, or consulting** and do not answer personal questions via email.
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/), or [Stack Overflow](https://stackoverflow.com/) in that case.
@@ -279,7 +279,7 @@ If you want to contribute, please read [**CONTRIBUTING.md**](./CONTRIBUTING.md)
The initial work to develop Stable Baselines3 was partially funded by the project *Reduced Complexity Models* from the *Helmholtz-Gemeinschaft Deutscher Forschungszentren*, and by the EU Horizon 2020 Research and Innovation Programme under grant number 951992 ([VeriDream](https://www.veridream.eu/)).
-The original version, Stable Baselines, was created in the [robotics lab U2IS](http://u2is.ensta-paristech.fr/index.php?lang=en) ([INRIA Flowers](https://flowers.inria.fr/) team) at [ENSTA ParisTech](http://www.ensta-paristech.fr/en).
+The original version, Stable Baselines, was created in the [robotics lab U2IS](http://u2is.ensta-paris.fr/?lang=en) ([INRIA Flowers](https://flowers.inria.fr/) team) at [ENSTA ParisTech](https://www.ensta.fr/en).
Logo credits: [L.M. Tenkes](https://www.instagram.com/lucillehue/)
diff --git a/docs/_static/img/colab-badge.svg b/docs/_static/img/colab-badge.svg
index c08066ee33..98bb8a3786 100644
--- a/docs/_static/img/colab-badge.svg
+++ b/docs/_static/img/colab-badge.svg
@@ -1 +1 @@
-
+Open in ColabOpen in Colab
diff --git a/docs/_static/img/colab.svg b/docs/_static/img/colab.svg
index c2d30e973a..ce487f4407 100644
--- a/docs/_static/img/colab.svg
+++ b/docs/_static/img/colab.svg
@@ -1,7 +1,7 @@
-
+
-
\ No newline at end of file
+
diff --git a/docs/conf.py b/docs/conf.py
index 3db4240137..23062c2fea 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -3,7 +3,7 @@
#
# This file does only contain a selection of the most common options. For a
# full list see the documentation:
-# http://www.sphinx-doc.org/en/master/config
+# https://www.sphinx-doc.org/en/master/config
# -- Path setup --------------------------------------------------------------
@@ -223,6 +223,6 @@ def setup(app):
# Example configuration for intersphinx: refer to the Python standard library.
# intersphinx_mapping = {
# 'python': ('https://docs.python.org/3/', None),
-# 'numpy': ('http://docs.scipy.org/doc/numpy/', None),
-# 'torch': ('http://pytorch.org/docs/master/', None),
+# 'numpy': ('https://docs.scipy.org/doc/numpy/', None),
+# 'torch': ('https://pytorch.org/docs/master/', None),
# }
diff --git a/docs/guide/checking_nan.md b/docs/guide/checking_nan.md
index 4c0b3ad30c..4c6fbdb8b7 100644
--- a/docs/guide/checking_nan.md
+++ b/docs/guide/checking_nan.md
@@ -152,4 +152,4 @@ As some datasets will sometimes fill missing values with NaNs as a surrogate val
Here is some reading material about finding NaNs:
-And filling the missing values with something else (imputation):
+And filling the missing values with something else (imputation):
diff --git a/docs/guide/examples.md b/docs/guide/examples.md
index 4144d92c92..1958971a72 100644
--- a/docs/guide/examples.md
+++ b/docs/guide/examples.md
@@ -448,7 +448,7 @@ model = PPO.load(log_dir / "ppo_halfcheetah", env=vec_env)
## Hindsight Experience Replay (HER)
-For this example, we use [Highway-Env](https://github.com/eleurent/highway-env) by [@eleurent](https://github.com/eleurent).
+For this example, we use [Highway-Env](https://github.com/Farama-Foundation/HighwayEnv) by [@eleurent](https://github.com/eleurent).
```{image} ../_static/img/colab-badge.svg
:target: https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/stable_baselines_her.ipynb
@@ -639,7 +639,7 @@ Policies also offers a simple way to save/load weights as a NumPy vector, using
and `load_from_vector()` method.
Following example demonstrates reading parameters, modifying some of them and loading them to model
-by implementing [evolution strategy (es)](http://blog.otoro.net/2017/10/29/visual-evolution-strategies/)
+by implementing [evolution strategy (es)](https://blog.otoro.net/2017/10/29/visual-evolution-strategies/)
for solving the `CartPole-v1` environment. The initial guess for parameters is obtained by running
A2C policy gradient updates on the model.
@@ -722,7 +722,7 @@ Some massively parallel simulation environments such as [EnvPool](https://github
To use SB3 with these tools, you need to wrap the environment with tool-specific `VecEnvWrapper` that preprocesses the data for SB3,
you can find links to some of these wrappers in [issue #772](https://github.com/DLR-RM/stable-baselines3/issues/772#issuecomment-1048657002).
-- Isaac Lab wrapper: [link](https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/sb3.py)
+- Isaac Lab wrapper: [link](https://github.com/isaac-sim/IsaacLab/blob/3e73d6dd79080fd7632488c061052a6edd52e230/source/isaaclab_rl/isaaclab_rl/sb3.py#L93)
- Brax: [link](https://gist.github.com/araffin/a7a576ec1453e74d9bb93120918ef7e7)
- EnvPool: [link](https://github.com/sail-sg/envpool/blob/main/examples/sb3_examples/ppo.py)
- Getting SAC to Work on a Massive Parallel Simulator:
diff --git a/docs/guide/export.md b/docs/guide/export.md
index 2c4526fcd4..57386f2588 100644
--- a/docs/guide/export.md
+++ b/docs/guide/export.md
@@ -338,7 +338,7 @@ motivation for the code example above).
The Coral chip is fast, with very low power consumption, but only has limited
on-device training abilities. More information is on the webpage here:
-.
+.
To deploy to a Coral, one must work via TFLite, and quantize the
network to reflect the Coral's capabilities. The full chain to go from
diff --git a/docs/guide/imitation.md b/docs/guide/imitation.md
index d9aad6d818..ec4b65248a 100644
--- a/docs/guide/imitation.md
+++ b/docs/guide/imitation.md
@@ -13,5 +13,5 @@ imitation learning algorithms on top of Stable-Baselines3, including:
You can install imitation with `pip install imitation`. The [imitation
documentation](https://imitation.readthedocs.io/en/latest/) has more details
-on how to use the library, including [a quick start guide](https://imitation.readthedocs.io/en/latest/getting-started/first-steps.html)
+on how to use the library, including [a quick start guide](https://imitation.readthedocs.io/en/latest/getting-started/first_steps.html)
for the impatient.
diff --git a/docs/guide/install.md b/docs/guide/install.md
index 5574feace5..1ad9ed08fa 100644
--- a/docs/guide/install.md
+++ b/docs/guide/install.md
@@ -8,7 +8,7 @@ Stable-Baselines3 requires python 3.10+ and PyTorch >= 2.3
### Windows
-We recommend using [Anaconda](https://conda.io/docs/user-guide/install/windows.html) for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.8 or above.
+We recommend using [miniforge](https://github.com/conda-forge/miniforge#windows) for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.10 or above.
For a quick start you can move straight to installing Stable-Baselines3 in the next step.
@@ -22,7 +22,7 @@ issue with atari-py package. [See this discussion for more information](https://
To install Stable Baselines3 with pip, execute:
```bash
-pip install stable-baselines3[extra]
+pip install 'stable-baselines3[extra]'
```
:::{note}
@@ -58,7 +58,7 @@ To contribute to Stable-Baselines3, with support for running tests and building
```bash
git clone https://github.com/DLR-RM/stable-baselines3 && cd stable-baselines3
-pip install -e .[docs,tests,extra]
+pip install -e '.[docs,tests,extra]'
```
## Using Docker Images
@@ -101,7 +101,7 @@ Note: if you are using a proxy, you need to pass extra params during
build and do some [tweaks]:
```bash
---network=host --build-arg HTTP_PROXY=http://your.proxy.fr:8080/ --build-arg http_proxy=http://your.proxy.fr:8080/ --build-arg HTTPS_PROXY=https://your.proxy.fr:8080/ --build-arg https_proxy=https://your.proxy.fr:8080/
+--network=host --build-arg HTTPS_PROXY=https://your.proxy.fr:8080/ --build-arg https_proxy=https://your.proxy.fr:8080/
```
### Run the images (CPU/GPU)
diff --git a/docs/guide/rl.md b/docs/guide/rl.md
index 951dfaf02a..bade373f96 100644
--- a/docs/guide/rl.md
+++ b/docs/guide/rl.md
@@ -13,6 +13,6 @@ However, if you want to learn about RL, there are several good resources to get
- [RL103: From Deep Q-Learning (DQN) to Soft Actor-Critic (SAC) and Beyond](https://araffin.github.io/post/rl103/)
- [Lilian Weng's blog](https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html)
- [Berkeley's Deep RL Bootcamp](https://sites.google.com/view/deep-rl-bootcamp/lectures)
-- [Berkeley's Deep Reinforcement Learning course](http://rail.eecs.berkeley.edu/deeprlcourse/)
+- [Berkeley's Deep Reinforcement Learning course](https://rail.eecs.berkeley.edu/deeprlcourse/)
- [Decisions & Dragons - FAQ for RL foundations](https://www.decisionsanddragons.com)
- [More resources](https://github.com/dennybritz/reinforcement-learning)
diff --git a/docs/guide/sb3_contrib.md b/docs/guide/sb3_contrib.md
index bac53bf481..4f866ab215 100644
--- a/docs/guide/sb3_contrib.md
+++ b/docs/guide/sb3_contrib.md
@@ -35,7 +35,7 @@ See documentation for the full list of included features.
- [Augmented Random Search (ARS)](https://arxiv.org/abs/1803.07055)
- [Quantile Regression DQN (QR-DQN)]
- [PPO with invalid action masking (Maskable PPO)](https://arxiv.org/abs/2006.14171)
-- [PPO with recurrent policy (RecurrentPPO aka PPO LSTM)](https://ppo-details.cleanrl.dev//2021/11/05/ppo-implementation-details/)
+- [PPO with recurrent policy (RecurrentPPO aka PPO LSTM)](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/)
- [Truncated Quantile Critics (TQC)]
- [Trust Region Policy Optimization (TRPO)](https://arxiv.org/abs/1502.05477)
- [Batch Normalization in Deep Reinforcement Learning (CrossQ)](https://openreview.net/forum?id=PczQtTsTIX)
diff --git a/docs/guide/tensorboard.md b/docs/guide/tensorboard.md
index 5c90464500..66499daca5 100644
--- a/docs/guide/tensorboard.md
+++ b/docs/guide/tensorboard.md
@@ -124,7 +124,7 @@ class ImageRecorderCallback(BaseCallback):
image = self.training_env.render(mode="rgb_array")
# "HWC" specify the dataformat of the image, here channel last
# (H for height, W for width, C for channel)
- # See https://pytorch.org/docs/stable/tensorboard.html
+ # See https://docs.pytorch.org/docs/stable/tensorboard.html
# for supported formats
self.logger.record("trajectory/image", Image(image, "HWC"), exclude=("stdout", "log", "json", "csv"))
return True
@@ -223,7 +223,7 @@ class VideoRecorderCallback(BaseCallback):
"""
# We expect `render()` to return a uint8 array with values in [0, 255] or a float array
# with values in [0, 1], as described in
- # https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_video
+ # https://docs.pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_video
screen = self._eval_env.render(mode="rgb_array")
# PyTorch uses CxHxW vs HxWxC gym (and tensorflow) image convention
screens.append(screen.transpose(2, 0, 1))
@@ -297,7 +297,7 @@ model.learn(total_timesteps=int(5e4), callback=HParamCallback())
## Directly Accessing The Summary Writer
-If you would like to log arbitrary data (in one of the formats supported by [pytorch](https://pytorch.org/docs/stable/tensorboard.html)), you
+If you would like to log arbitrary data (in one of the formats supported by [PyTorch](https://docs.pytorch.org/docs/stable/tensorboard.html)), you
can get direct access to the underlying SummaryWriter in a callback:
:::{warning}
@@ -306,7 +306,7 @@ This is method is not recommended and should only be used by advanced users.
:::{note}
If you want a concrete example, you can watch [how to log lap time with donkeycar env](https://www.youtube.com/watch?v=v8j2bpcE4Rg&t=4619s),
-or read the code in the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/feat/gym-donkeycar/rl_zoo3/callbacks.py#L251-L270).
+or read the code in the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/eb5d9c7770abe9a60f5511193ebcb260dfdc2706/rl_zoo3/callbacks.py#L262).
You might also want to take a look at [issue #1160](https://github.com/DLR-RM/stable-baselines3/issues/1160) and [issue #1219](https://github.com/DLR-RM/stable-baselines3/issues/1219).
:::
diff --git a/docs/index.rst b/docs/index.rst
index 6f1a919edc..60a3dc4bee 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -113,7 +113,7 @@ To cite this project in publications:
volume = {22},
number = {268},
pages = {1-8},
- url = {http://jmlr.org/papers/v22/20-1364.html}
+ url = {https://jmlr.org/papers/v22/20-1364.html}
}
Note: If you need to refer to a specific version of SB3, you can also use the `Zenodo DOI `_.
diff --git a/docs/make.bat b/docs/make.bat
index 22b5fff4ee..3adf0a243b 100644
--- a/docs/make.bat
+++ b/docs/make.bat
@@ -22,7 +22,7 @@ if errorlevel 9009 (
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
- echo.http://sphinx-doc.org/
+ echo.https://sphinx-doc.org/
exit /b 1
)
diff --git a/docs/misc/changelog.md b/docs/misc/changelog.md
index d58c42090f..971a004a20 100644
--- a/docs/misc/changelog.md
+++ b/docs/misc/changelog.md
@@ -29,6 +29,7 @@
### Documentation:
- Added example for using torch.compile
+- Fixed many broken links and updated links to https whenever possible
## Release 2.8.0 (2026-04-01)
@@ -684,7 +685,7 @@ We highly recommended you to upgrade to Python >= 3.8.
:::{warning}
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
-You can find a migration guide here: .
+You can find a migration guide here: .
If you want to try the SB3 v2.0 alpha version, you can take a look at [PR #1327](https://github.com/DLR-RM/stable-baselines3/pull/1327).
:::
@@ -1872,7 +1873,7 @@ And all the contributors:
[antonin raffin]: https://araffin.github.io/
[ashley hill]: https://github.com/hill-a
[maximilian ernestus]: https://github.com/ernestum
-[quentin gallouédec]: https://gallouedec.com/
+[quentin gallouédec]: https://github.com/qgallouedec
[rl zoo]: https://github.com/DLR-RM/rl-baselines3-zoo
[sb3-contrib]: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
[sbx]: https://github.com/araffin/sbx
diff --git a/docs/misc/projects.md b/docs/misc/projects.md
index f3569e367a..56fcc1d320 100644
--- a/docs/misc/projects.md
+++ b/docs/misc/projects.md
@@ -13,7 +13,7 @@ Authors: Parth Kothari, Christian Perone, Luca Bergamini, Alexandre Alahi, Peter
Github:
-
+
Paper:
@@ -100,7 +100,7 @@ Author: Jacopo Panerati
Github:
-
+
Paper:
@@ -118,11 +118,8 @@ Author: Justin Terry
GitHub:
-
+
-Tutorial on multi-agent support in stable baselines:
-
-
## Rocket League Gym
@@ -140,9 +137,6 @@ GitHub:
-Website:
-
-
## gym-electric-motor
@@ -193,11 +187,11 @@ Authors: Junyeob Baek
GitHub:
-
+
Demo:
-[link](https://github.com/CUN-bjy/policy-distillation-baselines/issues/3#issuecomment-817730173)
+[link](https://github.com/dion-jy/policy-distillation-baselines/issues/3#issuecomment-817730173)
## highway-env
@@ -211,11 +205,11 @@ Author:
GitHub:
-
+
Examples:
-[Colab Links](https://github.com/eleurent/highway-env/tree/master/scripts#using-stable-baselines3)
+[Colab Links](https://github.com/Farama-Foundation/HighwayEnv/tree/master/scripts#using-stable-baselines3)
## tactile-gym
@@ -231,9 +225,6 @@ Paper:
-Website:
-
-[tactile-gym website](https://sites.google.com/my.bristol.ac.uk/tactile-gym-sim2real/home)
## RLeXplore
@@ -247,7 +238,7 @@ Author: Mingqi Yuan
GitHub:
-
+
## UAV_Navigation_DRL_AirSim
diff --git a/docs/modules/a2c.md b/docs/modules/a2c.md
index de16aab87d..784022afb0 100644
--- a/docs/modules/a2c.md
+++ b/docs/modules/a2c.md
@@ -20,7 +20,7 @@ Read more [here](https://github.com/DLR-RM/stable-baselines3/pull/110#issuecomme
## Notes
- Original paper:
-- OpenAI blog post:
+- OpenAI blog post:
## Can I use?
@@ -82,7 +82,7 @@ For more information, see [Vectorized Environments](../guide/vec_envs.md), [Issu
:::
:::{note}
-Using gSDE (Generalized State-Dependent Exploration) during inference (see [PR #1767](https://github.com/DLR-RM/stable-baselines3/pull/1767)):
+Using gSDE (Generalized State-Dependent Exploration) during inference (see [issue #1767](https://github.com/DLR-RM/stable-baselines3/issues/1767)):
When using A2C models trained with `use_sde=True`, the automatic noise resetting that occurs during training (controlled by `sde_sample_freq`) does not happen when using `model.predict()` for inference. This results in deterministic behavior even when `deterministic=False`.
diff --git a/docs/modules/ddpg.md b/docs/modules/ddpg.md
index a0d16434a7..a52520c7d5 100644
--- a/docs/modules/ddpg.md
+++ b/docs/modules/ddpg.md
@@ -31,7 +31,7 @@ they share the same policies and same implementation.
## Notes
-- Deterministic Policy Gradient:
+- Deterministic Policy Gradient:
- DDPG Paper:
- OpenAI Spinning Guide for DDPG:
diff --git a/docs/modules/dqn.md b/docs/modules/dqn.md
index 2befcbb3ec..e11eb15796 100644
--- a/docs/modules/dqn.md
+++ b/docs/modules/dqn.md
@@ -7,7 +7,7 @@
# DQN
-[Deep Q Network (DQN)](https://arxiv.org/abs/1312.5602) builds on [Fitted Q-Iteration (FQI)](http://ml.informatik.uni-freiburg.de/former/_media/publications/rieecml05.pdf)
+[Deep Q Network (DQN)](https://arxiv.org/abs/1312.5602) builds on [Fitted Q-Iteration (FQI)](https://www.jmlr.org/papers/volume6/ernst05a/ernst05a.pdf)
and make use of different tricks to stabilize the learning with neural networks: it uses a replay buffer, a target network and gradient clipping.
```{eval-rst}
diff --git a/docs/modules/ppo.md b/docs/modules/ppo.md
index f9e6e68d4e..87903932ea 100644
--- a/docs/modules/ppo.md
+++ b/docs/modules/ppo.md
@@ -96,7 +96,7 @@ For more information, see [Vectorized Environments](../guide/vec_envs.md), [Issu
:::
:::{note}
-Using gSDE (Generalized State-Dependent Exploration) during inference (see [PR #1767](https://github.com/DLR-RM/stable-baselines3/pull/1767)):
+Using gSDE (Generalized State-Dependent Exploration) during inference (see [issue #1767](https://github.com/DLR-RM/stable-baselines3/issues/1767)):
When using PPO models trained with `use_sde=True`, the automatic noise resetting that occurs during training (controlled by `sde_sample_freq`) does not happen when using `model.predict()` for inference. This results in deterministic behavior even when `deterministic=False`.
diff --git a/docs/modules/sac.md b/docs/modules/sac.md
index 097fc95b8e..c18c9c56aa 100644
--- a/docs/modules/sac.md
+++ b/docs/modules/sac.md
@@ -91,7 +91,7 @@ while True:
```
:::{note}
-Using gSDE (Generalized State-Dependent Exploration) during inference (see [PR #1767](https://github.com/DLR-RM/stable-baselines3/pull/1767)):
+Using gSDE (Generalized State-Dependent Exploration) during inference (see [PR #1767](https://github.com/DLR-RM/stable-baselines3/issues/1767)):
When using SAC models trained with `use_sde=True`, the automatic noise resetting that occurs during training (controlled by `sde_sample_freq`) does not happen when using `model.predict()` for inference. This results in deterministic behavior even when `deterministic=False`.
diff --git a/stable_baselines3/common/noise.py b/stable_baselines3/common/noise.py
index 550dbb4255..9b203784d1 100644
--- a/stable_baselines3/common/noise.py
+++ b/stable_baselines3/common/noise.py
@@ -51,7 +51,7 @@ class OrnsteinUhlenbeckActionNoise(ActionNoise):
"""
An Ornstein Uhlenbeck action noise, this is designed to approximate Brownian motion with friction.
- Based on http://math.stackexchange.com/questions/1287634/implementing-ornstein-uhlenbeck-in-matlab
+ Based on https://math.stackexchange.com/questions/1287634/implementing-ornstein-uhlenbeck-in-matlab
:param mean: Mean of the noise
:param sigma: Scale of the noise
diff --git a/stable_baselines3/common/sb2_compat/rmsprop_tf_like.py b/stable_baselines3/common/sb2_compat/rmsprop_tf_like.py
index 62b4dc4d32..73d4b4260c 100644
--- a/stable_baselines3/common/sb2_compat/rmsprop_tf_like.py
+++ b/stable_baselines3/common/sb2_compat/rmsprop_tf_like.py
@@ -20,7 +20,7 @@ class RMSpropTFLike(Optimizer):
- Initialize squared gradient to ones rather than zeros
Proposed by G. Hinton in his
- `course `_.
+ `course `_.
The centered version first appears in `Generating Sequences
With Recurrent Neural Networks `_.
diff --git a/stable_baselines3/ddpg/ddpg.py b/stable_baselines3/ddpg/ddpg.py
index 0d76f62654..b5e449dbf6 100644
--- a/stable_baselines3/ddpg/ddpg.py
+++ b/stable_baselines3/ddpg/ddpg.py
@@ -15,7 +15,7 @@ class DDPG(TD3):
"""
Deep Deterministic Policy Gradient (DDPG).
- Deterministic Policy Gradient: http://proceedings.mlr.press/v32/silver14.pdf
+ Deterministic Policy Gradient: https://proceedings.mlr.press/v32/silver14.pdf
DDPG Paper: https://arxiv.org/abs/1509.02971
Introduction to DDPG: https://spinningup.openai.com/en/latest/algorithms/ddpg.html