Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CITATION.bib
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ @article{stable-baselines3
volume = {22},
number = {268},
pages = {1-8},
url = {http://jmlr.org/papers/v22/20-1364.html}
url = {https://jmlr.org/papers/v22/20-1364.html}
}
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ https://github.com/DLR-RM/stable-baselines3
Note: If you do not follow the template (and its mandatory steps), your pull request will be ignored.

If you are not familiar with creating a Pull Request, here are some guides:
- http://stackoverflow.com/questions/14680711/how-to-do-a-github-pull-request
- https://stackoverflow.com/questions/14680711/how-to-do-a-github-pull-request
- https://help.github.com/articles/creating-a-pull-request/


Expand Down
2 changes: 1 addition & 1 deletion NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Large portion of the code of Stable-Baselines3 (in `common/`) were ported from S
both licensed under the MIT License:

before the fork (June 2018):
Copyright (c) 2017 OpenAI (http://openai.com)
Copyright (c) 2017 OpenAI (https://openai.com)

after the fork (June 2018):
Copyright (c) 2018-2019 Stable-Baselines Team
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,15 +256,15 @@ To cite this repository in publications:
volume = {22},
number = {268},
pages = {1-8},
url = {http://jmlr.org/papers/v22/20-1364.html}
url = {https://jmlr.org/papers/v22/20-1364.html}
}
```

Note: If you need to refer to a specific version of SB3, you can also use the [Zenodo DOI](https://doi.org/10.5281/zenodo.8123988).

## Maintainers

Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave), [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli) and [Quentin Gallouédec](https://gallouedec.com/) (@qgallouedec).
Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave), [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli) and [Quentin Gallouédec](https://github.com/qgallouedec) (@qgallouedec).

**Important Note: We do not provide technical support, or consulting** and do not answer personal questions via email.
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/), or [Stack Overflow](https://stackoverflow.com/) in that case.
Expand All @@ -279,7 +279,7 @@ If you want to contribute, please read [**CONTRIBUTING.md**](./CONTRIBUTING.md)

The initial work to develop Stable Baselines3 was partially funded by the project *Reduced Complexity Models* from the *Helmholtz-Gemeinschaft Deutscher Forschungszentren*, and by the EU Horizon 2020 Research and Innovation Programme under grant number 951992 ([VeriDream](https://www.veridream.eu/)).

The original version, Stable Baselines, was created in the [robotics lab U2IS](http://u2is.ensta-paristech.fr/index.php?lang=en) ([INRIA Flowers](https://flowers.inria.fr/) team) at [ENSTA ParisTech](http://www.ensta-paristech.fr/en).
The original version, Stable Baselines, was created in the [robotics lab U2IS](http://u2is.ensta-paris.fr/?lang=en) ([INRIA Flowers](https://flowers.inria.fr/) team) at [ENSTA ParisTech](https://www.ensta.fr/en).


Logo credits: [L.M. Tenkes](https://www.instagram.com/lucillehue/)
2 changes: 1 addition & 1 deletion docs/_static/img/colab-badge.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/_static/img/colab.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#
# This file does only contain a selection of the most common options. For a
# full list see the documentation:
# http://www.sphinx-doc.org/en/master/config
# https://www.sphinx-doc.org/en/master/config

# -- Path setup --------------------------------------------------------------

Expand Down Expand Up @@ -223,6 +223,6 @@ def setup(app):
# Example configuration for intersphinx: refer to the Python standard library.
# intersphinx_mapping = {
# 'python': ('https://docs.python.org/3/', None),
# 'numpy': ('http://docs.scipy.org/doc/numpy/', None),
# 'torch': ('http://pytorch.org/docs/master/', None),
# 'numpy': ('https://docs.scipy.org/doc/numpy/', None),
# 'torch': ('https://pytorch.org/docs/master/', None),
# }
2 changes: 1 addition & 1 deletion docs/guide/checking_nan.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,4 +152,4 @@ As some datasets will sometimes fill missing values with NaNs as a surrogate val

Here is some reading material about finding NaNs: <https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html>

And filling the missing values with something else (imputation): <https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4>
And filling the missing values with something else (imputation): <https://towardsdatascience.com/missing-data-in-time-series-machine-learning-techniques-6b2273ff8b45/>
6 changes: 3 additions & 3 deletions docs/guide/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,7 @@ model = PPO.load(log_dir / "ppo_halfcheetah", env=vec_env)

## Hindsight Experience Replay (HER)

For this example, we use [Highway-Env](https://github.com/eleurent/highway-env) by [@eleurent](https://github.com/eleurent).
For this example, we use [Highway-Env](https://github.com/Farama-Foundation/HighwayEnv) by [@eleurent](https://github.com/eleurent).

```{image} ../_static/img/colab-badge.svg
:target: https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/stable_baselines_her.ipynb
Expand Down Expand Up @@ -639,7 +639,7 @@ Policies also offers a simple way to save/load weights as a NumPy vector, using
and `load_from_vector()` method.

Following example demonstrates reading parameters, modifying some of them and loading them to model
by implementing [evolution strategy (es)](http://blog.otoro.net/2017/10/29/visual-evolution-strategies/)
by implementing [evolution strategy (es)](https://blog.otoro.net/2017/10/29/visual-evolution-strategies/)
for solving the `CartPole-v1` environment. The initial guess for parameters is obtained by running
A2C policy gradient updates on the model.

Expand Down Expand Up @@ -722,7 +722,7 @@ Some massively parallel simulation environments such as [EnvPool](https://github
To use SB3 with these tools, you need to wrap the environment with tool-specific `VecEnvWrapper` that preprocesses the data for SB3,
you can find links to some of these wrappers in [issue #772](https://github.com/DLR-RM/stable-baselines3/issues/772#issuecomment-1048657002).

- Isaac Lab wrapper: [link](https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/sb3.py)
- Isaac Lab wrapper: [link](https://github.com/isaac-sim/IsaacLab/blob/3e73d6dd79080fd7632488c061052a6edd52e230/source/isaaclab_rl/isaaclab_rl/sb3.py#L93)
- Brax: [link](https://gist.github.com/araffin/a7a576ec1453e74d9bb93120918ef7e7)
- EnvPool: [link](https://github.com/sail-sg/envpool/blob/main/examples/sb3_examples/ppo.py)
- Getting SAC to Work on a Massive Parallel Simulator: <https://araffin.github.io/post/sac-massive-sim/>
Expand Down
2 changes: 1 addition & 1 deletion docs/guide/export.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,7 @@ motivation for the code example above).

The Coral chip is fast, with very low power consumption, but only has limited
on-device training abilities. More information is on the webpage here:
<https://coral.ai>.
<https://developers.google.com/coral>.

To deploy to a Coral, one must work via TFLite, and quantize the
network to reflect the Coral's capabilities. The full chain to go from
Expand Down
2 changes: 1 addition & 1 deletion docs/guide/imitation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ imitation learning algorithms on top of Stable-Baselines3, including:

You can install imitation with `pip install imitation`. The [imitation
documentation](https://imitation.readthedocs.io/en/latest/) has more details
on how to use the library, including [a quick start guide](https://imitation.readthedocs.io/en/latest/getting-started/first-steps.html)
on how to use the library, including [a quick start guide](https://imitation.readthedocs.io/en/latest/getting-started/first_steps.html)
for the impatient.
8 changes: 4 additions & 4 deletions docs/guide/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Stable-Baselines3 requires python 3.10+ and PyTorch >= 2.3

### Windows

We recommend using [Anaconda](https://conda.io/docs/user-guide/install/windows.html) for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.8 or above.
We recommend using [miniforge](https://github.com/conda-forge/miniforge#windows) for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.10 or above.

For a quick start you can move straight to installing Stable-Baselines3 in the next step.

Expand All @@ -22,7 +22,7 @@ issue with atari-py package. [See this discussion for more information](https://
To install Stable Baselines3 with pip, execute:

```bash
pip install stable-baselines3[extra]
pip install 'stable-baselines3[extra]'
```

:::{note}
Expand Down Expand Up @@ -58,7 +58,7 @@ To contribute to Stable-Baselines3, with support for running tests and building

```bash
git clone https://github.com/DLR-RM/stable-baselines3 && cd stable-baselines3
pip install -e .[docs,tests,extra]
pip install -e '.[docs,tests,extra]'
```

## Using Docker Images
Expand Down Expand Up @@ -101,7 +101,7 @@ Note: if you are using a proxy, you need to pass extra params during
build and do some [tweaks]:

```bash
--network=host --build-arg HTTP_PROXY=http://your.proxy.fr:8080/ --build-arg http_proxy=http://your.proxy.fr:8080/ --build-arg HTTPS_PROXY=https://your.proxy.fr:8080/ --build-arg https_proxy=https://your.proxy.fr:8080/
--network=host --build-arg HTTPS_PROXY=https://your.proxy.fr:8080/ --build-arg https_proxy=https://your.proxy.fr:8080/
```

### Run the images (CPU/GPU)
Expand Down
2 changes: 1 addition & 1 deletion docs/guide/rl.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ However, if you want to learn about RL, there are several good resources to get
- [RL103: From Deep Q-Learning (DQN) to Soft Actor-Critic (SAC) and Beyond](https://araffin.github.io/post/rl103/)
- [Lilian Weng's blog](https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html)
- [Berkeley's Deep RL Bootcamp](https://sites.google.com/view/deep-rl-bootcamp/lectures)
- [Berkeley's Deep Reinforcement Learning course](http://rail.eecs.berkeley.edu/deeprlcourse/)
- [Berkeley's Deep Reinforcement Learning course](https://rail.eecs.berkeley.edu/deeprlcourse/)
- [Decisions & Dragons - FAQ for RL foundations](https://www.decisionsanddragons.com)
- [More resources](https://github.com/dennybritz/reinforcement-learning)
2 changes: 1 addition & 1 deletion docs/guide/sb3_contrib.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ See documentation for the full list of included features.
- [Augmented Random Search (ARS)](https://arxiv.org/abs/1803.07055)
- [Quantile Regression DQN (QR-DQN)]
- [PPO with invalid action masking (Maskable PPO)](https://arxiv.org/abs/2006.14171)
- [PPO with recurrent policy (RecurrentPPO aka PPO LSTM)](https://ppo-details.cleanrl.dev//2021/11/05/ppo-implementation-details/)
- [PPO with recurrent policy (RecurrentPPO aka PPO LSTM)](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/)
- [Truncated Quantile Critics (TQC)]
- [Trust Region Policy Optimization (TRPO)](https://arxiv.org/abs/1502.05477)
- [Batch Normalization in Deep Reinforcement Learning (CrossQ)](https://openreview.net/forum?id=PczQtTsTIX)
Expand Down
8 changes: 4 additions & 4 deletions docs/guide/tensorboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ class ImageRecorderCallback(BaseCallback):
image = self.training_env.render(mode="rgb_array")
# "HWC" specify the dataformat of the image, here channel last
# (H for height, W for width, C for channel)
# See https://pytorch.org/docs/stable/tensorboard.html
# See https://docs.pytorch.org/docs/stable/tensorboard.html
# for supported formats
self.logger.record("trajectory/image", Image(image, "HWC"), exclude=("stdout", "log", "json", "csv"))
return True
Expand Down Expand Up @@ -223,7 +223,7 @@ class VideoRecorderCallback(BaseCallback):
"""
# We expect `render()` to return a uint8 array with values in [0, 255] or a float array
# with values in [0, 1], as described in
# https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_video
# https://docs.pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_video
screen = self._eval_env.render(mode="rgb_array")
# PyTorch uses CxHxW vs HxWxC gym (and tensorflow) image convention
screens.append(screen.transpose(2, 0, 1))
Expand Down Expand Up @@ -297,7 +297,7 @@ model.learn(total_timesteps=int(5e4), callback=HParamCallback())

## Directly Accessing The Summary Writer

If you would like to log arbitrary data (in one of the formats supported by [pytorch](https://pytorch.org/docs/stable/tensorboard.html)), you
If you would like to log arbitrary data (in one of the formats supported by [PyTorch](https://docs.pytorch.org/docs/stable/tensorboard.html)), you
can get direct access to the underlying SummaryWriter in a callback:

:::{warning}
Expand All @@ -306,7 +306,7 @@ This is method is not recommended and should only be used by advanced users.

:::{note}
If you want a concrete example, you can watch [how to log lap time with donkeycar env](https://www.youtube.com/watch?v=v8j2bpcE4Rg&t=4619s),
or read the code in the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/feat/gym-donkeycar/rl_zoo3/callbacks.py#L251-L270).
or read the code in the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/eb5d9c7770abe9a60f5511193ebcb260dfdc2706/rl_zoo3/callbacks.py#L262).
You might also want to take a look at [issue #1160](https://github.com/DLR-RM/stable-baselines3/issues/1160) and [issue #1219](https://github.com/DLR-RM/stable-baselines3/issues/1219).
:::

Expand Down
Loading