Skip to content

Commit cda81cc

Browse files
Deploy PR #23 preview
1 parent 0c92759 commit cda81cc

15 files changed

Lines changed: 125 additions & 72 deletions

pr-23/_sources/algorithms.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ Hyperparameter Tuning
167167
General Guidelines
168168
~~~~~~~~~~~~~~~~~~
169169

170-
1. **Start with defaults**: See ``twisterl/defaults.py`` for sensible default parameters
170+
1. **Start with defaults**: See ``src/twisterl/defaults.py`` for sensible default parameters
171171
2. **Adjust learning rate first**: This usually has the biggest impact
172172
3. **Monitor training curves**: Use TensorBoard to track progress (logs saved to ``runs/`` by default)
173173

pr-23/_sources/api/environments.rst.txt

Lines changed: 33 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -66,12 +66,19 @@ The ``PyEnv`` class wraps Python environments for use with TwisteRL's Rust train
6666
}
6767
}
6868
69-
The Python environment class should implement:
70-
71-
- ``reset() -> observation``
72-
- ``step(action) -> (observation, reward, done, info)``
73-
- ``obs_shape() -> list[int]``
74-
- ``num_actions() -> int``
69+
The Python environment class must implement:
70+
71+
- ``reset(difficulty: int)``: Reset the environment to initial state with given difficulty
72+
- ``next(action: int)``: Execute an action (advances the environment state)
73+
- ``observe() -> list[int]``: Return the current observation
74+
- ``obs_shape() -> list[int]``: Return observation dimensions
75+
- ``num_actions() -> int``: Return number of valid actions
76+
- ``is_final() -> bool``: Return True if current state is terminal
77+
- ``success() -> bool``: Return True if the goal was achieved
78+
- ``value() -> float``: Return the reward value for current state
79+
- ``masks() -> list[bool]``: Return action mask (True if action is valid)
80+
- ``set_state(state: list[int])``: Set environment to specific state
81+
- ``copy()``: Return a copy of the environment (for parallel collection)
7582
- ``twists() -> (obs_perms, act_perms)`` (optional, for symmetry-aware training)
7683

7784
Creating Custom Environments
@@ -87,25 +94,31 @@ For best performance, implement environments in Rust. See the ``examples/grid_wo
8794

8895
See :doc:`../examples` for detailed instructions.
8996

90-
Environment Interface
91-
---------------------
97+
Environment Interface (Rust Trait)
98+
-----------------------------------
99+
100+
Rust environments implement the ``twisterl::rl::env::Env`` trait. The required methods are:
101+
102+
- ``num_actions() -> usize``: Return number of possible actions
103+
- ``obs_shape() -> Vec<usize>``: Return observation dimensions
104+
- ``set_state(state: Vec<i64>)``: Set environment to a specific state
105+
- ``reset()``: Reset to a random initial state
106+
- ``step(action: usize)``: Execute an action (evolve the state)
107+
- ``is_final() -> bool``: Return True if current state is terminal
108+
- ``success() -> bool``: Return True if the goal was achieved
109+
- ``reward() -> f32``: Return the reward value for current state
110+
- ``observe() -> Vec<usize>``: Return current state as sparse observation
92111

93-
All environments must provide these methods (called from Rust or Python):
112+
Optional methods with default implementations:
94113

95-
- ``reset()``: Reset to initial state, return observation
96-
- ``step(action)``: Take action, return (obs, reward, done, info)
97-
- ``obs_shape()``: Return observation dimensions
98-
- ``num_actions()``: Return number of valid actions
99-
- ``is_final()``: Return True if current state is terminal
100-
- ``success()``: Return True if the goal was achieved (episode ended successfully)
101-
- ``reward()``: Return the reward value for the current state
102-
- ``twists()``: Return permutation symmetries (optional)
103-
- ``set_state(state)``: Set environment to specific state (for inference)
104-
- ``difficulty``: Property to get/set difficulty level
114+
- ``set_difficulty(difficulty: usize)``: Set difficulty level (default: no-op)
115+
- ``get_difficulty() -> usize``: Get current difficulty (default: 1)
116+
- ``masks() -> Vec<bool>``: Return action mask (default: all True)
117+
- ``twists() -> (Vec<Vec<usize>>, Vec<Vec<usize>>)``: Return permutation symmetries (default: empty)
105118

106119
Permutation Symmetries (Twists)
107120
-------------------------------
108121

109122
TwisteRL supports symmetry-aware training through "twists" - permutations of observations and actions that represent equivalent states.
110123

111-
See ``docs/twists.md`` for detailed documentation on implementing twists in your environments.
124+
See `twists.md <https://github.com/AI4quantum/twisteRL/blob/main/docs/twists.md>`_ for detailed documentation on implementing twists in your environments.

pr-23/_sources/docs-guide.rst.txt

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -368,8 +368,6 @@ Troubleshooting Deployment
368368
- Click "Run workflow"
369369
- Choose the branch and click "Run workflow"
370370

371-
The documentation will be available at http://localhost:8000
372-
373371
Custom Domain (Optional)
374372
------------------------
375373

pr-23/_sources/examples.rst.txt

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -76,8 +76,8 @@ TwisteRL supports custom environments implemented in Rust. The ``examples/grid_w
7676
crate-type = ["cdylib"]
7777
7878
[dependencies]
79-
pyo3 = { version = "0.20", features = ["extension-module"] }
80-
twisterl = { path = "path/to/twisterl/rust", features = ["python_bindings"] }
79+
pyo3 = { version = "0.24", features = ["extension-module"] }
80+
twisterl = { path = "../../rust", features = ["python_bindings"] }
8181
8282
3. **Implement the environment** by implementing the ``twisterl::rl::env::Env`` trait.
8383

@@ -86,20 +86,27 @@ TwisteRL supports custom environments implemented in Rust. The ``examples/grid_w
8686
.. code-block:: rust
8787
8888
use pyo3::prelude::*;
89+
use twisterl::rl::env::Env;
8990
use twisterl::python_interface::env::PyBaseEnv;
9091
91-
#[pyclass(name = "MyEnv", extends = PyBaseEnv)]
92+
#[pyclass(name="MyEnv", extends=PyBaseEnv)]
9293
struct PyMyEnv;
9394
9495
#[pymethods]
9596
impl PyMyEnv {
9697
#[new]
97-
fn new(...) -> (Self, PyBaseEnv) {
98-
let env = MyEnv::new(...);
98+
fn new(/* your params */) -> (Self, PyBaseEnv) {
99+
let env = MyEnv::new(/* ... */);
99100
(PyMyEnv, PyBaseEnv { env: Box::new(env) })
100101
}
101102
}
102103
104+
#[pymodule]
105+
fn my_env(_py: Python<'_>, m: &Bound<'_, PyModule>) -> PyResult<()> {
106+
m.add_class::<PyMyEnv>()?;
107+
Ok(())
108+
}
109+
103110
5. **Build and install** the module:
104111

105112
.. code-block:: bash
@@ -124,7 +131,9 @@ TwisteRL also supports Python environments through the ``PyEnv`` wrapper:
124131
}
125132
}
126133
127-
Note that Python environments may be slower than native Rust environments.
134+
Your Python environment class must implement the required interface (``reset``, ``next``, ``observe``, etc.). See :doc:`api/environments` for the complete list of required methods.
135+
136+
Note that Python environments may be slower than native Rust environments due to the Python-Rust interop overhead.
128137

129138
Use Cases
130139
---------

pr-23/_sources/index.rst.txt

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,17 +45,19 @@ This example trains a model to play the popular "8 puzzle" where numbers have to
4545

4646
This model can be trained on a single CPU in under 1 minute (no GPU required!).
4747

48-
🏗️ Current State (PoC)
49-
-----------------------
48+
Current State (Proof of Concept)
49+
---------------------------------
5050

51-
- Hybrid rust-python implementation:
51+
- Hybrid Rust-Python implementation:
5252
- Data collection and inference in Rust
5353
- Training in Python (PyTorch)
5454
- Supported algorithms:
5555
- PPO (Proximal Policy Optimization)
5656
- AlphaZero
5757
- Focus on discrete observation and action spaces
58-
- Support for native Rust environments and for Python environments through a wrapper
58+
- Support for native Rust environments and Python environments through a wrapper
59+
60+
**Repository:** `GitHub <https://github.com/AI4quantum/twisteRL>`_
5961

6062
Getting Started
6163
---------------

pr-23/_sources/quickstart.rst.txt

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,12 @@ The training configuration is specified in JSON format. Here's an example based
5656
"num_epochs": 10,
5757
"vf_coef": 0.8,
5858
"ent_coef": 0.01,
59-
"clip_ratio": 0.1
59+
"clip_ratio": 0.1,
60+
"normalize_advantage": true
61+
},
62+
"learning": {
63+
"diff_threshold": 0.85,
64+
"diff_max": 32
6065
},
6166
"optimizer": {
6267
"lr": 0.00015

pr-23/_sources/twists.md.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ lightweight form of regularization because the agent sees equivalent states unde
1111
- Every environment implements the `twisterl::rl::env::Env` trait. The trait includes a `twists`
1212
method that returns `(Vec<Vec<usize>>, Vec<Vec<usize>>)` representing valid permutations on the
1313
flattened observation array and matching permutations on the discrete action space
14-
(`rust/src/rl/env.rs:33`).
14+
(`rust/src/rl/env.rs:59`).
1515
- When an environment is instantiated from Python via `prepare_algorithm`, twisteRL immediately calls
1616
`env.twists()` and forwards the returned permutations to the policy constructor
17-
(`src/twisterl/utils.py:126`). The policy can then symmetrize logits, average values, or augment
17+
(`src/twisterl/utils.py:194`). The policy can then symmetrize logits, average values, or augment
1818
rollouts without extra environment queries.
1919

2020
## Data Contract

pr-23/algorithms.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -292,7 +292,7 @@ <h2>Hyperparameter Tuning<a class="headerlink" href="#hyperparameter-tuning" tit
292292
<section id="general-guidelines">
293293
<h3>General Guidelines<a class="headerlink" href="#general-guidelines" title="Link to this heading"></a></h3>
294294
<ol class="arabic simple">
295-
<li><p><strong>Start with defaults</strong>: See <code class="docutils literal notranslate"><span class="pre">twisterl/defaults.py</span></code> for sensible default parameters</p></li>
295+
<li><p><strong>Start with defaults</strong>: See <code class="docutils literal notranslate"><span class="pre">src/twisterl/defaults.py</span></code> for sensible default parameters</p></li>
296296
<li><p><strong>Adjust learning rate first</strong>: This usually has the biggest impact</p></li>
297297
<li><p><strong>Monitor training curves</strong>: Use TensorBoard to track progress (logs saved to <code class="docutils literal notranslate"><span class="pre">runs/</span></code> by default)</p></li>
298298
</ol>

pr-23/api/environments.html

Lines changed: 33 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@
6262
</li>
6363
<li class="toctree-l2"><a class="reference internal" href="#python-environment-wrapper">Python Environment Wrapper</a></li>
6464
<li class="toctree-l2"><a class="reference internal" href="#creating-custom-environments">Creating Custom Environments</a></li>
65-
<li class="toctree-l2"><a class="reference internal" href="#environment-interface">Environment Interface</a></li>
65+
<li class="toctree-l2"><a class="reference internal" href="#environment-interface-rust-trait">Environment Interface (Rust Trait)</a></li>
6666
<li class="toctree-l2"><a class="reference internal" href="#permutation-symmetries-twists">Permutation Symmetries (Twists)</a></li>
6767
</ul>
6868
</li>
@@ -154,12 +154,19 @@ <h2>Python Environment Wrapper<a class="headerlink" href="#python-environment-wr
154154
<span class="p">}</span>
155155
</pre></div>
156156
</div>
157-
<p>The Python environment class should implement:</p>
157+
<p>The Python environment class must implement:</p>
158158
<ul class="simple">
159-
<li><p><code class="docutils literal notranslate"><span class="pre">reset()</span> <span class="pre">-&gt;</span> <span class="pre">observation</span></code></p></li>
160-
<li><p><code class="docutils literal notranslate"><span class="pre">step(action)</span> <span class="pre">-&gt;</span> <span class="pre">(observation,</span> <span class="pre">reward,</span> <span class="pre">done,</span> <span class="pre">info)</span></code></p></li>
161-
<li><p><code class="docutils literal notranslate"><span class="pre">obs_shape()</span> <span class="pre">-&gt;</span> <span class="pre">list[int]</span></code></p></li>
162-
<li><p><code class="docutils literal notranslate"><span class="pre">num_actions()</span> <span class="pre">-&gt;</span> <span class="pre">int</span></code></p></li>
159+
<li><p><code class="docutils literal notranslate"><span class="pre">reset(difficulty:</span> <span class="pre">int)</span></code>: Reset the environment to initial state with given difficulty</p></li>
160+
<li><p><code class="docutils literal notranslate"><span class="pre">next(action:</span> <span class="pre">int)</span></code>: Execute an action (advances the environment state)</p></li>
161+
<li><p><code class="docutils literal notranslate"><span class="pre">observe()</span> <span class="pre">-&gt;</span> <span class="pre">list[int]</span></code>: Return the current observation</p></li>
162+
<li><p><code class="docutils literal notranslate"><span class="pre">obs_shape()</span> <span class="pre">-&gt;</span> <span class="pre">list[int]</span></code>: Return observation dimensions</p></li>
163+
<li><p><code class="docutils literal notranslate"><span class="pre">num_actions()</span> <span class="pre">-&gt;</span> <span class="pre">int</span></code>: Return number of valid actions</p></li>
164+
<li><p><code class="docutils literal notranslate"><span class="pre">is_final()</span> <span class="pre">-&gt;</span> <span class="pre">bool</span></code>: Return True if current state is terminal</p></li>
165+
<li><p><code class="docutils literal notranslate"><span class="pre">success()</span> <span class="pre">-&gt;</span> <span class="pre">bool</span></code>: Return True if the goal was achieved</p></li>
166+
<li><p><code class="docutils literal notranslate"><span class="pre">value()</span> <span class="pre">-&gt;</span> <span class="pre">float</span></code>: Return the reward value for current state</p></li>
167+
<li><p><code class="docutils literal notranslate"><span class="pre">masks()</span> <span class="pre">-&gt;</span> <span class="pre">list[bool]</span></code>: Return action mask (True if action is valid)</p></li>
168+
<li><p><code class="docutils literal notranslate"><span class="pre">set_state(state:</span> <span class="pre">list[int])</span></code>: Set environment to specific state</p></li>
169+
<li><p><code class="docutils literal notranslate"><span class="pre">copy()</span></code>: Return a copy of the environment (for parallel collection)</p></li>
163170
<li><p><code class="docutils literal notranslate"><span class="pre">twists()</span> <span class="pre">-&gt;</span> <span class="pre">(obs_perms,</span> <span class="pre">act_perms)</span></code> (optional, for symmetry-aware training)</p></li>
164171
</ul>
165172
</section>
@@ -174,26 +181,32 @@ <h2>Creating Custom Environments<a class="headerlink" href="#creating-custom-env
174181
</ol>
175182
<p>See <a class="reference internal" href="../examples.html"><span class="doc">Examples</span></a> for detailed instructions.</p>
176183
</section>
177-
<section id="environment-interface">
178-
<h2>Environment Interface<a class="headerlink" href="#environment-interface" title="Link to this heading"></a></h2>
179-
<p>All environments must provide these methods (called from Rust or Python):</p>
184+
<section id="environment-interface-rust-trait">
185+
<h2>Environment Interface (Rust Trait)<a class="headerlink" href="#environment-interface-rust-trait" title="Link to this heading"></a></h2>
186+
<p>Rust environments implement the <code class="docutils literal notranslate"><span class="pre">twisterl::rl::env::Env</span></code> trait. The required methods are:</p>
180187
<ul class="simple">
181-
<li><p><code class="docutils literal notranslate"><span class="pre">reset()</span></code>: Reset to initial state, return observation</p></li>
182-
<li><p><code class="docutils literal notranslate"><span class="pre">step(action)</span></code>: Take action, return (obs, reward, done, info)</p></li>
183-
<li><p><code class="docutils literal notranslate"><span class="pre">obs_shape()</span></code>: Return observation dimensions</p></li>
184-
<li><p><code class="docutils literal notranslate"><span class="pre">num_actions()</span></code>: Return number of valid actions</p></li>
185-
<li><p><code class="docutils literal notranslate"><span class="pre">is_final()</span></code>: Return True if current state is terminal</p></li>
186-
<li><p><code class="docutils literal notranslate"><span class="pre">success()</span></code>: Return True if the goal was achieved (episode ended successfully)</p></li>
187-
<li><p><code class="docutils literal notranslate"><span class="pre">reward()</span></code>: Return the reward value for the current state</p></li>
188-
<li><p><code class="docutils literal notranslate"><span class="pre">twists()</span></code>: Return permutation symmetries (optional)</p></li>
189-
<li><p><code class="docutils literal notranslate"><span class="pre">set_state(state)</span></code>: Set environment to specific state (for inference)</p></li>
190-
<li><p><code class="docutils literal notranslate"><span class="pre">difficulty</span></code>: Property to get/set difficulty level</p></li>
188+
<li><p><code class="docutils literal notranslate"><span class="pre">num_actions()</span> <span class="pre">-&gt;</span> <span class="pre">usize</span></code>: Return number of possible actions</p></li>
189+
<li><p><code class="docutils literal notranslate"><span class="pre">obs_shape()</span> <span class="pre">-&gt;</span> <span class="pre">Vec&lt;usize&gt;</span></code>: Return observation dimensions</p></li>
190+
<li><p><code class="docutils literal notranslate"><span class="pre">set_state(state:</span> <span class="pre">Vec&lt;i64&gt;)</span></code>: Set environment to a specific state</p></li>
191+
<li><p><code class="docutils literal notranslate"><span class="pre">reset()</span></code>: Reset to a random initial state</p></li>
192+
<li><p><code class="docutils literal notranslate"><span class="pre">step(action:</span> <span class="pre">usize)</span></code>: Execute an action (evolve the state)</p></li>
193+
<li><p><code class="docutils literal notranslate"><span class="pre">is_final()</span> <span class="pre">-&gt;</span> <span class="pre">bool</span></code>: Return True if current state is terminal</p></li>
194+
<li><p><code class="docutils literal notranslate"><span class="pre">success()</span> <span class="pre">-&gt;</span> <span class="pre">bool</span></code>: Return True if the goal was achieved</p></li>
195+
<li><p><code class="docutils literal notranslate"><span class="pre">reward()</span> <span class="pre">-&gt;</span> <span class="pre">f32</span></code>: Return the reward value for current state</p></li>
196+
<li><p><code class="docutils literal notranslate"><span class="pre">observe()</span> <span class="pre">-&gt;</span> <span class="pre">Vec&lt;usize&gt;</span></code>: Return current state as sparse observation</p></li>
197+
</ul>
198+
<p>Optional methods with default implementations:</p>
199+
<ul class="simple">
200+
<li><p><code class="docutils literal notranslate"><span class="pre">set_difficulty(difficulty:</span> <span class="pre">usize)</span></code>: Set difficulty level (default: no-op)</p></li>
201+
<li><p><code class="docutils literal notranslate"><span class="pre">get_difficulty()</span> <span class="pre">-&gt;</span> <span class="pre">usize</span></code>: Get current difficulty (default: 1)</p></li>
202+
<li><p><code class="docutils literal notranslate"><span class="pre">masks()</span> <span class="pre">-&gt;</span> <span class="pre">Vec&lt;bool&gt;</span></code>: Return action mask (default: all True)</p></li>
203+
<li><p><code class="docutils literal notranslate"><span class="pre">twists()</span> <span class="pre">-&gt;</span> <span class="pre">(Vec&lt;Vec&lt;usize&gt;&gt;,</span> <span class="pre">Vec&lt;Vec&lt;usize&gt;&gt;)</span></code>: Return permutation symmetries (default: empty)</p></li>
191204
</ul>
192205
</section>
193206
<section id="permutation-symmetries-twists">
194207
<h2>Permutation Symmetries (Twists)<a class="headerlink" href="#permutation-symmetries-twists" title="Link to this heading"></a></h2>
195208
<p>TwisteRL supports symmetry-aware training through “twists” - permutations of observations and actions that represent equivalent states.</p>
196-
<p>See <code class="docutils literal notranslate"><span class="pre">docs/twists.md</span></code> for detailed documentation on implementing twists in your environments.</p>
209+
<p>See <a class="reference external" href="https://github.com/AI4quantum/twisteRL/blob/main/docs/twists.md">twists.md</a> for detailed documentation on implementing twists in your environments.</p>
197210
</section>
198211
</section>
199212

pr-23/docs-guide.html

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,6 @@ <h2>Troubleshooting Deployment<a class="headerlink" href="#troubleshooting-deplo
458458
- Select “Build and Deploy Documentation”
459459
- Click “Run workflow”
460460
- Choose the branch and click “Run workflow”</p>
461-
<p>The documentation will be available at <a class="reference external" href="http://localhost:8000">http://localhost:8000</a></p>
462461
</section>
463462
<section id="custom-domain-optional">
464463
<h2>Custom Domain (Optional)<a class="headerlink" href="#custom-domain-optional" title="Link to this heading"></a></h2>

0 commit comments

Comments
 (0)