You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TwisteRL supports symmetry-aware training through "twists" - permutations of observations and actions that represent equivalent states.
110
123
111
-
See ``docs/twists.md`` for detailed documentation on implementing twists in your environments.
124
+
See `twists.md <https://github.com/AI4quantum/twisteRL/blob/main/docs/twists.md>`_ for detailed documentation on implementing twists in your environments.
@@ -124,7 +131,9 @@ TwisteRL also supports Python environments through the ``PyEnv`` wrapper:
124
131
}
125
132
}
126
133
127
-
Note that Python environments may be slower than native Rust environments.
134
+
Your Python environment class must implement the required interface (``reset``, ``next``, ``observe``, etc.). See :doc:`api/environments` for the complete list of required methods.
135
+
136
+
Note that Python environments may be slower than native Rust environments due to the Python-Rust interop overhead.
<h3>General Guidelines<aclass="headerlink" href="#general-guidelines" title="Link to this heading"></a></h3>
294
294
<olclass="arabic simple">
295
-
<li><p><strong>Start with defaults</strong>: See <codeclass="docutils literal notranslate"><spanclass="pre">twisterl/defaults.py</span></code> for sensible default parameters</p></li>
295
+
<li><p><strong>Start with defaults</strong>: See <codeclass="docutils literal notranslate"><spanclass="pre">src/twisterl/defaults.py</span></code> for sensible default parameters</p></li>
296
296
<li><p><strong>Adjust learning rate first</strong>: This usually has the biggest impact</p></li>
297
297
<li><p><strong>Monitor training curves</strong>: Use TensorBoard to track progress (logs saved to <codeclass="docutils literal notranslate"><spanclass="pre">runs/</span></code> by default)</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">reset(difficulty:</span><spanclass="pre">int)</span></code>: Reset the environment to initial state with given difficulty</p></li>
160
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">next(action:</span><spanclass="pre">int)</span></code>: Execute an action (advances the environment state)</p></li>
161
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">observe()</span><spanclass="pre">-></span><spanclass="pre">list[int]</span></code>: Return the current observation</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">num_actions()</span><spanclass="pre">-></span><spanclass="pre">int</span></code>: Return number of valid actions</p></li>
164
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">is_final()</span><spanclass="pre">-></span><spanclass="pre">bool</span></code>: Return True if current state is terminal</p></li>
165
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">success()</span><spanclass="pre">-></span><spanclass="pre">bool</span></code>: Return True if the goal was achieved</p></li>
166
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">value()</span><spanclass="pre">-></span><spanclass="pre">float</span></code>: Return the reward value for current state</p></li>
167
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">masks()</span><spanclass="pre">-></span><spanclass="pre">list[bool]</span></code>: Return action mask (True if action is valid)</p></li>
168
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">set_state(state:</span><spanclass="pre">list[int])</span></code>: Set environment to specific state</p></li>
169
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">copy()</span></code>: Return a copy of the environment (for parallel collection)</p></li>
163
170
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">twists()</span><spanclass="pre">-></span><spanclass="pre">(obs_perms,</span><spanclass="pre">act_perms)</span></code> (optional, for symmetry-aware training)</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">num_actions()</span></code>: Return number of valid actions</p></li>
185
-
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">is_final()</span></code>: Return True if current state is terminal</p></li>
186
-
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">success()</span></code>: Return True if the goal was achieved (episode ended successfully)</p></li>
187
-
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">reward()</span></code>: Return the reward value for the current state</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">set_state(state)</span></code>: Set environment to specific state (for inference)</p></li>
190
-
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">difficulty</span></code>: Property to get/set difficulty level</p></li>
188
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">num_actions()</span><spanclass="pre">-></span><spanclass="pre">usize</span></code>: Return number of possible actions</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">set_state(state:</span><spanclass="pre">Vec<i64>)</span></code>: Set environment to a specific state</p></li>
191
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">reset()</span></code>: Reset to a random initial state</p></li>
192
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">step(action:</span><spanclass="pre">usize)</span></code>: Execute an action (evolve the state)</p></li>
193
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">is_final()</span><spanclass="pre">-></span><spanclass="pre">bool</span></code>: Return True if current state is terminal</p></li>
194
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">success()</span><spanclass="pre">-></span><spanclass="pre">bool</span></code>: Return True if the goal was achieved</p></li>
195
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">reward()</span><spanclass="pre">-></span><spanclass="pre">f32</span></code>: Return the reward value for current state</p></li>
196
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">observe()</span><spanclass="pre">-></span><spanclass="pre">Vec<usize></span></code>: Return current state as sparse observation</p></li>
197
+
</ul>
198
+
<p>Optional methods with default implementations:</p>
199
+
<ulclass="simple">
200
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">set_difficulty(difficulty:</span><spanclass="pre">usize)</span></code>: Set difficulty level (default: no-op)</p></li>
201
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">get_difficulty()</span><spanclass="pre">-></span><spanclass="pre">usize</span></code>: Get current difficulty (default: 1)</p></li>
202
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">masks()</span><spanclass="pre">-></span><spanclass="pre">Vec<bool></span></code>: Return action mask (default: all True)</p></li>
<h2>Permutation Symmetries (Twists)<aclass="headerlink" href="#permutation-symmetries-twists" title="Link to this heading"></a></h2>
195
208
<p>TwisteRL supports symmetry-aware training through “twists” - permutations of observations and actions that represent equivalent states.</p>
196
-
<p>See <codeclass="docutils literal notranslate"><spanclass="pre">docs/twists.md</span></code> for detailed documentation on implementing twists in your environments.</p>
209
+
<p>See <aclass="reference external" href="https://github.com/AI4quantum/twisteRL/blob/main/docs/twists.md">twists.md</a> for detailed documentation on implementing twists in your environments.</p>
0 commit comments