rl/docs/source/reference/llms_envs.rst at ff125389bee53f0e3feb092d481e329b379bdb8b · pytorch/rl

orphan:

.. currentmodule:: torchrl.envs.llm

LLM Environments

The environment layer orchestrates data loading, tool execution, reward computation, and formatting.

.. autosummary::
    :toctree: generated/
    :template: rl_template.rst

    ChatEnv
    CountdownEnv
    CountdownRewardParser
    DatasetChatEnv
    GSM8KEnv
    make_gsm8k_env
    GSM8KPrepareQuestion
    GSM8KRewardParser
    IFEvalEnv
    IfEvalScorer
    IFEvalScoreData
    MATHEnv
    MATHRewardParser
    LLMEnv
    LLMHashingEnv
    make_mlgym
    MLGymWrapper

Agentic toolkit (preview)

.. currentmodule:: torchrl.envs.llm.agentic

The :mod:`torchrl.envs.llm.agentic` package provides a SOTA, async-first substrate for tool-calling agents on top of an unmodified :class:`~torchrl.envs.llm.ChatEnv`: structured parsers for the major provider protocols (XML, JSON-block, OpenAI tool_calls, Anthropic tool_use), hardened :class:`Sandbox` backends, and stateful :class:`Repl` sessions.

This preview ships the substrate the headline orchestrator (ToolCompose) is built on. A minimal end-to-end sketch -- usable today against the substrate, formalised by the orchestrator -- looks like:

from torchrl.envs.llm.agentic.parsers import XMLToolCallParser
from torchrl.envs.llm.agentic.sandbox import default_sandbox, ResourceLimits
from torchrl.envs.llm.agentic.repl import SubprocessRepl

parser = XMLToolCallParser()
parsed = parser.parse('<tool name="python" tag="c1">{"code": "print(2+2)"}</tool>')
# -> parsed.calls[0].tool == "python", parsed.calls[0].call_id == "c1"

sandbox = default_sandbox(ResourceLimits(wall_seconds=10, network="none"))
async def run():
    async with sandbox, SubprocessRepl(sandbox) as repl:
        result = await repl.execute("print(2+2)")
        assert result.stdout.strip() == "4"

Tool contracts

A :class:`Tool` is a pure async object with a name, a JSON Schema input_schema, and an async run(args, ctx) method returning a :class:`ToolResult`. Calls flow through a :class:`ToolCallParser` (one of the four built-ins below) which guarantees a stable call_id for every invocation.

.. autosummary::
    :toctree: generated/
    :template: rl_template.rst

    Tool
    ToolContext
    ToolResult
    TextPart
    JsonPart
    ImagePart
    FileRefPart
    ParsedCall
    ParseResult
    ToolCallParser

Parsers

.. currentmodule:: torchrl.envs.llm.agentic.parsers

.. autosummary::
    :toctree: generated/
    :template: rl_template.rst

    XMLToolCallParser
    JSONToolCallParser
    OpenAIToolCallParser
    AnthropicToolUseParser

Sandboxing

.. currentmodule:: torchrl.envs.llm.agentic.sandbox

A :class:`Sandbox` is an async context manager that runs subprocess commands with bounded resources, controlled filesystem access, and opt-in network egress. The default backends are :class:`BubblewrapSandbox` on Linux and :class:`SeatbeltSandbox` on macOS; pick one explicitly or use :func:`default_sandbox`.

For environments without those binaries, :class:`UnsafeSubprocessSandbox` provides a no-isolation fallback that warns loudly on every open(). Do not use it with untrusted model output.

Note

Apple has officially deprecated sandbox-exec, but it still ships with macOS 14+ and remains the most portable in-process isolation primitive on that platform. For stronger or cross-platform isolation, prefer :class:`DockerSandbox` (currently a stub -- contributions welcome).

.. autosummary::
    :toctree: generated/
    :template: rl_template.rst

    Sandbox
    SandboxResult
    ResourceLimits
    BubblewrapSandbox
    SeatbeltSandbox
    UnsafeSubprocessSandbox
    DockerSandbox
    E2BSandbox
    ModalSandbox
    default_sandbox

Stateful REPLs

.. currentmodule:: torchrl.envs.llm.agentic.repl

A :class:`Repl` runs stateful code inside a :class:`Sandbox` so an agent can build up variables across multiple tool calls. The default :class:`JupyterRepl` uses an IPython kernel for rich outputs (images, JSON, plots) and clean restarts (optional dependency: jupyter_client). :class:`SubprocessRepl` is a no-dep fallback that trades rich display for portability.

.. autosummary::
    :toctree: generated/
    :template: rl_template.rst

    Repl
    ReplResult
    ReplDisplay
    ReplError
    JupyterRepl
    SubprocessRepl

Built-in tools and adapters

.. currentmodule:: torchrl.envs.llm.agentic

.. autosummary::
    :toctree: generated/
    :template: rl_template.rst

    ToolCompose
    DispatchResult
    PythonTool
    ShellTool
    FileReadTool
    StopTool
    HttpTool
    MCPServerConfig
    MCPToolset
    RateLimiter
    as_tool

Migration from legacy tool transforms

Existing code built on :mod:`torchrl.envs.llm.transforms` keeps working: no DeprecationWarning is emitted in this release. Each legacy class has a .. seealso:: block in its docstring pointing at the recommended replacement, summarised here.

Legacy transform → agentic counterpart

Legacy	Agentic	Adapter recipe
`ExecuteToolsInOrder`	:class:`ToolCompose`	Replace at the env stack level. `ToolCompose` runs calls concurrently; pin sequential execution per-tool with :class:`RateLimiter` `max_concurrent=1` if you depend on ordering.
`PythonInterpreter`	:class:`PythonTool` + :class:`Sandbox` + :class:`Repl`	For a soft migration, lift the existing transform: `as_tool(PythonInterpreter(persistent=True), name="python", input_schema=...)`.
`SimpleToolTransform`	Native :class:`Tool` subclass	Or `as_tool(transform, name=..., input_schema=...)`.
`BrowserTransform`	:func:`tools.as_tool` of the existing transform	A native :class:`Tool` for browser automation may land later; until then the adapter is the recommended path.
`MCPToolTransform`	:class:`MCPToolset`	One :class:`Tool` per remote tool, schemas auto-discovered. Drops directly into `ToolCompose`.
`XMLBlockParser` / `JSONCallParser`	:class:`parsers.XMLToolCallParser` / :class:`parsers.JSONToolCallParser`	Same syntax; the agentic versions enforce a stable `call_id`.
`ToolService` / `ToolRegistry`	The `tools=[...]` argument to :class:`ToolCompose`	The registry pattern collapses into the compose container.

For a guided walkthrough, see the :ref:`agentic ChatEnv tutorial <llm_agentic>`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Environments

Agentic toolkit (preview)

Tool contracts

Parsers

Sandboxing

Stateful REPLs

Built-in tools and adapters

Migration from legacy tool transforms

FilesExpand file tree

llms_envs.rst

Latest commit

History

llms_envs.rst

File metadata and controls

LLM Environments

Agentic toolkit (preview)

Tool contracts

Parsers

Sandboxing

Stateful REPLs

Built-in tools and adapters

Migration from legacy tool transforms