Skip to content

Latest commit

 

History

History
19 lines (16 loc) · 1.74 KB

File metadata and controls

19 lines (16 loc) · 1.74 KB

Examples

These examples provide concrete examples to leverage slime in your own RL workflow. Some examples are just demonstrative, but most of them are verifiable with a concrete performance score.

Directory Structure

  • eval_multi_task: Example for supporting evaluation multiple tasks with different configs.
  • fully_async: Demonstrates fully asynchronous rollout generation for higher efficiency.
  • geo3k_vlm: Training VLMs on a single-turn reasoning task using GRPO on the GEO3K dataset.
  • geo3k_vlm_multi_turn: VLM multi-turn training on Geo3k dataset.
  • low_precision: Examples of FP8 training and inference for improved throughput and stability.
  • multi_agent: Example of running multi-agent RL with slime.
  • on_policy_distillation: Example implementation for on-policy distillation, extending the reinforcement learning pipeline to support teacher–student distillation directly within on-policy training.
  • reproducibility: Guides on achieving bitwise experiment reproduction using deterministic modes.
  • retool: Demonstrates the retool functionality for tool-enabled language model generation.
  • search-r1: A minimal reproduction of Search-R1, featuring multi-turn conversation and tool-calling.
  • strands_sglang: Integration example with the Strands-Agents scaffolding framework.
  • tau-bench: Training in an agentic multi-turn tool use environment (Tau-bench).
  • train_infer_mismatch_helper: Algorithmic methods for rollout correction (e.g., TIS, MIS).