assert removed

slikhite-1 · pengdurice · commit 96f9b2e70b74 · 2026-05-20T18:39:55.000Z
Signed-off-by: slikhite-1 &lt;slikhite@nvidia.com&gt;
diff --git a/docs/guides/grpo.md b/docs/guides/grpo.md
@@ -456,20 +456,6 @@ grpo:
 
 Set `overlong_filtering` to true when training on tasks where truncation at the maximum sequence length is expected, such as long-form reasoning or mathematical proofs.
 
-#### CISPO (Clipped IS-weight Policy Optimization)
-
-CISPO introduced in [MiniMax-M1 paper](https://arxiv.org/abs/2506.13585) clips the importance sampling weight itself and applies stop-gradient.
-
-The loss is:
-
-$$
-L(\theta) = E_{x \sim \pi_{\theta_{\text{old}}}} \Big[ \text{sg}\big(\text{clip}(r(\theta), 1-\varepsilon_{\text{low}}, 1+\varepsilon_{\text{high}})\big) \cdot A_t \cdot \log \pi_\theta(x) \Big]
-$$
-
-where $r(\theta) = \frac{\pi_\theta(x)}{\pi_{\theta_{\text{old}}}(x)}$, $\text{sg}$ denotes stop-gradient, and $\varepsilon_{\text{low}}$, $\varepsilon_{\text{high}}$ are the IS-weight clipping bounds. Dual-clipping (`ratio_clip_c`) is ignored when CISPO is enabled.
-
-To use CISPO, set `loss_fn.use_cispo: true` in your config. Tune `ratio_clip_min` and `ratio_clip_max` (mapping to $\varepsilon_{\text{low}}$ and $\varepsilon_{\text{high}}$). It is recommended to use a large `ratio_clip_min` (e.g. 1.0) and tune `ratio_clip_max` (e.g. 0.8). Example: [examples/configs/cispo_math_8B.yaml](../../examples/configs/cispo_math_8B.yaml).
-
 #### Top-p and top-k filtering
 
 The implementation aligns with vLLM’s top-p and top-k filtering by applying an equivalent process to the logits.
diff --git a/examples/configs/cispo_math_8B.yaml b/examples/configs/cispo_math_8B.yaml
@@ -1,8 +1,7 @@
-# CISPO Algorithm Configuration
 defaults: "grpo_math_1B.yaml"
 
   # ============================================================================
-  # CISPO: Clipped IS-weight Policy Optimization
+  # CISPO: Clipped Importance Sampling Policy Optimization
   # CISPO clips the IS weight itself and applies stop-gradient, then multiplies by
   # advantage and log-probability. 
   # ratio_clip_min / ratio_clip_max control the IS weight clipping bounds (ε_IS_low / ε_IS_high).
diff --git a/nemo_rl/algorithms/loss/loss_functions.py b/nemo_rl/algorithms/loss/loss_functions.py
@@ -239,6 +239,9 @@ def __init__(self, cfg: ClippedPGLossConfig):
             assert self.ratio_clip_c is None, (
                 "use_cispo is incompatible with ratio_clip_c; "
                 "ratio_clip_c is not supported when use_cispo=True"
+        if self.truncated_importance_sampling_ratio is not None:
+            assert self.use_importance_sampling_correction, (
+                "truncated_importance_sampling_ratio is only supported when use_importance_sampling_correction is True"
             )
         if self.truncated_importance_sampling_type is not None:
             assert self.use_importance_sampling_correction, (