running

YangZhou08 · YangZhou08 · commit 0b714e543bb8 · 2026-02-08T22:07:15.000-05:00
diff --git a/README.md b/README.md
@@ -76,6 +76,28 @@ it does not introduce extra model forward passes or re-rollouts.
 - Even with Jackpot, two model joint training will still crash eventually after 300 steps with batchsize 64. 
 - The paper does not validate Jackpot on very large models (e.g., 32B variants), due to limit of resources. 
 
+## Code Structure 
+We based our implementation on Verl https://github.com/verl-project/verl. 
+For installation, you can simply run the following 
+``` 
+pip install -e .[vllm] 
+``` 
+Example script you can use to start Jackpot is here 
+```
+<<placeholder>> 
+``` 
+Here is the detailed explanation of all the parameters we added to verl arguments for Jackpot 
+```yaml 
+# inside actor.yaml 
+use_jackpot: false # Whether to enable Jackpot (OBRS) loss
+jackpot_log_probs_to_keep: 20 # Number of top-k log-probs to keep for Jackpot
+jackpot_lambda: 1.0 # Scaling factor for Jackpot loss
+jackpot_clip_ratio: 3.0 # Clipping ratio for Jackpot importance weights
+jackpot_use_latest_logits: false # Whether to recompute Jackpot with the latest logits instead of cached top-k
+jackpot_use_topk_renorm: true # Whether to renormalize Jackpot weights using the top-k slice
+jackpot_mask_only: false # Mask Jackpot weights without renormalization outside the top-k slice
+```
+
 ## **Bibliography** 
 
 If you think our work is helpful, please consider citing us using the following BibTeX.