File tree Expand file tree Collapse file tree 1 file changed +22
-0
lines changed
Expand file tree Collapse file tree 1 file changed +22
-0
lines changed Original file line number Diff line number Diff line change @@ -76,6 +76,28 @@ it does not introduce extra model forward passes or re-rollouts.
7676- Even with Jackpot, two model joint training will still crash eventually after 300 steps with batchsize 64.
7777- The paper does not validate Jackpot on very large models (e.g., 32B variants), due to limit of resources.
7878
79+ ## Code Structure
80+ We based our implementation on Verl https://github.com/verl-project/verl .
81+ For installation, you can simply run the following
82+ ```
83+ pip install -e .[vllm]
84+ ```
85+ Example script you can use to start Jackpot is here
86+ ```
87+ <<placeholder>>
88+ ```
89+ Here is the detailed explanation of all the parameters we added to verl arguments for Jackpot
90+ ``` yaml
91+ # inside actor.yaml
92+ use_jackpot : false # Whether to enable Jackpot (OBRS) loss
93+ jackpot_log_probs_to_keep : 20 # Number of top-k log-probs to keep for Jackpot
94+ jackpot_lambda : 1.0 # Scaling factor for Jackpot loss
95+ jackpot_clip_ratio : 3.0 # Clipping ratio for Jackpot importance weights
96+ jackpot_use_latest_logits : false # Whether to recompute Jackpot with the latest logits instead of cached top-k
97+ jackpot_use_topk_renorm : true # Whether to renormalize Jackpot weights using the top-k slice
98+ jackpot_mask_only : false # Mask Jackpot weights without renormalization outside the top-k slice
99+ ` ` `
100+
79101## **Bibliography**
80102
81103If you think our work is helpful, please consider citing us using the following BibTeX.
You can’t perform that action at this time.
0 commit comments