Skip to content

Commit 0b714e5

Browse files
committed
running
1 parent 4c77535 commit 0b714e5

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,28 @@ it does not introduce extra model forward passes or re-rollouts.
7676
- Even with Jackpot, two model joint training will still crash eventually after 300 steps with batchsize 64.
7777
- The paper does not validate Jackpot on very large models (e.g., 32B variants), due to limit of resources.
7878

79+
## Code Structure
80+
We based our implementation on Verl https://github.com/verl-project/verl.
81+
For installation, you can simply run the following
82+
```
83+
pip install -e .[vllm]
84+
```
85+
Example script you can use to start Jackpot is here
86+
```
87+
<<placeholder>>
88+
```
89+
Here is the detailed explanation of all the parameters we added to verl arguments for Jackpot
90+
```yaml
91+
# inside actor.yaml
92+
use_jackpot: false # Whether to enable Jackpot (OBRS) loss
93+
jackpot_log_probs_to_keep: 20 # Number of top-k log-probs to keep for Jackpot
94+
jackpot_lambda: 1.0 # Scaling factor for Jackpot loss
95+
jackpot_clip_ratio: 3.0 # Clipping ratio for Jackpot importance weights
96+
jackpot_use_latest_logits: false # Whether to recompute Jackpot with the latest logits instead of cached top-k
97+
jackpot_use_topk_renorm: true # Whether to renormalize Jackpot weights using the top-k slice
98+
jackpot_mask_only: false # Mask Jackpot weights without renormalization outside the top-k slice
99+
```
100+
79101
## **Bibliography**
80102
81103
If you think our work is helpful, please consider citing us using the following BibTeX.

0 commit comments

Comments
 (0)