You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: skills/configs/SKILL.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -70,7 +70,7 @@ In `opd`, rollouts are generated by the student. The orchestrator scores the stu
70
70
71
71
`[inference]` is required for the usual online path because it starts the student inference server and auto-configures `orchestrator.student.client.base_url`. The student pool is used for online evals and policy weight sync. For externally started student inference, set `orchestrator.student.client.base_url` explicitly instead.
72
72
73
-
Teacher logprob scoring supports both self-hosted vLLM and Prime API teacher clients: `/inference/v1/generate`for vLLM server roots, `/api/v1/generate` when the teacher client base URL ends in `/api/v1`.
73
+
Teacher logprob scoring uses PrimeRL's vLLM-native `/inference/v1/generate`route. The request field is `token_ids`, meaning the prompt plus completion tokens to score; response `choices[].token_ids` remains generated completion tokens and is not used for OPD scoring.
0 commit comments