Hi! Im training gpt-oss-120b with CISPO loss. I get the following grad norm returned on a regular batch of on policy data:
OptimStepResponse(metrics={'unclipped_grad_l2:mean': 13468.94921875})
How exactly is unclipped_grad_l2:mean computed, as the reported grad norm is dramatically too large.
Thanks in advance for any help with this!
Hi! Im training gpt-oss-120b with CISPO loss. I get the following grad norm returned on a regular batch of on policy data:
OptimStepResponse(metrics={'unclipped_grad_l2:mean': 13468.94921875})How exactly is
unclipped_grad_l2:meancomputed, as the reported grad norm is dramatically too large.Thanks in advance for any help with this!