Grad norm issue

Hi! Im training gpt-oss-120b with CISPO loss. I get the following grad norm returned on a regular batch of on policy data:

`OptimStepResponse(metrics={'unclipped_grad_l2:mean': 13468.94921875})`

How exactly is` unclipped_grad_l2:mean` computed, as the reported grad norm is dramatically too large.

Thanks in advance for any help with this!