`osuT5/utils/train_utils.py`: `ValueError: Out of range float values are not JSON compliant` when saving checkpoint to WandB

Traceback:

```
[2026-03-06 08:56:47,366][osuT5.utils.train_utils][INFO] - {'train/loss': 0.10812649875879288, 'train/grad_l2':
   0.23848098367452622, 'train/weights_l2': 323.96180091922207, 'train/lr': 0.0010721064521583958,
  'train/seconds_per_step': 1.6197779893875122, 'train/epoch': 97, 'step': 5000}
  [2026-03-06 08:56:47,368][accelerate.accelerator][INFO] - Saving current state to checkpoint-5000
  [2026-03-06 08:56:48,079][accelerate.checkpointing][INFO] - Model weights saved in
  checkpoint-5000/pytorch_model.bin
  [2026-03-06 08:56:48,161][accelerate.checkpointing][INFO] - Optimizer state saved in
  checkpoint-5000/optimizer.bin
  [2026-03-06 08:56:48,162][accelerate.checkpointing][INFO] - Scheduler state saved in
  checkpoint-5000/scheduler.bin
  [2026-03-06 08:56:48,163][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in
  checkpoint-5000/sampler.bin
  [2026-03-06 08:56:48,164][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in
  checkpoint-5000/sampler_1.bin
  [2026-03-06 08:56:48,165][accelerate.checkpointing][INFO] - Random states saved in
  checkpoint-5000/random_states_0.pkl
  [2026-03-06 08:56:48,166][accelerate.checkpointing][INFO] - Saving the state of Tokenizer to
  checkpoint-5000/custom_checkpoint_0.pkl
  Error executing job with overrides: ['data.train_dataset_path=/test/', 'data.test_dataset_path=/test/', 'compile=false',
  'data.train_dataset_start=0', 'data.train_dataset_end=66', 'data.test_dataset_start=0',
  'data.test_dataset_end=0', 'optim.base_lr=0.0025', 'optim.base_lr_2=0.0005', 'optim.batch_size=8',
  'optim.grad_acc=1', 'optim.total_steps=9000', 'optim.warmup_steps=200']
  Traceback (most recent call last):
    File "/workspace/Mapperatorinator/osuT5/train.py", line 111, in main
      func(
    File "/workspace/Mapperatorinator/osuT5/osuT5/utils/train_utils.py", line 377, in train
      maybe_save_checkpoint(model, accelerator, args, shared)
    File "/workspace/Mapperatorinator/osuT5/osuT5/utils/train_utils.py", line 70, in maybe_save_checkpoint
      art = wandb.Artifact(
            ^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.11/site-packages/wandb/sdk/artifacts/artifact.py", line 217, in __init__
      self._metadata: dict[str, Any] = validate_metadata(metadata)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.11/functools.py", line 909, in wrapper
      return dispatch(args[0].__class__)(*args, **kw)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.11/site-packages/wandb/sdk/artifacts/_validators.py", line 218, in _
      metadata = from_json(json.dumps(json_friendly_val(metadata), allow_nan=False))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.11/json/__init__.py", line 238, in dumps
      **kw).encode(obj)
            ^^^^^^^^^^^
    File "/opt/conda/lib/python3.11/json/encoder.py", line 200, in encode
      chunks = self.iterencode(o, _one_shot=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.11/json/encoder.py", line 258, in iterencode
      return _iterencode(o, 0)
             ^^^^^^^^^^^^^^^^^
  ValueError: Out of range float values are not JSON compliant

  Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
  wandb:
  wandb: 🚀 View run wandering-wood-117 at:
  wandb: Find logs at: ...
```

Workaround:
Simply resume the training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`osuT5/utils/train_utils.py`: `ValueError: Out of range float values are not JSON compliant` when saving checkpoint to WandB #93

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

osuT5/utils/train_utils.py: ValueError: Out of range float values are not JSON compliant when saving checkpoint to WandB #93

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`osuT5/utils/train_utils.py`: `ValueError: Out of range float values are not JSON compliant` when saving checkpoint to WandB #93