Skip to content

security: replace dill.load() with SafeUnpickler allowlist in mmv/utils/checkpoint.py#716

Open
brodmart wants to merge 1 commit into
google-deepmind:masterfrom
brodmart:fix/mmv-checkpoint-safe-unpickler
Open

security: replace dill.load() with SafeUnpickler allowlist in mmv/utils/checkpoint.py#716
brodmart wants to merge 1 commit into
google-deepmind:masterfrom
brodmart:fix/mmv-checkpoint-safe-unpickler

Conversation

@brodmart
Copy link
Copy Markdown

Summary

mmv/utils/checkpoint.py:24 calls dill.load() on an attacker-controlled file path (--checkpoint_path flag). dill is a superset of pickle and executes arbitrary Python bytecode during deserialization — a malicious checkpoint achieves full RCE on the loading host with no further prerequisites.

# Before (line 24) — arbitrary code execution
checkpoint_data = dill.load(checkpoint_file)

Fix

Replace dill.load() with _SafeUnpickler, a stdlib pickle.Unpickler subclass that overrides find_class() with an explicit allowlist. Only types actually present in MMV parameter/state dicts are permitted:

  • numpy.ndarray, numpy.dtype, numpy scalar/reconstruct helpers
  • builtins: dict, list, tuple, str, int, float, bool, bytes

Any type outside the allowlist raises pickle.UnpicklingError before instantiation.

# After — allowlist-gated deserialization
checkpoint_data = _SafeUnpickler(checkpoint_file).load()

This also removes the dill dependency from the mmv module entirely.

Impact

CVSS 3.1: AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H — 7.8 HIGH

A user who runs eval_ucf101.py --checkpoint_path <attacker-controlled-path> loads and executes the malicious payload. Typical in shared compute environments (university clusters, cloud notebooks) where checkpoint files are shared between users.

Testing

The allowlist covers all types written by the original MMV checkpoint-saving code. If any additional numpy or JAX types are encountered, extend _SafeUnpickler._ALLOWED accordingly — the error message names the blocked module.class for easy diagnosis.

dill.load() is equivalent to pickle.load() and executes arbitrary Python
code in any loaded checkpoint file. A malicious or compromised checkpoint
at --checkpoint_path will achieve full RCE on the loading host.

Replace with _SafeUnpickler, a stdlib-pickle subclass that restricts
find_class() to only the types present in MMV params/state dicts
(nested dicts of numpy arrays). No dill dependency needed.
@polarbe
Copy link
Copy Markdown

polarbe commented Apr 30, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants