Skip to content

fix: harden checkpoint config target validation#4419

Closed
chtruong814 wants to merge 4 commits into
mainfrom
chtruong/register-allowed
Closed

fix: harden checkpoint config target validation#4419
chtruong814 wants to merge 4 commits into
mainfrom
chtruong/register-allowed

Conversation

@chtruong814

@chtruong814 chtruong814 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Harden Bridge config instantiation by validating all target entries against a stable allowlist before any target can run.
  • Block config-driven allowlist mutation primitives, including register_allowed_target_prefix and target_allowlist mutators.
  • Validate builder checkpoint metadata before ModelConfig.from_dict so checkpoint run_config.yaml cannot bypass the instantiate allowlist.
  • Add regression coverage for sibling-order allowlist mutation, OmegaConf configs, kwargs, instantiate_node, args/list traversal, upstream allowlist mutators, and checkpoint metadata target/builder rejection.

Validation

  • PRE_COMMIT_HOME=/private/tmp/nemo-lm-pre-commit-cache pre-commit run --all-files
  • PRE_COMMIT_HOME=/private/tmp/nemo-lm-pre-commit-cache pre-commit run --files tests/unit_tests/utils/test_instantiate_utils.py
  • UV_CACHE_DIR=.uv-cache uv run pre-commit run --all-files was attempted but blocked on macOS/arm64 because nvidia-resiliency-ext==0.6.0 has no compatible wheel.
  • Targeted uv pytest commands were attempted but hit the same local macOS/arm64 dependency resolution blocker before collection.

Notes

  • The current branch head is 0a18996. If CI does not start automatically, use /ok to test 0a18996.

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@chtruong814 chtruong814 changed the title fix(security): harden checkpoint config target validation fix: harden checkpoint config target validation Jun 19, 2026
Comment thread src/megatron/bridge/utils/instantiate_utils.py
Comment thread src/megatron/bridge/utils/instantiate_utils.py
@claude

claude Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Light Review

The snapshot-based preflight approach is well designed — capturing the allowlist state before traversal closes the TOCTOU window where a config entry could extend the allowlist for a later sibling. The blocked-targets/blocked-prefixes denylist covers both Bridge and upstream MLM mutation vectors (register_allowed_target_prefix, target_allowlist.*). Test coverage for the core attack vectors is thorough.

Two coverage gaps flagged as inline comments:

  1. instantiate_node has no test — the wrapper does preflight validation but if it were accidentally removed, no test would catch the regression.
  2. kwargs preflight in instantiate() has no test — line 192 validates kwargs, which is a real attack surface, but no test exercises it.

No bugs or logic errors found in the implementation.

Suggested test cases

No perf tests impacted.

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant