Skip to content

fix: flag import-only custom pickle globals#1499

Draft
mldangelo-oai wants to merge 1 commit into
mainfrom
mdangelo/codex/fix-picklescan-import-only-globals-c095
Draft

fix: flag import-only custom pickle globals#1499
mldangelo-oai wants to merge 1 commit into
mainfrom
mdangelo/codex/fix-picklescan-import-only-globals-c095

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

Summary:\n- flag non-allowlisted import-only pickle GLOBAL references in native picklescan\n- suppress source-available benign custom modules in the Python wrapper\n- add bytecode-only custom-module and allowlisted OrderedDict regressions\n\nValidation:\n- maturin develop --release\n- targeted pytest regressions\n- cargo fmt/check/clippy/test\n- ruff format/check, mypy\n- full non-slow pytest: 7712 passed, 16 skipped

@mldangelo-oai
Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions
Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 765.85ms -> 783.30ms (+2.3%).

Workload Benchmark Target Size Files Baseline Current Change Status
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 101.38ms 109.43ms +7.9% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 490.8us 476.2us -3.0% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 23.10ms 23.75ms +2.8% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 212.03ms 217.01ms +2.3% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 292.16ms 295.59ms +1.2% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 39.94ms 40.33ms +1.0% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 1.59ms 1.57ms -1.0% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 485.5us 481.6us -0.8% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 20.38ms 20.23ms -0.8% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 1.67ms 1.65ms -0.7% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 72.14ms 72.29ms +0.2% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 484.0us 484.8us +0.2% stable

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7b0c36a49

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1118 to +1120
if source_unavailable_reason is None:
changed = True
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep source-available import globals under review

When a pickle contains only a GLOBAL for a custom module whose .py source is present, this branch drops the Rust warning solely because the source can be parsed. But unpickling that GLOBAL still imports the module and executes top-level initialization code, and the call-graph enrichment here does not inspect import-time statements for import-only references; I verified a payload cmodule\nGadget\n. with a top-level file write returns CLEAN from scan_bytes() while pickle.loads() performs the write. This reopens the import-time execution bypass for source-available custom modules, so the warning should only be suppressed after proving module initialization is inert.

Useful? React with 👍 / 👎.

Comment on lines +1118 to +1120
if source_unavailable_reason is None:
changed = True
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not suppress globals for unresolved modules

If the referenced custom module is not importable in the scanner's environment, _call_graph_source_unavailable_reason() returns None after both spec lookups miss, so this branch removes the new NON_ALLOWLISTED_GLOBAL warning and the report becomes CLEAN. That is a common deployment case when scanning third-party model artifacts without their private code installed, but unpickling the same GLOBAL in the target environment will still import that module and run its initialization. Missing source/specs should keep the warning (or fail closed), not be treated the same as a proven-safe built-in.

Useful? React with 👍 / 👎.

Comment on lines +11 to +12
!IMPORT_ONLY_GLOBAL_ALLOWLIST_MODULES.contains(&module)
&& !IMPORT_ONLY_GLOBAL_ALLOWLIST_MODULES.contains(&top_level_module)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid allowlisting every submodule of broad packages

Because this checks the top-level package name, any import-only global under packages like numpy, torch, joblib, or click is treated as allowlisted even when the actual module is an arbitrary submodule such as numpy.evil. A pickle GLOBAL imports that full submodule name before resolving the attribute, so this skips the new review warning for non-standard submodules that can still execute import-time code in the target environment. The allowlist should be exact or prefix-specific for known-safe modules rather than trusting every child of these large packages.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant