fix: flag import-only custom pickle globals#1499
Conversation
|
@codex review |
Performance BenchmarksCompared
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d7b0c36a49
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if source_unavailable_reason is None: | ||
| changed = True | ||
| continue |
There was a problem hiding this comment.
Keep source-available import globals under review
When a pickle contains only a GLOBAL for a custom module whose .py source is present, this branch drops the Rust warning solely because the source can be parsed. But unpickling that GLOBAL still imports the module and executes top-level initialization code, and the call-graph enrichment here does not inspect import-time statements for import-only references; I verified a payload cmodule\nGadget\n. with a top-level file write returns CLEAN from scan_bytes() while pickle.loads() performs the write. This reopens the import-time execution bypass for source-available custom modules, so the warning should only be suppressed after proving module initialization is inert.
Useful? React with 👍 / 👎.
| if source_unavailable_reason is None: | ||
| changed = True | ||
| continue |
There was a problem hiding this comment.
Do not suppress globals for unresolved modules
If the referenced custom module is not importable in the scanner's environment, _call_graph_source_unavailable_reason() returns None after both spec lookups miss, so this branch removes the new NON_ALLOWLISTED_GLOBAL warning and the report becomes CLEAN. That is a common deployment case when scanning third-party model artifacts without their private code installed, but unpickling the same GLOBAL in the target environment will still import that module and run its initialization. Missing source/specs should keep the warning (or fail closed), not be treated the same as a proven-safe built-in.
Useful? React with 👍 / 👎.
| !IMPORT_ONLY_GLOBAL_ALLOWLIST_MODULES.contains(&module) | ||
| && !IMPORT_ONLY_GLOBAL_ALLOWLIST_MODULES.contains(&top_level_module) |
There was a problem hiding this comment.
Avoid allowlisting every submodule of broad packages
Because this checks the top-level package name, any import-only global under packages like numpy, torch, joblib, or click is treated as allowlisted even when the actual module is an arbitrary submodule such as numpy.evil. A pickle GLOBAL imports that full submodule name before resolving the attribute, so this skips the new review warning for non-standard submodules that can still execute import-time code in the target environment. The allowlist should be exact or prefix-specific for known-safe modules rather than trusting every child of these large packages.
Useful? React with 👍 / 👎.
Summary:\n- flag non-allowlisted import-only pickle GLOBAL references in native picklescan\n- suppress source-available benign custom modules in the Python wrapper\n- add bytecode-only custom-module and allowlisted OrderedDict regressions\n\nValidation:\n- maturin develop --release\n- targeted pytest regressions\n- cargo fmt/check/clippy/test\n- ruff format/check, mypy\n- full non-slow pytest: 7712 passed, 16 skipped