Skip to content

fix: detect newline-separated picklescan calls#1481

Open
mldangelo-oai wants to merge 2 commits into
mainfrom
mdangelo/codex/fix-picklescan-newline-code-string-c105
Open

fix: detect newline-separated picklescan calls#1481
mldangelo-oai wants to merge 2 commits into
mainfrom
mdangelo/codex/fix-picklescan-newline-code-string-c105

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

Summary

  • treat newline-leading suffixes that start another call expression as code-like, including dotted calls
  • add malicious positive and prose near-match Rust regression coverage
  • update the picklescan changelog

Validation

  • cargo fmt --check
  • cargo check
  • cargo clippy --all-targets -- -D warnings
  • cargo test
  • VIRTUAL_ENV=/Users/mdangelo/code/modelaudit/.venv UV_CACHE_DIR=/private/tmp/modelaudit-uv-cache /Users/mdangelo/code/modelaudit/.venv/bin/python -m maturin develop --release
  • PYTHONPATH=/private/tmp/modelaudit-c105 PROMPTFOO_DISABLE_TELEMETRY=1 /Users/mdangelo/code/modelaudit/.venv/bin/pytest packages/modelaudit-picklescan/tests/test_native_interface.py -q
  • ruff format/check and mypy for touched Python package paths
  • git diff --check

@mldangelo-oai
Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 31, 2026

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 755.81ms -> 765.98ms (+1.3%).

Workload Benchmark Target Size Files Baseline Current Change Status
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 394.9us 415.2us +5.1% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 22.41ms 23.33ms +4.1% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 99.53ms 102.07ms +2.6% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 288.25ms 294.59ms +2.2% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 40.97ms 40.42ms -1.3% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 412.4us 417.9us +1.3% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 19.83ms 19.57ms -1.3% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 1.53ms 1.51ms -1.1% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 415.4us 419.9us +1.1% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 215.58ms 217.06ms +0.7% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 65.07ms 64.73ms -0.5% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 1.44ms 1.44ms +0.1% stable

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0247e2e2dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/modelaudit-picklescan/rust/src/strings.rs Outdated
@mldangelo-oai mldangelo-oai marked this pull request as ready for review May 31, 2026 12:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 441ca7729a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return false;
}
}
_ => return false,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat subscripted follow-up calls as code

When the newline-separated statement starts with a subscripted callable, e.g. eval(payload)\ncallbacks[0](), this branch returns false after the first identifier because only ( and . are recognized. That sends control back to call_like_has_prose_suffix, which treats the alphabetic suffix as prose and skips the dangerous eval match, so a valid two-line code string can still scan clean.

Useful? React with 👍 / 👎.

Comment on lines +615 to +617
while chars.peek().is_some_and(|ch| is_python_word_char(*ch)) {
chars.next();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound suffix identifier lookahead

For a literal such as eval(x)\n followed by a long alphanumeric run that is not a call, the new suffix parser consumes the entire remaining identifier before returning false. Because this runs inside contains_call_like for each candidate/pattern and string scanning can pass multi-megabyte windows, a crafted pickle can force repeated full-window scans just to classify the suffix as prose; capping this lookahead would preserve the existing bounded-resource behavior.

Useful? React with 👍 / 👎.

@ianw-oai
Copy link
Copy Markdown
Contributor

ianw-oai commented Jun 1, 2026

Potential regression: the new newline heuristic appears to widen false positives for prose-like parenthetical text, and I did not see coverage for that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants