[Feat] Vllm Dumper#1507
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThis PR adds a vLLM-based hidden states extraction script and a corresponding parity test. The script loads conversations, tokenizes with loss masks, uses vLLM speculative decoding to extract hidden states, reshapes outputs, and saves results as ChangesvLLM Hidden States Collection and Parity Testing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1507 +/- ##
==========================================
- Coverage 76.93% 76.75% -0.19%
==========================================
Files 474 475 +1
Lines 51506 51635 +129
==========================================
+ Hits 39625 39630 +5
- Misses 11881 12005 +124
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0761d1b to
4252397
Compare
|
e5c46c7 to
2bc8e4b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@examples/speculative_decoding/collect_hidden_states/compute_hidden_states_vllm.py`:
- Around line 242-247: Check and guard for output.kv_transfer_params being None
before calling .get(): inside the loop where hidden_states_path =
output.kv_transfer_params.get("hidden_states_path") is used, first verify that
output.kv_transfer_params is not None (and is a dict-like object) and if it is
None, print the same warning referencing conversation_id and continue; otherwise
retrieve hidden_states_path and proceed as before.
In `@tests/examples/speculative_decoding/test_collect_hidden_states_parity.py`:
- Around line 96-97: The test unsafely deserializes artifacts using
torch.load(..., weights_only=False) for variables pt_hf and pt_vl; change those
calls to torch.load(..., weights_only=True) to avoid executing arbitrary pickled
objects (or, if full object deserialization is genuinely required, add an
explicit inline comment documenting the security rationale and why
weights_only=False is necessary). Update the two calls that set pt_hf and pt_vl
accordingly so the test only loads tensor weights unless a documented exception
is provided.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d92dafc3-8572-4c12-aacd-7afa56e100f3
📒 Files selected for processing (2)
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_vllm.pytests/examples/speculative_decoding/test_collect_hidden_states_parity.py
| hidden_states_path = output.kv_transfer_params.get("hidden_states_path") | ||
| if hidden_states_path is None: | ||
| print( | ||
| f"Warning: no hidden_states_path for conversation {conversation_id}, skipping" | ||
| ) | ||
| continue |
There was a problem hiding this comment.
Guard against kv_transfer_params being None.
If output.kv_transfer_params is None (e.g., when kv_transfer is not configured or an error occurs), calling .get() on it will raise an AttributeError. Consider checking for None before accessing the dictionary method.
Proposed fix
- hidden_states_path = output.kv_transfer_params.get("hidden_states_path")
- if hidden_states_path is None:
+ kv_params = getattr(output, "kv_transfer_params", None) or {}
+ hidden_states_path = kv_params.get("hidden_states_path")
+ if not hidden_states_path:
print(
f"Warning: no hidden_states_path for conversation {conversation_id}, skipping"
)
continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| hidden_states_path = output.kv_transfer_params.get("hidden_states_path") | |
| if hidden_states_path is None: | |
| print( | |
| f"Warning: no hidden_states_path for conversation {conversation_id}, skipping" | |
| ) | |
| continue | |
| kv_params = getattr(output, "kv_transfer_params", None) or {} | |
| hidden_states_path = kv_params.get("hidden_states_path") | |
| if not hidden_states_path: | |
| print( | |
| f"Warning: no hidden_states_path for conversation {conversation_id}, skipping" | |
| ) | |
| continue |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@examples/speculative_decoding/collect_hidden_states/compute_hidden_states_vllm.py`
around lines 242 - 247, Check and guard for output.kv_transfer_params being None
before calling .get(): inside the loop where hidden_states_path =
output.kv_transfer_params.get("hidden_states_path") is used, first verify that
output.kv_transfer_params is not None (and is a dict-like object) and if it is
None, print the same warning referencing conversation_id and continue; otherwise
retrieve hidden_states_path and proceed as before.
| pt_hf = torch.load(f_hf, map_location="cpu", weights_only=False) | ||
| pt_vl = torch.load(f_vl, map_location="cpu", weights_only=False) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify there are no explicit unsafe torch.load calls in Python files.
rg -nP --type=py 'torch\.load\([^)]*weights_only\s*=\s*False' -C2Repository: NVIDIA/Model-Optimizer
Length of output: 2186
Fix unsafe test artifact deserialization in speculative decoding parity test
Line 96-97 use torch.load(..., weights_only=False) without an inline documented exception rationale; update to weights_only=True (or add an explicit safety justification if full deserialization is required).
🔒 Suggested fix
- pt_hf = torch.load(f_hf, map_location="cpu", weights_only=False)
- pt_vl = torch.load(f_vl, map_location="cpu", weights_only=False)
+ pt_hf = torch.load(f_hf, map_location="cpu", weights_only=True)
+ pt_vl = torch.load(f_vl, map_location="cpu", weights_only=True)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/examples/speculative_decoding/test_collect_hidden_states_parity.py`
around lines 96 - 97, The test unsafely deserializes artifacts using
torch.load(..., weights_only=False) for variables pt_hf and pt_vl; change those
calls to torch.load(..., weights_only=True) to avoid executing arbitrary pickled
objects (or, if full object deserialization is genuinely required, add an
explicit inline comment documenting the security rationale and why
weights_only=False is necessary). Update the two calls that set pt_hf and pt_vl
accordingly so the test only loads tensor weights unless a documented exception
is provided.
What does this PR do?
Type of change: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
Release Notes
New Features
Tests