-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[#12699][feat] AutoDeploy: Support Piecewise CG for VLMs #12749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
nvchenghaoz
wants to merge
30
commits into
NVIDIA:main
from
nv-auto-deploy:chenghao/piecewise_update_0402
Closed
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
7543abb
ep1
taylor-yb-lee b65229b
moe sharding tp8
taylor-yb-lee 9ead146
Exclude lm_head from cuda graph
taylor-yb-lee 0ed7977
Turn on lm_head_sharding
taylor-yb-lee 8bcf4dd
Fix tp8 sharding for fused moe checkpoint
taylor-yb-lee 7e0172a
Added qwen3.5 config for long context length
taylor-yb-lee f4cb16f
Qwen3.5 configs
taylor-yb-lee 693ed16
Added comment
taylor-yb-lee 331af8a
Added moe sharding for tp8/ep1
taylor-yb-lee 4a0cf8d
Added unittest for tp sharding for NVFP4 MoE
taylor-yb-lee 93b48a2
- Revert freemem size
taylor-yb-lee 3777753
revert graphting of lm_head and add lm_head in the model graph
taylor-yb-lee e872bb8
The fix adds a text-only fast path at the top of forward() that:
taylor-yb-lee 22e7154
Allow quwen3.5 to use AutoModelr
taylor-yb-lee ee96ad4
Fix piecewise CUDA graph for Qwen3.5 MoE
taylor-yb-lee 0a0ccf8
config for text only case
taylor-yb-lee aaf33aa
removed unnecessary code
taylor-yb-lee 22ed629
Rename variable
taylor-yb-lee f92e78e
Add assert
taylor-yb-lee 5d589ae
Revised comment and added a method to clarify is_full_model
taylor-yb-lee 7cc985d
Revert fast path (it does not affect performance)
taylor-yb-lee 14c5351
Fixed set_output_embeddings() no longer updates module actually used …
taylor-yb-lee c26c050
remove tp8 config
taylor-yb-lee bac09ce
Extract text-model (graph module) in Qwen3.5 model for enabling piece…
taylor-yb-lee fa5f485
Remove picewise cudagraph w/a
taylor-yb-lee c44662d
Revert unnecessary change for using AutoModel for Qwen3.5 text model
taylor-yb-lee ce45322
Update Piecewise CG to support VLMs
nvchenghaoz 0041583
address CR's reviews
nvchenghaoz d2e6fa4
Merge branch 'main' into chenghao/piecewise_update_0402
nvchenghaoz ee798a3
further address CR's review
nvchenghaoz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
335 changes: 282 additions & 53 deletions
335
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.