Skip to content

Update vllm rule for better EP performance#3718

Open
NuojCheng wants to merge 1 commit intomainfrom
chengnuojin-vllm-rule-update
Open

Update vllm rule for better EP performance#3718
NuojCheng wants to merge 1 commit intomainfrom
chengnuojin-vllm-rule-update

Conversation

@NuojCheng
Copy link
Copy Markdown
Collaborator

@NuojCheng NuojCheng commented Apr 22, 2026

Description

It refactor the vllm logical rule and enable attn dp expert for optimal inference performance.

This PR also updates the function names from create_nnx_model to from_pretrained following updates from #3450 .

Tests

Vllm test: https://paste.googleplex.com/4812933185011712

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@NuojCheng NuojCheng force-pushed the chengnuojin-vllm-rule-update branch from ad39294 to 92cc4b5 Compare April 22, 2026 16:36
Copy link
Copy Markdown
Collaborator

@NicoGrande NicoGrande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@NuojCheng NuojCheng force-pushed the chengnuojin-vllm-rule-update branch from 92cc4b5 to 9e3afc1 Compare April 22, 2026 21:39

with self.mesh, nn.logical_axis_rules(self.maxtext_config.logical_axis_rules):
model, _ = model_creation_utils.create_nnx_model(
model = model_creation_utils.from_pretrained(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this part of the PR? Is this some artifact of rebasing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not from rebasing but vllm decoding fails at head now. Put in the same PR for decoding test purpose.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind adding this info to the PR description?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind adding this info to the PR description?

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants