rollout ep#9392
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the vllm_enable_expert_parallel parameter to support Expert Parallel (EP) for MoE models, including documentation updates and validation logic in deploy_args.py. Feedback was provided regarding the ambiguity of the warning message when automatically overriding configurations and the inconsistency of this auto-fix approach compared to other argument validations that raise errors for incompatible settings.
| logger.warning('vllm_enable_expert_parallel is only supported with vllm_use_async_engine, ' | ||
| 'set vllm_use_async_engine to True.') |
There was a problem hiding this comment.
The warning message is slightly ambiguous as it sounds like an instruction to the user, even though the code has already performed the override. It would be clearer to state that the setting is being changed automatically.
Additionally, this auto-fix logic is inconsistent with the handling of multi_turn_scheduler in _check_args (lines 136-137), which raises a ValueError for the same incompatibility. For a more robust and consistent implementation, consider integrating this dependency into _set_default_engine_type for defaulting and _check_args for validation.
References
- Maintain consistency in argument validation and defaulting logic across similar parameters.
No description provided.