[core] feat: support async scheduling with structured outputs#1582
Conversation
Signed-off-by: AlpinDale <alpindale@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a significant refactoring to support asynchronous scheduling with structured outputs. The core change involves splitting the model execution into a two-step process: execute_model for the forward pass and sample_tokens for sampling. This is a clean approach that allows for grammar bitmask computation between the two steps. The changes are consistently applied across the scheduler, engine core, executors, and model runners. I've found one critical issue that could lead to a crash in distributed setups.
| assert model_runner_output is not None | ||
| kv_output = model_runner_output.kv_connector_output | ||
| if not kv_output: | ||
| continue |
There was a problem hiding this comment.
The assertion assert model_runner_output is not None can fail. The outputs list can contain None values from workers that did not produce an output, which is a valid scenario with the new changes. This assertion will cause a crash in distributed setups. You should handle None values gracefully by skipping them.
| assert model_runner_output is not None | |
| kv_output = model_runner_output.kv_connector_output | |
| if not kv_output: | |
| continue | |
| if model_runner_output is None: | |
| continue | |
| kv_output = model_runner_output.kv_connector_output | |
| if not kv_output: | |
| continue |
…Runner Signed-off-by: AlpinDale <alpindale@gmail.com>
No description provided.