Commit 78c936d
committed
[Integration] Expose length-aware batching in all ModelHandler subclasses
Completes the smart bucketing feature (#37531) by exposing
batch_length_fn and batch_bucket_boundaries parameters across all
concrete ModelHandler implementations.
This allows users to enable length-aware batching on supported
inference backends by passing these parameters directly to the handler
constructor.
- adds batch_length_fn / batch_bucket_boundaries to 16 handler classes
- wires Gemini and Vertex AI batching params into _batching_kwargs
- adds end-to-end RunInference coverage for length-aware batching
- adds per-handler forwarding regression tests and fixes them to be
hermetic1 parent 1bb2a3f commit 78c936d
0 file changed
0 commit comments