Commit 78c936d

committed

[Integration] Expose length-aware batching in all ModelHandler subclasses

Completes the smart bucketing feature (#37531) by exposing batch_length_fn and batch_bucket_boundaries parameters across all concrete ModelHandler implementations. This allows users to enable length-aware batching on supported inference backends by passing these parameters directly to the handler constructor. - adds batch_length_fn / batch_bucket_boundaries to 16 handler classes - wires Gemini and Vertex AI batching params into _batching_kwargs - adds end-to-end RunInference coverage for length-aware batching - adds per-handler forwarding regression tests and fixes them to be hermetic

1 parent 1bb2a3f commit 78c936dCopy full SHA for 78c936d

0 file changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 78c936d

File tree

0 commit comments