Add bounded batch parity for embeddings and transcription#3
Merged
Conversation
Pass micro_batch_size matching the sweep batch size; aggregate per-vector success, latency, and tokens across full batch results (deduping shared-request metrics); fix vectors/s.
There was a problem hiding this comment.
Pull request overview
This PR adds bounded/lazy batch execution parity across embeddings and transcription, including resilient embedding micro-batching with per-item failure isolation and per-item callbacks, plus updates to docs and CLI benchmark accounting to match the new behavior.
Changes:
- Refactors embedding + transcription request paths into private helpers and adds bounded batch runners with ordered results and per-item callbacks.
- Introduces
transcribe_batch/atranscribe_batchAPIs and extends embedding batch APIs withmicro_batch_size,on_progress, andon_result. - Updates CLI embed benchmarking to account for micro-batching/partial failures per vector, and adds tests/docs for the new batch behavior.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_client_limits.py | Adds coverage ensuring embedding queue-wait metrics include scheduler delay under bounded admission. |
| tests/test_client_batch.py | Expands batch tests for embedding failure isolation, callbacks, strict-mode cancellation, and transcription batch behavior. |
| tests/test_cli_bench.py | Adds benchmark tests for micro-batch sizing propagation and partial-failure accounting. |
| tests/fakes.py | Extends fake CLI client to record micro_batch_size for embed batch calls. |
| src/infermesh/types.py | Updates batch-result docs and introduces callback type aliases for batch methods. |
| src/infermesh/client.py | Wires embedding/transcription through new helpers; adds transcription batch APIs and embedding micro-batch/callback options. |
| src/infermesh/_transcription.py | New bounded/lazy async transcription batch runner + single-item helper. |
| src/infermesh/_generation.py | Switches to shared task-cancellation helper for bounded generation batches. |
| src/infermesh/_embedding.py | New resilient embedding micro-batch runner with recursive failure isolation and per-item callbacks. |
| src/infermesh/_cli_bench.py | Adjusts embed benchmark accounting to count per-vector submissions/successes/failures; dedupes per-request stats. |
| src/infermesh/_batch_utils.py | Adds shared cancel_tasks helper for internal batch runners. |
| docs/guide.md | Documents callback contract parity and new embed/transcribe batch behavior. |
| README.md | Updates examples and guidance for micro-batched embedding and transcription batch usage/callback parity. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…sult typing - Skip recursive micro-batch split on retryable errors in embed_batch - Use bare raise in transcription batch error handler - Define OnBatchResult as PEP 695 generic type alias Made-with: Cursor
446a356 to
e6cf0c1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LMClientmethods explicit, and update docs plus benchmark accountingTesting
uv run pytest -quv run pre-commit run -a