Add AudioData.split() for chunking large audio data#896
Open
ftnext wants to merge 2 commits into
Open
Conversation
`AudioData.split(max_bytes, *, silence_aware=False)` returns a list of chunks whose WAV-serialized size is within `max_bytes`. Useful for feeding oversized recordings to APIs with strict upload limits (e.g., OpenAI Whisper's 25MB cap). Two strategies: - silence_aware=False (default): mechanical fixed-time split. No optional dependency required. Strict size cap. - silence_aware=True: snaps boundaries to nearby silences via librosa.effects.split, looking only backward from the size-derived target so the cap is preserved. Requires the new `audio-split` extra (librosa, numpy); surfaces lazy/numba init failures as SetupError. Sample-aligned frame_data is required so the byte budget is a hard ceiling in both modes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add an audio-split entry to the extra-contracts matrix so the silence-aware code path runs against `.[dev,audio-split]` each PR. - Include audio-split in the ubuntu all-extras install spec (skipped on 3.14 to match whisper-local). - Ignore `.claude/` and `.cursor/` worktree state from local AI tooling so it cannot be committed by mistake. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
AudioData.split(max_bytes, *, silence_aware=False)so users can chunk oversized recordings into pieces that fit within API upload limits (e.g., the 25 MB cap on OpenAI's Whisper transcription endpoint).silence_aware=False(default): mechanical fixed-time split. No optional dependency. Each chunk's WAV-serialized size is<= max_bytes.silence_aware=True: snaps chunk boundaries to nearby silences vialibrosa.effects.split, looking only backward from the size-derived target so the cap is preserved. Requires the newaudio-splitextra (librosa,numpy). Lazy / numba-cache initialization failures are translated intoSetupError.Sample-aligned
frame_datais required (split()raisesValueErroron unaligned input) so the byte budget is a hard ceiling in both modes.Usage
Test plan
tests/test_audio.py:SetupErroron missinglibrosa, on lazy import failures, and on call-time numba / runtime errorsValueErroron unalignedframe_dataand on too-smallmax_bytesaudio-splitentry in theextra-contractsmatrix runspytest tests/test_audio.pyagainst.[dev,audio-split].audio-splitadded to the Ubuntuall-extrasinstall spec (skipped on 3.14 to matchwhisper-local).🤖 Generated with Claude Code