Skip to content

Add AudioData.split() for chunking large audio data#896

Open
ftnext wants to merge 2 commits into
Uberi:masterfrom
ftnext:feature/audio-data-split
Open

Add AudioData.split() for chunking large audio data#896
ftnext wants to merge 2 commits into
Uberi:masterfrom
ftnext:feature/audio-data-split

Conversation

@ftnext
Copy link
Copy Markdown
Collaborator

@ftnext ftnext commented May 17, 2026

Summary

Adds AudioData.split(max_bytes, *, silence_aware=False) so users can chunk oversized recordings into pieces that fit within API upload limits (e.g., the 25 MB cap on OpenAI's Whisper transcription endpoint).

  • silence_aware=False (default): mechanical fixed-time split. No optional dependency. Each chunk's WAV-serialized size is <= max_bytes.
  • silence_aware=True: snaps chunk boundaries to nearby silences via librosa.effects.split, looking only backward from the size-derived target so the cap is preserved. Requires the new audio-split extra (librosa, numpy). Lazy / numba-cache initialization failures are translated into SetupError.

Sample-aligned frame_data is required (split() raises ValueError on unaligned input) so the byte budget is a hard ceiling in both modes.

Usage

import speech_recognition as sr

audio = sr.AudioData.from_file("large.wav")
r = sr.Recognizer()

# 24 MB target leaves a small buffer under Whisper's 25 MB request limit.
chunks = audio.split(max_bytes=24 * 1024 * 1024, silence_aware=True)
text = " ".join(r.recognize_openai(c, model="whisper-1") for c in chunks)

Test plan

  • 17 new tests in tests/test_audio.py:
    • Strict size cap in both modes
    • Sample-width coverage (8 / 16 / 24 / 32-bit) and explicit little-endian decoding (host-byte-order independence)
    • SetupError on missing librosa, on lazy import failures, and on call-time numba / runtime errors
    • Silence-aware boundary snaps to speech end within the look-back window
    • ValueError on unaligned frame_data and on too-small max_bytes
  • CI coverage:
    • New audio-split entry in the extra-contracts matrix runs pytest tests/test_audio.py against .[dev,audio-split].
    • audio-split added to the Ubuntu all-extras install spec (skipped on 3.14 to match whisper-local).

🤖 Generated with Claude Code

ftnext and others added 2 commits May 17, 2026 23:24
`AudioData.split(max_bytes, *, silence_aware=False)` returns a list
of chunks whose WAV-serialized size is within `max_bytes`. Useful
for feeding oversized recordings to APIs with strict upload limits
(e.g., OpenAI Whisper's 25MB cap).

Two strategies:
- silence_aware=False (default): mechanical fixed-time split. No
  optional dependency required. Strict size cap.
- silence_aware=True: snaps boundaries to nearby silences via
  librosa.effects.split, looking only backward from the size-derived
  target so the cap is preserved. Requires the new `audio-split`
  extra (librosa, numpy); surfaces lazy/numba init failures as
  SetupError.

Sample-aligned frame_data is required so the byte budget is a hard
ceiling in both modes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add an audio-split entry to the extra-contracts matrix so the
  silence-aware code path runs against `.[dev,audio-split]` each PR.
- Include audio-split in the ubuntu all-extras install spec
  (skipped on 3.14 to match whisper-local).
- Ignore `.claude/` and `.cursor/` worktree state from local AI
  tooling so it cannot be committed by mistake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant