Skip to content

Avoid default MP3 transcoding for transcription benchmarks #623

@ushaket

Description

@ushaket

Problem Statement

When benchmarking transcription/translation workloads, GuideLLM currently transcodes input audio to MP3 by default through the encode_media preprocessing path (encode_audio(audio_format="mp3") default). This changes the original dataset audio format before request submission.

Proposed Solution

Change default audio encoding behavior to avoid forced MP3 transcoding.

Suggested behavior:

  1. If audio_format is explicitly provided by the user, use it.
  2. Otherwise, infer format from source metadata/path (file suffix, URL suffix, or dataset-provided format).
  3. If format cannot be inferred, default to wav (safe/common fallback), not mp3.

Optional:

  • add a warning/log when fallback-to-WAV is used due to missing format metadata,
  • document this behavior in benchmarking docs and CLI examples.

Alternatives Considered

Always force WAV instead of MP3: better than MP3 for fidelity, but still ignores source format unnecessarily.

Usage Examples

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions