Skip to content

improved text & audio processing#53

Open
ther3zz wants to merge 8 commits intotravisvn:mainfrom
ther3zz:patch-1
Open

improved text & audio processing#53
ther3zz wants to merge 8 commits intotravisvn:mainfrom
ther3zz:patch-1

Conversation

@ther3zz
Copy link
Copy Markdown

@ther3zz ther3zz commented Oct 31, 2025

  • better sentence splitting
  • normalization for different symbols

I noticed that sentences similar to "you can expect sunny skies with a high of 73°F and a low of 54°F" would not be spoken properly when it came to the degrees F, figured I would work on enhancing that bit.

better sentence splitting
normalization for different symbols
Refactor text processing functions for improved performance and error handling. Enhance regex patterns for better normalization of various units, temperatures, and time formats.
Updated text processing functions to use verbalization for numbers, enhancing the conversion of various units and formats to their verbal representations.
Added num2words library for number conversion.
- Normalizes dates ("November 4" -> "November fourth") and number ranges ("2018-2019" -> "2018 to 2019").
- Handles full dates ("November 3, 2025") as a single unit for natural prosody.
- Splits sentences at headline-style colons for natural pauses.
- Includes phonetic hints, scientific notation, chemical formulas, and other advanced edge cases.
- Verbalizes appended symbols like in "Disney+".
- Converts parentheticals to comma-separated clauses for improved prosody.
Refactor audio processing module to enhance configurability and performance. Added environment variable support for cache management, parallel processing, and audio file limits.
@ther3zz ther3zz changed the title improved text processing improved text & audio processing Nov 3, 2025
@ther3zz
Copy link
Copy Markdown
Author

ther3zz commented Nov 3, 2025

added the following for audio processing:

Cache Configuration:

  • AUDIO_CACHE_MAX_SIZE_MB: Max in-memory cache size in MB. Evicts old entries when full. (Default: 256)
  • AUDIO_CACHE_CLEAR_INTERVAL_S: Automatically clear cache periodically (in seconds). 0 disables. (Default: 3600)

Performance & Limits:

  • AUDIO_SILENCE_PADDING_MS: Default silence duration in ms. (Default: 250)
  • AUDIO_MAX_FILES_TO_CONCATENATE: Max number of files per job. (Default: 5000)
  • AUDIO_MAX_TOTAL_SIZE_MB: Max combined file size in MB. (Default: 2048)
  • AUDIO_USE_PARALLEL_PROCESSING: Set 'true' or '1' to enable parallel mode. (Default: false)
  • AUDIO_MAX_PARALLEL_WORKERS: Max threads for parallel mode. (Default: CPU cores)
  • AUDIO_LARGE_FILE_THRESHOLD_MB: Warn if a single file exceeds this size. (Default: 100)

Cache hit scenario: Near-instant (just memory copy)
Parallel cold start: ~0.5-2 seconds per 100 files
Sequential cold start: ~2-8 seconds per 100 files
Memory usage: Bounded by cache + parallel worker limits

TTS Use Case Optimization 🎤
Common phrases get cached (greetings, numbers, etc.)
Parallel processing handles large documents efficiently
Silence padding creates natural speech rhythm
Normalization ensures consistent volume
Multiple format support for different deployment targets

Enhanced text processing capabilities by expanding regex patterns for various symbols, units, and mathematical operators. Improved prosody handling by converting additional punctuation and symbols into more natural speech.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant