improved text & audio processing by ther3zz · Pull Request #53 · travisvn/chatterbox-tts-api

ther3zz · 2025-10-31T20:41:13Z

better sentence splitting
normalization for different symbols

I noticed that sentences similar to "you can expect sunny skies with a high of 73°F and a low of 54°F" would not be spoken properly when it came to the degrees F, figured I would work on enhancing that bit.

better sentence splitting normalization for different symbols

Refactor text processing functions for improved performance and error handling. Enhance regex patterns for better normalization of various units, temperatures, and time formats.

Updated text processing functions to use verbalization for numbers, enhancing the conversion of various units and formats to their verbal representations.

Added num2words library for number conversion.

- Normalizes dates ("November 4" -> "November fourth") and number ranges ("2018-2019" -> "2018 to 2019"). - Handles full dates ("November 3, 2025") as a single unit for natural prosody. - Splits sentences at headline-style colons for natural pauses. - Includes phonetic hints, scientific notation, chemical formulas, and other advanced edge cases. - Verbalizes appended symbols like in "Disney+". - Converts parentheticals to comma-separated clauses for improved prosody.

Refactor audio processing module to enhance configurability and performance. Added environment variable support for cache management, parallel processing, and audio file limits.

ther3zz · 2025-11-03T22:28:58Z

added the following for audio processing:

Cache Configuration:

AUDIO_CACHE_MAX_SIZE_MB: Max in-memory cache size in MB. Evicts old entries when full. (Default: 256)
AUDIO_CACHE_CLEAR_INTERVAL_S: Automatically clear cache periodically (in seconds). 0 disables. (Default: 3600)

Performance & Limits:

AUDIO_SILENCE_PADDING_MS: Default silence duration in ms. (Default: 250)
AUDIO_MAX_FILES_TO_CONCATENATE: Max number of files per job. (Default: 5000)
AUDIO_MAX_TOTAL_SIZE_MB: Max combined file size in MB. (Default: 2048)
AUDIO_USE_PARALLEL_PROCESSING: Set 'true' or '1' to enable parallel mode. (Default: false)
AUDIO_MAX_PARALLEL_WORKERS: Max threads for parallel mode. (Default: CPU cores)
AUDIO_LARGE_FILE_THRESHOLD_MB: Warn if a single file exceeds this size. (Default: 100)

Cache hit scenario: Near-instant (just memory copy)
Parallel cold start: ~0.5-2 seconds per 100 files
Sequential cold start: ~2-8 seconds per 100 files
Memory usage: Bounded by cache + parallel worker limits

TTS Use Case Optimization 🎤
Common phrases get cached (greetings, numbers, etc.)
Parallel processing handles large documents efficiently
Silence padding creates natural speech rhythm
Normalization ensures consistent volume
Multiple format support for different deployment targets

Enhanced text processing capabilities by expanding regex patterns for various symbols, units, and mathematical operators. Improved prosody handling by converting additional punctuation and symbols into more natural speech.

ther3zz added 7 commits October 31, 2025 16:40

improved text processing

8c810f3

better sentence splitting normalization for different symbols

Refactor text processing utilities for TTS

a61f1f6

Refactor text processing functions for improved performance and error handling. Enhance regex patterns for better normalization of various units, temperatures, and time formats.

Enhance number verbalization in text processing

9dbbfb4

Updated text processing functions to use verbalization for numbers, enhancing the conversion of various units and formats to their verbal representations.

Add num2words library to requirements

fbbf43a

Added num2words library for number conversion.

Fix verbalization of appended symbols and cleanup

b830e01

Enhance audio processing with environment variable support

46b0c71

Refactor audio processing module to enhance configurability and performance. Added environment variable support for cache management, parallel processing, and audio file limits.

ther3zz changed the title ~~improved text processing~~ improved text & audio processing Nov 3, 2025

Enhance text processing and prosody handling

024ea82

Enhanced text processing capabilities by expanding regex patterns for various symbols, units, and mathematical operators. Improved prosody handling by converting additional punctuation and symbols into more natural speech.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improved text & audio processing#53

improved text & audio processing#53
ther3zz wants to merge 8 commits intotravisvn:mainfrom
ther3zz:patch-1

ther3zz commented Oct 31, 2025

Uh oh!

ther3zz commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ther3zz commented Oct 31, 2025

Uh oh!

ther3zz commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant