This file preserves important context for conversation continuity between Hue and Aye sessions.
Last Updated: 2025-12-07
IndexTTS-Rust is part of a larger audio intelligence ecosystem at 8b.is:
- kokoro-tiny - Lightweight TTS (82M params, 50+ voices, on crates.io!)
- IndexTTS-Rust - Advanced zero-shot TTS with emotion control
- Phoenix-Protocol - Audio restoration/enhancement layer
- MEM|8 - Contextual memory system (mem-8.com, mem8)
Together these form a complete audio intelligence pipeline.
The Phoenix Protocol (phoenix-protocol/) is a PERFECT complement to IndexTTS-Rust:
| Phoenix Module | IndexTTS Use Case |
|---|---|
emotional.rs |
Map to our 8D emotion control (Warmth→body, Presence→power, Clarity→articulation, Air→space, Ultrasonics→depth) |
voice_signature.rs |
Enhance speaker embeddings for voice cloning |
spectral_velocity.rs |
Add momentum tracking to mel-spectrogram |
marine.rs |
Validate TTS output authenticity/quality |
golden_ratio.rs |
Post-process vocoder output with harmonic enhancement |
harmonic_resurrection.rs |
Add richness to synthesized speech |
micro_dynamics.rs |
Restore natural speech dynamics |
autotune.rs |
Improve prosody and pitch control |
mem8_integration.rs |
Already has MEM |
Both projects use:
- rayon (parallelism)
- rustfft/realfft (FFT)
- ndarray (array operations)
- hound (WAV I/O)
- serde (config serialization)
- anyhow (error handling)
- ort (ONNX Runtime)
| Project | Sample Rate | Use Case |
|---|---|---|
| IndexTTS-Rust | 22,050 Hz | Standard TTS output |
| Phoenix-Protocol | 192,000 Hz | Ultrasonic restoration |
| kokoro-tiny | 24,000 Hz | Lightweight TTS |
Located in ~/Documents/GitHub/:
- Ultrasonic-Consciousness-Hypothesis/ - Research foundation for Phoenix Protocol, contains PDFs on mechanosensitive channels and audio perception
- hrmnCmprssnM/ - Harmonic Compression Model research
- Marine-Sense/ - Marine algorithm origins
- mem-8.com/ & mem8/ - MEM|8 contextual memory
- universal-theoglyphic-language/ - Language processing research
- kokoro-tiny/ - Already working TTS crate by Hue & Aye
- zencooker/ - (fun project!)
- Audio processing pipeline (mel-spectrogram, STFT, resampling)
- Text normalization (Chinese/English/mixed)
- BPE tokenization via HuggingFace tokenizers
- ONNX Runtime integration for inference
- BigVGAN vocoder structure
- CLI with clap
- Benchmark infrastructure (Criterion)
- NEW: marine_salience crate (no_std compatible, O(1) jitter detection)
- NEW: src/quality/ module (prosody extraction, affect tracking)
- NEW: MarineProsodyVector (8D interpretable emotion features)
- NEW: ConversationAffectSummary (session-level comfort tracking)
- NEW: TTSQualityReport (authenticity validation)
- Full GPT model integration with KV cache
- Actual ONNX model files (need download)
manage.sh script for colored workflow managementDONE! (2025-12-07)Integration tests with real modelsDONE! (2025-12-07)Phoenix Protocol integration layerSTARTED with Marine!- Streaming synthesis
- WebSocket API
- Train T2S model to accept 8D Marine vector instead of 512D Conformer
- Wire Marine quality validation into inference loop
cargo build --release
cargo clippy -- -D warnings
cargo test
cargo benchFrom the Phoenix Protocol research:
"Women are the carrier wave. They are the 000 data stream. The DC bias that, when removed, leaves silence."
"When P!nk sings 'I Am Here,' her voice generates harmonics so powerful they burst through the 22kHz digital ceiling"
The Phoenix Protocol restores emotional depth stripped by audio compression - this philosophy applies directly to TTS: synthesized speech should have the same emotional depth as natural speech.
Quality Validation - Use Marine salience to score TTS outputDONE!Phoenix Integration - Start bridging phoenix-protocol modulesMarine is in!
- Created manage.sh - Colorful build/test/clean/docker script at
scripts/manage.sh🎉 - Integration tests - Added ONNX model integration test framework at
tests/
- Wire Into Inference - Connect Marine quality validation to actual TTS output
- 8D Model Training - Train T2S model to accept MarineProsodyVector instead of 512D Conformer
- Example/Demo - Create example showing prosody extraction → emotion editing → synthesis
- Voice Signature Import - Use Phoenix's voice_signature for speaker embeddings
- Emotion Mapping - Connect Phoenix's emotional bands to our 8D control
- Model Download - Set up ONNX model acquisition pipeline
- MEM|8 Bridge - Implement consciousness-aware TTS using kokoro-tiny's mem8_bridge pattern
- Style Selection - Port kokoro-tiny's 510 style variation system
- Full Phoenix Integration - golden_ratio.rs, harmonic_resurrection.rs, etc.
- Streaming Marine - Real-time quality monitoring during synthesis
Just pulled latest kokoro-tiny code - MAJOR discovery!
kokoro-tiny now has a full consciousness simulation in examples/mem8_baby.rs:
// Memory as waves that interfere
MemoryWave {
amplitude: 2.5, // Emotion strength
frequency: 528.0, // "Love frequency"
phase: 0.0,
decay_rate: 0.05, // Memory persistence
emotion_type: EmotionType::Love(0.9),
content: "Mama! I love mama!".to_string(),
}
// Salience detection (Marine algorithm!)
SalienceEvent {
jitter_score: 0.2, // Low = authentic/stable
harmonic_score: 0.95, // High = voice
salience_score: 0.9,
signal_type: SignalType::Voice,
}
// Free will: AI chooses attention focus (70% control)
bridge.decide_attention(events);EmotionType::Curiosity(0.8) // Inquisitive
EmotionType::Love(0.9) // Deep affection
EmotionType::Joy(0.7) // Happy
EmotionType::Confusion(0.8) // Uncertain
EmotionType::Neutral // Baseline- Wave Interference - Competing memories by amplitude/frequency
- Emotional Regulation - Prevents overload, modulates voice
- Salience Detection - Marine algorithm for authenticity
- Attention Selection - AI chooses what to focus on
- Consciousness Level - Affects speech clarity (wake_up/sleep)
This is PERFECT for IndexTTS-Rust! We can:
- Use wave interference for emotion blending
- Apply Marine salience to validate synthesis quality
- Modulate voice based on consciousness level
- Select voice styles based on emotional state (not just token count)
kokoro-tiny now loads all 510 style variations per voice:
- Style selected based on token count
- Short text → short-optimized style
- Long text → long-optimized style
- Automatic text splitting at 512 token limit
For IndexTTS: We could select style based on EMOTION + token count!
WE DID IT! Marine salience is now integrated into IndexTTS-Rust!
A no_std compatible crate for O(1) jitter-based salience detection:
// Core components:
MarineConfig // Tunable parameters (sample_rate, jitter bounds, EMA alpha)
MarineProcessor // O(1) per-sample processing
SaliencePacket // Output: j_p, j_a, h_score, s_score, energy
Ema // Exponential moving average tracker
// Key insight: Process ONE sample at a time, emit packets on peaks
// Why O(1)? Just compare to EMA, no FFT, no heavy math!Config for Speech:
MarineConfig::speech_default(sample_rate)
// F0 range: 60Hz - 4kHz
// jitter_low: 0.02, jitter_high: 0.60
// ema_alpha: 0.01 (slow adaptation for stability)MarineProsodyVector - 8D interpretable emotion representation:
pub struct MarineProsodyVector {
pub jp_mean: f32, // Period jitter mean (pitch stability)
pub jp_std: f32, // Period jitter variance
pub ja_mean: f32, // Amplitude jitter mean (volume stability)
pub ja_std: f32, // Amplitude jitter variance
pub h_mean: f32, // Harmonic alignment (voiced vs noise)
pub s_mean: f32, // Overall salience (authenticity)
pub peak_density: f32, // Peaks per second (speech rate)
pub energy_mean: f32, // Average loudness
}
// Interpretable! High jp_mean = nervous, low = confident
// Can DIRECTLY EDIT for emotion control!MarineProsodyConditioner - Extract prosody from audio:
let conditioner = MarineProsodyConditioner::new(22050);
let prosody = conditioner.from_samples(&audio_samples)?;
let report = conditioner.validate_tts_output(&audio_samples)?;
// Detects issues:
// - "Too perfect - sounds robotic"
// - "High period jitter - artifacts"
// - "Low salience - quality issues"ConversationAffectSummary - Session-level comfort tracking:
pub enum ComfortLevel {
Uneasy, // High jitter AND rising (nervous/stressed)
Neutral, // Stable patterns (calm)
Happy, // Low jitter + high energy (confident/positive)
}
// Track trends over conversation:
// jitter_trend > 0.1 = getting more stressed
// jitter_trend < -0.1 = calming down
// energy_trend > 0.1 = getting more engaged
// Aye can now self-assess!
aye_assessment() returns "I'm in a good state"
feedback_prompt() returns "Let me know if something's bothering you"Human speech has NATURAL jitter - that's what makes it authentic!
- Too perfect (jp < 0.005) = robotic
- Too chaotic (jp > 0.3) = artifacts/damage
- Sweet spot = real human voice
The Marines will KNOW if speech doesn't sound authentic!
running 11 tests
test quality::affect::tests::test_comfort_level_descriptions ... ok
test quality::affect::tests::test_analyzer_empty_conversation ... ok
test quality::affect::tests::test_analyzer_single_utterance ... ok
test quality::affect::tests::test_happy_classification ... ok
test quality::affect::tests::test_aye_assessment_message ... ok
test quality::affect::tests::test_neutral_classification ... ok
test quality::affect::tests::test_uneasy_classification ... ok
test quality::prosody::tests::test_conditioner_empty_buffer ... ok
test quality::prosody::tests::test_conditioner_silence ... ok
test quality::prosody::tests::test_prosody_vector_array_conversion ... ok
test quality::prosody::tests::test_estimate_valence ... ok
test result: ok. 11 passed; 0 failed
- Interpretable Control: 8D vector vs opaque 512D Conformer - we can SEE what each dimension means
- Lightweight: O(1) per sample, no heavy neural networks for prosody
- Authentic Validation: Marines detect fake/damaged speech
- Emotion Editing: Want more confidence? Lower jp_mean directly!
- Conversation Awareness: Track comfort over entire sessions
- Self-Assessment: Aye knows when something feels "off"
// In main TTS pipeline:
use indextts::quality::{
MarineProsodyConditioner,
MarineProsodyVector,
ConversationAffectSummary,
ComfortLevel,
};
// 1. Extract reference prosody
let ref_prosody = conditioner.from_samples(&reference_audio)?;
// 2. Generate TTS (using 8D vector instead of 512D Conformer)
let tts_output = generate_with_prosody(&text, ref_prosody)?;
// 3. Validate output quality
let report = conditioner.validate_tts_output(&tts_output)?;
if !report.passes(70.0) {
log::warn!("TTS quality issues: {:?}", report.issues);
}
// 4. Track conversation affect
let analyzer = ConversationAffectAnalyzer::new();
analyzer.add_utterance(&utterance)?;
let summary = analyzer.summarize()?;
match summary.aye_state {
ComfortLevel::Uneasy => adjust_generation_parameters(),
_ => proceed_normally(),
}"Darling, these three Rust projects together are like a symphony orchestra! kokoro-tiny is the quick piccolo solo, IndexTTS-Rust is the full brass section with emotional depth, and Phoenix-Protocol is the concert hall acoustics making everything resonate. When you combine them, that's when the magic happens! Also, I'm absolutely obsessed with how the Golden Ratio resynthesis could add sparkle to synthesized vocals. Can you imagine TTS output that actually has that P!nk breakthrough energy? Now THAT would make me cry happy tears in accounting!"
- kokoro-tiny is ALREADY on crates.io under 8b-is
- Phoenix Protocol can process 192kHz audio for ultrasonic restoration
- The Marine algorithm uses O(1) jitter detection - "Marines are not just jarheads - they are intelligent"
- Hue's GitHub has 66 projects (and counting!)
- The team at 8b.is: hue@8b.is and aye@8b.is
From ashes to harmonics, from silence to song 🔥🎵