Version: v0.3.0
Last Updated: January 24, 2026
This document explains how voice input and AI processing works in Ta-Da, including the data flow, provider options, and fallback mechanisms.
Ta-Da's voice feature allows users to speak naturally and have their accomplishments ("tadas") automatically extracted. The system uses a tiered approach:
- Speech-to-Text (STT): Browser's Web Speech API (free, runs via Google/Apple)
- AI Extraction: Server-side LLM processing with client-side fallback
Voice & AI settings are located in Settings → Voice & AI (gear icon → Voice & AI section).
The voice recording feature is integrated into:
- Ta-Da! page - Record tadas by voice
- Timer page - Add voice notes after sessions
- /voice - Dedicated voice recording page (hidden from main nav)
┌─────────────────┐ ┌──────────────────┐ ┌───────────────────┐
│ User speaks │───▶│ Browser Speech │───▶│ Text transcript │
│ into mic │ │ API (Google/ │ │ │
│ │ │ Apple backend) │ │ │
└─────────────────┘ └──────────────────┘ └─────────┬─────────┘
│
▼
┌──────────────────────────────────────────┐
│ Ta-Da Server │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ POST /api/voice/structure │ │
│ │ │ │
│ │ Has X-User-Api-Key header? │ │
│ │ ├─ YES: Use user's OpenAI/Anthropic│ │
│ │ └─ NO: Use server's GROQ_API_KEY │ │
│ │ (Llama 3.3 70B - fast/cheap) │ │
│ └─────────────────────────────────────┘ │
└──────────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Extracted │
│ Ta-Das │
└─────────────────┘
| Component | Location | Provider | Cost |
|---|---|---|---|
| Web Speech API | Browser | Google (Chrome/Edge) or Apple (Safari) | Free |
How it works:
- User taps the microphone button
- Browser requests microphone permission
- Audio is streamed to Google/Apple's speech recognition service
- Transcribed text is returned in real-time (interim + final results)
Privacy Note: Audio is processed by Google or Apple depending on the browser. This is a browser-level API and cannot be avoided without using a custom STT solution (Whisper WASM - planned for future).
| Scenario | Provider | Model | Who Pays |
|---|---|---|---|
| Default | Groq | Llama 3.3 70B | Developer/Operator |
| BYOK (OpenAI) | OpenAI | gpt-4o-mini | User |
| BYOK (Anthropic) | Anthropic | claude-3-haiku | User |
Endpoint: POST /api/voice/structure
Request:
{
text: string; // Transcribed speech
mode: "tada" | "journal" | "timer-note";
provider?: "groq" | "openai" | "anthropic"; // Optional, defaults to groq
}Headers:
X-User-Api-Key: User's BYOK key (optional)
Response:
{
tadas: Array<{
name: string;
category?: string;
significance?: "minor" | "normal" | "major";
}>;
journalType?: "dream" | "reflection" | "note";
provider: string;
tokensUsed?: number;
}┌─────────────────────────────────────────────────────────────┐
│ AI Extraction Request │
└─────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────┐
│ Try Server API (3 retries) │
│ with exponential backoff │
└───────────────────────────────┘
│
┌────────────┴────────────┐
│ │
✅ Success ❌ Fails
│ │
▼ ▼
┌──────────────┐ ┌──────────────────────────┐
│ LLM Result │ │ Rule-Based Fallback │
│ (high qual) │ │ (client-side, offline) │
└──────────────┘ └──────────────────────────┘
Rule-Based Fallback (extractTadasRuleBased()):
- Runs entirely in the browser (no network needed)
- Splits text by conjunctions ("and", "then", "also")
- Detects action verbs: finished, completed, fixed, cleaned, called, etc.
- Detects significance from keywords ("finally" = major)
- Detects category from context keywords
- Returns 60% confidence score (vs 85%+ for LLM)
When Fallback Activates:
- Server returns 503 (LLM not configured)
- Server is offline/unreachable
- All 3 retry attempts fail
- Network is completely unavailable
Set in .env:
# Primary LLM - RECOMMENDED
# Fast, cheap, reliable. Get yours at https://console.groq.com/keys
GROQ_API_KEY=gsk_...
# Optional fallbacks (if Groq unavailable)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Rate limiting
VOICE_FREE_LIMIT=50 # Monthly limit for free tierProvider Priority:
- If user sends BYOK header → use user's key with requested provider
- Else if
provider=groqandGROQ_API_KEYset → use Groq - Else if
provider=openaiandOPENAI_API_KEYset → use OpenAI - Else if
provider=anthropicandANTHROPIC_API_KEYset → use Anthropic - Else → return 503 (triggers client fallback)
Users configure in Settings → Voice & AI:
| Setting | Options | Description |
|---|---|---|
| Speech Recognition | Auto, Browser, On-Device, Cloud | How speech is transcribed |
| AI Processing | Auto, OpenAI, Anthropic | Which LLM to use |
| Prefer Offline | Toggle | Prioritize on-device processing |
| BYOK Keys | OpenAI, Anthropic | User's own API keys |
BYOK Flow:
- User adds API key in Settings
- Key stored in browser localStorage (encrypted MVP, proper Web Crypto planned)
- On extraction, key sent in
X-User-Api-Keyheader - Server uses user's key instead of server's Groq key
- User billed directly by their provider
| Tier | Limit | Enforcement |
|---|---|---|
| Free (no BYOK) | 50/month | Server rejects with 402 |
| BYOK | Unlimited | Billed to user's account |
Rate Limit Response:
{
"statusCode": 402,
"statusMessage": "Free tier limit reached (50/month). Add your own API key in settings to continue."
}| Usage | Cost |
|---|---|
| Per extraction | ~$0.003 (Llama 3.3 70B) |
| 100 users × 50/month | ~$15/month |
| 1000 users × 50/month | ~$150/month |
| Provider | Model | Cost per extraction |
|---|---|---|
| OpenAI | gpt-4o-mini | ~$0.002 |
| Anthropic | claude-3-haiku | ~$0.003 |
-
Audio Privacy: Audio never reaches Ta-Da servers. Browser sends directly to Google/Apple for STT.
-
Text Privacy: Transcribed text is sent to Ta-Da server, then to LLM provider. Not stored permanently.
-
BYOK Keys: Stored in browser localStorage. Sent to Ta-Da server in header, then used to call provider API. Keys never logged or stored server-side.
-
Rate Limiting: Prevents abuse of server's Groq quota. 10-second cooldown between requests per user.
| Browser | Web Speech API | Fallback |
|---|---|---|
| Chrome | ✅ Full support | N/A |
| Edge | ✅ Full support | N/A |
| Safari | ✅ Full support (webkit prefix) | N/A |
| Firefox | ❌ Not supported | Show error message |
- Whisper WASM (T196-T203): On-device transcription for full offline support and privacy
- WebLLM: On-device LLM for extraction without any network calls
- Streaming: Real-time extraction as user speaks
- Server doesn't have
GROQ_API_KEYset - Solution: Add key to
.envor user adds BYOK
- User hit 50/month limit
- Solution: User adds BYOK key in settings
- Rule-based fallback is being used
- Check server logs for LLM errors
- Verify
GROQ_API_KEYis valid
- User is on Firefox
- Solution: Use Chrome, Edge, or Safari
app/composables/useLLMStructure.ts- Client-side extraction orchestrationapp/server/api/voice/structure.post.ts- Server endpointapp/utils/tadaExtractor.ts- Rule-based fallback + LLM promptapp/components/settings/VoiceSettings.vue- User settings UIapp/composables/useTranscription.ts- Web Speech API wrapper