Updated the TTS inference and made it work#4
Closed
AdvityaDua wants to merge 1 commit into
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR significantly upgrades the interview backend by migrating speech-to-text inference to Groq's high-performance API, enhancing the voice analysis scoring system, and introducing LLM-based CV evaluation.
Key Changes
🎤 Speech-to-Text (ASR)
Groq Integration: Replaced the local Whisper implementation with Groq-hosted Whisper (whisper-large-v3). This removes the need for local model loading and significantly reduces latency.
Dependency Removal: Removed ffmpeg normalization steps as Groq handles audio processing, making the backend lighter and easier to deploy.
Robustness: Added safety checks for empty audio, "None" hallucinations, and safe Groq client initialization.
📊 Voice Analysis
Scaled Scoring: The
VoiceAnalyzer
now returns scores scaled out of 10 (for Fluency, Clarity, Confidence, Pace) alongside the original raw weighted scores. This aligns with the frontend's requirement for a 0-10 scale.
Improved Metrics:
WPM Calculation: Now uses the actual transcript (if provided) for more accurate Speech Rate (WPM) calculation.
Pitch & Energy: Integrated librosa.pyin for robust pitch tracking and RMS for energy/clarity analysis.
Error Handling: Added an explicit analysis_ok flag and structured error codes for better frontend feedback.
📄 CV Evaluation
LLM Scorer: Implemented LLMScorer using Groq to intelligently evaluate CVs against Job Descriptions.
Heuristic Fallback: Added graceful fallbacks to heuristic scoring if the LLM API is unavailable.
Structured Output: Ensures JSON parsing reliability for LLM responses.
🧪 Testing
Added comprehensive unit tests for LLMScorer and CVEvaluationEngine.
Added debug_groq.py and debug_voice_import.py utilities for environment verification.