Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions agent_notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Speaker Segment Persistence & Error Handling Fixes

## Changes Summary

1. **Backend Models**: Added `SpeakerSegment` model to `internal/models/transcription.go` to persist timestamped audio segments for each identified speaker. Added to GORM auto-migration.
2. **Database Layer**:
- Updated `JobRepository` interface in `internal/repository/implementations.go` with `SaveSpeakerSegments` and `GetSegmentsBySpeakerID`.
- Implemented these methods in `jobRepository`.
3. **Transcription Pipeline**:
- Updated `UnifiedTranscriptionService.saveTranscriptionResults` in `internal/transcription/unified_service.go` to automatically extract and save speaker segments after successful transcription.
4. **API Layer**:
- Added `GET /api/v1/speakers/:id/segments` endpoint in `internal/api/speaker_handlers.go`.
- Registered the new route in `internal/api/router.go`.
5. **Speaker Management Fixes**:
- Corrected a Go-style syntax error (`func` instead of `def`) in `internal/transcription/adapters/py/nvidia/titanet_manage.py`.
- Enhanced `TitanetAdapter` to capture and return `stderr` from Python commands for better diagnostics.
- Updated API handlers to return these descriptive error messages to the frontend.
6. **Frontend Enhancements**:
- Updated `web/frontend/src/lib/speakersApi.ts` to include the `getSegments` method and improved error parsing from API responses.
- Updated `AudioFilesTable.tsx` to display speaker names in the table view.
7. **Tests**: Updated `MockJobRepository` in test suites to match the new interface; all `internal/transcription` tests passed.

## Environment Resolution
- The `uv run` issue in `data/whisperx-env/parakeet/` was resolved by running `uv lock` (performed by user), fixing dependency resolution for the private registry.
- Syntax error in `titanet_manage.py` was manually patched in both the source and the active environment.


* * *
# Speaker Persistence Implementation

Implemented global speaker identity tracking with high-dimensional embedding storage.

## Backend Changes
- Added `SpeakerSegment` and `SpeakerJobCentroid` models in SQLite.
- Updated `UnifiedTranscriptionService` to save reference segments and job-level centroids.
- Enhanced `titanet_identify.py` to extract and return segment-level embeddings and the calculated centroid.
- Added `SaveSpeakerJobCentroids` to `JobRepository`.
- Updated API routes and handlers for speaker management (Rename, List, Delete).
- Fixed build errors in `unified_service.go` related to variable scope and function signatures.

## Frontend Changes
- Created a "Speakers" tab in the Settings page.
- Implemented an `AudioChip` component that plays speaker voice samples using browser-side seeking.
- Added global speaker renaming and deletion capabilities.
- Optimized API calls to handle large transcript payloads by removing redundant preloads in the segments endpoint.

## Format & Consistency
- Standardized speaker IDs in the database (supporting multiple prefix formats like `Speaker-` and `Spk-`).
- Implemented trailing slash consistency for Gin routing.


* * *
249 changes: 249 additions & 0 deletions api-docs/docs.go
Original file line number Diff line number Diff line change
Expand Up @@ -1602,6 +1602,163 @@ const docTemplate = `{
}
}
},
"/api/v1/speakers": {
"get": {
"description": "Get a list of all identified speakers",
"produces": [
"application/json"
],
"tags": [
"speakers"
],
"summary": "List speakers",
"responses": {
"200": {
"description": "OK",
"schema": {
"type": "array",
"items": {}
}
},
"500": {
"description": "Internal Server Error",
"schema": {
"$ref": "#/definitions/api.ErrorResponse"
}
}
}
}
},
"/api/v1/speakers/{id}": {
"put": {
"description": "Rename an identified speaker and update past transcripts",
"consumes": [
"application/json"
],
"produces": [
"application/json"
],
"tags": [
"speakers"
],
"summary": "Rename speaker",
"parameters": [
{
"type": "string",
"description": "Speaker ID",
"name": "id",
"in": "path",
"required": true
},
{
"description": "New Name",
"name": "request",
"in": "body",
"required": true,
"schema": {
"$ref": "#/definitions/api.RenameSpeakerRequest"
}
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"type": "object",
"additionalProperties": {
"type": "string"
}
}
},
"400": {
"description": "Bad Request",
"schema": {
"$ref": "#/definitions/api.ErrorResponse"
}
},
"500": {
"description": "Internal Server Error",
"schema": {
"$ref": "#/definitions/api.ErrorResponse"
}
}
}
},
"delete": {
"description": "Delete a speaker identity",
"produces": [
"application/json"
],
"tags": [
"speakers"
],
"summary": "Delete speaker",
"parameters": [
{
"type": "string",
"description": "Speaker ID",
"name": "id",
"in": "path",
"required": true
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"type": "object",
"additionalProperties": {
"type": "string"
}
}
},
"500": {
"description": "Internal Server Error",
"schema": {
"$ref": "#/definitions/api.ErrorResponse"
}
}
}
}
},
"/api/v1/speakers/{id}/segments": {
"get": {
"description": "Get all audio segments and their associated transcription jobs for a speaker",
"produces": [
"application/json"
],
"tags": [
"speakers"
],
"summary": "Get speaker segments",
"parameters": [
{
"type": "string",
"description": "Speaker ID",
"name": "id",
"in": "path",
"required": true
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"type": "array",
"items": {
"$ref": "#/definitions/models.SpeakerSegment"
}
}
},
"500": {
"description": "Internal Server Error",
"schema": {
"$ref": "#/definitions/api.ErrorResponse"
}
}
}
}
},
"/api/v1/summaries": {
"get": {
"security": [
Expand Down Expand Up @@ -4256,6 +4413,17 @@ const docTemplate = `{
}
}
},
"api.RenameSpeakerRequest": {
"type": "object",
"required": [
"name"
],
"properties": {
"name": {
"type": "string"
}
}
},
"api.SetUserDefaultProfileRequest": {
"type": "object",
"required": [
Expand Down Expand Up @@ -4535,6 +4703,81 @@ const docTemplate = `{
}
}
},
"models.SpeakerMapping": {
"type": "object",
"properties": {
"created_at": {
"type": "string"
},
"custom_name": {
"description": "e.g., \"John Doe\"",
"type": "string"
},
"id": {
"type": "integer"
},
"original_speaker": {
"description": "e.g., \"speaker_00\"",
"type": "string"
},
"transcription_job": {
"description": "Relationships",
"allOf": [
{
"$ref": "#/definitions/models.TranscriptionJob"
}
]
},
"transcription_job_id": {
"type": "string"
},
"updated_at": {
"type": "string"
}
}
},
"models.SpeakerSegment": {
"type": "object",
"properties": {
"created_at": {
"type": "string"
},
"embedding": {
"description": "JSON-serialized float32 array",
"type": "array",
"items": {
"type": "integer"
}
},
"end": {
"type": "number"
},
"id": {
"type": "integer"
},
"speaker_id": {
"description": "The global speaker ID (UUID) or local name",
"type": "string"
},
"start": {
"type": "number"
},
"text": {
"type": "string"
},
"transcription_job": {
"description": "Relationships",
"allOf": [
{
"$ref": "#/definitions/models.TranscriptionJob"
}
]
},
"transcription_job_id": {
"type": "string"
}
}
},
"models.Summary": {
"type": "object",
"properties": {
Expand Down Expand Up @@ -4654,6 +4897,12 @@ const docTemplate = `{
}
]
},
"speaker_mappings": {
"type": "array",
"items": {
"$ref": "#/definitions/models.SpeakerMapping"
}
},
"status": {
"$ref": "#/definitions/models.JobStatus"
},
Expand Down
Loading
Loading