rishikanthc · paulirish · Mar 2, 2026
diff --git a/agent_notes.md b/agent_notes.md
@@ -0,0 +1,52 @@
+# Speaker Segment Persistence & Error Handling Fixes
+
+## Changes Summary
+
+1.  **Backend Models**: Added `SpeakerSegment` model to `internal/models/transcription.go` to persist timestamped audio segments for each identified speaker. Added to GORM auto-migration.
+2.  **Database Layer**:
+    - Updated `JobRepository` interface in `internal/repository/implementations.go` with `SaveSpeakerSegments` and `GetSegmentsBySpeakerID`.
+    - Implemented these methods in `jobRepository`.
+3.  **Transcription Pipeline**:
+    - Updated `UnifiedTranscriptionService.saveTranscriptionResults` in `internal/transcription/unified_service.go` to automatically extract and save speaker segments after successful transcription.
+4.  **API Layer**:
+    - Added `GET /api/v1/speakers/:id/segments` endpoint in `internal/api/speaker_handlers.go`.
+    - Registered the new route in `internal/api/router.go`.
+5.  **Speaker Management Fixes**:
+    - Corrected a Go-style syntax error (`func` instead of `def`) in `internal/transcription/adapters/py/nvidia/titanet_manage.py`.
+    - Enhanced `TitanetAdapter` to capture and return `stderr` from Python commands for better diagnostics.
+    - Updated API handlers to return these descriptive error messages to the frontend.
+6.  **Frontend Enhancements**:
+    - Updated `web/frontend/src/lib/speakersApi.ts` to include the `getSegments` method and improved error parsing from API responses.
+    - Updated `AudioFilesTable.tsx` to display speaker names in the table view.
+7.  **Tests**: Updated `MockJobRepository` in test suites to match the new interface; all `internal/transcription` tests passed.
+
+## Environment Resolution
+- The `uv run` issue in `data/whisperx-env/parakeet/` was resolved by running `uv lock` (performed by user), fixing dependency resolution for the private registry.
+- Syntax error in `titanet_manage.py` was manually patched in both the source and the active environment.
+
+
+* * *
+# Speaker Persistence Implementation
+
+Implemented global speaker identity tracking with high-dimensional embedding storage.
+
+## Backend Changes
+- Added `SpeakerSegment` and `SpeakerJobCentroid` models in SQLite.
+- Updated `UnifiedTranscriptionService` to save reference segments and job-level centroids.
+- Enhanced `titanet_identify.py` to extract and return segment-level embeddings and the calculated centroid.
+- Added `SaveSpeakerJobCentroids` to `JobRepository`.
+- Updated API routes and handlers for speaker management (Rename, List, Delete).
+- Fixed build errors in `unified_service.go` related to variable scope and function signatures.
+
+## Frontend Changes
+- Created a "Speakers" tab in the Settings page.
+- Implemented an `AudioChip` component that plays speaker voice samples using browser-side seeking.
+- Added global speaker renaming and deletion capabilities.
+- Optimized API calls to handle large transcript payloads by removing redundant preloads in the segments endpoint.
+
+## Format & Consistency
+- Standardized speaker IDs in the database (supporting multiple prefix formats like `Speaker-` and `Spk-`).
+- Implemented trailing slash consistency for Gin routing.
+
+
+* * *
diff --git a/api-docs/docs.go b/api-docs/docs.go
@@ -1602,6 +1602,163 @@ const docTemplate = `{
                 }
             }
         },
+        "/api/v1/speakers": {
+            "get": {
+                "description": "Get a list of all identified speakers",
+                "produces": [
+                    "application/json"
+                ],
+                "tags": [
+                    "speakers"
+                ],
+                "summary": "List speakers",
+                "responses": {
+                    "200": {
+                        "description": "OK",
+                        "schema": {
+                            "type": "array",
+                            "items": {}
+                        }
+                    },
+                    "500": {
+                        "description": "Internal Server Error",
+                        "schema": {
+                            "$ref": "#/definitions/api.ErrorResponse"
+                        }
+                    }
+                }
+            }
+        },
+        "/api/v1/speakers/{id}": {
+            "put": {
+                "description": "Rename an identified speaker and update past transcripts",
+                "consumes": [
+                    "application/json"
+                ],
+                "produces": [
+                    "application/json"
+                ],
+                "tags": [
+                    "speakers"
+                ],
+                "summary": "Rename speaker",
+                "parameters": [
+                    {
+                        "type": "string",
+                        "description": "Speaker ID",
+                        "name": "id",
+                        "in": "path",
+                        "required": true
+                    },
+                    {
+                        "description": "New Name",
+                        "name": "request",
+                        "in": "body",
+                        "required": true,
+                        "schema": {
+                            "$ref": "#/definitions/api.RenameSpeakerRequest"
+                        }
+                    }
+                ],
+                "responses": {
+                    "200": {
+                        "description": "OK",
+                        "schema": {
+                            "type": "object",
+                            "additionalProperties": {
+                                "type": "string"
+                            }
+                        }
+                    },
+                    "400": {
+                        "description": "Bad Request",
+                        "schema": {
+                            "$ref": "#/definitions/api.ErrorResponse"
+                        }
+                    },
+                    "500": {
+                        "description": "Internal Server Error",
+                        "schema": {
+                            "$ref": "#/definitions/api.ErrorResponse"
+                        }
+                    }
+                }
+            },
+            "delete": {
+                "description": "Delete a speaker identity",
+                "produces": [
+                    "application/json"
+                ],
+                "tags": [
+                    "speakers"
+                ],
+                "summary": "Delete speaker",
+                "parameters": [
+                    {
+                        "type": "string",
+                        "description": "Speaker ID",
+                        "name": "id",
+                        "in": "path",
+                        "required": true
+                    }
+                ],
+                "responses": {
+                    "200": {
+                        "description": "OK",
+                        "schema": {
+                            "type": "object",
+                            "additionalProperties": {
+                                "type": "string"
+                            }
+                        }
+                    },
+                    "500": {
+                        "description": "Internal Server Error",
+                        "schema": {
+                            "$ref": "#/definitions/api.ErrorResponse"
+                        }
+                    }
+                }
+            }
+        },
+        "/api/v1/speakers/{id}/segments": {
+            "get": {
+                "description": "Get all audio segments and their associated transcription jobs for a speaker",
+                "produces": [
+                    "application/json"
+                ],
+                "tags": [
+                    "speakers"
+                ],
+                "summary": "Get speaker segments",
+                "parameters": [
+                    {
+                        "type": "string",
+                        "description": "Speaker ID",
+                        "name": "id",
+                        "in": "path",
+                        "required": true
+                    }
+                ],
+                "responses": {
+                    "200": {
+                        "description": "OK",
+                        "schema": {
+                            "type": "array",
+                            "items": {
+                                "$ref": "#/definitions/models.SpeakerSegment"
+                            }
+                        }
+                    },
+                    "500": {
+                        "description": "Internal Server Error",
+                        "schema": {
+                            "$ref": "#/definitions/api.ErrorResponse"
+                        }
+                    }
+                }
+            }
+        },
         "/api/v1/summaries": {
             "get": {
                 "security": [
@@ -4256,6 +4413,17 @@ const docTemplate = `{
                 }
             }
         },
+        "api.RenameSpeakerRequest": {
+            "type": "object",
+            "required": [
+                "name"
+            ],
+            "properties": {
+                "name": {
+                    "type": "string"
+                }
+            }
+        },
         "api.SetUserDefaultProfileRequest": {
             "type": "object",
             "required": [
@@ -4535,6 +4703,81 @@ const docTemplate = `{
                 }
             }
         },
+        "models.SpeakerMapping": {
+            "type": "object",
+            "properties": {
+                "created_at": {
+                    "type": "string"
+                },
+                "custom_name": {
+                    "description": "e.g., \"John Doe\"",
+                    "type": "string"
+                },
+                "id": {
+                    "type": "integer"
+                },
+                "original_speaker": {
+                    "description": "e.g., \"speaker_00\"",
+                    "type": "string"
+                },
+                "transcription_job": {
+                    "description": "Relationships",
+                    "allOf": [
+                        {
+                            "$ref": "#/definitions/models.TranscriptionJob"
+                        }
+                    ]
+                },
+                "transcription_job_id": {
+                    "type": "string"
+                },
+                "updated_at": {
+                    "type": "string"
+                }
+            }
+        },
+        "models.SpeakerSegment": {
+            "type": "object",
+            "properties": {
+                "created_at": {
+                    "type": "string"
+                },
+                "embedding": {
+                    "description": "JSON-serialized float32 array",
+                    "type": "array",
+                    "items": {
+                        "type": "integer"
+                    }
+                },
+                "end": {
+                    "type": "number"
+                },
+                "id": {
+                    "type": "integer"
+                },
+                "speaker_id": {
+                    "description": "The global speaker ID (UUID) or local name",
+                    "type": "string"
+                },
+                "start": {
+                    "type": "number"
+                },
+                "text": {
+                    "type": "string"
+                },
+                "transcription_job": {
+                    "description": "Relationships",
+                    "allOf": [
+                        {
+                            "$ref": "#/definitions/models.TranscriptionJob"
+                        }
+                    ]
+                },
+                "transcription_job_id": {
+                    "type": "string"
+                }
+            }
+        },
         "models.Summary": {
             "type": "object",
             "properties": {
@@ -4654,6 +4897,12 @@ const docTemplate = `{
                         }
                     ]
                 },
+                "speaker_mappings": {
+                    "type": "array",
+                    "items": {
+                        "$ref": "#/definitions/models.SpeakerMapping"
+                    }
+                },
                 "status": {
                     "$ref": "#/definitions/models.JobStatus"
                 },