Description
Domain-specific vocabulary (financial terms, technical jargon) causes catastrophic keyword recall failures because stemming cannot handle morphological variants and compound words. Add a third-tier fallback using PostgreSQL's pg_trgm extension for character-level trigram matching, which catches typos, unstemmed terms, and partial matches that full-text search misses.
Evidence: exp-040 showed +6.57% retrieval quality and +26.5% chunking quality improvement.
Acceptance Criteria
Implementation Notes
Key files: src/Connapse.Search/Keyword/KeywordSearchService.cs, src/Connapse.Storage/Vectors/VectorColumnManager.cs
Pattern: Add RunTrigramFallbackAsync method as third tier after AND-then-OR
Depends on: tiered AND-then-OR fallback (Add tiered AND-then-OR fallback to keyword search #278 ) being implemented first
Reference: auto-rag-eval/results/exp-040-index_type_optimizat-93f292-changespec.json
Size Estimate
M — ~150 lines across 2 files (search logic + index creation)
Related
Depends on: #278
Description
Domain-specific vocabulary (financial terms, technical jargon) causes catastrophic keyword recall failures because stemming cannot handle morphological variants and compound words. Add a third-tier fallback using PostgreSQL's pg_trgm extension for character-level trigram matching, which catches typos, unstemmed terms, and partial matches that full-text search misses.
Evidence: exp-040 showed +6.57% retrieval quality and +26.5% chunking quality improvement.
Acceptance Criteria
chunks.contentdotnet test)Implementation Notes
src/Connapse.Search/Keyword/KeywordSearchService.cs,src/Connapse.Storage/Vectors/VectorColumnManager.csRunTrigramFallbackAsyncmethod as third tier after AND-then-ORauto-rag-eval/results/exp-040-index_type_optimizat-93f292-changespec.jsonSize Estimate
M — ~150 lines across 2 files (search logic + index creation)
Related
Depends on: #278