Skip to content

Add pg_trgm trigram similarity fallback tier to keyword search #279

@Destrayon

Description

@Destrayon

Description

Domain-specific vocabulary (financial terms, technical jargon) causes catastrophic keyword recall failures because stemming cannot handle morphological variants and compound words. Add a third-tier fallback using PostgreSQL's pg_trgm extension for character-level trigram matching, which catches typos, unstemmed terms, and partial matches that full-text search misses.

Evidence: exp-040 showed +6.57% retrieval quality and +26.5% chunking quality improvement.

Acceptance Criteria

  • pg_trgm extension created on database initialization
  • GIN trigram index created on chunks.content
  • Trigram fallback tier activates when AND+OR tiers return insufficient results
  • No regression on retrieval speed (p95 < 300ms)
  • Retrieval quality improves on eval harness
  • Tests pass (dotnet test)

Implementation Notes

  • Key files: src/Connapse.Search/Keyword/KeywordSearchService.cs, src/Connapse.Storage/Vectors/VectorColumnManager.cs
  • Pattern: Add RunTrigramFallbackAsync method as third tier after AND-then-OR
  • Depends on: tiered AND-then-OR fallback (Add tiered AND-then-OR fallback to keyword search #278) being implemented first
  • Reference: auto-rag-eval/results/exp-040-index_type_optimizat-93f292-changespec.json

Size Estimate

M — ~150 lines across 2 files (search logic + index creation)

Related

Depends on: #278

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status
    Todo

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions