|
| 1 | +# PostgreSQL Extensions and Text Search Optimization |
| 2 | + |
| 3 | +## Required Extensions |
| 4 | + |
| 5 | +Marten's NGram search relies on the `pg_trgm` PostgreSQL extension. It must exist in the database before Marten tries to create NGram indexes — otherwise schema migration fails silently or at runtime. |
| 6 | + |
| 7 | +| Extension | Purpose | Required by | |
| 8 | +|-----------|---------|------------| |
| 9 | +| `pg_trgm` | Trigram-based fuzzy/partial-word matching | `NgramIndex`, `NgramSearch` | |
| 10 | +| `unaccent` | Strips diacritics/accents from strings | `UseNGramSearchWithUnaccent` (optional) | |
| 11 | + |
| 12 | +> Both extensions ship with standard PostgreSQL. No extra install is needed — just `CREATE EXTENSION`. |
| 13 | +
|
| 14 | +### Register extensions with Marten |
| 15 | + |
| 16 | +Marten can create the extensions automatically via `Weasel.Postgresql.Extension`: |
| 17 | + |
| 18 | +```csharp |
| 19 | +using Weasel.Postgresql; |
| 20 | + |
| 21 | +// In AddMarten() options setup: |
| 22 | +options.Storage.ExtendedSchemaObjects.Add(new Extension("pg_trgm")); |
| 23 | + |
| 24 | +// Optional: for accent-insensitive NGram search |
| 25 | +options.Storage.ExtendedSchemaObjects.Add(new Extension("unaccent")); |
| 26 | +``` |
| 27 | + |
| 28 | +With `AutoCreateSchemaObjects = AutoCreate.All` (development) Marten runs |
| 29 | +`CREATE EXTENSION IF NOT EXISTS pg_trgm` on startup. In production (`AutoCreate.CreateOnly`) extensions are also created if missing. |
| 30 | + |
| 31 | +### Aspire: provision the extension via a creation script |
| 32 | + |
| 33 | +When running with Aspire using the PostgreSQL container, pass a SQL creation script through `WithCreationScript`: |
| 34 | + |
| 35 | +```csharp |
| 36 | +// AppHost.cs |
| 37 | +var postgres = builder.AddPostgres(ResourceNames.Postgres) |
| 38 | + .WithCreationScript("sql/create-extensions.sql"); |
| 39 | +``` |
| 40 | + |
| 41 | +```sql |
| 42 | +-- sql/create-extensions.sql |
| 43 | +CREATE EXTENSION IF NOT EXISTS pg_trgm; |
| 44 | +CREATE EXTENSION IF NOT EXISTS unaccent; -- only if using UseNGramSearchWithUnaccent |
| 45 | +``` |
| 46 | + |
| 47 | +> The comment `// Add PostgreSQL with pg_trgm extension for ngram search` in `AppHost.cs` serves as a reminder that `pg_trgm` is a runtime dependency of the API service. |
| 48 | +
|
| 49 | +--- |
| 50 | + |
| 51 | +## Search Strategies |
| 52 | + |
| 53 | +### 1. NGram search (`NgramIndex` + `NgramSearch`) — **preferred for partial-word matching** |
| 54 | + |
| 55 | +Uses `pg_trgm` to index every 3-character sequence (trigram) of a string. A query term is also broken into trigrams and compared against the index. |
| 56 | + |
| 57 | +**Why to use it:** Works for mid-word substrings ("clea" → "clean", "agil" → "agile"), typo tolerance, autocomplete. Does not require full words or specific word boundaries. |
| 58 | + |
| 59 | +**Configuration:** |
| 60 | +```csharp |
| 61 | +// Register index when configuring Marten |
| 62 | +options.Schema.For<BookSearchProjection>() |
| 63 | + .NgramIndex(x => x.Title) |
| 64 | + .NgramIndex(x => x.AuthorNames); |
| 65 | +``` |
| 66 | + |
| 67 | +```csharp |
| 68 | +// LINQ query |
| 69 | +var results = await session.Query<BookSearchProjection>() |
| 70 | + .Where(b => b.Title.NgramSearch("clea")) |
| 71 | + .ToListAsync(); |
| 72 | +``` |
| 73 | + |
| 74 | +**Multi-field pattern — use a computed `SearchText` property:** |
| 75 | + |
| 76 | +Instead of querying multiple NGram indexes, concatenate searchable fields into one property and put a single index on it. This keeps query code simple and index count low: |
| 77 | + |
| 78 | +```csharp |
| 79 | +// Projection property |
| 80 | +public string SearchText { get; set; } = string.Empty; |
| 81 | + |
| 82 | +// In projection logic |
| 83 | +static void UpdateSearchText(BookSearchProjection p) => |
| 84 | + p.SearchText = $"{p.Title} {p.Isbn ?? string.Empty} {p.PublisherName ?? string.Empty} {p.AuthorNames}".Trim(); |
| 85 | + |
| 86 | +// Single index covers all fields |
| 87 | +options.Schema.For<BookSearchProjection>() |
| 88 | + .NgramIndex(x => x.SearchText); |
| 89 | +``` |
| 90 | + |
| 91 | +**Accent-insensitive variant:** |
| 92 | + |
| 93 | +When users may search with or without diacritics (e.g., "bjork" → "Björk"), enable unaccent: |
| 94 | + |
| 95 | +```csharp |
| 96 | +// Requires unaccent extension to be installed first |
| 97 | +options.Advanced.UseNGramSearchWithUnaccent = true; |
| 98 | +``` |
| 99 | + |
| 100 | +This wraps the indexed column and the query term in `unaccent()` so "uðmu" does not match "umut", but "bjork" does match "Björk". |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +### 2. Full-Text Search (`FullTextIndex` + `PlainTextSearch` / `PhraseSearch` / `WebStyleSearch`) |
| 105 | + |
| 106 | +Uses PostgreSQL's native `tsvector`/`tsquery` full-text search. Lexemes (stemmed word roots), stop-word removal, and language-aware dictionaries. Does **not** support partial words — "clean" matches "cleaned", "cleaning", but not "clea". |
| 107 | + |
| 108 | +```csharp |
| 109 | +// Index (GIN over tsvector) |
| 110 | +options.Schema.For<BlogPost>() |
| 111 | + .FullTextIndex(d => d.Body) // "english" language config by default |
| 112 | + .FullTextIndex(index => index.RegConfig = "portuguese", d => d.Body); |
| 113 | + |
| 114 | +// Query variants |
| 115 | +session.Query<BlogPost>().Where(x => x.Body.PlainTextSearch("software design")) // plainto_tsquery |
| 116 | +session.Query<BlogPost>().Where(x => x.Body.PhraseSearch("software design")) // phraseto_tsquery |
| 117 | +session.Query<BlogPost>().Where(x => x.Body.WebStyleSearch("software OR design")) // websearch_to_tsquery (PG11+) |
| 118 | +session.Query<BlogPost>().Where(x => x.Body.Search("software & design")) // to_tsquery (raw operators) |
| 119 | +``` |
| 120 | + |
| 121 | +**When to prefer full-text over NGram:** |
| 122 | +- Body text, descriptions, long-form content — documents where word semantics matter |
| 123 | +- Multiple languages with language-specific stemming |
| 124 | +- Users type full words, not partial terms |
| 125 | + |
| 126 | +--- |
| 127 | + |
| 128 | +### 3. GIN Index on JSON Data (`GinIndexJsonData`) |
| 129 | + |
| 130 | +Indexes the entire JSONB column with a GIN index. This accelerates ad-hoc queries on any JSON key, including nested paths, without needing individual indexes per field. |
| 131 | + |
| 132 | +```csharp |
| 133 | +options.Schema.For<BookSearchProjection>().GinIndexJsonData(); |
| 134 | +``` |
| 135 | + |
| 136 | +**When to use:** Useful for ad-hoc queries against many fields, or when fields queried are not predefined. Not needed if you have explicit computed indexes (`Index(x => x.Field)`) on every queried property — those are more selective. |
| 137 | + |
| 138 | +--- |
| 139 | + |
| 140 | +## Index Strategy in This Project |
| 141 | + |
| 142 | +The project's `ConfigureIndexes` method applies this strategy consistently across projections: |
| 143 | + |
| 144 | +| Projection | B-tree (sorting/exact) | NGram (search) | GIN JSON | |
| 145 | +|-----------|----------------------|---------------|---------| |
| 146 | +| `BookSearchProjection` | `Title`, `PublisherId`, `Deleted` | `Title`, `AuthorNames` | ✓ | |
| 147 | +| `AuthorProjection` | `Name`, `Deleted` | `Name` | — | |
| 148 | +| `PublisherProjection` | `Name`, `Deleted` | `Name` | — | |
| 149 | +| `ApplicationUser` | `NormalizedEmail`, `NormalizedUserName`, `CreatedAt` | `Email` | ✓ | |
| 150 | + |
| 151 | +**Pattern:** Index the field twice — once with `Index()` for exact-match and sort operations, once with `NgramIndex()` for search. These are independent indexes and serve different queries. |
| 152 | + |
| 153 | +```csharp |
| 154 | +options.Schema.For<AuthorProjection>() |
| 155 | + .Index(x => x.Name) // ORDER BY / exact match |
| 156 | + .NgramIndex(x => x.Name) // WHERE Name.NgramSearch(...) |
| 157 | + .Index(x => x.Deleted); // WHERE Deleted = false |
| 158 | +``` |
| 159 | + |
| 160 | +**Partial index for filtered queries:** For columns always queried with a constant predicate (e.g., `Deleted = false`), a partial index reduces index size: |
| 161 | + |
| 162 | +```csharp |
| 163 | +options.Schema.For<ApplicationUser>() |
| 164 | + .Index(x => x.CreatedAt, idx => |
| 165 | + { |
| 166 | + idx.Predicate = "data ->> 'EmailConfirmed' = 'false'"; |
| 167 | + idx.Name = "idx_application_user_unverified_created_at"; |
| 168 | + }); |
| 169 | +``` |
| 170 | + |
| 171 | +--- |
| 172 | + |
| 173 | +## Index Type Reference |
| 174 | + |
| 175 | +| Index | SQL type | Best for | Requires | |
| 176 | +|-------|---------|----------|---------| |
| 177 | +| `NgramIndex` | GIN (pg_trgm) | Partial/fuzzy word matching | `pg_trgm` extension | |
| 178 | +| `FullTextIndex` | GIN (tsvector) | Whole-word linguistic search | Built-in PG FTS | |
| 179 | +| `GinIndexJsonData` | GIN (jsonb_path_ops) | Ad-hoc JSON queries | — | |
| 180 | +| `Index` (default) | B-tree | Equality, range, sort | — | |
| 181 | +| `Index(..., IndexMethod.GIN)` | GIN (custom) | Custom GIN expressions | depends | |
| 182 | + |
| 183 | +--- |
| 184 | + |
| 185 | +## Common Mistakes |
| 186 | + |
| 187 | +| Problem | Cause | Fix | |
| 188 | +|---------|-------|-----| |
| 189 | +| `NgramSearch` returns nothing | `pg_trgm` extension not installed | Add `new Extension("pg_trgm")` to `ExtendedSchemaObjects`, or use `WithCreationScript` in Aspire | |
| 190 | +| Partial search misses accented names | `unaccent` not enabled | Set `options.Advanced.UseNGramSearchWithUnaccent = true` and install the `unaccent` extension | |
| 191 | +| Slow search across many fields | Multiple `NgramIndex` hits | Consolidate into one computed `SearchText` field with a single `NgramIndex` | |
| 192 | +| `FullTextIndex` doesn't match substrings | tsvector uses whole-word lexemes | Switch to `NgramIndex` for partial/autocomplete use cases | |
| 193 | +| Too many GIN indexes slow writes | Separate NGram + GIN JSON indexes per projection | Remove `GinIndexJsonData` when explicit `Index()` columns cover all query paths | |
0 commit comments