Skip to content

Commit 7d6a66c

Browse files
committed
feat: Add documentation for PostgreSQL extensions and text search optimization in Marten
1 parent ebd594e commit 7d6a66c

3 files changed

Lines changed: 199 additions & 18 deletions

File tree

.github/skills/jasperfx-marten/SKILL.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Marten turns PostgreSQL into both a document database and an event store. It is
1818
| [marten-events.md](references/marten-events.md) | Events design, `StartStream/Append`, `AggregateStreamAsync`, `FetchStreamStateAsync`, aggregates, `ISoftDeleted` |
1919
| [marten-projections.md](references/marten-projections.md) | `SingleStreamProjection`, `MultiStreamProjection`, lifecycle, registration, enrichment, composite projections |
2020
| [marten-multi-tenancy.md](references/marten-multi-tenancy.md) | Conjoined tenancy config, per-tenant sessions, middleware pattern, DI session registration, global documents (`[DoNotPartition]`), cross-tenant queries, projection tenancy, table partitioning, indexes, `DeleteAllTenantDataAsync`, `MultiTenancyConstants` |
21+
| [marten-postgres-extensions.md](references/marten-postgres-extensions.md) | `pg_trgm`, `unaccent` extensions, `NgramIndex`/`NgramSearch`, `FullTextIndex`/`PlainTextSearch`/`PhraseSearch`/`WebStyleSearch`, `GinIndexJsonData`, `UseNGramSearchWithUnaccent`, Aspire `WithCreationScript`, index strategy, common mistakes |
2122
| [marten-advanced.md](references/marten-advanced.md) | Async daemon, commit listeners, side effects, metadata, natural keys, performance |
2223

2324
## Quick Reference

.github/skills/jasperfx-marten/references/marten-documents.md

Lines changed: 5 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -61,34 +61,21 @@ var titles = await session.Query<BookSearchProjection>()
6161

6262
## Full-Text Search
6363

64-
Marten supports PostgreSQL full-text search with NGram (handles partial matches):
64+
See [marten-postgres-extensions.md](marten-postgres-extensions.md) for the full reference — `NgramIndex`/`NgramSearch`, `FullTextIndex`/`PlainTextSearch`/`PhraseSearch`/`WebStyleSearch`, `GinIndexJsonData`, required extensions (`pg_trgm`, `unaccent`), index strategy, and common pitfalls.
6565

66+
Quick reference:
6667
```csharp
67-
// PlainTextSearch — word-based, no partial matching
68-
var results = await session.Query<BookSearchProjection>()
69-
.Where(b => b.SearchText.PlainTextSearch("clean code"))
70-
.ToListAsync();
71-
72-
// NgramSearch — handles partial word matching (needs GIN index)
68+
// Partial-word / trigram search (requires pg_trgm + NgramIndex)
7369
var results = await session.Query<BookSearchProjection>()
7470
.Where(b => b.SearchText.NgramSearch("clea"))
7571
.ToListAsync();
7672

77-
// WebStyleSearch — handles natural language queries
73+
// Whole-word / linguistic search (requires FullTextIndex)
7874
var results = await session.Query<BookSearchProjection>()
79-
.Where(b => b.SearchText.WebStyleSearch("clean OR agile"))
75+
.Where(b => b.SearchText.PlainTextSearch("clean code"))
8076
.ToListAsync();
8177
```
8278

83-
> Full-text search requires a GIN index on the field. Configure in `AddMarten()`:
84-
>
85-
> ```csharp
86-
> options.Schema.For<BookSearchProjection>()
87-
> .Index(x => x.SearchText, idx => idx.Method = IndexMethod.GIN);
88-
> ```
89-
>
90-
> For multilingual support (accented characters), use `UseNGramSearchWithUnaccent()`.
91-
9279
## CollectionContains and JSON Queries
9380

9481
```csharp
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# PostgreSQL Extensions and Text Search Optimization
2+
3+
## Required Extensions
4+
5+
Marten's NGram search relies on the `pg_trgm` PostgreSQL extension. It must exist in the database before Marten tries to create NGram indexes — otherwise schema migration fails silently or at runtime.
6+
7+
| Extension | Purpose | Required by |
8+
|-----------|---------|------------|
9+
| `pg_trgm` | Trigram-based fuzzy/partial-word matching | `NgramIndex`, `NgramSearch` |
10+
| `unaccent` | Strips diacritics/accents from strings | `UseNGramSearchWithUnaccent` (optional) |
11+
12+
> Both extensions ship with standard PostgreSQL. No extra install is needed — just `CREATE EXTENSION`.
13+
14+
### Register extensions with Marten
15+
16+
Marten can create the extensions automatically via `Weasel.Postgresql.Extension`:
17+
18+
```csharp
19+
using Weasel.Postgresql;
20+
21+
// In AddMarten() options setup:
22+
options.Storage.ExtendedSchemaObjects.Add(new Extension("pg_trgm"));
23+
24+
// Optional: for accent-insensitive NGram search
25+
options.Storage.ExtendedSchemaObjects.Add(new Extension("unaccent"));
26+
```
27+
28+
With `AutoCreateSchemaObjects = AutoCreate.All` (development) Marten runs
29+
`CREATE EXTENSION IF NOT EXISTS pg_trgm` on startup. In production (`AutoCreate.CreateOnly`) extensions are also created if missing.
30+
31+
### Aspire: provision the extension via a creation script
32+
33+
When running with Aspire using the PostgreSQL container, pass a SQL creation script through `WithCreationScript`:
34+
35+
```csharp
36+
// AppHost.cs
37+
var postgres = builder.AddPostgres(ResourceNames.Postgres)
38+
.WithCreationScript("sql/create-extensions.sql");
39+
```
40+
41+
```sql
42+
-- sql/create-extensions.sql
43+
CREATE EXTENSION IF NOT EXISTS pg_trgm;
44+
CREATE EXTENSION IF NOT EXISTS unaccent; -- only if using UseNGramSearchWithUnaccent
45+
```
46+
47+
> The comment `// Add PostgreSQL with pg_trgm extension for ngram search` in `AppHost.cs` serves as a reminder that `pg_trgm` is a runtime dependency of the API service.
48+
49+
---
50+
51+
## Search Strategies
52+
53+
### 1. NGram search (`NgramIndex` + `NgramSearch`) — **preferred for partial-word matching**
54+
55+
Uses `pg_trgm` to index every 3-character sequence (trigram) of a string. A query term is also broken into trigrams and compared against the index.
56+
57+
**Why to use it:** Works for mid-word substrings ("clea" → "clean", "agil" → "agile"), typo tolerance, autocomplete. Does not require full words or specific word boundaries.
58+
59+
**Configuration:**
60+
```csharp
61+
// Register index when configuring Marten
62+
options.Schema.For<BookSearchProjection>()
63+
.NgramIndex(x => x.Title)
64+
.NgramIndex(x => x.AuthorNames);
65+
```
66+
67+
```csharp
68+
// LINQ query
69+
var results = await session.Query<BookSearchProjection>()
70+
.Where(b => b.Title.NgramSearch("clea"))
71+
.ToListAsync();
72+
```
73+
74+
**Multi-field pattern — use a computed `SearchText` property:**
75+
76+
Instead of querying multiple NGram indexes, concatenate searchable fields into one property and put a single index on it. This keeps query code simple and index count low:
77+
78+
```csharp
79+
// Projection property
80+
public string SearchText { get; set; } = string.Empty;
81+
82+
// In projection logic
83+
static void UpdateSearchText(BookSearchProjection p) =>
84+
p.SearchText = $"{p.Title} {p.Isbn ?? string.Empty} {p.PublisherName ?? string.Empty} {p.AuthorNames}".Trim();
85+
86+
// Single index covers all fields
87+
options.Schema.For<BookSearchProjection>()
88+
.NgramIndex(x => x.SearchText);
89+
```
90+
91+
**Accent-insensitive variant:**
92+
93+
When users may search with or without diacritics (e.g., "bjork" → "Björk"), enable unaccent:
94+
95+
```csharp
96+
// Requires unaccent extension to be installed first
97+
options.Advanced.UseNGramSearchWithUnaccent = true;
98+
```
99+
100+
This wraps the indexed column and the query term in `unaccent()` so "uðmu" does not match "umut", but "bjork" does match "Björk".
101+
102+
---
103+
104+
### 2. Full-Text Search (`FullTextIndex` + `PlainTextSearch` / `PhraseSearch` / `WebStyleSearch`)
105+
106+
Uses PostgreSQL's native `tsvector`/`tsquery` full-text search. Lexemes (stemmed word roots), stop-word removal, and language-aware dictionaries. Does **not** support partial words — "clean" matches "cleaned", "cleaning", but not "clea".
107+
108+
```csharp
109+
// Index (GIN over tsvector)
110+
options.Schema.For<BlogPost>()
111+
.FullTextIndex(d => d.Body) // "english" language config by default
112+
.FullTextIndex(index => index.RegConfig = "portuguese", d => d.Body);
113+
114+
// Query variants
115+
session.Query<BlogPost>().Where(x => x.Body.PlainTextSearch("software design")) // plainto_tsquery
116+
session.Query<BlogPost>().Where(x => x.Body.PhraseSearch("software design")) // phraseto_tsquery
117+
session.Query<BlogPost>().Where(x => x.Body.WebStyleSearch("software OR design")) // websearch_to_tsquery (PG11+)
118+
session.Query<BlogPost>().Where(x => x.Body.Search("software & design")) // to_tsquery (raw operators)
119+
```
120+
121+
**When to prefer full-text over NGram:**
122+
- Body text, descriptions, long-form content — documents where word semantics matter
123+
- Multiple languages with language-specific stemming
124+
- Users type full words, not partial terms
125+
126+
---
127+
128+
### 3. GIN Index on JSON Data (`GinIndexJsonData`)
129+
130+
Indexes the entire JSONB column with a GIN index. This accelerates ad-hoc queries on any JSON key, including nested paths, without needing individual indexes per field.
131+
132+
```csharp
133+
options.Schema.For<BookSearchProjection>().GinIndexJsonData();
134+
```
135+
136+
**When to use:** Useful for ad-hoc queries against many fields, or when fields queried are not predefined. Not needed if you have explicit computed indexes (`Index(x => x.Field)`) on every queried property — those are more selective.
137+
138+
---
139+
140+
## Index Strategy in This Project
141+
142+
The project's `ConfigureIndexes` method applies this strategy consistently across projections:
143+
144+
| Projection | B-tree (sorting/exact) | NGram (search) | GIN JSON |
145+
|-----------|----------------------|---------------|---------|
146+
| `BookSearchProjection` | `Title`, `PublisherId`, `Deleted` | `Title`, `AuthorNames` ||
147+
| `AuthorProjection` | `Name`, `Deleted` | `Name` ||
148+
| `PublisherProjection` | `Name`, `Deleted` | `Name` ||
149+
| `ApplicationUser` | `NormalizedEmail`, `NormalizedUserName`, `CreatedAt` | `Email` ||
150+
151+
**Pattern:** Index the field twice — once with `Index()` for exact-match and sort operations, once with `NgramIndex()` for search. These are independent indexes and serve different queries.
152+
153+
```csharp
154+
options.Schema.For<AuthorProjection>()
155+
.Index(x => x.Name) // ORDER BY / exact match
156+
.NgramIndex(x => x.Name) // WHERE Name.NgramSearch(...)
157+
.Index(x => x.Deleted); // WHERE Deleted = false
158+
```
159+
160+
**Partial index for filtered queries:** For columns always queried with a constant predicate (e.g., `Deleted = false`), a partial index reduces index size:
161+
162+
```csharp
163+
options.Schema.For<ApplicationUser>()
164+
.Index(x => x.CreatedAt, idx =>
165+
{
166+
idx.Predicate = "data ->> 'EmailConfirmed' = 'false'";
167+
idx.Name = "idx_application_user_unverified_created_at";
168+
});
169+
```
170+
171+
---
172+
173+
## Index Type Reference
174+
175+
| Index | SQL type | Best for | Requires |
176+
|-------|---------|----------|---------|
177+
| `NgramIndex` | GIN (pg_trgm) | Partial/fuzzy word matching | `pg_trgm` extension |
178+
| `FullTextIndex` | GIN (tsvector) | Whole-word linguistic search | Built-in PG FTS |
179+
| `GinIndexJsonData` | GIN (jsonb_path_ops) | Ad-hoc JSON queries ||
180+
| `Index` (default) | B-tree | Equality, range, sort ||
181+
| `Index(..., IndexMethod.GIN)` | GIN (custom) | Custom GIN expressions | depends |
182+
183+
---
184+
185+
## Common Mistakes
186+
187+
| Problem | Cause | Fix |
188+
|---------|-------|-----|
189+
| `NgramSearch` returns nothing | `pg_trgm` extension not installed | Add `new Extension("pg_trgm")` to `ExtendedSchemaObjects`, or use `WithCreationScript` in Aspire |
190+
| Partial search misses accented names | `unaccent` not enabled | Set `options.Advanced.UseNGramSearchWithUnaccent = true` and install the `unaccent` extension |
191+
| Slow search across many fields | Multiple `NgramIndex` hits | Consolidate into one computed `SearchText` field with a single `NgramIndex` |
192+
| `FullTextIndex` doesn't match substrings | tsvector uses whole-word lexemes | Switch to `NgramIndex` for partial/autocomplete use cases |
193+
| Too many GIN indexes slow writes | Separate NGram + GIN JSON indexes per projection | Remove `GinIndexJsonData` when explicit `Index()` columns cover all query paths |

0 commit comments

Comments
 (0)