Skip to content

Commit a99cfdf

Browse files
committed
Document stopword stripping behavior for = and fulltext()
Update README to reflect that both exact phrase (=) and tokenized search (fulltext()) strip default Redis stopwords before sending queries. Adds a dedicated Stopword Handling section with examples and the STOPWORDS 0 workaround. Replaces outdated 'preserves stopwords' language.
1 parent 16b49ee commit a99cfdf

1 file changed

Lines changed: 34 additions & 20 deletions

File tree

README.md

Lines changed: 34 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -172,28 +172,29 @@ The layered approach emerged from TDD — writing tests first revealed natural b
172172

173173
Full-text search on TEXT fields with multiple search modes:
174174

175-
| Feature | SQL Syntax | RediSearch Output |
176-
|---------|-----------|-------------------|
177-
| Exact phrase | `title = 'gaming laptop'` | `@title:"gaming laptop"` |
178-
| Tokenized search | `fulltext(title, 'gaming laptop')` | `@title:(gaming laptop)` |
179-
| Fuzzy LD=1 | `fuzzy(title, 'laptap')` | `@title:%laptap%` |
180-
| Fuzzy LD=2 | `fuzzy(title, 'laptap', 2)` | `@title:%%laptap%%` |
181-
| Fuzzy LD=3 | `fuzzy(title, 'laptap', 3)` | `@title:%%%laptap%%%` |
182-
| OR / union | `fulltext(title, 'laptop OR tablet')` | `@title:(laptop\|tablet)` |
183-
| Prefix | `title LIKE 'lap%'` | `@title:lap*` |
184-
| Suffix | `title LIKE '%top'` | `@title:*top` |
185-
| Contains | `title LIKE '%apt%'` | `@title:*apt*` |
186-
| Proximity (slop) | `fulltext(title, 'gaming laptop', 2)` | `@title:(gaming laptop) => { $slop: 2; }` |
187-
| Proximity + order | `fulltext(title, 'gaming laptop', 2, true)` | `@title:(gaming laptop) => { $slop: 2; $inorder: true; }` |
188-
| Optional term | `fulltext(title, 'laptop ~gaming')` | `@title:(laptop ~gaming)` |
189-
| BM25 score | `SELECT score() AS relevance FROM idx` | `FT.SEARCH ... WITHSCORES` |
190-
| Negation | `NOT fulltext(title, 'refurbished')` | `-@title:refurbished` |
175+
| Feature | SQL Syntax | RediSearch Output | Notes |
176+
|---------|-----------|-------------------|-------|
177+
| Exact phrase | `title = 'gaming laptop'` | `@title:"gaming laptop"` | Stopwords stripped |
178+
| Tokenized search | `fulltext(title, 'gaming laptop')` | `@title:(gaming laptop)` | Stopwords stripped |
179+
| Fuzzy LD=1 | `fuzzy(title, 'laptap')` | `@title:%laptap%` | |
180+
| Fuzzy LD=2 | `fuzzy(title, 'laptap', 2)` | `@title:%%laptap%%` | |
181+
| Fuzzy LD=3 | `fuzzy(title, 'laptap', 3)` | `@title:%%%laptap%%%` | |
182+
| OR / union | `fulltext(title, 'laptop OR tablet')` | `@title:(laptop\|tablet)` | |
183+
| Prefix | `title LIKE 'lap%'` | `@title:lap*` | |
184+
| Suffix | `title LIKE '%top'` | `@title:*top` | |
185+
| Contains | `title LIKE '%apt%'` | `@title:*apt*` | |
186+
| Proximity (slop) | `fulltext(title, 'gaming laptop', 2)` | `@title:(gaming laptop) => { $slop: 2; }` | |
187+
| Proximity + order | `fulltext(title, 'gaming laptop', 2, true)` | `@title:(gaming laptop) => { $slop: 2; $inorder: true; }` | |
188+
| Optional term | `fulltext(title, 'laptop ~gaming')` | `@title:(laptop ~gaming)` | |
189+
| BM25 score | `SELECT score() AS relevance FROM idx` | `FT.SEARCH ... WITHSCORES` | |
190+
| Negation | `NOT fulltext(title, 'refurbished')` | `-@title:refurbished` | |
191191

192192
**Examples:**
193193

194194
```sql
195-
-- Exact phrase match (stopwords preserved)
195+
-- Exact phrase match (stopwords like "of" are stripped automatically)
196196
SELECT * FROM products WHERE title = 'bank of america'
197+
-- Produces: @title:"bank america"
197198

198199
-- Fuzzy search for typos (Levenshtein distance 2)
199200
SELECT * FROM products WHERE fuzzy(title, 'laptap', 2)
@@ -214,10 +215,23 @@ SELECT title, score() AS relevance FROM products WHERE fulltext(title, 'laptop')
214215
SELECT * FROM products WHERE fulltext(title, 'laptop') OR fulltext(description, 'laptop')
215216
```
216217

218+
**Stopword handling:**
219+
220+
Both `=` (exact phrase) and `fulltext()` (tokenized search) automatically strip [Redis default stopwords](https://redis.io/docs/latest/develop/ai/search-and-query/advanced-concepts/stopwords/) before sending queries to RediSearch. This is necessary because RediSearch does not index stopwords, so including them in queries causes syntax errors or failed matches. A `UserWarning` is emitted when stopwords are removed.
221+
222+
For example, `WHERE title = 'bank of america'` produces `@title:"bank america"` because "of" is a default stopword and is never stored in the inverted index. The stripped phrase still matches correctly because the indexer assigns consecutive token positions after dropping stopwords.
223+
224+
To include stopwords in your queries, create your index with `STOPWORDS 0`:
225+
226+
```
227+
FT.CREATE myindex ON HASH PREFIX 1 doc: STOPWORDS 0 SCHEMA title TEXT
228+
```
229+
217230
**Notes:**
218-
- `=` on TEXT fields performs **exact phrase** matching (preserves stopwords)
219-
- `fulltext()` performs **tokenized** search (stopwords are filtered with a warning)
220-
- `fuzzy()` and `fulltext()` only work on TEXT fields — using them on TAG or NUMERIC raises `ValueError`
231+
- `=` on TEXT fields performs **exact phrase** matching (double-quoted)
232+
- `fulltext()` performs **tokenized** AND search (parenthesized)
233+
- Both operators strip stopwords and emit a warning when they do
234+
- `fuzzy()` and `fulltext()` only work on TEXT fields; using them on TAG or NUMERIC raises `ValueError`
221235
- OR must be **uppercase**: `'laptop OR tablet'` triggers union; lowercase `'laptop or tablet'` is treated as a regular three-word AND search
222236
- Special characters (`@`, `|`, `-`, `*`, `+`, etc.) in search terms are automatically escaped
223237

0 commit comments

Comments
 (0)