You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -232,18 +232,38 @@ Since this interface only returns rowid and distance, if you need to access addi
232
232
*`table` (TEXT): Name of the target table.
233
233
*`column` (TEXT): Column containing vectors.
234
234
*`vector` (BLOB or JSON): The query vector.
235
-
*`k` (INTEGER): Number of nearest neighbors to return.
235
+
*`k` (INTEGER, optional): Number of nearest neighbors to return. When provided, the module collects the top-k results sorted by distance. When omitted, the module operates in **streaming mode** — rows are returned progressively as they are scanned, enabling standard SQL clauses such as `WHERE` and `LIMIT` to control filtering and result count.
236
236
237
-
**Example:**
237
+
**Examples:**
238
238
239
239
```sql
240
+
-- Top-k mode: return the 5 nearest neighbors, sorted by distance
240
241
SELECT rowid, distance
241
242
FROM vector_full_scan('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]'), 5);
242
243
```
243
244
245
+
```sql
246
+
-- Streaming mode: progressively scan all rows, apply SQL filters
247
+
SELECT rowid, distance
248
+
FROM vector_full_scan('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]'))
249
+
LIMIT5;
250
+
```
251
+
252
+
```sql
253
+
-- Streaming mode with JOIN and filtering
254
+
SELECT
255
+
v.rowid,
256
+
row_number() OVER (ORDER BYv.distance) AS rank_number,
257
+
v.distance
258
+
FROM vector_full_scan('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]')) AS v
@@ -257,81 +277,31 @@ You **must run `vector_quantize()`** before using `vector_quantize_scan()` and w
257
277
*`table` (TEXT): Name of the target table.
258
278
*`column` (TEXT): Column containing vectors.
259
279
*`vector` (BLOB or JSON): The query vector.
260
-
*`k` (INTEGER): Number of nearest neighbors to return.
280
+
*`k` (INTEGER, optional): Number of nearest neighbors to return. When provided, the module collects the top-k results sorted by distance. When omitted, the module operates in **streaming mode** — rows are returned progressively, enabling standard SQL clauses such as `WHERE` and `LIMIT`.
261
281
262
282
**Performance Highlights:**
263
283
264
284
* Handles **1M vectors** of dimension 768 in a few milliseconds.
265
285
* Uses **<50MB** of RAM.
266
286
* Achieves **>0.95 recall**.
267
287
268
-
**Example:**
269
-
270
-
```sql
271
-
SELECT rowid, distance
272
-
FROM vector_quantize_scan('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]'), 10);
273
-
```
274
-
275
-
---
276
-
277
-
## 🔁 Streaming Interfaces
278
-
279
-
### `vector_full_scan_stream` and `vector_quantize_scan_stream`
280
-
281
-
**Returns:**`Virtual Table (rowid, distance)`
282
-
283
-
**Description:**
284
-
These streaming interfaces provide the same functionality as `vector_full_scan` and `vector_quantize_scan`, respectively, but are designed for incremental or filtered processing of results.
285
-
286
-
Unlike their non-streaming counterparts, these functions **omit the fourth parameter (`k`)** and allow you to use standard SQL clauses such as `WHERE` and `LIMIT` to control filtering and result count. Since this interface only returns rowid and distance, if you need to access additional columns from the original table, you must use a SELF JOIN.
287
-
288
-
This makes them ideal for combining vector search with additional query conditions or progressive result consumption in streaming applications.
289
-
290
-
**Parameters:**
291
-
292
-
*`table` (TEXT): Name of the target table.
293
-
*`column` (TEXT): Column containing vectors.
294
-
*`vector` (BLOB or JSON): The query vector.
295
-
296
-
**Key Differences from Non-Streaming Variants:**
297
-
298
-
| Function | Equivalent To | Requires `k`| Supports `WHERE`| Supports `LIMIT`|
--Top-k mode: return the 10 nearest neighbors, sorted by distance
307
292
SELECT rowid, distance
308
-
FROM vector_full_scan_stream('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]'))
309
-
LIMIT5;
293
+
FROM vector_quantize_scan('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]'), 10);
310
294
```
311
295
312
296
```sql
313
-
--Perform a filtered approximate scan using quantized data
297
+
--Streaming mode: progressively scan using quantized data
314
298
SELECT rowid, distance
315
-
FROM vector_quantize_scan_stream('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]'))
316
-
LIMIT10;
317
-
```
318
-
319
-
**Accessing Additional Columns:**
320
-
321
-
```sql
322
-
-- Perform a filtered full scan with additional columns
323
-
SELECT
324
-
v.rowid,
325
-
row_number() OVER (ORDER BYv.distance) AS rank_number,
326
-
v.distance
327
-
FROM vector_full_scan_stream('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]')) AS v
328
-
JOIN documents ONdocuments.rowid=v.rowid
329
-
WHEREdocuments.category='science'
299
+
FROM vector_quantize_scan('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]'))
330
300
LIMIT10;
331
301
```
332
302
333
303
**Usage Notes:**
334
304
335
-
*These interfaces return rows progressively and can efficiently combine vector similarity with SQL-level filters.
336
-
*The `LIMIT` clause can be used to control how many rows are read or returned.
337
-
*The query planner integrates the streaming virtual table into the overall SQL execution plan, enabling hybrid filtering and ranking operations.
305
+
*In **top-k mode** (with `k`), results are sorted by distance. The query planner knows the output is pre-sorted, so no additional `ORDER BY` is needed.
306
+
*In **streaming mode** (without `k`), rows are returned in scan order. Use `ORDER BY distance` and `LIMIT` as needed.
307
+
*Streaming mode is ideal for combining vector similarity with additional SQL-level filters or progressive result consumption.
0 commit comments