Skip to content

Commit 7dcf2ff

Browse files
zxqfd555Manul from Pathway
authored andcommitted
duckdb output connector (#10482)
GitOrigin-RevId: 24cb5947215a9175c6b66aed273667dcddc7f932
1 parent 30790be commit 7dcf2ff

11 files changed

Lines changed: 2964 additions & 134 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
66
## [Unreleased]
77

88
### Added
9+
- `pw.io.duckdb.write` writes a Pathway table into a DuckDB database file through a native, in-process connector, in either `"stream_of_changes"` or `"snapshot"` mode. Embeddings stored as `numpy` arrays or `list[float]` columns land in native `DOUBLE[]` list columns, so the result is directly searchable with DuckDB's vector-distance functions for RAG retrieval.
910
- `pw.io.weaviate.write` writes a Pathway table to a [Weaviate](https://weaviate.io/) collection, keeping it in sync with the table: additions and updates upsert objects (keyed by a UUID derived from the required `primary_key`), and deletions remove them. An optional `vector` column is stored as the object's embedding; the remaining columns become object properties. The target collection must already exist. The connector writes in parallel across workers (`pathway spawn -n`), with `batch_size` and `concurrency` to tune throughput, and supports `api_key`/`headers` authentication for self-hosted and Weaviate Cloud deployments.
1011
- `pw.io.pinecone.write` writes a Pathway table to a [Pinecone](https://www.pinecone.io/) index for use as a vector store in RAG pipelines. It keeps the index in sync with the current state of the table: a row is upserted under its record id and a row removed from the table is deleted from the index. By default the id is the table's internal row key (unique, and written in parallel across workers); pass a `primary_key` column to use your own ids instead, in which case the values must uniquely identify rows (a collision raises an error) and writing runs on a single worker. The `vector` column holds the dense embedding (`list[float]` or a 1-D `numpy.ndarray`); the remaining columns — or just those listed in `metadata_columns` — are stored as record metadata. Targets both Pinecone cloud and the local Pinecone emulator via the `host` parameter; the index must already exist with a matching dimension.
1112

0 commit comments

Comments
 (0)