feat: add Apache Pinot vector search client by xiangfu0 · Pull Request #757 · zilliztech/VectorDBBench

xiangfu0 · 2026-04-15T11:52:09Z

Summary

Adds a complete Apache Pinot client for VectorDBBench.

4 index types: HNSW (Lucene-based), IVF_FLAT, IVF_PQ, IVF_ON_DISK
3 metrics: L2, IP, COSINE
Filter support: NumGE and StrEqual via Pinot range/equality predicates
Parallel loading: thread_safe=True — each worker thread maintains its own row buffer and flushes to Pinot via a fresh HTTP session. Since Pinot's ingestFromFile is synchronous (blocks until the HNSW index is built, ~6 min per 100K×768D segment), running concurrent flushes across threads reduces load time significantly compared to sequential flushing.
Optional dep: pip install "vectordb-bench[pinot]"

Benchmark Results

Small dataset (OpenAI 50K, 768D, L2)

Index	QPS	Recall
HNSW	798	1.000
IVF_FLAT	800	1.000
IVF_PQ	795	1.000
IVF_ON_DISK	691	1.000

Large dataset (Cohere 1M, 768D, COSINE)

Index	QPS	Recall
HNSW m=16, ef_construction=100	74	0.982

Filter benchmark (Cohere 1M, COSINE, HNSW m=32)

Filter	QPS	Recall
1% NumGE	71	0.977
99% NumGE	97	0.649

Low recall on 99% filter is expected: the HNSW graph is built on the full dataset, so filtering to 1% of vectors at query time causes many graph neighbors to be pruned.

Test plan

make lint passes on all modified files
make unittest passes
Verify Pinot Docker cluster starts with docker compose (see Pinot quickstart)
Run vectordbbench pinot-hnsw --db-label test against a local Pinot cluster

🤖 Generated with Claude Code

xiangfu0 · 2026-04-15T12:32:13Z

/assign @XuanYang-cn

Adds a complete Apache Pinot client for VectorDBBench. Index types: HNSW (Lucene), IVF_FLAT, IVF_PQ, IVF_ON_DISK Metrics: L2, IP, COSINE Filters: NumGE, StrEqual Optional dep: pip install "vectordb-bench[pinot]" Parallel loading: thread_safe=True — each worker thread maintains its own row buffer and flushes to Pinot via a fresh HTTP session. Since Pinot's ingestFromFile is synchronous (blocks until HNSW index is built, ~6 min per 100K×768D segment), concurrent flushes across threads reduce load time significantly vs sequential flushing. Benchmark results: Small dataset (OpenAI 50K, 768D, L2): HNSW: 798 QPS, recall=1.000 IVF_FLAT: 800 QPS, recall=1.000 IVF_PQ: 795 QPS, recall=1.000 IVF_ON_DISK: 691 QPS, recall=1.000 Large dataset (Cohere 1M, 768D, COSINE): HNSW m=16: 74 QPS, recall=0.982 Filter benchmark (Cohere 1M, COSINE, HNSW m=32): 1% NumGE: 71 QPS, recall=0.977 99% NumGE: 97 QPS, recall=0.649 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

xiangfu0 · 2026-04-19T22:20:30Z

@XuanYang-cn Please review

sre-ci-robot · 2026-04-20T06:17:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xiangfu0, XuanYang-cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [XuanYang-cn]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

xiangfu0 force-pushed the feat/pinot-support branch from 1d51fbf to 9c9d434 Compare April 15, 2026 12:07

xiangfu0 force-pushed the feat/pinot-support branch 2 times, most recently from 0f01a75 to 4296e07 Compare April 15, 2026 12:38

xiangfu0 force-pushed the feat/pinot-support branch from 4296e07 to 5568d1f Compare April 15, 2026 12:43

XuanYang-cn self-assigned this Apr 20, 2026

XuanYang-cn approved these changes Apr 20, 2026

View reviewed changes

XuanYang-cn merged commit c4083f3 into zilliztech:main Apr 20, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Apache Pinot vector search client#757

feat: add Apache Pinot vector search client#757
XuanYang-cn merged 1 commit intozilliztech:mainfrom
xiangfu0:feat/pinot-support

xiangfu0 commented Apr 15, 2026

Uh oh!

xiangfu0 commented Apr 15, 2026

Uh oh!

xiangfu0 commented Apr 19, 2026

Uh oh!

sre-ci-robot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xiangfu0 commented Apr 15, 2026

Summary

Benchmark Results

Small dataset (OpenAI 50K, 768D, L2)

Large dataset (Cohere 1M, 768D, COSINE)

Filter benchmark (Cohere 1M, COSINE, HNSW m=32)

Test plan

Uh oh!

xiangfu0 commented Apr 15, 2026

Uh oh!

xiangfu0 commented Apr 19, 2026

Uh oh!

sre-ci-robot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants