Skip to content

Commit a1743c7

Browse files
committed
add cli, evals, and agent skills
1 parent 202125a commit a1743c7

23 files changed

Lines changed: 1902 additions & 28 deletions
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
name: tiledb-vector-search-cli
3+
description: >-
4+
TileDB Vector Search CLI (`tiledb vs`): build indexes from markdown/Quarto/HTML
5+
directories and run semantic search. Use when the user mentions tiledb vs,
6+
vector search CLI, building a vector index from docs, or searching an index
7+
built with this repo.
8+
---
9+
10+
# TileDB Vector Search CLI
11+
12+
## Install
13+
14+
From the TileDB-Vector-Search repo root (native extension must build):
15+
16+
```bash
17+
pip install -e ".[cli]"
18+
```
19+
20+
For **LangGraph + Anthropic evals** (`tiledb vs eval`), `langgraph` / `langchain-anthropic` / `langchain-core` ship with the base package; use **`[eval]`** if you want an explicit `anthropic` dependency. Set `ANTHROPIC_API_KEY`:
21+
22+
```bash
23+
pip install -e ".[cli,eval]"
24+
```
25+
26+
Entry point: `tiledb` (Click). Subcommand group: `vs` (vector search).
27+
28+
Implementation lives under `apis/python/src/tiledb/vector_search/cli/` (`main.py`, `vs.py`, `eval_commands.py`). Eval logic lives under `apis/python/src/tiledb/vector_search/evals/`.
29+
30+
## Commands
31+
32+
### `tiledb vs build SOURCE_DIR OUTPUT_URI`
33+
34+
Recursively indexes docs under `SOURCE_DIR` (local path or `s3://`): `.md`, `.qmd`, `.html`, `.htm`, `.pdf`, `.txt`, `.rst`, `.doc`, `.docx`. Other extensions are skipped (logged in yellow). Parsing uses the same `TileDBLoader` stack as `DirectoryTextReader` (PyMuPDF for PDF, BS4 for HTML, etc.).
35+
36+
**Incremental updates:** If `OUTPUT_URI` already exists as a TileDB group, only files whose paths are **not** already present in index metadata (`file_path`) are embedded and merged. If every file is already indexed, the command exits with nothing to do.
37+
38+
**Caveat:** Updates are keyed by **file URI/path**. Editing a file in place does **not** re-embed it; only new paths are added.
39+
40+
| Option | Default | Notes |
41+
|--------|---------|--------|
42+
| `--index-type` | `FLAT` | `FLAT`, `IVF_FLAT`, `VAMANA` |
43+
| `--embedding-model` | `all-MiniLM-L6-v2` | Sentence-transformers model |
44+
| `--chunk-size` | `500` | Characters per chunk |
45+
| `--chunk-overlap` | `50` | Chunk overlap |
46+
| `--s3-region` | `us-east-1` | Passed as TileDB config `vfs.s3.region` |
47+
48+
**Phases (fresh build):** progress bar while embedding per file/partition, then spinner while building the index.
49+
50+
**Phases (incremental):** progress bar for new files, then spinner for `consolidate_updates`.
51+
52+
Examples:
53+
54+
```bash
55+
tiledb vs build ./docs ./out/my_index
56+
tiledb vs build ./docs s3://bucket/prefix/index --s3-region us-west-2
57+
tiledb vs build ./docs ./out/index --index-type IVF_FLAT --embedding-model all-mpnet-base-v2
58+
```
59+
60+
### `tiledb vs search INDEX_URI QUERY`
61+
62+
Semantic search over an existing index (local or `s3://`). Uses the same embedding model stored in the index.
63+
64+
| Option | Default |
65+
|--------|---------|
66+
| `--topk` | `10` |
67+
68+
Quote multi-word queries in the shell:
69+
70+
```bash
71+
tiledb vs search ./out/my_index "how do I configure auth"
72+
tiledb vs search s3://bucket/prefix/index "deployment" --topk 5
73+
```
74+
75+
### `tiledb vs eval agent DATASET.jsonl`
76+
77+
Runs a small **LangGraph** pipeline (retrieve → generate) with **ChatAnthropic**, using your eval JSONL file. Each line is one object with at least `question` and `gold_answer`; optional `id` and `relevant_file_paths` (for recall@k when an index is used).
78+
79+
**With an index (default):** runs **both** passes per example—**with** retrieval from `--index-uri` and **without**—and prints JSON with `with_index` / `without_index` aggregates (exact match rate, mean token F1, optional LLM judge).
80+
81+
**Baseline only:** `--no-index-only` (no `--index-uri`).
82+
83+
| Option | Notes |
84+
|--------|--------|
85+
| `--index-uri` | Vector index URI (required unless `--no-index-only`) |
86+
| `--top-k` | Chunks passed to the model (default 10) |
87+
| `--model` | Anthropic model id (default `claude-3-5-haiku-20241022`) |
88+
| `--llm-judge` | Extra model call to score prediction vs gold (0–1) |
89+
| `--skip-without-index` / `--skip-with-index` | Run only one side when comparing |
90+
| `--out` | Write JSON report to a file |
91+
92+
Example:
93+
94+
```bash
95+
export ANTHROPIC_API_KEY=...
96+
tiledb vs eval agent ./examples/evals/sample.jsonl --index-uri ./out/my_index --out report.json
97+
```
98+
99+
Example datasets live under `examples/evals/`; see `examples/evals/README.md` for **`academy_vector_embeddings_foundation.jsonl`** (build the index from the TileDB-Documentation `academy` directory so `relevant_file_paths` align with metadata).
100+
101+
### `tiledb vs eval compare-indexes DATASET.jsonl`
102+
103+
Compares **two** indexes (e.g. different `--embedding-model` builds) on rows that include `relevant_file_paths`: reports **recall@k** per index (whether any top-`k` result path matches a labeled path; matching uses normalized equality, path suffix overlap, or same parent-dir + filename—not basename alone, so shared names like `index.qmd` are not all treated as the same file). With `--answer-metrics`, also runs the full agent per index and scores answers vs `gold_answer`.
104+
105+
```bash
106+
tiledb vs eval compare-indexes ./eval.jsonl --index-a ./idx_minilm --index-b ./idx_mpnet --top-k 10
107+
tiledb vs eval compare-indexes ./eval.jsonl --index-a ./idx_a --index-b ./idx_b --answer-metrics --llm-judge
108+
```
109+
110+
## Scores in search output
111+
112+
Default index metric is L2-related distance on normalized embeddings: **lower scores are better** (0 = closest). Results are ordered best-first.
113+
114+
## Dependencies and runtime
115+
116+
- **CLI extras:** `sentence-transformers`, `langchain-text-splitters`, `langchain-community`, `beautifulsoup4`, `pymupdf`, `python-docx` (`pyproject.toml``[project.optional-dependencies].cli`).
117+
- **Eval stack (core):** `langgraph`, `langchain-anthropic`, and `langchain-core` are **main** dependencies. **`[eval]`** adds an explicit `anthropic` pin for API clients; `pip install .[eval]` is still recommended for eval workflows alongside `[cli]`.
118+
- **S3:** Standard TileDB/AWS credential and endpoint configuration; `--s3-region` sets `vfs.s3.region` only.
119+
- **Sentence-transformers:** Model weights cache under the usual Hugging Face / sentence-transformers cache dirs; no API key for public models.
120+
121+
## Extending file types
122+
123+
Supported suffixes are defined in `SUPPORTED_SUFFIXES` in `apis/python/src/tiledb/vector_search/cli/vs.py`. New types must be parseable by `TileDBLoader` in `object_readers/directory_reader.py` (add a MIME handler there if needed).

apis/python/src/tiledb/vector_search/cli/__init__.py

Whitespace-only changes.
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
from __future__ import annotations
2+
3+
import json
4+
from pathlib import Path
5+
6+
import click
7+
8+
9+
def _ensure_eval_deps() -> None:
10+
try:
11+
import langchain_anthropic # noqa: F401
12+
import langgraph.graph # noqa: F401
13+
except ImportError as e:
14+
raise click.ClickException(
15+
"Eval commands need optional dependencies. Install with:\n"
16+
" pip install 'tiledb-vector-search[eval]'\n"
17+
"(Use [cli] as well to build indexes from documents.)"
18+
) from e
19+
20+
21+
@click.group(name="eval", invoke_without_command=True)
22+
@click.pass_context
23+
def eval_cli(ctx: click.Context) -> None:
24+
"""LangGraph + Anthropic evals and index comparison."""
25+
_ensure_eval_deps()
26+
if ctx.invoked_subcommand is None:
27+
click.echo(ctx.get_help(), color=ctx.color)
28+
29+
30+
@eval_cli.command("agent")
31+
@click.argument(
32+
"dataset",
33+
type=click.Path(path_type=Path, exists=True, dir_okay=False),
34+
)
35+
@click.option(
36+
"--index-uri",
37+
default=None,
38+
help="TileDB vector index URI. With the default flags, runs both with and without retrieval.",
39+
)
40+
@click.option(
41+
"--no-index-only",
42+
is_flag=True,
43+
help="Only run the baseline (no retrieval). Does not require --index-uri.",
44+
)
45+
@click.option(
46+
"--skip-without-index",
47+
is_flag=True,
48+
help="When using --index-uri, do not run the no-retrieval pass.",
49+
)
50+
@click.option(
51+
"--skip-with-index",
52+
is_flag=True,
53+
help="When using --index-uri, do not run the retrieval pass.",
54+
)
55+
@click.option("--top-k", default=10, show_default=True, type=int)
56+
@click.option(
57+
"--model",
58+
default="claude-haiku-4-5-20251001",
59+
show_default=True,
60+
help="Anthropic model id (langchain-anthropic).",
61+
)
62+
@click.option(
63+
"--llm-judge",
64+
is_flag=True,
65+
help="Also call the model to score each answer vs gold (0–1).",
66+
)
67+
@click.option(
68+
"--out",
69+
type=click.Path(path_type=Path, dir_okay=False),
70+
default=None,
71+
help="Write JSON report to this path.",
72+
)
73+
def eval_agent(
74+
dataset: Path,
75+
index_uri: str | None,
76+
no_index_only: bool,
77+
skip_without_index: bool,
78+
skip_with_index: bool,
79+
top_k: int,
80+
model: str,
81+
llm_judge: bool,
82+
out: Path | None,
83+
) -> None:
84+
"""Run eval examples: compare answers with vs without vector retrieval."""
85+
from tiledb.vector_search.evals.runner import run_agent_eval_report
86+
87+
if no_index_only:
88+
run_with = False
89+
run_without = True
90+
if index_uri and (skip_with_index or skip_without_index):
91+
raise click.UsageError(
92+
"--no-index-only cannot be combined with --index-uri skip flags meaningfully."
93+
)
94+
else:
95+
if not index_uri:
96+
raise click.UsageError(
97+
"Provide --index-uri, or use --no-index-only for baseline-only."
98+
)
99+
run_with = not skip_with_index
100+
run_without = not skip_without_index
101+
if not run_with and not run_without:
102+
raise click.UsageError(
103+
"Choose at least one of retrieval or baseline (check skip flags)."
104+
)
105+
106+
report = run_agent_eval_report(
107+
dataset_path=dataset,
108+
index_uri=index_uri,
109+
top_k=top_k,
110+
anthropic_model=model,
111+
llm_judge=llm_judge,
112+
run_without_index=run_without,
113+
run_with_index=run_with,
114+
show_progress=True,
115+
)
116+
text = json.dumps(report, indent=2)
117+
click.echo(text)
118+
if out is not None:
119+
out.write_text(text, encoding="utf-8")
120+
click.echo(f"Wrote {out}", err=True)
121+
122+
123+
@eval_cli.command("compare-indexes")
124+
@click.argument(
125+
"dataset",
126+
type=click.Path(path_type=Path, exists=True, dir_okay=False),
127+
)
128+
@click.option("--index-a", required=True, help="First index URI.")
129+
@click.option("--index-b", required=True, help="Second index URI.")
130+
@click.option("--top-k", default=10, show_default=True, type=int)
131+
@click.option(
132+
"--answer-metrics",
133+
is_flag=True,
134+
help="Run the full agent per index and score answers vs gold (calls the API).",
135+
)
136+
@click.option(
137+
"--model",
138+
default="claude-haiku-4-5-20251001",
139+
show_default=True,
140+
)
141+
@click.option("--llm-judge", is_flag=True)
142+
@click.option(
143+
"--out",
144+
type=click.Path(path_type=Path, dir_okay=False),
145+
default=None,
146+
)
147+
def eval_compare_indexes(
148+
dataset: Path,
149+
index_a: str,
150+
index_b: str,
151+
top_k: int,
152+
answer_metrics: bool,
153+
model: str,
154+
llm_judge: bool,
155+
out: Path | None,
156+
) -> None:
157+
"""Compare two indexes: retrieval recall@k (labeled rows) and optional answer metrics."""
158+
from tiledb.vector_search.evals.runner import run_compare_indexes_report
159+
160+
report = run_compare_indexes_report(
161+
dataset_path=dataset,
162+
index_uri_a=index_a,
163+
index_uri_b=index_b,
164+
top_k=top_k,
165+
anthropic_model=model,
166+
answer_metrics=answer_metrics,
167+
llm_judge=llm_judge,
168+
show_progress=True,
169+
)
170+
r = report.get("retrieval") or {}
171+
if r.get("per_example") == []:
172+
click.secho(
173+
"No rows with relevant_file_paths; retrieval section is empty. "
174+
"Add labels or use eval agent for answer-only runs.",
175+
fg="yellow",
176+
err=True,
177+
)
178+
text = json.dumps(report, indent=2)
179+
click.echo(text)
180+
if out is not None:
181+
out.write_text(text, encoding="utf-8")
182+
click.echo(f"Wrote {out}", err=True)
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
import click
2+
3+
from tiledb.vector_search.cli.vs import vs
4+
5+
6+
@click.group()
7+
@click.version_option(package_name="tiledb-vector-search")
8+
def cli():
9+
"""TileDB command-line interface."""
10+
11+
12+
cli.add_command(vs)
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
"""CLI progress bar (stderr), shared by ``vs build`` and ``vs eval``."""
2+
3+
from __future__ import annotations
4+
5+
import sys
6+
import time
7+
8+
BAR_WIDTH = 30
9+
10+
11+
def progress_bar(current: int, total: int, label: str, start_time: float) -> None:
12+
"""Write a single-line progress bar to stderr."""
13+
elapsed = time.monotonic() - start_time
14+
frac = current / total if total else 1.0
15+
filled = int(BAR_WIDTH * frac)
16+
bar = "█" * filled + "░" * (BAR_WIDTH - filled)
17+
pct = int(frac * 100)
18+
sys.stderr.write(f"\r {label} [{current}/{total}] {bar} {pct}% {elapsed:.0f}s")
19+
sys.stderr.flush()
20+
21+
22+
def progress_done(current: int, total: int, label: str, start_time: float) -> None:
23+
elapsed = time.monotonic() - start_time
24+
bar = "█" * BAR_WIDTH
25+
sys.stderr.write(f"\r{label} [{total}/{total}] {bar} 100% {elapsed:.1f}s\n")
26+
sys.stderr.flush()

0 commit comments

Comments
 (0)