TileDB-Inc
diff --git a/‎.agents/skills/tiledb-vector-search-cli/SKILL.md‎
Lines changed: 123 additions & 0 deletions b/‎.agents/skills/tiledb-vector-search-cli/SKILL.md‎
Lines changed: 123 additions & 0 deletions
diff --git a/‎apis/python/src/tiledb/vector_search/cli/__init__.py‎ b/‎apis/python/src/tiledb/vector_search/cli/__init__.py‎
diff --git a/‎apis/python/src/tiledb/vector_search/cli/eval_commands.py‎
Lines changed: 182 additions & 0 deletions b/‎apis/python/src/tiledb/vector_search/cli/eval_commands.py‎
Lines changed: 182 additions & 0 deletions
diff --git a/‎apis/python/src/tiledb/vector_search/cli/main.py‎
Lines changed: 12 additions & 0 deletions b/‎apis/python/src/tiledb/vector_search/cli/main.py‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎apis/python/src/tiledb/vector_search/cli/progress.py‎
Lines changed: 26 additions & 0 deletions b/‎apis/python/src/tiledb/vector_search/cli/progress.py‎
Lines changed: 26 additions & 0 deletions
@@ -0,0 +1,123 @@
+---
+name: tiledb-vector-search-cli
+description: >-
+  TileDB Vector Search CLI (`tiledb vs`): build indexes from markdown/Quarto/HTML
+  directories and run semantic search. Use when the user mentions tiledb vs,
+  vector search CLI, building a vector index from docs, or searching an index
+  built with this repo.
+---
+
+# TileDB Vector Search CLI
+
+## Install
+
+From the TileDB-Vector-Search repo root (native extension must build):
+
+```bash
+pip install -e ".[cli]"
+```
+
+For **LangGraph + Anthropic evals** (`tiledb vs eval`), `langgraph` / `langchain-anthropic` / `langchain-core` ship with the base package; use **`[eval]`** if you want an explicit `anthropic` dependency. Set `ANTHROPIC_API_KEY`:
+
+```bash
+pip install -e ".[cli,eval]"
+```
+
+Entry point: `tiledb` (Click). Subcommand group: `vs` (vector search).
+
+Implementation lives under `apis/python/src/tiledb/vector_search/cli/` (`main.py`, `vs.py`, `eval_commands.py`). Eval logic lives under `apis/python/src/tiledb/vector_search/evals/`.
+
+## Commands
+
+### `tiledb vs build SOURCE_DIR OUTPUT_URI`
+
+Recursively indexes docs under `SOURCE_DIR` (local path or `s3://`): `.md`, `.qmd`, `.html`, `.htm`, `.pdf`, `.txt`, `.rst`, `.doc`, `.docx`. Other extensions are skipped (logged in yellow). Parsing uses the same `TileDBLoader` stack as `DirectoryTextReader` (PyMuPDF for PDF, BS4 for HTML, etc.).
+
+**Incremental updates:** If `OUTPUT_URI` already exists as a TileDB group, only files whose paths are **not** already present in index metadata (`file_path`) are embedded and merged. If every file is already indexed, the command exits with nothing to do.
+
+**Caveat:** Updates are keyed by **file URI/path**. Editing a file in place does **not** re-embed it; only new paths are added.
+
+| Option | Default | Notes |
+|--------|---------|--------|
+| `--index-type` | `FLAT` | `FLAT`, `IVF_FLAT`, `VAMANA` |
+| `--embedding-model` | `all-MiniLM-L6-v2` | Sentence-transformers model |
+| `--chunk-size` | `500` | Characters per chunk |
+| `--chunk-overlap` | `50` | Chunk overlap |
+| `--s3-region` | `us-east-1` | Passed as TileDB config `vfs.s3.region` |
+
+**Phases (fresh build):** progress bar while embedding per file/partition, then spinner while building the index.
+
+**Phases (incremental):** progress bar for new files, then spinner for `consolidate_updates`.
+
+Examples:
+
+```bash
+tiledb vs build ./docs ./out/my_index
+tiledb vs build ./docs s3://bucket/prefix/index --s3-region us-west-2
+tiledb vs build ./docs ./out/index --index-type IVF_FLAT --embedding-model all-mpnet-base-v2
+```
+
+### `tiledb vs search INDEX_URI QUERY`
+
+Semantic search over an existing index (local or `s3://`). Uses the same embedding model stored in the index.
+
+| Option | Default |
+|--------|---------|
+| `--topk` | `10` |
+
+Quote multi-word queries in the shell:
+
+```bash
+tiledb vs search ./out/my_index "how do I configure auth"
+tiledb vs search s3://bucket/prefix/index "deployment" --topk 5
+```
+
+### `tiledb vs eval agent DATASET.jsonl`
+
+Runs a small **LangGraph** pipeline (retrieve → generate) with **ChatAnthropic**, using your eval JSONL file. Each line is one object with at least `question` and `gold_answer`; optional `id` and `relevant_file_paths` (for recall@k when an index is used).
+
+**With an index (default):** runs **both** passes per example—**with** retrieval from `--index-uri` and **without**—and prints JSON with `with_index` / `without_index` aggregates (exact match rate, mean token F1, optional LLM judge).
+
+**Baseline only:** `--no-index-only` (no `--index-uri`).
+
+| Option | Notes |
+|--------|--------|
+| `--index-uri` | Vector index URI (required unless `--no-index-only`) |
+| `--top-k` | Chunks passed to the model (default 10) |
+| `--model` | Anthropic model id (default `claude-3-5-haiku-20241022`) |
+| `--llm-judge` | Extra model call to score prediction vs gold (0–1) |
+| `--skip-without-index` / `--skip-with-index` | Run only one side when comparing |
+| `--out` | Write JSON report to a file |
+
+Example:
+
+```bash
+export ANTHROPIC_API_KEY=...
+tiledb vs eval agent ./examples/evals/sample.jsonl --index-uri ./out/my_index --out report.json
+```
+
+Example datasets live under `examples/evals/`; see `examples/evals/README.md` for **`academy_vector_embeddings_foundation.jsonl`** (build the index from the TileDB-Documentation `academy` directory so `relevant_file_paths` align with metadata).
+
+### `tiledb vs eval compare-indexes DATASET.jsonl`
+
+Compares **two** indexes (e.g. different `--embedding-model` builds) on rows that include `relevant_file_paths`: reports **recall@k** per index (whether any top-`k` result path matches a labeled path; matching uses normalized equality, path suffix overlap, or same parent-dir + filename—not basename alone, so shared names like `index.qmd` are not all treated as the same file). With `--answer-metrics`, also runs the full agent per index and scores answers vs `gold_answer`.
+
+```bash
+tiledb vs eval compare-indexes ./eval.jsonl --index-a ./idx_minilm --index-b ./idx_mpnet --top-k 10
+tiledb vs eval compare-indexes ./eval.jsonl --index-a ./idx_a --index-b ./idx_b --answer-metrics --llm-judge
+```
+
+## Scores in search output
+
+Default index metric is L2-related distance on normalized embeddings: **lower scores are better** (0 = closest). Results are ordered best-first.
+
+## Dependencies and runtime
+
+- **CLI extras:** `sentence-transformers`, `langchain-text-splitters`, `langchain-community`, `beautifulsoup4`, `pymupdf`, `python-docx` (`pyproject.toml` → `[project.optional-dependencies].cli`).
+- **Eval stack (core):** `langgraph`, `langchain-anthropic`, and `langchain-core` are **main** dependencies. **`[eval]`** adds an explicit `anthropic` pin for API clients; `pip install .[eval]` is still recommended for eval workflows alongside `[cli]`.
+- **S3:** Standard TileDB/AWS credential and endpoint configuration; `--s3-region` sets `vfs.s3.region` only.
+- **Sentence-transformers:** Model weights cache under the usual Hugging Face / sentence-transformers cache dirs; no API key for public models.
+
+## Extending file types
+
+Supported suffixes are defined in `SUPPORTED_SUFFIXES` in `apis/python/src/tiledb/vector_search/cli/vs.py`. New types must be parseable by `TileDBLoader` in `object_readers/directory_reader.py` (add a MIME handler there if needed).
@@ -0,0 +1,182 @@
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import click
+
+
+def _ensure_eval_deps() -> None:
+    try:
+        import langchain_anthropic  # noqa: F401
+        import langgraph.graph  # noqa: F401
+    except ImportError as e:
+        raise click.ClickException(
+            "Eval commands need optional dependencies. Install with:\n"
+            "  pip install 'tiledb-vector-search[eval]'\n"
+            "(Use [cli] as well to build indexes from documents.)"
+        ) from e
+
+
+@click.group(name="eval", invoke_without_command=True)
+@click.pass_context
+def eval_cli(ctx: click.Context) -> None:
+    """LangGraph + Anthropic evals and index comparison."""
+    _ensure_eval_deps()
+    if ctx.invoked_subcommand is None:
+        click.echo(ctx.get_help(), color=ctx.color)
+
+
+@eval_cli.command("agent")
+@click.argument(
+    "dataset",
+    type=click.Path(path_type=Path, exists=True, dir_okay=False),
+)
+@click.option(
+    "--index-uri",
+    default=None,
+    help="TileDB vector index URI. With the default flags, runs both with and without retrieval.",
+)
+@click.option(
+    "--no-index-only",
+    is_flag=True,
+    help="Only run the baseline (no retrieval). Does not require --index-uri.",
+)
+@click.option(
+    "--skip-without-index",
+    is_flag=True,
+    help="When using --index-uri, do not run the no-retrieval pass.",
+)
+@click.option(
+    "--skip-with-index",
+    is_flag=True,
+    help="When using --index-uri, do not run the retrieval pass.",
+)
+@click.option("--top-k", default=10, show_default=True, type=int)
+@click.option(
+    "--model",
+    default="claude-haiku-4-5-20251001",
+    show_default=True,
+    help="Anthropic model id (langchain-anthropic).",
+)
+@click.option(
+    "--llm-judge",
+    is_flag=True,
+    help="Also call the model to score each answer vs gold (0–1).",
+)
+@click.option(
+    "--out",
+    type=click.Path(path_type=Path, dir_okay=False),
+    default=None,
+    help="Write JSON report to this path.",
+)
+def eval_agent(
+    dataset: Path,
+    index_uri: str | None,
+    no_index_only: bool,
+    skip_without_index: bool,
+    skip_with_index: bool,
+    top_k: int,
+    model: str,
+    llm_judge: bool,
+    out: Path | None,
+) -> None:
+    """Run eval examples: compare answers with vs without vector retrieval."""
+    from tiledb.vector_search.evals.runner import run_agent_eval_report
+
+    if no_index_only:
+        run_with = False
+        run_without = True
+        if index_uri and (skip_with_index or skip_without_index):
+            raise click.UsageError(
+                "--no-index-only cannot be combined with --index-uri skip flags meaningfully."
+            )
+    else:
+        if not index_uri:
+            raise click.UsageError(
+                "Provide --index-uri, or use --no-index-only for baseline-only."
+            )
+        run_with = not skip_with_index
+        run_without = not skip_without_index
+        if not run_with and not run_without:
+            raise click.UsageError(
+                "Choose at least one of retrieval or baseline (check skip flags)."
+            )
+
+    report = run_agent_eval_report(
+        dataset_path=dataset,
+        index_uri=index_uri,
+        top_k=top_k,
+        anthropic_model=model,
+        llm_judge=llm_judge,
+        run_without_index=run_without,
+        run_with_index=run_with,
+        show_progress=True,
+    )
+    text = json.dumps(report, indent=2)
+    click.echo(text)
+    if out is not None:
+        out.write_text(text, encoding="utf-8")
+        click.echo(f"Wrote {out}", err=True)
+
+
+@eval_cli.command("compare-indexes")
+@click.argument(
+    "dataset",
+    type=click.Path(path_type=Path, exists=True, dir_okay=False),
+)
+@click.option("--index-a", required=True, help="First index URI.")
+@click.option("--index-b", required=True, help="Second index URI.")
+@click.option("--top-k", default=10, show_default=True, type=int)
+@click.option(
+    "--answer-metrics",
+    is_flag=True,
+    help="Run the full agent per index and score answers vs gold (calls the API).",
+)
+@click.option(
+    "--model",
+    default="claude-haiku-4-5-20251001",
+    show_default=True,
+)
+@click.option("--llm-judge", is_flag=True)
+@click.option(
+    "--out",
+    type=click.Path(path_type=Path, dir_okay=False),
+    default=None,
+)
+def eval_compare_indexes(
+    dataset: Path,
+    index_a: str,
+    index_b: str,
+    top_k: int,
+    answer_metrics: bool,
+    model: str,
+    llm_judge: bool,
+    out: Path | None,
+) -> None:
+    """Compare two indexes: retrieval recall@k (labeled rows) and optional answer metrics."""
+    from tiledb.vector_search.evals.runner import run_compare_indexes_report
+
+    report = run_compare_indexes_report(
+        dataset_path=dataset,
+        index_uri_a=index_a,
+        index_uri_b=index_b,
+        top_k=top_k,
+        anthropic_model=model,
+        answer_metrics=answer_metrics,
+        llm_judge=llm_judge,
+        show_progress=True,
+    )
+    r = report.get("retrieval") or {}
+    if r.get("per_example") == []:
+        click.secho(
+            "No rows with relevant_file_paths; retrieval section is empty. "
+            "Add labels or use eval agent for answer-only runs.",
+            fg="yellow",
+            err=True,
+        )
+    text = json.dumps(report, indent=2)
+    click.echo(text)
+    if out is not None:
+        out.write_text(text, encoding="utf-8")
+        click.echo(f"Wrote {out}", err=True)
@@ -0,0 +1,12 @@
+import click
+
+from tiledb.vector_search.cli.vs import vs
+
+
+@click.group()
+@click.version_option(package_name="tiledb-vector-search")
+def cli():
+    """TileDB command-line interface."""
+
+
+cli.add_command(vs)
@@ -0,0 +1,26 @@
+"""CLI progress bar (stderr), shared by ``vs build`` and ``vs eval``."""
+
+from __future__ import annotations
+
+import sys
+import time
+
+BAR_WIDTH = 30
+
+
+def progress_bar(current: int, total: int, label: str, start_time: float) -> None:
+    """Write a single-line progress bar to stderr."""
+    elapsed = time.monotonic() - start_time
+    frac = current / total if total else 1.0
+    filled = int(BAR_WIDTH * frac)
+    bar = "█" * filled + "░" * (BAR_WIDTH - filled)
+    pct = int(frac * 100)
+    sys.stderr.write(f"\r  {label} [{current}/{total}] {bar} {pct}%  {elapsed:.0f}s")
+    sys.stderr.flush()
+
+
+def progress_done(current: int, total: int, label: str, start_time: float) -> None:
+    elapsed = time.monotonic() - start_time
+    bar = "█" * BAR_WIDTH
+    sys.stderr.write(f"\r  ✓ {label} [{total}/{total}] {bar} 100%  {elapsed:.1f}s\n")
+    sys.stderr.flush()