prosdevlab · prosdev · Mar 30, 2026 · Mar 29, 2026 · Mar 29, 2026 · Mar 29, 2026
diff --git a/.changeset/antfly-migration.md b/.changeset/antfly-migration.md
@@ -0,0 +1,11 @@
+---
+"@prosdevlab/dev-agent": minor
+---
+
+Replace LanceDB + @xenova/transformers with Antfly for hybrid search
+
+- **Hybrid search**: `dev_search` now uses BM25 + vector + RRF fusion — exact keyword matches AND semantic understanding in one query
+- **New command**: `dev setup` handles search backend installation (Docker-first, native fallback)
+- **Auto-embedding**: Antfly generates embeddings locally via Termite — no separate embedding pipeline
+- **Direct key lookup**: Replaces O(n) zero-vector scan with instant key fetch
+- **Breaking**: Requires Antfly server running (`dev setup` handles this). Existing LanceDB indexes are not migrated — run `dev index . --force` to rebuild.
diff --git a/.claude/da-plans/README.md b/.claude/da-plans/README.md
@@ -9,9 +9,9 @@ Implementation deviations are logged at the bottom of each plan file.
 
 | Track | Description | Status |
 |-------|-------------|--------|
-| [Core](core/) | Scanner, vector storage, services, indexer | Not started |
+| [Core](core/) | Scanner, vector storage, services, indexer | Phase 1: Draft |
 | [CLI](cli/) | Command-line interface | Not started |
-| [MCP Server](mcp-server/) | Model Context Protocol server + adapters | Not started |
+| [MCP Server](mcp-server/) | Model Context Protocol server + adapters | Phase 1: Draft (blocked on core/phase-1) |
 | [Subagents](subagents/) | Coordinator, explorer, planner, GitHub agents | Not started |
 | [Integrations](integrations/) | Claude Code, VS Code, Cursor | Not started |
 | [Logger](logger/) | @prosdevlab/kero centralized logging | Not started |

diff --git a/.claude/da-plans/core/phase-1-antfly-migration/1.1-spike-findings.md b/.claude/da-plans/core/phase-1-antfly-migration/1.1-spike-findings.md
@@ -0,0 +1,122 @@
+# Part 1.1 — Spike Findings
+
+**Date:** 2026-03-29
+**Antfly version:** 0.1.0 (native binary, macOS ARM64)
+**SDK version:** @antfly/sdk 0.0.14
+
+## Results
+
+| # | Question | Answer |
+|---|----------|--------|
+| 1 | Does batch insert overwrite existing keys (upsert)? | **Yes.** Re-inserting same key overwrites the document. Confirmed via lookup after upsert. |
+| 2 | How long does background embedding take? | **~2 seconds** for a single document to become searchable. First batch (10 docs) searchable within 5-8s. |
+| 3 | Can we query immediately after insert? | **No — ~2s delay.** Embeddings are generated asynchronously. `dev index` should wait or poll for completion. |
+| 4 | What does `client.tables.get()` return? | Returns table info including `storage_status.disk_usage` (bytes), index configs, and shard info. **No direct doc count** — need to use a query with limit to count. |
+| 5 | Latency of lookup vs vector search? | Lookup is near-instant. Semantic search ~1-2ms for 10 docs. Both fast at this scale. |
+| 6 | Can we full-scan without a query vector? | **Yes** — use the global `/api/v1/query` endpoint with just `table` and `limit`, no `semantic_search`. Returns all docs. |
+| 7 | Does the SDK handle connection errors gracefully? | **SDK works fine via ESM** (our default). CJS build has a bug with `openapi-fetch` default export — only affects CJS consumers. See SDK notes below. |
+| 8 | What happens when antfly server is not running? | curl gets `ECONNREFUSED`. Clear and fast failure. |
+| 9 | Does `getAll()` paginate beyond 10000 docs? | Not tested at scale in this spike. The query endpoint accepts `limit` — likely works up to a reasonable size. Need to test with a real repo index. |
+| 10 | Does `dev index` need to wait for embedding completion? | **Yes.** There's a ~2s delay between insert and searchability. For a full index run, we should wait for all embeddings to complete before declaring success. Poll embedding status or add a brief wait. |
+
+## API Endpoint Reference (verified)
+
+| Operation | Method | Endpoint |
+|-----------|--------|----------|
+| Create table | POST | `/api/v1/tables/{name}` |
+| Get table info | GET | `/api/v1/tables/{name}` |
+| Drop table | DELETE | `/api/v1/tables/{name}` |
+| List tables | GET | `/api/v1/tables` |
+| Batch insert/delete | POST | `/api/v1/tables/{name}/batch` |
+| Lookup by key | GET | `/api/v1/tables/{name}/lookup/{key}` |
+| Query (table-specific) | POST | `/api/v1/tables/{name}/query` |
+| Query (global) | POST | `/api/v1/query` |
+
+**Important:** The global query endpoint (`/api/v1/query`) returns results in `responses[0].hits.hits[]` format. Table-specific query (`/api/v1/tables/{name}/query`) returns in `hits.hits[]` format.
+
+## Key Findings
+
+### 1. Table creation auto-creates full-text index
+
+When creating a table with an embeddings index, antfly automatically adds a
+`full_text_index_v0` full-text index. This means **every table gets hybrid search
+for free** — no extra configuration needed.
+
+### 2. Hybrid search with RRF works beautifully
+
+Tested: `semantic_search: "error handling and retry"` + `full_text_search: "retryWithBackoff"`
+
+Result: `func-retryBackoff` ranked #1 with scores from BOTH BM25 and vector similarity.
+The `_index_scores` object shows which indexes contributed. RRF doubled its score vs
+semantic-only results. This is exactly the upgrade we wanted for `dev_search`.
+
+### 3. Document structure is flexible (schemaless)
+
+Documents are JSON objects. No predefined schema required. We can store `text`, `metadata`,
+`type`, `file`, `line` — whatever we want. The embedding index uses the `template` field
+(Handlebars) to know which field(s) to embed.
+
+### 4. Embedding model confirmed: bge-small-en-v1.5, dimension 384
+
+Table info shows `dimension: 384` and `model: BAAI/bge-small-en-v1.5`. Same dimension
+as our current all-MiniLM-L6-v2 (384), so result structures don't change.
+
+Note: i8 variant 404'd during model pull. f32 variant (127.8MB) works. The plan should
+use default variant (no `--variants i8` flag) until i8 is fixed.
+
+### 5. Lookup by key replaces O(n) zero-vector hack
+
+`GET /api/v1/tables/{name}/lookup/{key}` returns the document directly. Returns 404 if
+not found. This is a massive improvement over the current `get()` implementation in
+`LanceDBVectorStore` which does a full vector scan with a zero vector.
+
+### 6. Storage info available
+
+`client.tables.get()` returns `storage_status.disk_usage` in bytes. This can replace
+the `storageSize` field in `VectorStats` (currently reads local LanceDB directory).
+
+## SDK Notes
+
+### CJS build has a bug (doesn't affect us)
+
+The SDK's CJS bundle (`dist/index.cjs`) fails because `openapi-fetch` is ESM-only.
+`tsup` wraps it with `__toESM(require("openapi-fetch"))` and accesses `.default`,
+which is `undefined` in CJS context.
+
+**This doesn't affect dev-agent.** All our packages use `"type": "module"` (ESM).
+The ESM import path (`dist/index.js`) works correctly.
+
+The spike error was from `npx tsx` which loaded the CJS path — not representative
+of our actual runtime.
+
+**Recommendation:** Use `@antfly/sdk` directly. It's type-safe, auto-generated from
+OpenAPI spec, and works fine via ESM. Worth mentioning the CJS bug to the antfly
+team (fix: `noExternal: ['openapi-fetch']` in tsup config) for other consumers.
+
+## Docker Findings
+
+### `ghcr.io/antflydb/antfly:omni`
+- No ARM64 image available. Runs under Rosetta with `--platform linux/amd64`.
+- Pull succeeded but entrypoint errored: `Error: unknown flag: --api-url`
+
+### `ghcr.io/antflydb/antfly:latest`
+- Pulls successfully on ARM64 (via amd64 emulation)
+- Does NOT auto-start — just shows help. Needs explicit `swarm` command.
+- Would need: `docker run -d ... ghcr.io/antflydb/antfly:latest swarm`
+
+### Port conflict on native
+- `antfly swarm` binds to ports 8080, 9017, 9021, 12380, 11433
+- If any are occupied (e.g., old Docker container), it crashes with `bind: address already in use`
+- Docker is preferred because it isolates ports inside the container
+
+**Recommendation:** Docker-first with `antfly swarm` as the command, native fallback.
+Need to verify Docker image + `swarm` command works end-to-end.
+
+## Impact on Plan
+
+1. **Use `@antfly/sdk` directly** — ESM works fine, type-safe, auto-generated from OpenAPI
+2. **Model pull: use default variant** (not `--variants i8`) until i8 is fixed
+3. **`dev index` must wait for embeddings** — poll or add brief sleep after batch insert
+4. **Table info provides disk_usage** — can populate `VectorStats.storageSize`
+5. **Auto full-text index** — every table gets BM25 for free, simplifies table creation
+6. **Docker needs `swarm` command** — `docker run ... antfly swarm` not just `docker run ... antfly`
diff --git a/.claude/da-plans/core/phase-1-antfly-migration/1.1-spike-validate-api.md b/.claude/da-plans/core/phase-1-antfly-migration/1.1-spike-validate-api.md
@@ -0,0 +1,154 @@
+# Part 1.1 — Spike: Validate Antfly API
+
+## Goal
+
+Install antfly locally and confirm it can satisfy every operation our `VectorStore` interface
+needs. This is a throwaway spike — no code is committed.
+
+## Prerequisites
+
+```bash
+brew install --cask antflydb/antfly/antfly
+antfly termite pull --variants i8 BAAI/bge-small-en-v1.5
+antfly swarm
+```
+
+## Tasks
+
+### 1. Create a table with embedding index
+
+```typescript
+import { AntflyClient } from '@antfly/sdk';
+
+const client = new AntflyClient({ baseUrl: 'http://localhost:8080' });
+
+await client.tables.create('spike-code', {
+  indexes: {
+    content: {
+      type: 'embeddings',
+      template: '{{text}}',
+      embedder: {
+        provider: 'termite',
+        model: 'BAAI/bge-small-en-v1.5',
+      },
+    },
+  },
+});
+```
+
+**Confirm:** Table created, index active.
+
+### 2. Batch insert documents
+
+Insert 100 code documents matching our `EmbeddingDocument` shape:
+
+```typescript
+const inserts: Record<string, any> = {};
+for (const doc of documents) {
+  inserts[doc.id] = {
+    text: doc.text,
+    metadata: JSON.stringify(doc.metadata),
+  };
+}
+await client.tables.batch('spike-code', { inserts });
+```
+
+**Confirm:** Documents inserted. Check background embedding progress:
+```bash
+antfly index list --table spike-code
+```
+
+### 3. Test upsert behavior
+
+Re-insert a document with the same key but different text.
+
+**Confirm:** Does it overwrite? Or error? We need overwrite (upsert) semantics.
+
+### 4. Run hybrid search
+
+```typescript
+const results = await client.query({
+  table: 'spike-code',
+  semantic_search: 'authentication middleware',
+  full_text_search: { query: 'validateUser' },
+  indexes: ['content'],
+  fields: ['text', 'metadata'],
+  limit: 10,
+});
+```
+
+**Confirm:** Results come back. Both semantic and keyword matches appear.
+
+### 5. Test semantic-only search
+
+```typescript
+const results = await client.query({
+  table: 'spike-code',
+  semantic_search: 'error handling patterns',
+  indexes: ['content'],
+  limit: 10,
+});
+```
+
+**Confirm:** Pure semantic search works (our current default path).
+
+### 6. Test key lookup
+
+```typescript
+const doc = await client.tables.lookup('spike-code', 'some-doc-id');
+```
+
+**Confirm:** Returns the document by key. Fast (not a vector scan).
+
+### 7. Test batch delete
+
+```typescript
+await client.tables.batch('spike-code', { deletes: ['doc-1', 'doc-2'] });
+```
+
+**Confirm:** Documents removed. Search no longer returns them.
+
+### 8. Test count / table stats
+
+```typescript
+const info = await client.tables.get('spike-code');
+```
+
+**Confirm:** Can we get document count from table info?
+
+### 9. Test full scan (no query vector)
+
+```typescript
+const all = await client.tables.query('spike-code', { limit: 1000 });
+// or
+const all = await client.query({ table: 'spike-code', limit: 1000 });
+```
+
+**Confirm:** Can we retrieve all documents without a search query? This maps to `getAll()`.
+
+### 10. Test embedding availability timing
+
+Insert a batch, then immediately search for it.
+
+**Confirm:** How long until newly-inserted docs appear in search results?
+If there's a delay, we need to handle this in the index command (wait for embedding completion).
+
+## Questions to answer
+
+| # | Question | Answer |
+|---|----------|--------|
+| 1 | Does batch insert overwrite existing keys (upsert)? | |
+| 2 | How long does background embedding take for 100/1000/10000 docs? | |
+| 3 | Can we query immediately after insert? | |
+| 4 | What does `client.tables.get()` return? (need count) | |
+| 5 | Latency of `client.tables.lookup()` vs vector search? | |
+| 6 | Can we full-scan without a query vector? | |
+| 7 | Does the SDK handle connection errors gracefully? | |
+| 8 | What happens when antfly server is not running? | |
+| 9 | Does `getAll()` paginate beyond 10000 docs? How? | |
+| 10 | Does `dev index` need to wait for embedding completion before returning? | |
+
+## Exit criteria
+
+All 8 questions answered. If any answer blocks the migration, document it and reassess.
+If all answers are compatible, proceed to Part 1.2.