Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .changeset/antfly-migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
"@prosdevlab/dev-agent": minor
---

Replace LanceDB + @xenova/transformers with Antfly for hybrid search

- **Hybrid search**: `dev_search` now uses BM25 + vector + RRF fusion — exact keyword matches AND semantic understanding in one query
- **New command**: `dev setup` handles search backend installation (Docker-first, native fallback)
- **Auto-embedding**: Antfly generates embeddings locally via Termite — no separate embedding pipeline
- **Direct key lookup**: Replaces O(n) zero-vector scan with instant key fetch
- **Breaking**: Requires Antfly server running (`dev setup` handles this). Existing LanceDB indexes are not migrated — run `dev index . --force` to rebuild.
4 changes: 2 additions & 2 deletions .claude/da-plans/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ Implementation deviations are logged at the bottom of each plan file.

| Track | Description | Status |
|-------|-------------|--------|
| [Core](core/) | Scanner, vector storage, services, indexer | Not started |
| [Core](core/) | Scanner, vector storage, services, indexer | Phase 1: Draft |
| [CLI](cli/) | Command-line interface | Not started |
| [MCP Server](mcp-server/) | Model Context Protocol server + adapters | Not started |
| [MCP Server](mcp-server/) | Model Context Protocol server + adapters | Phase 1: Draft (blocked on core/phase-1) |
| [Subagents](subagents/) | Coordinator, explorer, planner, GitHub agents | Not started |
| [Integrations](integrations/) | Claude Code, VS Code, Cursor | Not started |
| [Logger](logger/) | @prosdevlab/kero centralized logging | Not started |
Expand Down
122 changes: 122 additions & 0 deletions .claude/da-plans/core/phase-1-antfly-migration/1.1-spike-findings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Part 1.1 — Spike Findings

**Date:** 2026-03-29
**Antfly version:** 0.1.0 (native binary, macOS ARM64)
**SDK version:** @antfly/sdk 0.0.14

## Results

| # | Question | Answer |
|---|----------|--------|
| 1 | Does batch insert overwrite existing keys (upsert)? | **Yes.** Re-inserting same key overwrites the document. Confirmed via lookup after upsert. |
| 2 | How long does background embedding take? | **~2 seconds** for a single document to become searchable. First batch (10 docs) searchable within 5-8s. |
| 3 | Can we query immediately after insert? | **No — ~2s delay.** Embeddings are generated asynchronously. `dev index` should wait or poll for completion. |
| 4 | What does `client.tables.get()` return? | Returns table info including `storage_status.disk_usage` (bytes), index configs, and shard info. **No direct doc count** — need to use a query with limit to count. |
| 5 | Latency of lookup vs vector search? | Lookup is near-instant. Semantic search ~1-2ms for 10 docs. Both fast at this scale. |
| 6 | Can we full-scan without a query vector? | **Yes** — use the global `/api/v1/query` endpoint with just `table` and `limit`, no `semantic_search`. Returns all docs. |
| 7 | Does the SDK handle connection errors gracefully? | **SDK works fine via ESM** (our default). CJS build has a bug with `openapi-fetch` default export — only affects CJS consumers. See SDK notes below. |
| 8 | What happens when antfly server is not running? | curl gets `ECONNREFUSED`. Clear and fast failure. |
| 9 | Does `getAll()` paginate beyond 10000 docs? | Not tested at scale in this spike. The query endpoint accepts `limit` — likely works up to a reasonable size. Need to test with a real repo index. |
| 10 | Does `dev index` need to wait for embedding completion? | **Yes.** There's a ~2s delay between insert and searchability. For a full index run, we should wait for all embeddings to complete before declaring success. Poll embedding status or add a brief wait. |

## API Endpoint Reference (verified)

| Operation | Method | Endpoint |
|-----------|--------|----------|
| Create table | POST | `/api/v1/tables/{name}` |
| Get table info | GET | `/api/v1/tables/{name}` |
| Drop table | DELETE | `/api/v1/tables/{name}` |
| List tables | GET | `/api/v1/tables` |
| Batch insert/delete | POST | `/api/v1/tables/{name}/batch` |
| Lookup by key | GET | `/api/v1/tables/{name}/lookup/{key}` |
| Query (table-specific) | POST | `/api/v1/tables/{name}/query` |
| Query (global) | POST | `/api/v1/query` |

**Important:** The global query endpoint (`/api/v1/query`) returns results in `responses[0].hits.hits[]` format. Table-specific query (`/api/v1/tables/{name}/query`) returns in `hits.hits[]` format.

## Key Findings

### 1. Table creation auto-creates full-text index

When creating a table with an embeddings index, antfly automatically adds a
`full_text_index_v0` full-text index. This means **every table gets hybrid search
for free** — no extra configuration needed.

### 2. Hybrid search with RRF works beautifully

Tested: `semantic_search: "error handling and retry"` + `full_text_search: "retryWithBackoff"`

Result: `func-retryBackoff` ranked #1 with scores from BOTH BM25 and vector similarity.
The `_index_scores` object shows which indexes contributed. RRF doubled its score vs
semantic-only results. This is exactly the upgrade we wanted for `dev_search`.

### 3. Document structure is flexible (schemaless)

Documents are JSON objects. No predefined schema required. We can store `text`, `metadata`,
`type`, `file`, `line` — whatever we want. The embedding index uses the `template` field
(Handlebars) to know which field(s) to embed.

### 4. Embedding model confirmed: bge-small-en-v1.5, dimension 384

Table info shows `dimension: 384` and `model: BAAI/bge-small-en-v1.5`. Same dimension
as our current all-MiniLM-L6-v2 (384), so result structures don't change.

Note: i8 variant 404'd during model pull. f32 variant (127.8MB) works. The plan should
use default variant (no `--variants i8` flag) until i8 is fixed.

### 5. Lookup by key replaces O(n) zero-vector hack

`GET /api/v1/tables/{name}/lookup/{key}` returns the document directly. Returns 404 if
not found. This is a massive improvement over the current `get()` implementation in
`LanceDBVectorStore` which does a full vector scan with a zero vector.

### 6. Storage info available

`client.tables.get()` returns `storage_status.disk_usage` in bytes. This can replace
the `storageSize` field in `VectorStats` (currently reads local LanceDB directory).

## SDK Notes

### CJS build has a bug (doesn't affect us)

The SDK's CJS bundle (`dist/index.cjs`) fails because `openapi-fetch` is ESM-only.
`tsup` wraps it with `__toESM(require("openapi-fetch"))` and accesses `.default`,
which is `undefined` in CJS context.

**This doesn't affect dev-agent.** All our packages use `"type": "module"` (ESM).
The ESM import path (`dist/index.js`) works correctly.

The spike error was from `npx tsx` which loaded the CJS path — not representative
of our actual runtime.

**Recommendation:** Use `@antfly/sdk` directly. It's type-safe, auto-generated from
OpenAPI spec, and works fine via ESM. Worth mentioning the CJS bug to the antfly
team (fix: `noExternal: ['openapi-fetch']` in tsup config) for other consumers.

## Docker Findings

### `ghcr.io/antflydb/antfly:omni`
- No ARM64 image available. Runs under Rosetta with `--platform linux/amd64`.
- Pull succeeded but entrypoint errored: `Error: unknown flag: --api-url`

### `ghcr.io/antflydb/antfly:latest`
- Pulls successfully on ARM64 (via amd64 emulation)
- Does NOT auto-start — just shows help. Needs explicit `swarm` command.
- Would need: `docker run -d ... ghcr.io/antflydb/antfly:latest swarm`

### Port conflict on native
- `antfly swarm` binds to ports 8080, 9017, 9021, 12380, 11433
- If any are occupied (e.g., old Docker container), it crashes with `bind: address already in use`
- Docker is preferred because it isolates ports inside the container

**Recommendation:** Docker-first with `antfly swarm` as the command, native fallback.
Need to verify Docker image + `swarm` command works end-to-end.

## Impact on Plan

1. **Use `@antfly/sdk` directly** — ESM works fine, type-safe, auto-generated from OpenAPI
2. **Model pull: use default variant** (not `--variants i8`) until i8 is fixed
3. **`dev index` must wait for embeddings** — poll or add brief sleep after batch insert
4. **Table info provides disk_usage** — can populate `VectorStats.storageSize`
5. **Auto full-text index** — every table gets BM25 for free, simplifies table creation
6. **Docker needs `swarm` command** — `docker run ... antfly swarm` not just `docker run ... antfly`
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Part 1.1 — Spike: Validate Antfly API

## Goal

Install antfly locally and confirm it can satisfy every operation our `VectorStore` interface
needs. This is a throwaway spike — no code is committed.

## Prerequisites

```bash
brew install --cask antflydb/antfly/antfly
antfly termite pull --variants i8 BAAI/bge-small-en-v1.5
antfly swarm
```

## Tasks

### 1. Create a table with embedding index

```typescript
import { AntflyClient } from '@antfly/sdk';

const client = new AntflyClient({ baseUrl: 'http://localhost:8080' });

await client.tables.create('spike-code', {
indexes: {
content: {
type: 'embeddings',
template: '{{text}}',
embedder: {
provider: 'termite',
model: 'BAAI/bge-small-en-v1.5',
},
},
},
});
```

**Confirm:** Table created, index active.

### 2. Batch insert documents

Insert 100 code documents matching our `EmbeddingDocument` shape:

```typescript
const inserts: Record<string, any> = {};
for (const doc of documents) {
inserts[doc.id] = {
text: doc.text,
metadata: JSON.stringify(doc.metadata),
};
}
await client.tables.batch('spike-code', { inserts });
```

**Confirm:** Documents inserted. Check background embedding progress:
```bash
antfly index list --table spike-code
```

### 3. Test upsert behavior

Re-insert a document with the same key but different text.

**Confirm:** Does it overwrite? Or error? We need overwrite (upsert) semantics.

### 4. Run hybrid search

```typescript
const results = await client.query({
table: 'spike-code',
semantic_search: 'authentication middleware',
full_text_search: { query: 'validateUser' },
indexes: ['content'],
fields: ['text', 'metadata'],
limit: 10,
});
```

**Confirm:** Results come back. Both semantic and keyword matches appear.

### 5. Test semantic-only search

```typescript
const results = await client.query({
table: 'spike-code',
semantic_search: 'error handling patterns',
indexes: ['content'],
limit: 10,
});
```

**Confirm:** Pure semantic search works (our current default path).

### 6. Test key lookup

```typescript
const doc = await client.tables.lookup('spike-code', 'some-doc-id');
```

**Confirm:** Returns the document by key. Fast (not a vector scan).

### 7. Test batch delete

```typescript
await client.tables.batch('spike-code', { deletes: ['doc-1', 'doc-2'] });
```

**Confirm:** Documents removed. Search no longer returns them.

### 8. Test count / table stats

```typescript
const info = await client.tables.get('spike-code');
```

**Confirm:** Can we get document count from table info?

### 9. Test full scan (no query vector)

```typescript
const all = await client.tables.query('spike-code', { limit: 1000 });
// or
const all = await client.query({ table: 'spike-code', limit: 1000 });
```

**Confirm:** Can we retrieve all documents without a search query? This maps to `getAll()`.

### 10. Test embedding availability timing

Insert a batch, then immediately search for it.

**Confirm:** How long until newly-inserted docs appear in search results?
If there's a delay, we need to handle this in the index command (wait for embedding completion).

## Questions to answer

| # | Question | Answer |
|---|----------|--------|
| 1 | Does batch insert overwrite existing keys (upsert)? | |
| 2 | How long does background embedding take for 100/1000/10000 docs? | |
| 3 | Can we query immediately after insert? | |
| 4 | What does `client.tables.get()` return? (need count) | |
| 5 | Latency of `client.tables.lookup()` vs vector search? | |
| 6 | Can we full-scan without a query vector? | |
| 7 | Does the SDK handle connection errors gracefully? | |
| 8 | What happens when antfly server is not running? | |
| 9 | Does `getAll()` paginate beyond 10000 docs? How? | |
| 10 | Does `dev index` need to wait for embedding completion before returning? | |

## Exit criteria

All 8 questions answered. If any answer blocks the migration, document it and reassess.
If all answers are compatible, proceed to Part 1.2.
Loading
Loading