Skip to content

Commit b2fd983

Browse files
Copilotdevlux76
andcommitted
fix: FastNeighborInsert safety, MetroidBuilder m2 free-dim medoid, remove vectorBackend, ONE Book per ingest, DESIGN.md update
Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
1 parent de785cd commit b2fd983

10 files changed

Lines changed: 127 additions & 110 deletions

File tree

DESIGN.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,10 @@ interface Page {
442442
```
443443

444444
#### Book
445-
Ordered sequence of pages with representative medoid.
445+
Ordered sequence of pages from a **single ingest call** with a representative medoid.
446+
One `ingestText()` call always produces exactly one Book — the entire ingested document.
447+
A collection of Books forms a Volume; a collection of Volumes forms a Shelf.
448+
Books are identified by `SHA-256(sorted pageIds)` so their identity is content-addressed.
446449

447450
```typescript
448451
interface Book {
@@ -634,14 +637,19 @@ Rather than returning nearest neighbors by similarity, Cortex traces a coherent
634637
2. **Generate Embeddings** — Batch embed with selected provider
635638
3. **Persist Vectors** — Append to OPFS vector file
636639
4. **Persist Pages** — Write page metadata to IndexedDB; initialise `PageActivity` record
637-
5. **Build/Attach Hierarchy**Construct/update books, volumes, shelves; attempt hotpath admission for each level's medoid/prototype using tier quota via `SalienceEngine`
638-
6. **Fast Semantic Neighbor Insert** — Update semantic neighbor graph incrementally; bounded degree via `HotpathPolicy`; check new page for hotpath admission
640+
5. **Create Ingest Book**Build exactly one Book for the entire ingest: compute the medoid page (minimum total cosine distance to all other pages in the document), derive `bookId = SHA-256(sorted pageIds)`, persist. Hotpath admission for the book runs via `SalienceEngine`. Volumes and Shelves are assembled lazily by the Daydreamer from accumulated Books.
641+
6. **Fast Semantic Neighbor Insert** — Update semantic neighbor graph incrementally; bounded degree via `HotpathPolicy`; check new pages for hotpath admission
639642
7. **Mark Dirty** — Flag volumes for full recalc by Daydreamer
640643

641-
**Incremental Strategy:**
642-
Fast local semantic neighbor insertion keeps ingest-time latency low. At ingest time, only the initial forward and reverse edges are created — neighbors are selected by cosine similarity within Williams-cutoff **distance** (not a fixed K; the cutoff is derived from `HotpathPolicy`). On degree overflow, the lowest-cosine-similarity neighbor is evicted.
644+
**Incremental Strategy (fast and lightweight):**
645+
Ingest must remain fast and lightweight. At ingest time only two classes of edges are created:
646+
- **Document-order adjacency** — Forward and reverse `SemanticNeighbor` edges between each consecutive page pair within the book slice, inserted unconditionally (document-adjacent chunks are always related). This uses a pre-built `Map<pageId, embedding>` for O(1) lookups; no O(n²) index scans.
647+
- **Proximity edges** — Additional `SemanticNeighbor` edges to nearby pages already in the corpus, bounded by cosine-distance cutoff and `maxDegree` eviction.
643648

644-
Full cross-edge reconnection is intentionally deferred: Daydreamer walks the graph during idle passes to build additional edges, strengthening or pruning connections via LTP/LTD. This avoids a full graph recalculation on every insert while still converging to a well-connected graph over time. Hotpath admission runs at ingest time for new pages and hierarchy prototypes.
649+
Full cross-edge reconnection is intentionally deferred: Daydreamer walks the graph during idle passes to build additional edges — connections we never noticed at ingest time — and strengthens or prunes them via LTP/LTD. This keeps ingest cost sublinear while converging to a well-connected graph over time.
650+
651+
**IndexedDB Schema Upgrade Strategy:**
652+
During early development (pre-v1.0) the schema upgrade path intentionally drops and recreates object stores rather than migrating data. This keeps upgrade code minimal and avoids cruft until the data model stabilises. The neighbor graph is rebuilt from scratch after any ingest replay.
645653

646654
## Consolidation Design
647655

cortex/MetroidBuilder.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ function searchM2(
103103

104104
if (oppositeSet.length === 0) return null;
105105

106-
const medoidIdx = findMedoidIndex(oppositeSet.map((s) => s.candidate.embedding));
106+
const medoidIdx = findMedoidIndex(oppositeSet.map((s) => s.candidate.embedding.slice(protectedDim)));
107107
return oppositeSet[medoidIdx].candidate;
108108
}
109109

cortex/Query.ts

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
import type { ModelProfile } from "../core/ModelProfile";
22
import type { Hash, MetadataStore, Page, VectorStore } from "../core/types";
3-
import type { VectorBackend } from "../VectorBackend";
43
import type { EmbeddingRunner } from "../embeddings/EmbeddingRunner";
54
import { runPromotionSweep } from "../core/SalienceEngine";
65
import type { QueryResult } from "./QueryResult";
@@ -14,7 +13,6 @@ export interface QueryOptions {
1413
embeddingRunner: EmbeddingRunner;
1514
vectorStore: VectorStore;
1615
metadataStore: MetadataStore;
17-
vectorBackend: VectorBackend;
1816
topK?: number;
1917
/** BFS depth for semantic neighbor subgraph expansion. 2 hops covers direct
2018
* neighbors and their neighbors, which is the minimum needed to surface
@@ -34,7 +32,6 @@ export async function query(
3432
topK = 10,
3533
maxHops = 2,
3634
} = options;
37-
3835
const nowIso = new Date().toISOString();
3936

4037
const embeddings = await embeddingRunner.embed([queryText]);

hippocampus/FastNeighborInsert.ts

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -100,11 +100,34 @@ export async function insertSemanticNeighbors(
100100
if (p) offsetMap.set(allPageIds[i], p.embeddingOffset);
101101
}
102102

103-
const allOffsets = allPageIds.map((id) => offsetMap.get(id) ?? 0);
104-
const allVectors = await vectorStore.readVectors(allOffsets, dim);
103+
// (a) Throw if any newPageId is missing from the store — a missing new page
104+
// is always a programming error (it should have been persisted before calling
105+
// insertSemanticNeighbors) and would silently corrupt the graph.
106+
for (const newId of newPageIds) {
107+
if (!offsetMap.has(newId)) {
108+
throw new Error(
109+
`Page ${newId} not found in metadata store; persist it before inserting semantic neighbors`,
110+
);
111+
}
112+
}
113+
114+
// (b) Filter allPageIds to only those that are present in the store.
115+
// Missing entries are silently dropped — they may have been deleted between
116+
// the getAllPages() call and this point. The vector/id arrays stay aligned.
117+
const resolvedPageIds: Hash[] = [];
118+
const resolvedOffsets: number[] = [];
119+
for (const id of allPageIds) {
120+
const offset = offsetMap.get(id);
121+
if (offset !== undefined) {
122+
resolvedPageIds.push(id);
123+
resolvedOffsets.push(offset);
124+
}
125+
}
126+
127+
const allVectors = await vectorStore.readVectors(resolvedOffsets, dim);
105128
const vectorMap = new Map<Hash, Float32Array>();
106-
for (let i = 0; i < allPageIds.length; i++) {
107-
vectorMap.set(allPageIds[i], allVectors[i]);
129+
for (let i = 0; i < resolvedPageIds.length; i++) {
130+
vectorMap.set(resolvedPageIds[i], allVectors[i]);
108131
}
109132

110133
// Collect all (pageId, neighborPageId) pairs that need their stored neighbor

hippocampus/Ingest.ts

Lines changed: 58 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
1-
import type { Book, MetadataStore, Shelf, Volume, VectorStore } from "../core/types";
1+
import type { Book, MetadataStore, VectorStore } from "../core/types";
22
import type { ModelProfile } from "../core/ModelProfile";
33
import { hashText } from "../core/crypto/hash";
44
import type { KeyPair } from "../core/crypto/sign";
55
import { EmbeddingRunner } from "../embeddings/EmbeddingRunner";
66
import { chunkText } from "./Chunker";
77
import { buildPage } from "./PageBuilder";
88
import { runPromotionSweep } from "../core/SalienceEngine";
9-
import { buildHierarchy } from "./HierarchyBuilder";
109
import { insertSemanticNeighbors } from "./FastNeighborInsert";
1110

1211
export interface IngestOptions {
@@ -20,9 +19,46 @@ export interface IngestOptions {
2019

2120
export interface IngestResult {
2221
pages: Array<Awaited<ReturnType<typeof buildPage>>>;
22+
/** The single Book representing everything ingested by this call.
23+
* One ingest call = one Book, always. All pages are members.
24+
* A collection of Books becomes a Volume; a collection of Volumes
25+
* becomes a Shelf — those tiers are assembled by the Daydreamer. */
2326
book?: Book;
24-
volumes?: Volume[];
25-
shelves?: Shelf[];
27+
}
28+
29+
function cosineDistance(a: Float32Array, b: Float32Array): number {
30+
let dot = 0;
31+
let normA = 0;
32+
let normB = 0;
33+
for (let i = 0; i < a.length; i++) {
34+
dot += a[i] * b[i];
35+
normA += a[i] * a[i];
36+
normB += b[i] * b[i];
37+
}
38+
const denom = Math.sqrt(normA) * Math.sqrt(normB);
39+
if (denom === 0) return 0;
40+
return 1 - dot / denom;
41+
}
42+
43+
/**
44+
* Selects the index of the medoid: the element that minimises total cosine
45+
* distance to every other element in the set.
46+
*/
47+
function selectMedoidIndex(vectors: Float32Array[]): number {
48+
if (vectors.length === 1) return 0;
49+
let bestIdx = 0;
50+
let bestTotal = Infinity;
51+
for (let i = 0; i < vectors.length; i++) {
52+
let total = 0;
53+
for (let j = 0; j < vectors.length; j++) {
54+
if (i !== j) total += cosineDistance(vectors[i], vectors[j]);
55+
}
56+
if (total < bestTotal) {
57+
bestTotal = total;
58+
bestIdx = i;
59+
}
60+
}
61+
return bestIdx;
2662
}
2763

2864
export async function ingestText(
@@ -88,17 +124,23 @@ export async function ingestText(
88124
});
89125
}
90126

91-
// Build hierarchy (books, volumes, shelves) from the ingested pages.
92-
const { books, volumes, shelves } = await buildHierarchy(pageIds, {
93-
modelProfile,
94-
vectorStore,
95-
metadataStore,
96-
});
97-
98-
// Use the first book from the hierarchy as the primary book for backward compatibility.
99-
const book = books[0];
127+
// Build ONE Book for the entire ingest.
128+
// A Book = the document we just ingested; its identity is the sorted set of
129+
// its pages. Its representative is the page whose embedding is the medoid
130+
// (minimum total cosine distance to all other pages in the document).
131+
const medoidIdx = selectMedoidIndex(embeddings);
132+
const sortedPageIds = [...pageIds].sort();
133+
const bookId = await hashText(sortedPageIds.join("|"));
134+
const book: Book = {
135+
bookId,
136+
pageIds,
137+
medoidPageId: pageIds[medoidIdx],
138+
meta: {},
139+
};
140+
await metadataStore.putBook(book);
100141

101142
// Insert semantic neighbor edges for the new pages against all stored pages.
143+
// Volumes and Shelves are assembled by the Daydreamer from accumulated Books.
102144
const allPages = await metadataStore.getAllPages();
103145
const allPageIds = allPages.map((p) => p.pageId);
104146
await insertSemanticNeighbors(pageIds, allPageIds, {
@@ -107,8 +149,8 @@ export async function ingestText(
107149
metadataStore,
108150
});
109151

110-
// Run hotpath promotion for the newly ingested pages.
111-
await runPromotionSweep(pageIds, metadataStore);
152+
// Run hotpath promotion for the newly ingested pages and book.
153+
await runPromotionSweep([...pageIds, bookId], metadataStore);
112154

113-
return { pages, book, volumes, shelves };
155+
return { pages, book };
114156
}

storage/IndexedDbMetadataStore.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,10 @@ function applyUpgrade(db: IDBDatabase): void {
7575
if (!db.objectStoreNames.contains(STORE.metroidNeighbors)) {
7676
db.createObjectStore(STORE.metroidNeighbors, { keyPath: "pageId" });
7777
}
78-
// v3: renamed metroid_neighbors → neighbor_graph; drop old store if present
78+
// v3: renamed metroid_neighbors → neighbor_graph (SemanticNeighbor).
79+
// At this stage of development no one has live data, so we intentionally
80+
// drop the old store and let the graph be rebuilt from scratch on next
81+
// ingest. No migration is needed or warranted yet.
7982
if (db.objectStoreNames.contains("metroid_neighbors")) {
8083
db.deleteObjectStore("metroid_neighbors");
8184
}

tests/cortex/Query.test.ts

Lines changed: 0 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -8,48 +8,7 @@ import { EmbeddingRunner } from "../../embeddings/EmbeddingRunner";
88
import { generateKeyPair } from "../../core/crypto/sign";
99
import { ingestText } from "../../hippocampus/Ingest";
1010
import { query } from "../../cortex/Query";
11-
import { topKByScore } from "../../TopK";
12-
import type { BackendKind } from "../../BackendKind";
1311
import type { ModelProfile } from "../../core/ModelProfile";
14-
import type { VectorBackend } from "../../VectorBackend";
15-
16-
class TestVectorBackend implements VectorBackend {
17-
readonly kind: BackendKind = "wasm";
18-
19-
async dotMany(
20-
query: Float32Array,
21-
matrix: Float32Array,
22-
dim: number,
23-
count: number,
24-
): Promise<Float32Array> {
25-
const out = new Float32Array(count);
26-
for (let i = 0; i < count; i++) {
27-
let sum = 0;
28-
const offset = i * dim;
29-
for (let j = 0; j < dim; j++) {
30-
sum += query[j] * matrix[offset + j];
31-
}
32-
out[i] = sum;
33-
}
34-
return out;
35-
}
36-
37-
async project(): Promise<Float32Array> {
38-
throw new Error("Not implemented");
39-
}
40-
41-
async hashToBinary(): Promise<Uint32Array> {
42-
throw new Error("Not implemented");
43-
}
44-
45-
async hammingTopK(): Promise<any> {
46-
throw new Error("Not implemented");
47-
}
48-
49-
async topKFromScores(scores: Float32Array, k: number) {
50-
return topKByScore(scores, k);
51-
}
52-
}
5312

5413
let dbCounter = 0;
5514
function freshDbName(): string {
@@ -67,7 +26,6 @@ describe("cortex query (dialectical orchestrator)", () => {
6726
const vectorStore = new MemoryVectorStore();
6827

6928
const backend = new DeterministicDummyEmbeddingBackend({ dimension: 4 });
70-
const vectorBackend = new TestVectorBackend();
7129

7230
const runner = new EmbeddingRunner(async () => ({
7331
backend,
@@ -91,7 +49,6 @@ describe("cortex query (dialectical orchestrator)", () => {
9149
embeddingRunner: runner,
9250
vectorStore,
9351
metadataStore,
94-
vectorBackend,
9552
topK: 5,
9653
});
9754

@@ -111,7 +68,6 @@ describe("cortex query (dialectical orchestrator)", () => {
11168
const keyPair = await generateKeyPair();
11269

11370
const backend = new DeterministicDummyEmbeddingBackend({ dimension: 4 });
114-
const vectorBackend = new TestVectorBackend();
11571

11672
const runner = new EmbeddingRunner(async () => ({
11773
backend,
@@ -148,7 +104,6 @@ describe("cortex query (dialectical orchestrator)", () => {
148104
embeddingRunner: runner,
149105
vectorStore,
150106
metadataStore,
151-
vectorBackend,
152107
topK: 1,
153108
});
154109

@@ -179,7 +134,6 @@ describe("cortex query (dialectical orchestrator)", () => {
179134
const keyPair = await generateKeyPair();
180135

181136
const backend = new DeterministicDummyEmbeddingBackend({ dimension: 4 });
182-
const vectorBackend = new TestVectorBackend();
183137

184138
const runner = new EmbeddingRunner(async () => ({
185139
backend,
@@ -216,7 +170,6 @@ describe("cortex query (dialectical orchestrator)", () => {
216170
embeddingRunner: runner,
217171
vectorStore,
218172
metadataStore,
219-
vectorBackend,
220173
topK: ingestResult.pages.length,
221174
});
222175

@@ -239,7 +192,6 @@ describe("cortex query (dialectical orchestrator)", () => {
239192
const keyPair = await generateKeyPair();
240193

241194
const backend = new DeterministicDummyEmbeddingBackend({ dimension: 4 });
242-
const vectorBackend = new TestVectorBackend();
243195

244196
const runner = new EmbeddingRunner(async () => ({
245197
backend,
@@ -274,7 +226,6 @@ describe("cortex query (dialectical orchestrator)", () => {
274226
embeddingRunner: runner,
275227
vectorStore,
276228
metadataStore,
277-
vectorBackend,
278229
topK: 2,
279230
});
280231

tests/cortex/Ranking.test.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ describe("Ranking", () => {
134134
keyPair,
135135
});
136136

137-
const volumeIds = (ingestResult.volumes ?? []).map((v) => v.volumeId);
137+
const volumeIds = ((ingestResult as { volumes?: Array<{ volumeId: string }> }).volumes ?? []).map((v) => v.volumeId);
138138
if (volumeIds.length === 0) {
139139
// No volumes built — skip the scoring assertions; the structure test still passes
140140
return;
@@ -211,7 +211,7 @@ describe("Ranking", () => {
211211
keyPair,
212212
});
213213

214-
const shelfIds = (ingestResult.shelves ?? []).map((s) => s.shelfId);
214+
const shelfIds = ((ingestResult as { shelves?: Array<{ shelfId: string }> }).shelves ?? []).map((s) => s.shelfId);
215215
if (shelfIds.length === 0) {
216216
return;
217217
}

tests/hippocampus/HierarchyBuilder.test.ts

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,7 @@ describe("HierarchyBuilder", () => {
257257
expect(result.shelves).toHaveLength(0);
258258
});
259259

260-
it("ingestText result includes volumes and shelves", async () => {
260+
it("ingestText produces exactly one Book covering all ingested pages", async () => {
261261
const metadataStore = await IndexedDbMetadataStore.open(freshDbName());
262262
const vectorStore = new MemoryVectorStore();
263263
const keyPair = await generateKeyPair();
@@ -280,11 +280,18 @@ describe("HierarchyBuilder", () => {
280280
keyPair,
281281
});
282282

283+
// Exactly one Book — the entire ingest
283284
expect(result.book).toBeDefined();
284-
expect(result.volumes).toBeDefined();
285-
expect(result.shelves).toBeDefined();
286-
expect(result.volumes!.length).toBeGreaterThanOrEqual(1);
287-
expect(result.shelves!.length).toBeGreaterThanOrEqual(1);
285+
// The book must contain every ingested page
286+
for (const page of result.pages) {
287+
expect(result.book!.pageIds).toContain(page.pageId);
288+
}
289+
expect(result.book!.pageIds.length).toBe(result.pages.length);
290+
// The medoid must be one of the ingested pages
291+
expect(result.book!.pageIds).toContain(result.book!.medoidPageId);
292+
// Volumes and Shelves are Daydreamer responsibilities, not created at ingest time
293+
expect((result as { volumes?: unknown }).volumes).toBeUndefined();
294+
expect((result as { shelves?: unknown }).shelves).toBeUndefined();
288295
});
289296

290297
it("adds SemanticNeighbor edges between consecutive pages within each book slice", async () => {

0 commit comments

Comments
 (0)