You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: DESIGN.md
+14-6Lines changed: 14 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -442,7 +442,10 @@ interface Page {
442
442
```
443
443
444
444
#### Book
445
-
Ordered sequence of pages with representative medoid.
445
+
Ordered sequence of pages from a **single ingest call** with a representative medoid.
446
+
One `ingestText()` call always produces exactly one Book — the entire ingested document.
447
+
A collection of Books forms a Volume; a collection of Volumes forms a Shelf.
448
+
Books are identified by `SHA-256(sorted pageIds)` so their identity is content-addressed.
446
449
447
450
```typescript
448
451
interfaceBook {
@@ -634,14 +637,19 @@ Rather than returning nearest neighbors by similarity, Cortex traces a coherent
634
637
2.**Generate Embeddings** — Batch embed with selected provider
635
638
3.**Persist Vectors** — Append to OPFS vector file
636
639
4.**Persist Pages** — Write page metadata to IndexedDB; initialise `PageActivity` record
637
-
5.**Build/Attach Hierarchy** — Construct/update books, volumes, shelves; attempt hotpath admission for each level's medoid/prototype using tier quota via `SalienceEngine`
638
-
6.**Fast Semantic Neighbor Insert** — Update semantic neighbor graph incrementally; bounded degree via `HotpathPolicy`; check new page for hotpath admission
640
+
5.**Create Ingest Book** — Build exactly one Book for the entire ingest: compute the medoid page (minimum total cosine distance to all other pages in the document), derive `bookId = SHA-256(sorted pageIds)`, persist. Hotpath admission for the book runs via `SalienceEngine`. Volumes and Shelves are assembled lazily by the Daydreamer from accumulated Books.
641
+
6.**Fast Semantic Neighbor Insert** — Update semantic neighbor graph incrementally; bounded degree via `HotpathPolicy`; check new pages for hotpath admission
639
642
7.**Mark Dirty** — Flag volumes for full recalc by Daydreamer
640
643
641
-
**Incremental Strategy:**
642
-
Fast local semantic neighbor insertion keeps ingest-time latency low. At ingest time, only the initial forward and reverse edges are created — neighbors are selected by cosine similarity within Williams-cutoff **distance** (not a fixed K; the cutoff is derived from `HotpathPolicy`). On degree overflow, the lowest-cosine-similarity neighbor is evicted.
644
+
**Incremental Strategy (fast and lightweight):**
645
+
Ingest must remain fast and lightweight. At ingest time only two classes of edges are created:
646
+
-**Document-order adjacency** — Forward and reverse `SemanticNeighbor` edges between each consecutive page pair within the book slice, inserted unconditionally (document-adjacent chunks are always related). This uses a pre-built `Map<pageId, embedding>` for O(1) lookups; no O(n²) index scans.
647
+
-**Proximity edges** — Additional `SemanticNeighbor` edges to nearby pages already in the corpus, bounded by cosine-distance cutoff and `maxDegree` eviction.
643
648
644
-
Full cross-edge reconnection is intentionally deferred: Daydreamer walks the graph during idle passes to build additional edges, strengthening or pruning connections via LTP/LTD. This avoids a full graph recalculation on every insert while still converging to a well-connected graph over time. Hotpath admission runs at ingest time for new pages and hierarchy prototypes.
649
+
Full cross-edge reconnection is intentionally deferred: Daydreamer walks the graph during idle passes to build additional edges — connections we never noticed at ingest time — and strengthens or prunes them via LTP/LTD. This keeps ingest cost sublinear while converging to a well-connected graph over time.
650
+
651
+
**IndexedDB Schema Upgrade Strategy:**
652
+
During early development (pre-v1.0) the schema upgrade path intentionally drops and recreates object stores rather than migrating data. This keeps upgrade code minimal and avoids cruft until the data model stabilises. The neighbor graph is rebuilt from scratch after any ingest replay.
0 commit comments