You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ARCHITECTURE-REVIEW.md
+8-3Lines changed: 8 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -244,18 +244,23 @@ This report catalogs every divergence found and maps each to a correction task i
244
244
245
245
## Components with Zero Drift
246
246
247
-
The following components correctly implement their intended architecture and require no changes related to this review:
247
+
The following components are correctly implemented (or partially implemented in the correct direction) and require no changes related to this naming review:
248
248
249
249
-`core/HotpathPolicy.ts` — Williams Bound policy implementation; correct
> **Note:** PLAN.md v1.2 has been updated to reflect the actual implementation status of all Hippocampus and Cortex modules. The initial v1.1 plan incorrectly marked `Chunker.ts`, `PageBuilder.ts`, `Ingest.ts`, `Query.ts`, and `QueryResult.ts` as missing; this has been corrected.
Copy file name to clipboardExpand all lines: DESIGN.md
+42-13Lines changed: 42 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -115,7 +115,10 @@ The Metroid is constructed at query time by the `MetroidBuilder`. It is **not**
115
115
1.**Select m1** — Identify the topic medoid most relevant to the query embedding.
116
116
2.**Freeze protected dimensions** — Lock the lower Matryoshka embedding dimensions that encode invariant semantic context (domain, language register, topic class). These dimensions are never searched for antithesis.
117
117
3.**Search for m2** — Within the remaining (unfrozen) upper dimensions, search for the nearest medoid that represents semantic opposition to m1.
118
-
4.**Compute centroid** — `c = (m1_vec + m2_vec) / 2` (element-wise average over the unfrozen dimensions).
118
+
4.**Compute centroid** — Compute `c` as follows:
119
+
- Protected dimensions (index < `matryoshkaProtectedDim`): copy directly from m1. These dimensions are invariant; averaging them would dilute the domain anchor that makes the antithesis search meaningful.
120
+
- Unfrozen dimensions (index >= `matryoshkaProtectedDim`): compute the element-wise average of m1 and m2 — `c[i] = (m1[i] + m2[i]) / 2`.
121
+
- The result is a full-dimensional vector that can be used directly as a scoring anchor.
119
122
5.**Prefer centroid as search origin** — Use `c` as the primary starting point for subgraph expansion. This prevents semantic drift toward either pole.
120
123
6.**Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from step 3. Each unwinding broadens the antithesis search.
121
124
7.**Stop at the protected dimension** — The protected lower dimensions are never unwound. This preserves semantic invariants throughout all levels of search.
@@ -181,14 +184,33 @@ This means CORTEX does not possess sufficient knowledge to provide an epistemica
181
184
When a knowledge gap is detected, CORTEX broadcasts the incomplete Metroid as a curiosity probe to connected peers:
Where `knowledgeBoundary` encodes the dimensional layer where antithesis discovery failed. Peers receiving this probe:
197
+
Where:
198
+
-**m1** — the thesis medoid (the topic for which antithesis was not found)
199
+
-**partialMetroid** — the incomplete Metroid at the boundary of local knowledge
200
+
-**queryContext** — the original query embedding, used for scoring by the responding peer
201
+
-**knowledgeBoundary** — the Matryoshka dimensional layer at which antithesis search failed
202
+
-**mimeType** — the MIME type of the embedded content (e.g. `text/plain`, `image/jpeg`). Required so receiving peers can validate commensurability of their graph sections.
203
+
-**modelUrn** — a URN identifying the specific embedding model and version used to produce the vectors (e.g. `urn:model:onnx-community/embeddinggemma-300m-ONNX:v1`). Peers **must** reject probes whose `modelUrn` does not match a model they can compare against. Accepting graph fragments embedded by a different model would produce incommensurable similarity scores at the dimensional boundaries where the models' Matryoshka layers overlap.
204
+
205
+
> **Why `mimeType` and `modelUrn` are required:**
206
+
> Embedding models project content into incompatible latent spaces. A fragment embedded with `nomic-embed-text-v1.5` (matryoshkaProtectedDim=64) cannot be meaningfully compared against a fragment embedded with `embeddinggemma-300m` (matryoshkaProtectedDim=128). Without explicit model and content-type identity on the probe, a peer could return graph sections that appear similar by cosine score but are semantically incommensurable — introducing hallucination-equivalent errors at the knowledge boundary.
188
207
189
-
1. Search their own memory graphs for medoids that could serve as `m2`.
190
-
2. If found, respond with the relevant graph fragment (subject to eligibility filtering; see Smart Sharing Guardrails).
191
-
3. The originating node integrates the received fragment and may retry MetroidBuilder.
208
+
Peers receiving this probe:
209
+
210
+
1. Verify `mimeType` and `modelUrn` match a supported local model.
211
+
2. Search their own memory graphs for medoids that could serve as `m2` using the same embedding space.
212
+
3. If found, respond with the relevant graph fragment (subject to eligibility filtering; see Smart Sharing Guardrails).
213
+
4. The originating node integrates the received fragment and may retry MetroidBuilder.
192
214
193
215
This mechanism enables **distributed learning without hallucination**: the system discovers knowledge through structured peer exchange rather than generating plausible-sounding but ungrounded content.
194
216
@@ -661,9 +683,9 @@ Smart sharing is a core capability, not a post-v1 extra. The v1 exchange path mu
661
683
662
684
**Medoid** (mathematical term): The existing memory node selected as the statistical representative of a cluster. Selected by minimising the sum of distances to all other nodes in the cluster. Used throughout algorithmic descriptions and internal implementation comments.
663
685
664
-
**Centroid** (mathematical term): The arithmetic mean of a set of vectors — a computed geometric point that may not correspond to any stored page. Used in MetroidBuilder to compute the balanced search origin `c`.
686
+
**Centroid** (mathematical term): In MetroidBuilder, the centroid `c` is a full-dimensional vector where protected dimensions are copied from m1 (domain invariant) and unfrozen dimensions are the element-wise average of m1 and m2. Used as the balanced search origin in dialectical scoring.
665
687
666
-
**Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid between them. **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem.
688
+
**Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid (protected dims from m1; unfrozen dims averaged). **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem.
667
689
668
690
**MetroidBuilder**: The CORTEX module responsible for constructing a Metroid for a given query via Matryoshka dimensional unwinding. Planned module: `cortex/MetroidBuilder.ts`.
669
691
@@ -687,15 +709,22 @@ Smart sharing is a core capability, not a post-v1 extra. The v1 exchange path mu
687
709
688
710
## Model-Derived Numerics
689
711
690
-
**Critical Rule:** All numeric values derived from ML model architecture (embedding dimensions, context lengths, thresholds) must **never** be hardcoded as magic numbers.
712
+
**Critical Rule:** All numeric values derived from ML model architecture (embedding dimensions, context lengths, thresholds, and Matryoshka sub-dimension boundaries) must **never** be hardcoded as magic numbers.
|`nomic-ai/nomic-embed-text-v1.5`| 64 | To be added when nomic provider is wired |
726
+
727
+
**Enforcement:**`npm run guard:model-derived` scans for violations before CI merge. The guard now checks for `matryoshkaProtectedDim` in addition to the standard embedding dimension and context length fields.
0 commit comments