Skip to content

Commit cda985c

Browse files
Copilotdevlux76
andcommitted
feat(core): add matryoshkaProtectedDim to ModelProfile; fix CuriosityProbe fields; fix doc accuracy
- core/ModelProfile.ts: add matryoshkaProtectedDim?: number to ModelProfileSeed and ModelProfile - core/ModelDefaults.ts: validate + pass through matryoshkaProtectedDim in buildModelProfileFromSeed - core/ModelProfileResolver.ts: carry matryoshkaProtectedDim through register()/resolve() - core/BuiltInModelProfiles.ts: add matryoshkaProtectedDim: 128 to EMBEDDING_GEMMA_300M_PROFILE - scripts/guard-model-derived.mjs: add matryoshkaProtectedDim to MODEL_FIELD_PATTERN - DESIGN.md: clarify centroid c construction (protected dims copied from m1); expand CuriosityProbe with mimeType + modelUrn fields; update model-derived numerics table - TODO.md: update P1-M1 (matryoshkaProtectedDim from ModelProfile), P1-N2/N4 (mimeType+modelUrn) - PLAN.md: fix module statuses (Chunker/PageBuilder/Ingest/Query/QueryResult exist); update What Works/Doesn't Work; fix blockers section - README.md: reframe Cortex description as planned vs current behavior - ARCHITECTURE-REVIEW.md: note implemented components in zero-drift section Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
1 parent ca1cb00 commit cda985c

10 files changed

Lines changed: 184 additions & 81 deletions

ARCHITECTURE-REVIEW.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -244,18 +244,23 @@ This report catalogs every divergence found and maps each to a correction task i
244244

245245
## Components with Zero Drift
246246

247-
The following components correctly implement their intended architecture and require no changes related to this review:
247+
The following components are correctly implemented (or partially implemented in the correct direction) and require no changes related to this naming review:
248248

249249
- `core/HotpathPolicy.ts` — Williams Bound policy implementation; correct
250250
- `core/SalienceEngine.ts` — Promotion/eviction lifecycle; correct
251251
- `core/crypto/` — Hash, sign, verify; correct
252252
- `storage/OPFSVectorStore.ts` — Append-only vector file; correct
253253
- `storage/MemoryVectorStore.ts` — In-memory testing backend; correct
254254
- `embeddings/` — All embedding providers; correct
255-
- `hippocampus/Chunker.ts` — Text chunking; correct
256-
- `hippocampus/PageBuilder.ts` — Page entity construction; correct
255+
- `hippocampus/Chunker.ts` — Text chunking; **implemented and correct**
256+
- `hippocampus/PageBuilder.ts` — Page entity construction; **implemented and correct**
257+
- `hippocampus/Ingest.ts` — Minimal ingest path; **partially implemented** (chunk→embed→persist→Book→hotpath); correct direction, hierarchy and neighbor insertion deferred
258+
- `cortex/Query.ts` — Minimal query path; **partially implemented** (hotpath-first flat scoring); correct direction, MetroidBuilder deferred
259+
- `cortex/QueryResult.ts` — Minimal result DTO; **partially implemented**; correct direction, provenance fields deferred
257260
- All `VectorBackend` implementations — correct
258261

262+
> **Note:** PLAN.md v1.2 has been updated to reflect the actual implementation status of all Hippocampus and Cortex modules. The initial v1.1 plan incorrectly marked `Chunker.ts`, `PageBuilder.ts`, `Ingest.ts`, `Query.ts`, and `QueryResult.ts` as missing; this has been corrected.
263+
259264
---
260265

261266
## Recommended Fix Order

DESIGN.md

Lines changed: 42 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,10 @@ The Metroid is constructed at query time by the `MetroidBuilder`. It is **not**
115115
1. **Select m1** — Identify the topic medoid most relevant to the query embedding.
116116
2. **Freeze protected dimensions** — Lock the lower Matryoshka embedding dimensions that encode invariant semantic context (domain, language register, topic class). These dimensions are never searched for antithesis.
117117
3. **Search for m2** — Within the remaining (unfrozen) upper dimensions, search for the nearest medoid that represents semantic opposition to m1.
118-
4. **Compute centroid**`c = (m1_vec + m2_vec) / 2` (element-wise average over the unfrozen dimensions).
118+
4. **Compute centroid** — Compute `c` as follows:
119+
- Protected dimensions (index < `matryoshkaProtectedDim`): copy directly from m1. These dimensions are invariant; averaging them would dilute the domain anchor that makes the antithesis search meaningful.
120+
- Unfrozen dimensions (index >= `matryoshkaProtectedDim`): compute the element-wise average of m1 and m2 — `c[i] = (m1[i] + m2[i]) / 2`.
121+
- The result is a full-dimensional vector that can be used directly as a scoring anchor.
119122
5. **Prefer centroid as search origin** — Use `c` as the primary starting point for subgraph expansion. This prevents semantic drift toward either pole.
120123
6. **Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from step 3. Each unwinding broadens the antithesis search.
121124
7. **Stop at the protected dimension** — The protected lower dimensions are never unwound. This preserves semantic invariants throughout all levels of search.
@@ -181,14 +184,33 @@ This means CORTEX does not possess sufficient knowledge to provide an epistemica
181184
When a knowledge gap is detected, CORTEX broadcasts the incomplete Metroid as a curiosity probe to connected peers:
182185

183186
```
184-
CuriosityProbe = { m1, partialMetroid, queryContext, knowledgeBoundary }
187+
CuriosityProbe = {
188+
m1,
189+
partialMetroid,
190+
queryContext,
191+
knowledgeBoundary,
192+
mimeType,
193+
modelUrn
194+
}
185195
```
186196

187-
Where `knowledgeBoundary` encodes the dimensional layer where antithesis discovery failed. Peers receiving this probe:
197+
Where:
198+
- **m1** — the thesis medoid (the topic for which antithesis was not found)
199+
- **partialMetroid** — the incomplete Metroid at the boundary of local knowledge
200+
- **queryContext** — the original query embedding, used for scoring by the responding peer
201+
- **knowledgeBoundary** — the Matryoshka dimensional layer at which antithesis search failed
202+
- **mimeType** — the MIME type of the embedded content (e.g. `text/plain`, `image/jpeg`). Required so receiving peers can validate commensurability of their graph sections.
203+
- **modelUrn** — a URN identifying the specific embedding model and version used to produce the vectors (e.g. `urn:model:onnx-community/embeddinggemma-300m-ONNX:v1`). Peers **must** reject probes whose `modelUrn` does not match a model they can compare against. Accepting graph fragments embedded by a different model would produce incommensurable similarity scores at the dimensional boundaries where the models' Matryoshka layers overlap.
204+
205+
> **Why `mimeType` and `modelUrn` are required:**
206+
> Embedding models project content into incompatible latent spaces. A fragment embedded with `nomic-embed-text-v1.5` (matryoshkaProtectedDim=64) cannot be meaningfully compared against a fragment embedded with `embeddinggemma-300m` (matryoshkaProtectedDim=128). Without explicit model and content-type identity on the probe, a peer could return graph sections that appear similar by cosine score but are semantically incommensurable — introducing hallucination-equivalent errors at the knowledge boundary.
188207
189-
1. Search their own memory graphs for medoids that could serve as `m2`.
190-
2. If found, respond with the relevant graph fragment (subject to eligibility filtering; see Smart Sharing Guardrails).
191-
3. The originating node integrates the received fragment and may retry MetroidBuilder.
208+
Peers receiving this probe:
209+
210+
1. Verify `mimeType` and `modelUrn` match a supported local model.
211+
2. Search their own memory graphs for medoids that could serve as `m2` using the same embedding space.
212+
3. If found, respond with the relevant graph fragment (subject to eligibility filtering; see Smart Sharing Guardrails).
213+
4. The originating node integrates the received fragment and may retry MetroidBuilder.
192214

193215
This mechanism enables **distributed learning without hallucination**: the system discovers knowledge through structured peer exchange rather than generating plausible-sounding but ungrounded content.
194216

@@ -661,9 +683,9 @@ Smart sharing is a core capability, not a post-v1 extra. The v1 exchange path mu
661683

662684
**Medoid** (mathematical term): The existing memory node selected as the statistical representative of a cluster. Selected by minimising the sum of distances to all other nodes in the cluster. Used throughout algorithmic descriptions and internal implementation comments.
663685

664-
**Centroid** (mathematical term): The arithmetic mean of a set of vectors — a computed geometric point that may not correspond to any stored page. Used in MetroidBuilder to compute the balanced search origin `c`.
686+
**Centroid** (mathematical term): In MetroidBuilder, the centroid `c` is a full-dimensional vector where protected dimensions are copied from m1 (domain invariant) and unfrozen dimensions are the element-wise average of m1 and m2. Used as the balanced search origin in dialectical scoring.
665687

666-
**Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid between them. **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem.
688+
**Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid (protected dims from m1; unfrozen dims averaged). **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem.
667689

668690
**MetroidBuilder**: The CORTEX module responsible for constructing a Metroid for a given query via Matryoshka dimensional unwinding. Planned module: `cortex/MetroidBuilder.ts`.
669691

@@ -687,15 +709,22 @@ Smart sharing is a core capability, not a post-v1 extra. The v1 exchange path mu
687709

688710
## Model-Derived Numerics
689711

690-
**Critical Rule:** All numeric values derived from ML model architecture (embedding dimensions, context lengths, thresholds) must **never** be hardcoded as magic numbers.
712+
**Critical Rule:** All numeric values derived from ML model architecture (embedding dimensions, context lengths, thresholds, and Matryoshka sub-dimension boundaries) must **never** be hardcoded as magic numbers.
691713

692714
**Source of Truth:**
693-
- `core/ModelProfile.ts` — Interface definition
694-
- `core/ModelDefaults.ts` — Default fallback values
695-
- `core/BuiltInModelProfiles.ts` — Concrete model registrations
715+
- `core/ModelProfile.ts` — Interface definition (includes `matryoshkaProtectedDim`)
716+
- `core/ModelDefaults.ts` — Default derivation from seed values
717+
- `core/BuiltInModelProfiles.ts` — Concrete model registrations (includes per-model `matryoshkaProtectedDim`)
696718
- `core/ModelProfileResolver.ts` — Runtime resolution
697719

698-
**Enforcement:** `npm run guard:model-derived` scans for violations before CI merge.
720+
**Model-specific `matryoshkaProtectedDim` values (must be sourced from `BuiltInModelProfiles.ts`):**
721+
722+
| Model | `matryoshkaProtectedDim` | Notes |
723+
|-------|--------------------------|-------|
724+
| `onnx-community/embeddinggemma-300m-ONNX` | 128 | Smallest supported Matryoshka sub-dimension |
725+
| `nomic-ai/nomic-embed-text-v1.5` | 64 | To be added when nomic provider is wired |
726+
727+
**Enforcement:** `npm run guard:model-derived` scans for violations before CI merge. The guard now checks for `matryoshkaProtectedDim` in addition to the standard embedding dimension and context length fields.
699728

700729
## Policy-Derived Constants
701730

0 commit comments

Comments
 (0)