Skip to content

Commit 62a719e

Browse files
Copilotdevlux76
andcommitted
docs: align MetroidBuilder to original spec — cosine-opposite medoid, frozen centroid, loop structure
DESIGN.md: - The Metroid: added conceptual framing — antithesis medoid (m2) produces the frozen centroid (c) which becomes the stable platform for deeper exploration; added philosophical foundation (centroid=gravitational pull, medoid=data point anchor; neither alone sufficient); Metroid replaces prior sparse NN-graph constructions - m2 definition: explicit parallel structure with m1; m2 is always an existing memory node (medoid of cosine-opposite set), never a phantom computed position - MetroidBuilder Algorithm: complete rewrite as thesis→freeze→antithesis→synthesis loop - Step 1 (Thesis): medoid search for m1 (not centroid, always existing node) - Step 2 (Freeze): lock protected Matryoshka dimensions - Step 3 (Antithesis): score each candidate as -cosine_similarity in free dims; find medoid of top-scoring (cosine-opposite) set — m2 is the medoid, not a raw vector negation - Step 4 (Synthesis): compute c once and freeze it; never recomputed - Step 5 (Evaluate): all subsequent candidates measured against frozen c - Steps 6-7: unwind and stop as before, but with frozen c invariant - Matryoshka Dimensional Unwinding: new candidates evaluated against frozen c, not a recomputed centroid; stop on knowledge gap → broadcast curiosity - Terminology: Metroid and MetroidBuilder entries updated with frozen c and cosine-opposite medoid algorithm TODO.md P1-M: - Added game-inspired framing (opposition becomes stepping stone via frozen c) - Step-by-step algorithm: exact formula -cosine_similarity; medoid of top-scoring candidates; frozen c never recomputed - Exit criteria now explicitly mentions frozen centroid invariant - Updated test cases: test c is frozen; m2 is medoid not vector negation Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
1 parent 622570b commit 62a719e

2 files changed

Lines changed: 128 additions & 47 deletions

File tree

DESIGN.md

Lines changed: 86 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -95,52 +95,93 @@ Three separate mathematical constructs are central to CORTEX. They must never be
9595

9696
### The Metroid
9797

98-
A Metroid is a structured search probe used for epistemically balanced exploration of a topic.
98+
A Metroid is a structured search primitive for epistemically balanced exploration of a topic.
99+
100+
The name captures a key architectural insight: what looks like an obstacle to progress — a medoid representing conceptual opposition — is not an enemy. The centroid computed from that opposition can be **held as a stable, frozen platform**, turning semantic divergence into a navigable step toward a goal. Every Metroid construction converts the antithesis (m2) into the anchor for the frozen centroid (c), which then provides structural support for deeper exploration.
101+
102+
A Metroid replaces all prior sparse nearest-neighbor graph constructions as the canonical mechanism for guided semantic exploration in CORTEX. Opposition, divergence, and curiosity-driven augmentation are the designed search dynamics — not similarity-chasing.
99103

100104
```
101105
Metroid = { m1, m2, c }
102106
```
103107

104108
Where:
105-
- **m1** — thesis medoid: the cluster representative most relevant to the query topic
106-
- **m2** — antithesis medoid: a cluster representative discovered through constrained Matryoshka search to represent semantic opposition to m1
107-
- **c** — centroid: the synthetic center of mass between m1 and m2.
109+
- **m1** — thesis medoid: found via medoid search from the query vector. A medoid (not a centroid) is always an existing memory node — it keeps the search on the correct conceptual road.
110+
- **m2** — antithesis medoid: the medoid of the cosine-opposite set — not merely the nearest semantically-opposing node, but the **most coherent existing memory node in the direction of maximal divergence** from m1. Like m1, m2 is always an actual memory node, never a computed phantom position.
111+
- **c** — centroid: the synthetic center of mass between m1 and m2, computed **once** and **frozen** as a stable platform.
108112
`c` is a "Kansas space" position — typically empty; no real node lives at the centroid.
109113
Its value is as a neutral vantage point: from `c`, distances to both poles and all
110114
candidates can be measured without anchoring bias toward either m1 or m2.
111115

116+
**Philosophical foundation:** Centroids (means) provide gravitational pull toward the midpoint. Medoids (medians) keep the search on the right road by anchoring to actual existing nodes. Neither alone guarantees epistemic honesty. The Metroid loop combines them: the medoid ensures the search never drifts to a phantom position; the frozen centroid ensures all subsequent evaluation is unbiased between the poles.
117+
112118
The Metroid is constructed at query time by the `MetroidBuilder`. It is **not** a persistent graph structure. It is a transient epistemological instrument.
113119

114120
---
115121

116122
### MetroidBuilder Algorithm
117123

118-
1. **Select m1** — Identify the topic medoid most relevant to the query embedding.
119-
2. **Freeze protected dimensions** — Lock the lower Matryoshka embedding dimensions that encode invariant semantic context (domain, language register, topic class). These dimensions are never searched for antithesis.
120-
3. **Search for m2** — Within the remaining (unfrozen) upper dimensions, search for the nearest medoid that represents semantic opposition to m1.
121-
4. **Compute centroid** — Compute `c` as a center of mass between m1 and m2:
122-
- Protected dimensions (index < `matryoshkaProtectedDim`): copy directly from m1. These dimensions are invariant; averaging them would dilute the domain anchor that makes the antithesis search meaningful.
123-
- Unfrozen dimensions (index >= `matryoshkaProtectedDim`): compute the element-wise average of m1 and m2 — `c[i] = (m1[i] + m2[i]) / 2`.
124-
- The result is a full-dimensional vector that can be used directly as a scoring anchor.
125-
126-
**Important:** `c` is a synthetic position — a "Kansas space". In most cases nothing actually
127-
exists at the centroid; it is an empty field in embedding space, equidistant from both poles.
128-
Its value is as a neutral vantage point. Standing at `c`, you can immediately measure whether
129-
any candidate is closer to m1 (thesis), closer to m2 (antithesis), or equidistant from both
130-
(genuinely synthetic). Scoring by proximity to `c` produces unbiased, balanced retrieval.
131-
Scoring from m1 or m2 would pull all results toward one pole.
132-
5. **Use centroid as scoring vantage point** — Weight candidates by their distance to `c`, not to m1 or m2.
124+
One full Metroid step is a **thesis → freeze → antithesis → synthesis** cycle:
125+
126+
1. **Thesis — Select m1** — From the query vector `q`, perform a medoid search to find `m1`: the
127+
median representative of the most relevant cluster. A medoid is always an existing memory node,
128+
ensuring the search stays on the correct conceptual road. Centroids (means) provide
129+
gravitational pull; medoids (medians) provide the road.
130+
131+
2. **Freeze** — Lock the first `n` protected Matryoshka dimensions in place. These dimensions
132+
encode invariant semantic context (domain, language register, topic class). Locking them
133+
preserves early decisions as fixed structure — preventing the search from drifting into
134+
vocabulary that shares surface-level patterns but belongs to a different conceptual domain.
135+
136+
3. **Antithesis — Find m2** — On the remaining free (unfrozen) dimensions:
137+
- Compute the **cosine-opposite score** for every candidate medoid: score each candidate as
138+
`-cosine_similarity(candidate_free_dims, m1_free_dims)`. The highest-scoring candidates are
139+
farthest from m1 in the free dimensions — representing maximal conceptual divergence.
140+
- Find the **medoid of that cosine-opposite set** (the top-scoring candidates). This is `m2`.
141+
- `m2` is the medoid of the top-scoring candidates — not the result of a direct vector
142+
negation. The medoid operation selects the most coherent existing memory node in the
143+
direction of maximal divergence. The medoid operation ensures `m2` is always
144+
an actual memory node.
145+
146+
4. **Synthesis — Freeze the centroid** — Compute `c` as the center of mass between m1 and m2
147+
and immediately **freeze it**. `c` is computed once per Metroid construction and never
148+
recalculated:
149+
- Protected dimensions (index < `matryoshkaProtectedDim`): copy directly from m1. These
150+
dimensions are invariant; averaging them would dilute the domain anchor.
151+
- Free dimensions (index >= `matryoshkaProtectedDim`): element-wise average of m1 and m2 —
152+
`c[i] = (m1[i] + m2[i]) / 2`.
153+
- `c` is a "Kansas space" position — typically empty; no real node lives at the centroid.
154+
Its value is as a neutral vantage point: from `c`, distances to both poles and all
155+
candidates can be measured without anchoring bias toward either m1 or m2.
156+
157+
5. **Evaluate subsequent candidates against the frozen centroid** — All further medoids
158+
(`m3`, `m4`, ...) found during Matryoshka unwinding are evaluated relative to this frozen `c`:
133159
- Near `c`: synthesis territory — balanced between both poles.
134160
- Much closer to m1 than to `c`: thesis-supporting.
135161
- Much closer to m2 than to `c`: antithesis-supporting.
136-
- Far from `c`, m1, and m2 simultaneously: a third conceptual region not captured by either pole — signal for further Matryoshka unwinding or a knowledge gap.
137-
Scoring from `c` avoids anchoring bias; see the Dialectical Search section for the full zone model.
138-
6. **Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from step 3. Each unwinding broadens the antithesis search.
139-
7. **Stop at the protected dimension** — The protected lower dimensions are never unwound. This preserves semantic invariants throughout all levels of search.
162+
- Far from `c`, m1, and m2 simultaneously: third conceptual region — signal for further
163+
unwinding or a knowledge gap.
164+
The centroid is a platform. Opposition has been frozen into a stepping stone.
165+
166+
6. **Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from
167+
step 3. Each unwinding broadens the antithesis search space. Subsequent antithesis candidates
168+
are still evaluated relative to the original frozen `c` — it is never recomputed.
169+
170+
7. **Stop at the protected dimension** — The protected lower dimensions are never unwound. Once
171+
the Matryoshka unwind has reached the protected floor, no further antithesis search is possible.
172+
If no satisfactory `m2` was found at any layer, set `knowledge_gap = true` and broadcast a
173+
curiosity query (see Knowledge Gap Detection).
140174

141175
**Why protect dimensions?**
142176

143-
Without dimensional protection, high-dimensional similarity in unrelated vocabulary can dominate the search. Specifically, upper Matryoshka dimensions encode fine-grained distinctions that may closely match surface-level word patterns regardless of topic. Protected lower dimensions encode domain context (e.g., "food/cooking") that anchors the search. Without this anchor, a query about pizza toppings could accumulate similarity mass toward adhesive-related terms in the high dimensions — because words describing how things stick together are statistically present in both culinary and industrial glue contexts. The protected dimensions ensure the culinary domain context is never overridden by this incidental high-dimensional similarity.
177+
Without dimensional protection, high-dimensional similarity in unrelated vocabulary can dominate
178+
the search. Upper Matryoshka dimensions encode fine-grained distinctions that may closely match
179+
surface-level word patterns regardless of topic. Protected lower dimensions encode domain context
180+
(e.g., "food/cooking") that anchors the search. Without this anchor, a query about pizza toppings
181+
could accumulate similarity mass toward adhesive-related terms — because words describing how
182+
things stick together are statistically present in both culinary and industrial glue contexts.
183+
The protected dimensions ensure the culinary domain context is never overridden by this incidental
184+
high-dimensional similarity.
144185

145186
---
146187

@@ -154,10 +195,15 @@ CORTEX uses Matryoshka Representation Learning (MRL) models that pack semantic i
154195
At each unwinding step:
155196
1. The protected dimension boundary shifts one layer outward.
156197
2. The antithesis search space expands into the newly freed dimensions.
157-
3. A new `m2` candidate is evaluated against the expanded space.
158-
4. The Metroid `{ m1, m2, c }` is recomputed with the updated `m2`.
198+
3. A new `m2` candidate is found via cosine-opposite medoid search in the expanded space.
199+
4. The new candidate is evaluated relative to the **frozen** `c` (computed in the first synthesis
200+
step and never recalculated). If it is close enough to `c`, the step is accepted; otherwise
201+
the search continues unwinding or declares a knowledge gap.
159202

160-
This produces progressively wider dialectical exploration while maintaining semantic coherence. The search terminates either when the protected dimension is reached or when a satisfactory `m2` is found.
203+
This produces progressively wider dialectical exploration while maintaining semantic coherence.
204+
The frozen centroid ensures that each expansion step is measured against a stable platform rather
205+
than a shifting target. The search terminates either when the protected dimension floor is reached
206+
or when a satisfactory `m2` is found.
161207

162208
---
163209

@@ -729,9 +775,19 @@ where nothing in the memory graph typically exists. Its value is as a neutral va
729775
scoring candidates by distance to `c` gives equal weight to both poles. A candidate closer to m1
730776
is thesis-supporting; closer to m2 is antithesis-supporting; near `c` is genuinely balanced.
731777

732-
**Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid (protected dims from m1; unfrozen dims averaged). **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem.
733-
734-
**MetroidBuilder**: The CORTEX module responsible for constructing a Metroid for a given query via Matryoshka dimensional unwinding. Planned module: `cortex/MetroidBuilder.ts`.
778+
**Metroid** (CORTEX architectural term): A structured dialectical search primitive constructed at
779+
query time: `{ m1, m2, c }`. m1 is the thesis medoid (found via medoid search from query vector q);
780+
m2 is the antithesis medoid (the medoid of the cosine-opposite set in the free dimensions — not
781+
merely a semantically-opposing node, but the most coherent representative of maximal divergence);
782+
c is the centroid (protected dims from m1; free dims averaged), computed **once and frozen** as a
783+
stable evaluation platform. All subsequent candidates in the Matryoshka unwind are evaluated
784+
relative to this frozen c. **A Metroid is never stored as a persistent graph structure.** It is an
785+
ephemeral instrument used by the CORTEX retrieval subsystem.
786+
787+
**MetroidBuilder**: The CORTEX module responsible for constructing a Metroid for a given query via
788+
Matryoshka dimensional unwinding. Runs the thesis→freeze→antithesis→synthesis loop: m1 via medoid
789+
search; m2 via cosine-opposite medoid; c computed once and frozen; subsequent candidates evaluated
790+
relative to frozen c. Planned module: `cortex/MetroidBuilder.ts`.
735791

736792
**Semantic neighbor graph** (also: proximity graph, neighbor graph): The sparse radius-graph of cosine-similarity edges between pages, used for subgraph expansion during retrieval. This is **not** the same as a Metroid. The edges connect pages with high cosine similarity and are used for BFS expansion. Currently named `MetroidNeighbor` / `metroid_neighbors` in the codebase — this is a naming error that must be corrected (tracked in TODO as P0-X).
737793

TODO.md

Lines changed: 42 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -354,35 +354,60 @@ These items add hierarchical routing and coherent path ordering. They transform
354354

355355
### P1-M: MetroidBuilder (DELIVERS: dialectical epistemology)
356356

357-
**Why:** MetroidBuilder is the core of what makes CORTEX an _epistemic_ system rather than a vector search engine. Without it, the system merely returns nearest neighbors and cannot explore opposing perspectives, detect knowledge gaps, or trigger P2P curiosity requests.
357+
**Why:** MetroidBuilder is the core of what makes CORTEX an _epistemic_ system rather than a vector search engine. Without it, the system merely returns nearest neighbors and cannot explore opposing perspectives, detect knowledge gaps, or trigger P2P curiosity requests. The Metroid loop converts conceptual opposition into navigable exploration steps.
358358

359359
- [ ] **P1-M1:** Implement `cortex/MetroidBuilder.ts`
360-
- Accept a query embedding and a list of resident medoids (shelf/volume/book representatives)
361-
- Select m1: the medoid with highest cosine similarity to the query
362-
- Read `matryoshkaProtectedDim` from `ModelProfile` (the field added to `core/ModelProfile.ts` as the per-model protected floor — e.g. 128 for embeddinggemma-300m, 64 for nomic-embed-text-v1.5). If `undefined` on the current model, return `{ m1, m2: null, c: null, knowledgeGap: true }` immediately.
363-
- Freeze all dimensions with index < `matryoshkaProtectedDim`
364-
- In the unfrozen upper dimensions (index >= `matryoshkaProtectedDim`), search for the nearest medoid with **opposing** semantic direction (minimum cosine similarity above a negative threshold, or maximum angular distance)
365-
- This medoid becomes m2 (antithesis)
366-
- Compute centroid: protected dims (< matryoshkaProtectedDim) copied from m1 vector; unfrozen dims averaged element-wise: `c[i] = (m1[i] + m2[i]) / 2`
367-
- Return `Metroid { m1, m2, c }`; if no valid m2 found, return `{ m1, m2: null, c: null, knowledgeGap: true }`
360+
- Accept a query embedding `q` and a list of resident medoids (shelf/volume/book representatives)
361+
- **Thesis (select m1):** Find `m1` via medoid search — the medoid minimizing distance to `q`. A
362+
medoid (not a centroid) is always an existing memory node; it ensures the search anchor is an
363+
actual data point rather than an averaged phantom position. This keeps the search on the
364+
correct conceptual road.
365+
- Read `matryoshkaProtectedDim` from `ModelProfile` (e.g. 128 for embeddinggemma-300m, 64 for
366+
nomic-embed-text-v1.5). If `undefined` on the current model (non-Matryoshka), return
367+
`{ m1, m2: null, c: null, knowledgeGap: true }` immediately.
368+
- **Freeze:** Lock all dimensions with index < `matryoshkaProtectedDim`.
369+
- **Antithesis (find m2):** In the unfrozen upper dimensions (index >= `matryoshkaProtectedDim`):
370+
1. Score every candidate medoid as `-cosine_similarity(candidate_free_dims, m1_free_dims)`.
371+
The highest-scoring candidates are farthest from m1 in the free dimensions — maximal
372+
conceptual divergence.
373+
2. Find the **medoid of that cosine-opposite set** (the top-scoring candidates). This is `m2`.
374+
3. `m2` must be an existing memory node (not a computed position). The medoid operation
375+
ensures this. This is distinct from simply finding the node with the lowest cosine
376+
similarity to m1.
377+
- **Synthesis (freeze centroid):** Compute `c` once and freeze it:
378+
- Protected dims (< `matryoshkaProtectedDim`): copy from m1 (domain invariant).
379+
- Free dims (>= `matryoshkaProtectedDim`): `c[i] = (m1[i] + m2[i]) / 2`.
380+
- This frozen `c` is never recalculated. All future candidates in the Matryoshka unwind are
381+
evaluated relative to this frozen platform.
382+
- Return `Metroid { m1, m2, c }`; if no valid m2 found, return
383+
`{ m1, m2: null, c: null, knowledgeGap: true }`
368384

369385
- [ ] **P1-M2:** Implement Matryoshka dimensional unwinding in `cortex/MetroidBuilder.ts`
370-
- After initial Metroid construction, progressively expand the antithesis search into deeper embedding layers
371-
- At each step, lower the protected dimension boundary by one Matryoshka tier
372-
- Re-evaluate `m2` at each tier; prefer the deepest tier's Metroid as the final result
373-
- Stop when the protected dimension floor is reached
386+
- After the initial Metroid construction, progressively expand the antithesis search into deeper
387+
embedding layers by shifting the protected dimension boundary outward one Matryoshka tier at a
388+
time.
389+
- At each new tier, find a new `m2` candidate via cosine-opposite medoid search in the expanded
390+
free dimensions.
391+
- Evaluate each candidate against the **frozen** `c` (not a recomputed centroid). If close
392+
enough to `c`, accept and freeze this step; take the next conceptual leap. If not,
393+
continue unwinding.
394+
- Stop when the protected dimension floor is reached or a satisfactory `m2` is accepted.
395+
- If no satisfactory `m2` is found at any layer, return `knowledgeGap: true`.
374396

375397
- [ ] **P1-M3:** Add MetroidBuilder test coverage
376398
- `tests/cortex/MetroidBuilder.test.ts`
377-
- Test m1 selection: highest similarity medoid is chosen
378-
- Test m2 selection: most semantically opposite medoid is chosen
379-
- Test centroid computation: midpoint between m1 and m2 vectors
399+
- Test m1 selection: the medoid minimising distance to q is chosen (not the centroid)
400+
- Test m2 selection: medoid of cosine-opposite set — not merely nearest semantically-opposing node
401+
- Test centroid computation: protected dims copied from m1; free dims averaged element-wise
402+
- Test centroid is frozen: subsequent unwinding steps do not recompute c
380403
- Test dimensional unwinding: search expands progressively through Matryoshka layers
381404
- Test knowledge gap: when no valid m2 exists in any layer, returns `knowledgeGap: true`
382405
- Test protected dimensions are never searched for antithesis
383406
- Test determinism: same inputs always produce same Metroid
384407

385-
**Exit Criteria:** MetroidBuilder constructs valid Metroids and correctly detects knowledge gaps.
408+
**Exit Criteria:** MetroidBuilder constructs valid Metroids (m1 via medoid search, m2 via
409+
cosine-opposite medoid of the top-scoring candidates, c computed once and never recomputed during
410+
Matryoshka unwinding) and correctly detects knowledge gaps.
386411

387412
---
388413

0 commit comments

Comments
 (0)