You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: align MetroidBuilder to original spec — cosine-opposite medoid, frozen centroid, loop structure
DESIGN.md:
- The Metroid: added conceptual framing — antithesis medoid (m2) produces the frozen
centroid (c) which becomes the stable platform for deeper exploration; added
philosophical foundation (centroid=gravitational pull, medoid=data point anchor;
neither alone sufficient); Metroid replaces prior sparse NN-graph constructions
- m2 definition: explicit parallel structure with m1; m2 is always an existing memory
node (medoid of cosine-opposite set), never a phantom computed position
- MetroidBuilder Algorithm: complete rewrite as thesis→freeze→antithesis→synthesis loop
- Step 1 (Thesis): medoid search for m1 (not centroid, always existing node)
- Step 2 (Freeze): lock protected Matryoshka dimensions
- Step 3 (Antithesis): score each candidate as -cosine_similarity in free dims;
find medoid of top-scoring (cosine-opposite) set — m2 is the medoid, not a
raw vector negation
- Step 4 (Synthesis): compute c once and freeze it; never recomputed
- Step 5 (Evaluate): all subsequent candidates measured against frozen c
- Steps 6-7: unwind and stop as before, but with frozen c invariant
- Matryoshka Dimensional Unwinding: new candidates evaluated against frozen c,
not a recomputed centroid; stop on knowledge gap → broadcast curiosity
- Terminology: Metroid and MetroidBuilder entries updated with frozen c and
cosine-opposite medoid algorithm
TODO.md P1-M:
- Added game-inspired framing (opposition becomes stepping stone via frozen c)
- Step-by-step algorithm: exact formula -cosine_similarity; medoid of top-scoring
candidates; frozen c never recomputed
- Exit criteria now explicitly mentions frozen centroid invariant
- Updated test cases: test c is frozen; m2 is medoid not vector negation
Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
Copy file name to clipboardExpand all lines: DESIGN.md
+86-30Lines changed: 86 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,52 +95,93 @@ Three separate mathematical constructs are central to CORTEX. They must never be
95
95
96
96
### The Metroid
97
97
98
-
A Metroid is a structured search probe used for epistemically balanced exploration of a topic.
98
+
A Metroid is a structured search primitive for epistemically balanced exploration of a topic.
99
+
100
+
The name captures a key architectural insight: what looks like an obstacle to progress — a medoid representing conceptual opposition — is not an enemy. The centroid computed from that opposition can be **held as a stable, frozen platform**, turning semantic divergence into a navigable step toward a goal. Every Metroid construction converts the antithesis (m2) into the anchor for the frozen centroid (c), which then provides structural support for deeper exploration.
101
+
102
+
A Metroid replaces all prior sparse nearest-neighbor graph constructions as the canonical mechanism for guided semantic exploration in CORTEX. Opposition, divergence, and curiosity-driven augmentation are the designed search dynamics — not similarity-chasing.
99
103
100
104
```
101
105
Metroid = { m1, m2, c }
102
106
```
103
107
104
108
Where:
105
-
-**m1** — thesis medoid: the cluster representative most relevant to the query topic
106
-
-**m2** — antithesis medoid: a cluster representative discovered through constrained Matryoshka search to represent semantic opposition to m1
107
-
-**c** — centroid: the synthetic center of mass between m1 and m2.
109
+
-**m1** — thesis medoid: found via medoid search from the query vector. A medoid (not a centroid) is always an existing memory node — it keeps the search on the correct conceptual road.
110
+
-**m2** — antithesis medoid: the medoid of the cosine-opposite set — not merely the nearest semantically-opposing node, but the **most coherent existing memory node in the direction of maximal divergence** from m1. Like m1, m2 is always an actual memory node, never a computed phantom position.
111
+
-**c** — centroid: the synthetic center of mass between m1 and m2, computed **once** and **frozen** as a stable platform.
108
112
`c` is a "Kansas space" position — typically empty; no real node lives at the centroid.
109
113
Its value is as a neutral vantage point: from `c`, distances to both poles and all
110
114
candidates can be measured without anchoring bias toward either m1 or m2.
111
115
116
+
**Philosophical foundation:** Centroids (means) provide gravitational pull toward the midpoint. Medoids (medians) keep the search on the right road by anchoring to actual existing nodes. Neither alone guarantees epistemic honesty. The Metroid loop combines them: the medoid ensures the search never drifts to a phantom position; the frozen centroid ensures all subsequent evaluation is unbiased between the poles.
117
+
112
118
The Metroid is constructed at query time by the `MetroidBuilder`. It is **not** a persistent graph structure. It is a transient epistemological instrument.
113
119
114
120
---
115
121
116
122
### MetroidBuilder Algorithm
117
123
118
-
1.**Select m1** — Identify the topic medoid most relevant to the query embedding.
119
-
2.**Freeze protected dimensions** — Lock the lower Matryoshka embedding dimensions that encode invariant semantic context (domain, language register, topic class). These dimensions are never searched for antithesis.
120
-
3.**Search for m2** — Within the remaining (unfrozen) upper dimensions, search for the nearest medoid that represents semantic opposition to m1.
121
-
4.**Compute centroid** — Compute `c` as a center of mass between m1 and m2:
122
-
- Protected dimensions (index < `matryoshkaProtectedDim`): copy directly from m1. These dimensions are invariant; averaging them would dilute the domain anchor that makes the antithesis search meaningful.
123
-
- Unfrozen dimensions (index >= `matryoshkaProtectedDim`): compute the element-wise average of m1 and m2 — `c[i] = (m1[i] + m2[i]) / 2`.
124
-
- The result is a full-dimensional vector that can be used directly as a scoring anchor.
125
-
126
-
**Important:**`c` is a synthetic position — a "Kansas space". In most cases nothing actually
127
-
exists at the centroid; it is an empty field in embedding space, equidistant from both poles.
128
-
Its value is as a neutral vantage point. Standing at `c`, you can immediately measure whether
129
-
any candidate is closer to m1 (thesis), closer to m2 (antithesis), or equidistant from both
130
-
(genuinely synthetic). Scoring by proximity to `c` produces unbiased, balanced retrieval.
131
-
Scoring from m1 or m2 would pull all results toward one pole.
132
-
5.**Use centroid as scoring vantage point** — Weight candidates by their distance to `c`, not to m1 or m2.
124
+
One full Metroid step is a **thesis → freeze → antithesis → synthesis** cycle:
125
+
126
+
1.**Thesis — Select m1** — From the query vector `q`, perform a medoid search to find `m1`: the
127
+
median representative of the most relevant cluster. A medoid is always an existing memory node,
128
+
ensuring the search stays on the correct conceptual road. Centroids (means) provide
129
+
gravitational pull; medoids (medians) provide the road.
130
+
131
+
2.**Freeze** — Lock the first `n` protected Matryoshka dimensions in place. These dimensions
132
+
encode invariant semantic context (domain, language register, topic class). Locking them
133
+
preserves early decisions as fixed structure — preventing the search from drifting into
134
+
vocabulary that shares surface-level patterns but belongs to a different conceptual domain.
135
+
136
+
3.**Antithesis — Find m2** — On the remaining free (unfrozen) dimensions:
137
+
- Compute the **cosine-opposite score** for every candidate medoid: score each candidate as
138
+
`-cosine_similarity(candidate_free_dims, m1_free_dims)`. The highest-scoring candidates are
139
+
farthest from m1 in the free dimensions — representing maximal conceptual divergence.
140
+
- Find the **medoid of that cosine-opposite set** (the top-scoring candidates). This is `m2`.
141
+
-`m2` is the medoid of the top-scoring candidates — not the result of a direct vector
142
+
negation. The medoid operation selects the most coherent existing memory node in the
143
+
direction of maximal divergence. The medoid operation ensures `m2` is always
144
+
an actual memory node.
145
+
146
+
4.**Synthesis — Freeze the centroid** — Compute `c` as the center of mass between m1 and m2
147
+
and immediately **freeze it**. `c` is computed once per Metroid construction and never
148
+
recalculated:
149
+
- Protected dimensions (index < `matryoshkaProtectedDim`): copy directly from m1. These
150
+
dimensions are invariant; averaging them would dilute the domain anchor.
151
+
- Free dimensions (index >= `matryoshkaProtectedDim`): element-wise average of m1 and m2 —
152
+
`c[i] = (m1[i] + m2[i]) / 2`.
153
+
-`c` is a "Kansas space" position — typically empty; no real node lives at the centroid.
154
+
Its value is as a neutral vantage point: from `c`, distances to both poles and all
155
+
candidates can be measured without anchoring bias toward either m1 or m2.
156
+
157
+
5.**Evaluate subsequent candidates against the frozen centroid** — All further medoids
158
+
(`m3`, `m4`, ...) found during Matryoshka unwinding are evaluated relative to this frozen `c`:
133
159
- Near `c`: synthesis territory — balanced between both poles.
134
160
- Much closer to m1 than to `c`: thesis-supporting.
135
161
- Much closer to m2 than to `c`: antithesis-supporting.
136
-
- Far from `c`, m1, and m2 simultaneously: a third conceptual region not captured by either pole — signal for further Matryoshka unwinding or a knowledge gap.
137
-
Scoring from `c` avoids anchoring bias; see the Dialectical Search section for the full zone model.
138
-
6.**Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from step 3. Each unwinding broadens the antithesis search.
139
-
7.**Stop at the protected dimension** — The protected lower dimensions are never unwound. This preserves semantic invariants throughout all levels of search.
162
+
- Far from `c`, m1, and m2 simultaneously: third conceptual region — signal for further
163
+
unwinding or a knowledge gap.
164
+
The centroid is a platform. Opposition has been frozen into a stepping stone.
165
+
166
+
6.**Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from
167
+
step 3. Each unwinding broadens the antithesis search space. Subsequent antithesis candidates
168
+
are still evaluated relative to the original frozen `c` — it is never recomputed.
169
+
170
+
7.**Stop at the protected dimension** — The protected lower dimensions are never unwound. Once
171
+
the Matryoshka unwind has reached the protected floor, no further antithesis search is possible.
172
+
If no satisfactory `m2` was found at any layer, set `knowledge_gap = true` and broadcast a
173
+
curiosity query (see Knowledge Gap Detection).
140
174
141
175
**Why protect dimensions?**
142
176
143
-
Without dimensional protection, high-dimensional similarity in unrelated vocabulary can dominate the search. Specifically, upper Matryoshka dimensions encode fine-grained distinctions that may closely match surface-level word patterns regardless of topic. Protected lower dimensions encode domain context (e.g., "food/cooking") that anchors the search. Without this anchor, a query about pizza toppings could accumulate similarity mass toward adhesive-related terms in the high dimensions — because words describing how things stick together are statistically present in both culinary and industrial glue contexts. The protected dimensions ensure the culinary domain context is never overridden by this incidental high-dimensional similarity.
177
+
Without dimensional protection, high-dimensional similarity in unrelated vocabulary can dominate
178
+
the search. Upper Matryoshka dimensions encode fine-grained distinctions that may closely match
179
+
surface-level word patterns regardless of topic. Protected lower dimensions encode domain context
180
+
(e.g., "food/cooking") that anchors the search. Without this anchor, a query about pizza toppings
181
+
could accumulate similarity mass toward adhesive-related terms — because words describing how
182
+
things stick together are statistically present in both culinary and industrial glue contexts.
183
+
The protected dimensions ensure the culinary domain context is never overridden by this incidental
184
+
high-dimensional similarity.
144
185
145
186
---
146
187
@@ -154,10 +195,15 @@ CORTEX uses Matryoshka Representation Learning (MRL) models that pack semantic i
154
195
At each unwinding step:
155
196
1. The protected dimension boundary shifts one layer outward.
156
197
2. The antithesis search space expands into the newly freed dimensions.
157
-
3. A new `m2` candidate is evaluated against the expanded space.
158
-
4. The Metroid `{ m1, m2, c }` is recomputed with the updated `m2`.
198
+
3. A new `m2` candidate is found via cosine-opposite medoid search in the expanded space.
199
+
4. The new candidate is evaluated relative to the **frozen**`c` (computed in the first synthesis
200
+
step and never recalculated). If it is close enough to `c`, the step is accepted; otherwise
201
+
the search continues unwinding or declares a knowledge gap.
159
202
160
-
This produces progressively wider dialectical exploration while maintaining semantic coherence. The search terminates either when the protected dimension is reached or when a satisfactory `m2` is found.
203
+
This produces progressively wider dialectical exploration while maintaining semantic coherence.
204
+
The frozen centroid ensures that each expansion step is measured against a stable platform rather
205
+
than a shifting target. The search terminates either when the protected dimension floor is reached
206
+
or when a satisfactory `m2` is found.
161
207
162
208
---
163
209
@@ -729,9 +775,19 @@ where nothing in the memory graph typically exists. Its value is as a neutral va
729
775
scoring candidates by distance to `c` gives equal weight to both poles. A candidate closer to m1
730
776
is thesis-supporting; closer to m2 is antithesis-supporting; near `c` is genuinely balanced.
731
777
732
-
**Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid (protected dims from m1; unfrozen dims averaged). **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem.
733
-
734
-
**MetroidBuilder**: The CORTEX module responsible for constructing a Metroid for a given query via Matryoshka dimensional unwinding. Planned module: `cortex/MetroidBuilder.ts`.
778
+
**Metroid** (CORTEX architectural term): A structured dialectical search primitive constructed at
779
+
query time: `{ m1, m2, c }`. m1 is the thesis medoid (found via medoid search from query vector q);
780
+
m2 is the antithesis medoid (the medoid of the cosine-opposite set in the free dimensions — not
781
+
merely a semantically-opposing node, but the most coherent representative of maximal divergence);
782
+
c is the centroid (protected dims from m1; free dims averaged), computed **once and frozen** as a
783
+
stable evaluation platform. All subsequent candidates in the Matryoshka unwind are evaluated
784
+
relative to this frozen c. **A Metroid is never stored as a persistent graph structure.** It is an
785
+
ephemeral instrument used by the CORTEX retrieval subsystem.
786
+
787
+
**MetroidBuilder**: The CORTEX module responsible for constructing a Metroid for a given query via
788
+
Matryoshka dimensional unwinding. Runs the thesis→freeze→antithesis→synthesis loop: m1 via medoid
789
+
search; m2 via cosine-opposite medoid; c computed once and frozen; subsequent candidates evaluated
790
+
relative to frozen c. Planned module: `cortex/MetroidBuilder.ts`.
735
791
736
792
**Semantic neighbor graph** (also: proximity graph, neighbor graph): The sparse radius-graph of cosine-similarity edges between pages, used for subgraph expansion during retrieval. This is **not** the same as a Metroid. The edges connect pages with high cosine similarity and are used for BFS expansion. Currently named `MetroidNeighbor` / `metroid_neighbors` in the codebase — this is a naming error that must be corrected (tracked in TODO as P0-X).
**Why:** MetroidBuilder is the core of what makes CORTEX an _epistemic_ system rather than a vector search engine. Without it, the system merely returns nearest neighbors and cannot explore opposing perspectives, detect knowledge gaps, or trigger P2P curiosity requests.
357
+
**Why:** MetroidBuilder is the core of what makes CORTEX an _epistemic_ system rather than a vector search engine. Without it, the system merely returns nearest neighbors and cannot explore opposing perspectives, detect knowledge gaps, or trigger P2P curiosity requests. The Metroid loop converts conceptual opposition into navigable exploration steps.
- Accept a query embedding and a list of resident medoids (shelf/volume/book representatives)
361
-
- Select m1: the medoid with highest cosine similarity to the query
362
-
- Read `matryoshkaProtectedDim` from `ModelProfile` (the field added to `core/ModelProfile.ts` as the per-model protected floor — e.g. 128 for embeddinggemma-300m, 64 for nomic-embed-text-v1.5). If `undefined` on the current model, return `{ m1, m2: null, c: null, knowledgeGap: true }` immediately.
363
-
- Freeze all dimensions with index < `matryoshkaProtectedDim`
364
-
- In the unfrozen upper dimensions (index >= `matryoshkaProtectedDim`), search for the nearest medoid with **opposing** semantic direction (minimum cosine similarity above a negative threshold, or maximum angular distance)
0 commit comments