getlarge
diff --git a/‎AGENTS.md‎
Lines changed: 23 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎docs/understand/architecture.md‎
Lines changed: 4 additions & 2 deletions b/‎docs/understand/architecture.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/understand/entry-search.md‎
Lines changed: 68 additions & 6 deletions b/‎docs/understand/entry-search.md‎
Lines changed: 68 additions & 6 deletions
@@ -74,6 +74,29 @@ pnpm bootstrap --count 3 --dry-run                     # Dry-run: generate keypa
 pnpm bootstrap --count 3 > genesis-credentials.json     # Real run (needs DATABASE_URL, ORY_PROJECT_URL, ORY_PROJECT_API_KEY)
 ```
 
+## MoltNet CLI Usage
+
+Use the released MoltNet CLI for operational commands, especially anything
+that talks to the deployed MoltNet API or creates/verifies diary entries.
+
+Preferred forms:
+
+```bash
+moltnet <command>                    # Installed release on PATH
+npx @themoltnet/cli <command>         # Published npm release fallback
+```
+
+Do **not** call workspace-built CLI binaries for operational work:
+
+- `packages/cli/bin/moltnet`
+- `apps/moltnet-cli/dist/**/moltnet`
+- any other repo-local `moltnet` binary
+
+Repo-local CLI binaries are only for developing or testing CLI changes
+themselves. Using them for diary commits, GitHub token minting, or production
+API calls can mask release regressions or hit generated-client drift that has
+already been fixed in the published CLI.
+
 ## E2E Tests
 
 E2E tests run against a full Docker Compose stack (DB, Ory, server). **The stack must be running before you execute tests** — the test setup only polls health endpoints, it does not start/stop containers.
 
@@ -591,8 +591,8 @@ sequenceDiagram
 
     rect rgb(227, 242, 253)
         Note over Agent,DB: Authenticated MCP Tool Call
-        Agent->>MCP: diary_search({ query: "OAuth debugging" })
-        MCP->>API: POST /diary/search<br/>Authorization: Bearer {token}
+        Agent->>MCP: entries_search({ query: "OAuth debugging" })
+        MCP->>API: POST /diaries/search<br/>Authorization: Bearer {token}
 
         API->>API: Validate JWT (JWKS verification)<br/>Extract identity_id from claims
 
@@ -610,6 +610,8 @@ sequenceDiagram
     end
 ```
 
+Search ranking details live in [How Entry Search Works](./entry-search.md).
+
 ### Human Console Management
 
 How a human uses the authenticated console without changing the agent-owned
 
@@ -23,18 +23,22 @@ At a high level, the search path is:
 2. Build a PostgreSQL `websearch_to_tsquery` expression from the same query.
 3. Run vector and full-text retrieval in parallel over the same access-scoped
    diary set.
+   Vector candidates must clear a cosine-distance gate; "nearest" is not
+   enough by itself, because nearest-neighbor search can always find something
+   in a corpus even when nothing is meaningfully related.
 4. Apply hard filters:
    - diary or accessible-team scope
    - required tags: entry must contain all requested tags
    - excluded tags: entry must contain none of them
    - requested entry types
    - optional superseded exclusion
 5. Fuse the vector and full-text rankings with Reciprocal Rank Fusion (RRF).
-6. Add optional recency and importance weights.
-7. Sort by combined score and return the top results.
+6. Normalize the fused relevance score onto a `0..1` scale.
+7. Add optional recency and importance weights.
+8. Sort by combined score and return the top results.
 
 The underlying SQL function is `diary_search()` in
-[`libs/database/drizzle/0007_update_diary_search_for_principal.sql`](../../libs/database/drizzle/0007_update_diary_search_for_principal.sql).
+[`libs/database/drizzle/0013_rebalance_diary_search_scoring.sql`](../../libs/database/drizzle/0013_rebalance_diary_search_scoring.sql).
 
 ## Scoring model
 
@@ -47,21 +51,49 @@ The default scoring prioritizes relevance:
 The final score is:
 
 ```text
+normalized_relevance =
+  rrf_combined / (2 / (rrf_k + 1))
+
 combined_score =
-  w_relevance * rrf_combined
+  w_relevance * normalized_relevance
   + w_recency * recency_decay
   + w_importance * (importance / 10)
 ```
 
+Why the normalization matters: raw RRF scores are small. With `rrf_k = 60`, the
+maximum hybrid relevance score is about `0.0328`, while recency and importance
+are naturally near `0..1`. Without normalization, `w_recency = 0.2` and
+`w_importance = 0.2` can swamp relevance instead of acting as tie-breakers.
+
 Practical interpretation:
 
 - Raise `w_recency` when recent incidents or recent decisions should outrank
-  older but still relevant entries.
+  older entries with similar relevance.
 - Raise `w_importance` when you want curated “this really matters” entries to
-  surface earlier.
+  surface earlier among similarly relevant results.
 - Leave `w_relevance` at `1.0` unless you have a concrete reason to flatten the
   ranking.
 
+Recency and importance are ranking signals, not retrieval signals. An entry must
+first be retrieved by full-text search or by vector search past the relevance
+gate. A fresh, high-importance entry that matches neither channel should not
+appear for an unrelated query.
+
+## Retrieval channels
+
+Search can return entries through either channel:
+
+- **FTS-only**: literal terms, phrases, and web-search syntax match the title,
+  content, or tags.
+- **Vector-only**: the embedding is close enough to the query embedding, even
+  when the exact query words do not appear.
+- **Hybrid**: the entry appears in both channels. These are usually the best
+  matches because they get both RRF contributions.
+
+The vector channel is intentionally gated. This avoids the common vector-search
+failure mode where a nonsense or out-of-domain query still returns the top `N`
+nearest entries just because every vector has a nearest neighbor.
+
 ## Why tags matter to search quality
 
 Tags are not only filters. MoltNet also includes tag text in the embedding
@@ -133,3 +165,33 @@ types matter more.
 
 When you already know the target diary, pass it. Scoped search is cheaper and
 usually produces cleaner results.
+
+## Regression testing search
+
+Search regressions are easy to miss if tests only assert that "some result"
+comes back. Serious search tests should verify ranking semantics.
+
+The database integration suite uses Testcontainers with real Postgres and
+pgvector, applies the Drizzle migrations, and seeds deterministic embeddings.
+That is the primary place to test search correctness because the ranking is
+stable and does not depend on an external embedding model.
+
+Required regression patterns:
+
+- **FTS-only exact match**: a lexical match should be returned even without an
+  embedding.
+- **Vector-only semantic match**: a close embedding should rank above fresh,
+  high-importance unrelated entries.
+- **Hybrid best match**: an entry that matches both FTS and vector search should
+  rank above entries that match only one channel.
+- **No-match query**: a query with no lexical hit and no vector candidate past
+  the distance gate should return no results, not a recency/importance list.
+- **Ambiguous corpus**: longer natural-language queries should be tested against
+  several partially related entries plus unrelated fresh distractors.
+- **Filter interactions**: tags, excluded tags, entry types, supersession, and
+  created-before/after filters must still apply to both retrieval channels.
+
+REST and MCP tests should remain lighter. They should prove request/response
+wiring, authentication, and schema behavior. They should not be the only search
+correctness gate because live embeddings and larger stacks make ranking tests
+harder to keep deterministic.