feat(search): semantic search MCP tools + tiered context mode (B-005 Phase 2)#124
Merged
George-iam merged 3 commits intomainfrom Apr 29, 2026
Merged
Conversation
Adds three new MCP tools and an opt-in tiered context-loading mode:
axme_get_memory(slug) - full body of one memory
axme_get_decision(id_or_slug) - full body of one decision
axme_search_kb(query, type?, k?) - semantic search across both
These enable agents on KBs >100 entries to navigate without loading every
body at session start. Available in both modes (full + search) so they
also serve as fuzzy-lookup tools mid-session.
## Two context modes (config.context.mode)
- full - default. Every memory and decision body is loaded into agent
context at session start. Existing behaviour, zero breakage.
- search - only a catalog (title + 1-line description, prefixed with
[type/enforce]) is loaded. Bodies fetched on demand via the
three new tools.
Switching is user-driven only - the agent never decides:
- env var: AXME_CONTEXT_MODE=tiered claude (one-off)
- CLI: axme-code config set context.mode search
- manual: edit .axme-code/config.yaml
Default config.yaml now ships a commented hint explaining when to switch.
At >100 KB entries the axme_context output also adds a soft suggestion
without changing behaviour.
## Lazy install of @huggingface/transformers
The semantic-search runtime (~100MB node_modules + ~30MB MiniLM ONNX
model cached at ~/.cache/huggingface/) is NOT bundled. It's installed on
demand into ~/.local/share/axme-code/runtime/ by:
axme-code config set context.mode search
That command atomically:
1. npm install --prefix runtime @huggingface/transformers@^4.0.1
2. Reindex every memory + decision into .axme-code/_index/embeddings.json
3. Write context.mode = search
If either step fails, config rolls back to full and an error is printed.
The user is never left in a half-broken state.
`axme-code reindex` is also exposed for manual rebuilds (e.g. after
hand-editing .md files).
## Embedding strategy
- Brute-force cosine over Float32 (no HNSW). At <=1000 vectors this is
<10ms; HNSW only matters at 10K+. Avoids native bindings entirely.
- Synchronous embed on every axme_save_memory / axme_save_decision when
search mode is active. ~50-200ms per save once the embedder is warm;
acceptable cost for "search results are immediately consistent".
- Skips silently when mode != search OR runtime missing - never blocks a
save on something the user opted out of.
## New files
- src/storage/embeddings.ts - core: cosine, topK, JSON load/save,
loadEmbedder lazy loader, mtime
staleness, embedKbEntry helper.
- src/tools/kb-search.ts - 3 MCP handlers (get_memory, get_decision,
search_kb) with friendly fallbacks.
- src/tools/search-install.ts - installTransformers + reindexAll +
runConfigSetSearch atomic flow.
- test/embeddings.test.ts - cosine math + topK ranking + JSON
round-trip + staleness + runtime detect.
- test/kb-search.test.ts - 3 handlers' fallback paths + format
round-trip.
## Modified files
- src/types.ts - ContextMode type + ProjectConfig.contextMode
("full" default).
- src/storage/config.ts - nested context.mode parse/format.
- src/server.ts - 3 new tool registrations; saveMemory/saveDecision
handlers now await embedKbEntry post-save.
- src/tools/context.ts - branch on contextMode: full mode unchanged;
search mode emits catalog (title+desc+labels)
plus MUST instructions and soft KB-size hint.
- src/cli.ts - `config get/set <key> [<value>]` and `reindex`
subcommands.
## Verification
- npm test 536/536 pass (was 511; +25 new tests for embeddings + kb-search)
- npx tsc --noEmit clean
- npm run build clean
Phase 1 (multi-client docs) is open separately as PR #123 and will be
merged together after combined E2E.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the descriptive "consider search mode" hint with an explicit MUST-style instruction: the agent surfaces the option to the user in its first response, asks whether to run the install command itself or let the user run it, and waits for an explicit decision before continuing the original task. Matches the existing instruction style used elsewhere (TRUNCATED OUTPUT RULE in server.ts, Pending Audits Check in CLAUDE.md). The agent never switches the mode without explicit user confirmation, and it does not nag again in the same session if the user declines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes for `axme-code config set context.mode search` on Windows: 1. **pathToFileURL on dynamic import.** `await import(absolutePath)` throws on Windows because Node treats raw `C:\...` paths as bare specifiers. Wrapping the resolved path with `pathToFileURL(...).href` produces a `file:///C:/...` URL that imports cleanly on every platform. 2. **Actionable error when onnxruntime-node native binding fails to load.** Most common Windows failure: `onnxruntime_binding.node` throws "specified module could not be found" because Microsoft Visual C++ Redistributable is missing. We now detect that signature and print a clear hint pointing to the redist installer plus the retry command, instead of returning null silently and leaving the user to chase a generic "module not loaded" message from search-install.ts. Verified on Azure Win11 native: - Before VC++ Redist install: error message correctly directs the user to https://aka.ms/vs/17/release/vc_redist.x64.exe. - After VC++ Redist install: `axme-code config set context.mode search` succeeds end-to-end (31/31 entries indexed in 5s, embeddings.json written, config.yaml updated). - claude --print session calls axme_context (renders search-mode catalog) and axme_search_kb (top hits semantically correct: D-004 + D-002 for "git push force safety" query). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes B-005 Phase 2. Adds three new MCP tools + opt-in
searchcontext mode for KBs >100 entries.These work in both modes. The
searchmode also changes whataxme_contextemits at session start: a compact catalog (title + 1-line desc +[type/enforce]labels) instead of every body. Token cost drops ~10x at session start.Two context modes
full(default)search(opt-in)User-driven switching (agent never decides):
AXME_CONTEXT_MODE=search claude(one-off)axme-code config set context.mode search.axme-code/config.yamlWhen the user is on
fulland the KB exceeds 100 entries,axme_contextadds a soft hint suggesting the switch — non-blocking, just informational.Lazy install of @huggingface/transformers
Semantic-search runtime (~100MB node_modules + ~30MB MiniLM ONNX model cached at
~/.cache/huggingface/) is not bundled. It installs into~/.local/share/axme-code/runtime/on opt-in via:That CLI command runs atomically:
npm install --prefix runtime @huggingface/transformers@^4.0.1.axme-code/_index/embeddings.jsoncontext.mode = searchIf either step fails → config rolls back to
full, error printed, user is never left in a half-broken state.axme-code reindexis also exposed for manual rebuilds.Embedding strategy
axme_save_memory/axme_save_decisionwhen search mode is active. ~50-200ms per save once embedder is warm.What's verified locally
npm test— 536/536 pass (was 511; +25 new tests acrossembeddings.test.ts+kb-search.test.ts)npx tsc --noEmit— cleannpm run build— cleanWhat's NOT yet verified (needs combined E2E with PR #123)
axme-code config set context.mode searchend-to-end (npm install + reindex + config write atomicity, including rollback on failure).axme_search_kbreturning real semantic hits over a populated KB.These are scheduled for the combined E2E pass (Linux locally + Windows VM) before merge.
Files
New (5 files, ~750 LOC):
src/storage/embeddings.ts— cosine, topK, JSON load/save, lazy embedder loader, mtime staleness, embedKbEntry helpersrc/tools/kb-search.ts— 3 MCP handlers with friendly fallbackssrc/tools/search-install.ts— installTransformers + reindexAll + atomic runConfigSetSearchtest/embeddings.test.ts,test/kb-search.test.tsModified (5 files):
src/types.ts—ContextModetype +ProjectConfig.contextModesrc/storage/config.ts— nestedcontext.modeparse/formatsrc/server.ts— 3 tool registrations + post-save embed hookssrc/tools/context.ts— search-mode branch (catalog + MUST instructions) + >100-entry soft hintsrc/cli.ts—config get/set+reindexsubcommandsRelated
Test plan
npx tsc --noEmitcleannpm run buildcleanaxme-code config set context.mode searchinstall + reindex +axme_search_kbreturns hitsaxme_contextoutput,axme_get_memoryfetches bodyembedKbEntryskip logic is correct (mode != search → no-op)🤖 Generated with Claude Code