Skip to content

Commit 33bd9c5

Browse files
MCP: wire v0.3 predict tools (omc_predict + omc_corpus_size)
v0.3 shipped the substrate-indexed completion engine as an OMC builtin but exposing it over MCP is what makes it actually useful to LLM clients (Claude Desktop, Cursor, the MCP server in this very session). ## New MCP tools - `omc_predict(paths, prefix, top_k?)` — wraps predict_continuations end-to-end. Returns a JSON payload with prefix echo, corpus_size, top_k, and a `suggestions` array. Each suggestion carries fn_name, source (full body), file, canonical_hash, attractor, prefix_match_len, substrate_distance, query_attractor. top_k clamps to [1, 50] so a misconfigured client can't grab the whole corpus. - `omc_corpus_size(paths)` — diagnostic. Returns fn_count for a list of paths. Used to verify file resolution before a larger predict call. ## Implementation - Both handlers share a `parse_paths_arg` helper for the array-of-strings validation pattern and a `build_corpus` helper for the read-and-ingest-each-file pattern. I/O errors surface as MCP-style `isError: true` strings, not panics. - predict_continuations is called directly from main.rs via `omnimcode_core::predict::{CodeCorpus, predict_continuations}` — no eval_program detour, no display_value formatting. The structured JSON output matches what `omc_help` and others do. ## Tests (first MCP tests in the crate) 8 integration tests in tests/integration.rs spawn the binary and exercise JSON-RPC over stdio: - initialize returns server info - tools/list includes both new tools - omc_corpus_size ingests Prometheus (>30 fns) - omc_predict on `fn prom_linear_` returns exactly forward/new/params with provenance fields populated - top_k caps results - missing 'paths' arg → friendly error string with the tool name - unreadable path → friendly error naming the path - unknown tool name → isError: true with name in the message Final: 231 Rust pass (was 223 + 8 new integration), 1087/1087 OMC. ## End-to-end verification $ {echo init; echo tools/list; echo predict-call} | omnimcode-mcp initialize: server=omnimcode-mcp tools/list: 9 tools (was 7), predict present: True omc_corpus_size: 70 fns omc_predict (prefix='fn prom_linear_'): prom_linear_forward prefix_len=24 dist=1.37e+18 prom_linear_new prefix_len=24 dist=2.44e+18 prom_linear_params prefix_len=24 dist=5.51e+18 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent fe87baf commit 33bd9c5

3 files changed

Lines changed: 344 additions & 1 deletion

File tree

omnimcode-mcp/README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,18 @@ needs to discover the language at runtime.
1616
- `omc_categories()` — list builtin categories
1717
- `omc_unique_builtins()` — OMC-only primitives (no NumPy equivalent)
1818
- `omc_explain_error(message)` — pattern-match an error against the
19-
259-entry knowledge base; returns explanation + cause + fix
19+
curated knowledge base; returns explanation + cause + fix
2020
- `omc_did_you_mean(name)` — typo suggestions over the known surface
21+
- **`omc_predict(paths, prefix, top_k?)`** — substrate-indexed code
22+
completion ([v0.3 chapter](https://github.com/RandomCoder-lab/OMC/releases/tag/v0.3-symbolic-prediction)).
23+
Given a partial OMC prefix (e.g. `fn prom_linear_`), returns the
24+
top-k ranked continuations from a content-addressed corpus. Each
25+
suggestion carries the full source, file path, canonical hash,
26+
prefix-match depth, and substrate distance — branching is
27+
first-class.
28+
- **`omc_corpus_size(paths)`** — diagnostic: how many top-level fns
29+
resolve across a list of OMC files. Use to verify paths before a
30+
predict call.
2131

2232
## Build
2333

omnimcode-mcp/src/main.rs

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ use omnimcode_core::docs;
2828
use omnimcode_core::errors;
2929
use omnimcode_core::interpreter::Interpreter;
3030
use omnimcode_core::parser::Parser;
31+
use omnimcode_core::predict::{CodeCorpus, predict_continuations};
3132
use omnimcode_core::value::Value;
3233

3334
#[derive(Debug, Deserialize)]
@@ -212,6 +213,55 @@ fn list_tools() -> Vec<Json> {
212213
"required": ["name"]
213214
}
214215
}),
216+
json!({
217+
"name": "omc_predict",
218+
"description": "Substrate-indexed code completion. Given a partial OMC code prefix \
219+
(e.g. `fn prom_linear_`), return the top-k ranked continuations from \
220+
a content-addressed corpus of OMC files. Each result is a viable \
221+
branch: it carries the full source of the matching fn, its file \
222+
path, canonical hash, prefix-match depth, and substrate distance. \
223+
Use to find similar fns when authoring code, to navigate a corpus \
224+
without grepping, or to surface stable callable shapes that an LLM \
225+
can adapt rather than invent from scratch.",
226+
"inputSchema": {
227+
"type": "object",
228+
"properties": {
229+
"paths": {
230+
"type": "array",
231+
"items": { "type": "string" },
232+
"description": "Source file paths to ingest. Top-level fns from each file are added to the corpus."
233+
},
234+
"prefix": {
235+
"type": "string",
236+
"description": "Partial OMC source (e.g. `fn prom_linear_`). May be incomplete."
237+
},
238+
"top_k": {
239+
"type": "integer",
240+
"minimum": 1,
241+
"default": 5,
242+
"description": "Number of ranked continuations to return."
243+
}
244+
},
245+
"required": ["paths", "prefix"]
246+
}
247+
}),
248+
json!({
249+
"name": "omc_corpus_size",
250+
"description": "Diagnostic: report how many top-level fns are ingested across a list \
251+
of OMC source paths. Useful for verifying paths resolve before \
252+
building a larger predict query.",
253+
"inputSchema": {
254+
"type": "object",
255+
"properties": {
256+
"paths": {
257+
"type": "array",
258+
"items": { "type": "string" },
259+
"description": "Source file paths to ingest."
260+
}
261+
},
262+
"required": ["paths"]
263+
}
264+
}),
215265
]
216266
}
217267

@@ -284,10 +334,74 @@ fn dispatch_tool(interp: &mut Interpreter, name: &str, args: &Json) -> Result<St
284334
let suggestions = docs::did_you_mean(name, 5);
285335
Ok(serde_json::to_string_pretty(&json!(suggestions)).unwrap())
286336
}
337+
"omc_predict" => {
338+
let paths = parse_paths_arg(args, "omc_predict")?;
339+
let prefix = args.get("prefix").and_then(Json::as_str)
340+
.ok_or_else(|| "omc_predict: missing 'prefix' arg".to_string())?;
341+
// top_k optional, defaults to 5. Clamp to [1, 50] so a
342+
// misconfigured client can't ask for the entire corpus.
343+
let top_k = args.get("top_k").and_then(Json::as_i64)
344+
.unwrap_or(5)
345+
.clamp(1, 50) as usize;
346+
let corpus = build_corpus(&paths)?;
347+
let suggestions = predict_continuations(&corpus, prefix, top_k);
348+
let payload = json!({
349+
"prefix": prefix,
350+
"corpus_size": corpus.len(),
351+
"top_k": top_k,
352+
"suggestions": suggestions.iter().map(|s| json!({
353+
"fn_name": s.fn_name,
354+
"source": s.source,
355+
"file": s.file,
356+
"canonical_hash": s.canonical_hash,
357+
"attractor": s.attractor,
358+
"prefix_match_len": s.prefix_match_len,
359+
"substrate_distance": s.substrate_distance,
360+
"query_attractor": s.query_attractor,
361+
})).collect::<Vec<_>>(),
362+
});
363+
Ok(serde_json::to_string_pretty(&payload).unwrap())
364+
}
365+
"omc_corpus_size" => {
366+
let paths = parse_paths_arg(args, "omc_corpus_size")?;
367+
let corpus = build_corpus(&paths)?;
368+
let payload = json!({
369+
"paths": paths,
370+
"fn_count": corpus.len(),
371+
});
372+
Ok(serde_json::to_string_pretty(&payload).unwrap())
373+
}
287374
_ => Err(format!("Unknown tool: {}", name)),
288375
}
289376
}
290377

378+
/// Extract a `paths` array argument from a tool's JSON args. Used by
379+
/// both omc_predict and omc_corpus_size — same shape, same validation.
380+
fn parse_paths_arg(args: &Json, tool: &str) -> Result<Vec<String>, String> {
381+
let paths_val = args.get("paths")
382+
.ok_or_else(|| format!("{}: missing 'paths' arg", tool))?;
383+
let arr = paths_val.as_array()
384+
.ok_or_else(|| format!("{}: 'paths' must be an array of strings", tool))?;
385+
arr.iter()
386+
.map(|v| v.as_str()
387+
.ok_or_else(|| format!("{}: every 'paths' entry must be a string", tool))
388+
.map(|s| s.to_string()))
389+
.collect()
390+
}
391+
392+
/// Build a CodeCorpus by reading + ingesting every file in `paths`.
393+
/// Surface I/O errors as MCP-style strings so the client sees a clean
394+
/// `isError: true` text instead of a panic.
395+
fn build_corpus(paths: &[String]) -> Result<CodeCorpus, String> {
396+
let mut corpus = CodeCorpus::new();
397+
for path in paths {
398+
let src = std::fs::read_to_string(path)
399+
.map_err(|e| format!("omc_predict: read {}: {}", path, e))?;
400+
corpus.ingest_file(path, &src);
401+
}
402+
Ok(corpus)
403+
}
404+
291405
/// Evaluate an OMC program. Errors come back as structured strings
292406
/// (the MCP client sees isError=true alongside the text). Each
293407
/// tools/call uses a fresh interpreter to avoid state bleed.

omnimcode-mcp/tests/integration.rs

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
//! End-to-end MCP protocol tests.
2+
//!
3+
//! Spawns the binary, talks JSON-RPC over stdio, asserts on the
4+
//! responses. Covers the full request → handler → response path
5+
//! including JSON parsing and protocol-level errors.
6+
//!
7+
//! Why integration rather than unit tests: the crate is bin-only, so
8+
//! handler functions aren't reachable from a unit-test module. This
9+
//! also exercises the actual protocol path a real LLM client would use.
10+
11+
use std::io::{BufRead, BufReader, Write};
12+
use std::path::PathBuf;
13+
use std::process::{Command, Stdio};
14+
15+
use serde_json::{json, Value};
16+
17+
/// Find the built `omnimcode-mcp` binary relative to the test
18+
/// executable's path (target/release/deps/integration-XXX or
19+
/// target/debug/deps/integration-XXX → target/{profile}/omnimcode-mcp).
20+
fn find_binary() -> PathBuf {
21+
let exe = std::env::current_exe().expect("current_exe");
22+
// exe is in target/<profile>/deps/integration-<hash>
23+
// walk up to target/<profile>/
24+
let target_profile_dir = exe.parent().unwrap().parent().unwrap();
25+
let bin = target_profile_dir.join("omnimcode-mcp");
26+
assert!(
27+
bin.exists(),
28+
"binary not found at {} — rebuild with `cargo build -p omnimcode-mcp`",
29+
bin.display()
30+
);
31+
bin
32+
}
33+
34+
/// Find the OMC repo root so test fixtures (`examples/lib/prometheus.omc`)
35+
/// can be referenced by relative path. CARGO_MANIFEST_DIR points at the
36+
/// crate dir; the repo root is one up.
37+
fn repo_root() -> PathBuf {
38+
PathBuf::from(env!("CARGO_MANIFEST_DIR")).parent().unwrap().to_path_buf()
39+
}
40+
41+
/// Send a sequence of JSON-RPC request strings to the binary, return
42+
/// the parsed response Values in order. Runs the binary fresh, sets cwd
43+
/// to the OMC repo root so file-path arguments resolve.
44+
fn rpc_exchange(requests: &[Value]) -> Vec<Value> {
45+
let bin = find_binary();
46+
let mut child = Command::new(bin)
47+
.current_dir(repo_root())
48+
.stdin(Stdio::piped())
49+
.stdout(Stdio::piped())
50+
.stderr(Stdio::null())
51+
.spawn()
52+
.expect("spawn mcp server");
53+
let mut stdin = child.stdin.take().expect("stdin");
54+
let stdout = child.stdout.take().expect("stdout");
55+
for r in requests {
56+
writeln!(stdin, "{}", r).expect("write");
57+
}
58+
drop(stdin); // closes the server's stdin → it'll exit after replying
59+
let reader = BufReader::new(stdout);
60+
let mut responses = Vec::new();
61+
for line in reader.lines() {
62+
let line = line.expect("read");
63+
if line.trim().is_empty() { continue; }
64+
let v: Value = serde_json::from_str(&line)
65+
.unwrap_or_else(|e| panic!("parse {}: {}", line, e));
66+
responses.push(v);
67+
}
68+
let _ = child.wait();
69+
responses
70+
}
71+
72+
#[test]
73+
fn initialize_returns_server_info() {
74+
let responses = rpc_exchange(&[
75+
json!({"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}),
76+
]);
77+
assert_eq!(responses.len(), 1);
78+
let r = &responses[0];
79+
assert_eq!(r["id"], 1);
80+
assert_eq!(r["result"]["serverInfo"]["name"], "omnimcode-mcp");
81+
}
82+
83+
#[test]
84+
fn tools_list_includes_predict_tools() {
85+
let responses = rpc_exchange(&[
86+
json!({"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}),
87+
json!({"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}),
88+
]);
89+
let tools = &responses[1]["result"]["tools"];
90+
let names: Vec<&str> = tools.as_array().unwrap()
91+
.iter()
92+
.map(|t| t["name"].as_str().unwrap())
93+
.collect();
94+
assert!(names.contains(&"omc_predict"), "predict tool present: {:?}", names);
95+
assert!(names.contains(&"omc_corpus_size"), "corpus_size present: {:?}", names);
96+
// Pre-existing tools still there too.
97+
assert!(names.contains(&"omc_eval"));
98+
assert!(names.contains(&"omc_help"));
99+
}
100+
101+
#[test]
102+
fn omc_corpus_size_ingests_prometheus() {
103+
let responses = rpc_exchange(&[
104+
json!({"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}),
105+
json!({"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
106+
"name":"omc_corpus_size",
107+
"arguments":{"paths":["examples/lib/prometheus.omc"]}
108+
}}),
109+
]);
110+
let r = &responses[1];
111+
assert_eq!(r["result"]["isError"], false, "should not be an error: {}", r);
112+
let text = r["result"]["content"][0]["text"].as_str().unwrap();
113+
let payload: Value = serde_json::from_str(text).unwrap();
114+
// Prometheus has ~70 fns currently; lower bound is the only stable assertion.
115+
let n = payload["fn_count"].as_i64().unwrap();
116+
assert!(n > 30, "expected >30 fns, got {}", n);
117+
}
118+
119+
#[test]
120+
fn omc_predict_ranks_prom_linear_prefix() {
121+
let responses = rpc_exchange(&[
122+
json!({"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}),
123+
json!({"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
124+
"name":"omc_predict",
125+
"arguments":{
126+
"paths":["examples/lib/prometheus.omc"],
127+
"prefix":"fn prom_linear_",
128+
"top_k":5
129+
}
130+
}}),
131+
]);
132+
let r = &responses[1];
133+
assert_eq!(r["result"]["isError"], false, "should not be an error: {}", r);
134+
let text = r["result"]["content"][0]["text"].as_str().unwrap();
135+
let payload: Value = serde_json::from_str(text).unwrap();
136+
assert_eq!(payload["prefix"], "fn prom_linear_");
137+
let suggestions = payload["suggestions"].as_array().unwrap();
138+
assert!(suggestions.len() >= 3, "should have at least 3 hits for fn prom_linear_, got {}", suggestions.len());
139+
let names: Vec<&str> = suggestions.iter()
140+
.map(|s| s["fn_name"].as_str().unwrap())
141+
.collect();
142+
assert!(names.contains(&"prom_linear_new"), "missing prom_linear_new in {:?}", names);
143+
assert!(names.contains(&"prom_linear_forward"), "missing prom_linear_forward in {:?}", names);
144+
assert!(names.contains(&"prom_linear_params"), "missing prom_linear_params in {:?}", names);
145+
// Each suggestion carries provenance fields.
146+
let first = &suggestions[0];
147+
assert!(first["source"].is_string(), "source field");
148+
assert_eq!(first["file"], "examples/lib/prometheus.omc");
149+
assert!(first["canonical_hash"].is_i64(), "canonical_hash field");
150+
assert!(first["prefix_match_len"].as_i64().unwrap() > 0, "prefix matched some tokens");
151+
assert!(first["substrate_distance"].as_i64().unwrap() >= 0);
152+
}
153+
154+
#[test]
155+
fn omc_predict_top_k_caps_results() {
156+
let responses = rpc_exchange(&[
157+
json!({"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}),
158+
json!({"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
159+
"name":"omc_predict",
160+
"arguments":{
161+
"paths":["examples/lib/prometheus.omc"],
162+
"prefix":"fn prom_",
163+
"top_k":2
164+
}
165+
}}),
166+
]);
167+
let text = responses[1]["result"]["content"][0]["text"].as_str().unwrap();
168+
let payload: Value = serde_json::from_str(text).unwrap();
169+
let suggestions = payload["suggestions"].as_array().unwrap();
170+
assert!(suggestions.len() <= 2, "top_k=2 capped at 2, got {}", suggestions.len());
171+
}
172+
173+
#[test]
174+
fn omc_predict_missing_paths_is_a_friendly_error() {
175+
let responses = rpc_exchange(&[
176+
json!({"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}),
177+
json!({"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
178+
"name":"omc_predict",
179+
"arguments":{"prefix":"fn anything","top_k":3}
180+
}}),
181+
]);
182+
let r = &responses[1];
183+
assert_eq!(r["result"]["isError"], true);
184+
let text = r["result"]["content"][0]["text"].as_str().unwrap();
185+
assert!(text.contains("missing 'paths'"), "error mentions missing paths: {}", text);
186+
}
187+
188+
#[test]
189+
fn omc_predict_unreadable_path_is_friendly() {
190+
let responses = rpc_exchange(&[
191+
json!({"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}),
192+
json!({"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
193+
"name":"omc_predict",
194+
"arguments":{
195+
"paths":["/nonexistent/path/does/not/exist.omc"],
196+
"prefix":"fn foo"
197+
}
198+
}}),
199+
]);
200+
let r = &responses[1];
201+
assert_eq!(r["result"]["isError"], true);
202+
let text = r["result"]["content"][0]["text"].as_str().unwrap();
203+
assert!(text.contains("read") && text.contains("nonexistent"),
204+
"names the bad path: {}", text);
205+
}
206+
207+
#[test]
208+
fn unknown_tool_returns_error_text() {
209+
let responses = rpc_exchange(&[
210+
json!({"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}),
211+
json!({"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
212+
"name":"omc_does_not_exist","arguments":{}
213+
}}),
214+
]);
215+
let r = &responses[1];
216+
assert_eq!(r["result"]["isError"], true);
217+
let text = r["result"]["content"][0]["text"].as_str().unwrap();
218+
assert!(text.contains("Unknown tool"), "error mentions unknown tool: {}", text);
219+
}

0 commit comments

Comments
 (0)