Skip to content

Commit ee385df

Browse files
Merge pull request #18 from reactome/graph-schema-live
Fetch graph schema live via APOC, cached for the session (1.4.0)
2 parents 7111c9b + 3b05d27 commit ee385df

11 files changed

Lines changed: 441 additions & 60 deletions

File tree

CHANGELOG.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,22 @@
22

33
All notable changes to this project are documented here. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
44

5+
## [1.4.0] — 2026-04-24
6+
7+
### Changed
8+
- **`reactome_cypher_schema` and `reactome://graph/schema` now return rich APOC-level data.** The previous implementation used the sparse built-in `db.schema.*` (labels / rel types / property names only). This release pulls `apoc.meta.schema()`, `apoc.meta.stats()`, `apoc.meta.{node,rel}TypeProperties()`, `db.indexes()`, `db.constraints()`, and `dbms.components()` — so clients see **per-label node counts**, **relationship cardinalities**, **property types with mandatory flags**, indexes, and constraints. The markdown digest jumps from ~40 KB (sparse) to ~80 KB (rich).
9+
- Fetch is lazy + cached in-memory for the session. Concurrent first-callers share one round-trip via promise deduplication.
10+
11+
### Added
12+
- **Startup schema prefetch.** `main()` fires `fetchGraphSchema()` in the background once the MCP is listening, so the first `reactome_cypher_schema` call doesn't wait 15–30 s on `apoc.meta.schema()` (that procedure samples 3M nodes on Reactome). Failures are logged; the cache stays empty and the next tool call retries on demand.
13+
- 7 new tests: markdown format coverage (4) + cache behavior (caching, concurrent dedup, optional-call fallback).
14+
15+
### Removed
16+
- The sparse `db.schema.*`-based schema path. No fallback — APOC is required for the Cypher schema tool. This is fine for the `reactome_neo4j_env` Docker image (APOC is always present); other deployments must load APOC for schema tooling to work.
17+
18+
### Notes
19+
- **No vendored schema artifact.** The MCP fetches live on connect. No coordination with `reactome_neo4j_env` release cadence is required.
20+
521
## [1.3.1] — 2026-04-21
622

723
### Added

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ Only registered when `NEO4J_URI` is set. Designed for curators running the [`rea
198198
| Tool | Description |
199199
|------|-------------|
200200
| `reactome_cypher_query` | Run a Cypher query with optional parameters; row count, per-row size, and total response size are all capped; a server-side timeout terminates runaway queries |
201-
| `reactome_cypher_schema` | Introspect labels, relationship types, and per-label property keys |
201+
| `reactome_cypher_schema` | Live APOC introspection: labels with node counts, relationship cardinalities, per-label and per-rel property types (with mandatory flags), indexes, constraints. Cached for the session after first call; pre-warmed at MCP startup. |
202202
| `reactome_cypher_sample` | Return a small sample of nodes for a given label |
203203

204204
**Read-only posture — what it is and isn't.** Sessions run in Neo4j READ mode, which rejects native write clauses (`CREATE`, `MERGE`, `DELETE`, `SET`, `REMOVE`). On top of that, `reactome_cypher_query` rejects APOC procedures that can write or reach outside the graph through back-channels (`apoc.cypher.runWrite` / `apoc.cypher.doIt`, `apoc.periodic.*`, `apoc.create/merge/refactor.*`, `apoc.load/import/export.*`, `apoc.trigger.*`, `apoc.nodes.delete`). Treat this as a guardrail against accidental mutation, not a security boundary — a real trust boundary should live at the Neo4j RBAC / plugin configuration layer, or by pointing at a read-only replica.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "reactome-mcp",
3-
"version": "1.3.1",
3+
"version": "1.4.0",
44
"description": "MCP server for Reactome pathway database - analysis, search, and exploration tools",
55
"type": "module",
66
"main": "dist/index.js",

src/clients/neo4j.ts

Lines changed: 2 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -107,33 +107,5 @@ export async function runRead<T = Record<string, unknown>>(
107107
}
108108
}
109109

110-
export interface GraphSchema {
111-
labels: string[];
112-
relationshipTypes: string[];
113-
propertiesByLabel: Record<string, { name: string; types: string[] }[]>;
114-
}
115-
116-
export async function fetchGraphSchema(): Promise<GraphSchema> {
117-
interface LabelRow { label: string }
118-
interface RelRow { relationshipType: string }
119-
interface PropRow { nodeType: string; propertyName: string; propertyTypes: string[] | null }
120-
121-
const [labelRows, relRows, propRows] = await Promise.all([
122-
runRead<LabelRow>("CALL db.labels() YIELD label RETURN label ORDER BY label"),
123-
runRead<RelRow>("CALL db.relationshipTypes() YIELD relationshipType RETURN relationshipType ORDER BY relationshipType"),
124-
runRead<PropRow>("CALL db.schema.nodeTypeProperties() YIELD nodeType, propertyName, propertyTypes RETURN nodeType, propertyName, propertyTypes"),
125-
]);
126-
127-
const propertiesByLabel: Record<string, { name: string; types: string[] }[]> = {};
128-
for (const p of propRows) {
129-
const entry = propertiesByLabel[p.nodeType] ?? [];
130-
entry.push({ name: p.propertyName, types: p.propertyTypes ?? [] });
131-
propertiesByLabel[p.nodeType] = entry;
132-
}
133-
134-
return {
135-
labels: labelRows.map((l) => l.label),
136-
relationshipTypes: relRows.map((r) => r.relationshipType),
137-
propertiesByLabel,
138-
};
139-
}
110+
// Graph-schema access lives in src/graph/schema.ts — split out so tests
111+
// can mock runRead across the module boundary.

src/graph/format-schema.ts

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
import type { GraphSchema } from "./schema.js";
2+
3+
/**
4+
* Render a GraphSchema (as produced by fetchGraphSchema) as a compact
5+
* markdown summary suitable for direct LLM consumption. The raw APOC
6+
* payload is ~500 KB — much too large to return whole. This digest keeps
7+
* the signal (labels with counts, relationship cardinalities, property
8+
* types with mandatory flags, indexes, constraints) and drops the
9+
* verbose apoc.meta.schema() object. Clients that need the full
10+
* structure can read the `reactome://graph/schema` resource.
11+
*/
12+
export function formatGraphSchemaMarkdown(schema: GraphSchema): string {
13+
const { stats, nodeTypeProperties, relTypeProperties, indexes, constraints } = schema;
14+
15+
const labelEntries = Object.entries(stats.labels ?? {}).sort(([, a], [, b]) => b - a);
16+
const relEntries = Object.entries(stats.relTypesCount ?? {}).sort(([, a], [, b]) => b - a);
17+
18+
const propsByLabel = new Map<string, Array<{ name: string; types: string[]; mandatory: boolean }>>();
19+
for (const p of nodeTypeProperties) {
20+
const key = (p.nodeLabels?.join(":") || p.nodeType) ?? p.nodeType;
21+
const entry = propsByLabel.get(key) ?? [];
22+
entry.push({ name: p.propertyName, types: p.propertyTypes ?? [], mandatory: p.mandatory });
23+
propsByLabel.set(key, entry);
24+
}
25+
26+
const propsByRel = new Map<string, Array<{ name: string; types: string[]; mandatory: boolean }>>();
27+
for (const p of relTypeProperties) {
28+
const entry = propsByRel.get(p.relType) ?? [];
29+
entry.push({ name: p.propertyName, types: p.propertyTypes ?? [], mandatory: p.mandatory });
30+
propsByRel.set(p.relType, entry);
31+
}
32+
33+
const lines: string[] = [];
34+
lines.push(`## Reactome Graph Schema`);
35+
const dbComp = schema.dbComponents[0];
36+
lines.push(
37+
`**Neo4j:** ${dbComp?.versions?.[0] ?? "?"} ${dbComp?.edition ?? ""} · **Fetched:** ${schema.fetchedAt}`
38+
);
39+
lines.push(
40+
`**Totals:** ${stats.nodeCount.toLocaleString()} nodes · ${stats.relCount.toLocaleString()} relationships · ${labelEntries.length} labels · ${Object.keys(stats.relTypes ?? {}).length} relationship types`
41+
);
42+
lines.push("");
43+
44+
lines.push(`### Labels (${labelEntries.length}, by node count)`);
45+
for (const [label, count] of labelEntries) {
46+
lines.push(`- \`${label}\` — ${count.toLocaleString()}`);
47+
}
48+
lines.push("");
49+
50+
lines.push(`### Relationship types (${relEntries.length}, by relationship count)`);
51+
for (const [relType, count] of relEntries) {
52+
lines.push(`- \`${relType}\` — ${count.toLocaleString()}`);
53+
}
54+
lines.push("");
55+
56+
lines.push(`### Node properties (by label)`);
57+
const sortedLabels = Array.from(propsByLabel.keys()).sort();
58+
for (const label of sortedLabels) {
59+
lines.push(`- **${label}**`);
60+
for (const p of propsByLabel.get(label)!) {
61+
const t = p.types.length ? ` _(${p.types.join("|")})_` : "";
62+
const m = p.mandatory ? " **required**" : "";
63+
lines.push(` - \`${p.name}\`${t}${m}`);
64+
}
65+
}
66+
lines.push("");
67+
68+
if (propsByRel.size > 0) {
69+
lines.push(`### Relationship properties (by type)`);
70+
const sortedRels = Array.from(propsByRel.keys()).sort();
71+
for (const rel of sortedRels) {
72+
const props = propsByRel.get(rel)!;
73+
if (props.length === 0) continue;
74+
lines.push(`- **${rel}**`);
75+
for (const p of props) {
76+
const t = p.types.length ? ` _(${p.types.join("|")})_` : "";
77+
const m = p.mandatory ? " **required**" : "";
78+
lines.push(` - \`${p.name}\`${t}${m}`);
79+
}
80+
}
81+
lines.push("");
82+
}
83+
84+
if (indexes.length > 0) {
85+
lines.push(`### Indexes (${indexes.length})`);
86+
for (const ix of indexes) {
87+
const row = ix as { name?: string; labelsOrTypes?: string[]; properties?: string[]; type?: string; state?: string };
88+
const labels = row.labelsOrTypes?.join(",") ?? "?";
89+
const props = row.properties?.join(",") ?? "?";
90+
lines.push(`- \`${row.name ?? "?"}\` — ${labels}(${props}) [${row.type ?? "?"}, ${row.state ?? "?"}]`);
91+
}
92+
lines.push("");
93+
}
94+
95+
if (constraints.length > 0) {
96+
lines.push(`### Constraints (${constraints.length})`);
97+
for (const c of constraints) {
98+
const row = c as { name?: string; description?: string };
99+
lines.push(`- \`${row.name ?? "?"}\` — ${row.description ?? ""}`);
100+
}
101+
lines.push("");
102+
}
103+
104+
lines.push(
105+
"_For programmatic access to the full schema (including the raw `apoc.meta.schema()` output with per-relationship cardinalities and full property type inventories), read the `reactome://graph/schema` resource._"
106+
);
107+
108+
return lines.join("\n");
109+
}

src/graph/schema.ts

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
import { runRead } from "../clients/neo4j.js";
2+
import { logger } from "../logger.js";
3+
4+
export interface GraphSchema {
5+
fetchedAt: string;
6+
dbComponents: Array<{ name: string; versions: string[]; edition: string }>;
7+
stats: {
8+
nodeCount: number;
9+
relCount: number;
10+
labels: Record<string, number>;
11+
relTypes: Record<string, number>;
12+
relTypesCount: Record<string, number>;
13+
};
14+
schema: Record<string, unknown>;
15+
nodeTypeProperties: Array<{
16+
nodeType: string;
17+
nodeLabels: string[];
18+
propertyName: string;
19+
propertyTypes: string[];
20+
mandatory: boolean;
21+
}>;
22+
relTypeProperties: Array<{
23+
relType: string;
24+
sourceNodeLabels: string[];
25+
targetNodeLabels: string[];
26+
propertyName: string;
27+
propertyTypes: string[];
28+
mandatory: boolean;
29+
}>;
30+
indexes: unknown[];
31+
constraints: unknown[];
32+
}
33+
34+
// apoc.meta.schema() can scan many nodes; give the schema queries a longer
35+
// budget than the default Cypher-query timeout.
36+
const SCHEMA_FETCH_TIMEOUT_MS = 60_000;
37+
38+
let schemaCache: GraphSchema | null = null;
39+
let schemaPending: Promise<GraphSchema> | null = null;
40+
41+
/**
42+
* Fetch the live graph schema via APOC (+ fallbacks for indexes and
43+
* constraints). Cached in-memory after the first successful call so
44+
* subsequent tool invocations are free. Concurrent first-callers share
45+
* one round-trip via the `schemaPending` promise.
46+
*/
47+
export async function fetchGraphSchema(): Promise<GraphSchema> {
48+
if (schemaCache) return schemaCache;
49+
if (schemaPending) return schemaPending;
50+
51+
const opts = { timeoutMs: SCHEMA_FETCH_TIMEOUT_MS };
52+
const start = Date.now();
53+
54+
schemaPending = (async () => {
55+
try {
56+
type Comp = { name: string; versions: string[]; edition: string };
57+
type Stats = GraphSchema["stats"];
58+
type NodeProp = GraphSchema["nodeTypeProperties"][number];
59+
type RelProp = GraphSchema["relTypeProperties"][number];
60+
61+
const [components, stats, schemaRow, nodeProps, relProps, indexes, constraints] = await Promise.all([
62+
runRead<Comp>(
63+
"CALL dbms.components() YIELD name, versions, edition RETURN name, versions, edition",
64+
{},
65+
opts
66+
),
67+
runRead<Stats>(
68+
"CALL apoc.meta.stats() YIELD labels, relTypes, relTypesCount, nodeCount, relCount RETURN labels, relTypes, relTypesCount, nodeCount, relCount",
69+
{},
70+
opts
71+
),
72+
runRead<{ value: Record<string, unknown> }>(
73+
"CALL apoc.meta.schema() YIELD value RETURN value",
74+
{},
75+
opts
76+
),
77+
runRead<NodeProp>(
78+
"CALL apoc.meta.nodeTypeProperties() YIELD nodeType, nodeLabels, propertyName, propertyTypes, mandatory RETURN nodeType, nodeLabels, propertyName, propertyTypes, mandatory",
79+
{},
80+
opts
81+
),
82+
runRead<RelProp>(
83+
"CALL apoc.meta.relTypeProperties() YIELD relType, sourceNodeLabels, targetNodeLabels, propertyName, propertyTypes, mandatory RETURN relType, sourceNodeLabels, targetNodeLabels, propertyName, propertyTypes, mandatory",
84+
{},
85+
opts
86+
).catch(() => [] as RelProp[]),
87+
runRead<unknown>(
88+
"CALL db.indexes() YIELD name, state, type, entityType, labelsOrTypes, properties RETURN name, state, type, entityType, labelsOrTypes, properties",
89+
{},
90+
opts
91+
).catch(() => [] as unknown[]),
92+
runRead<unknown>(
93+
"CALL db.constraints() YIELD name, description RETURN name, description",
94+
{},
95+
opts
96+
).catch(() => [] as unknown[]),
97+
]);
98+
99+
const result: GraphSchema = {
100+
fetchedAt: new Date().toISOString(),
101+
dbComponents: components,
102+
stats: stats[0] ?? {
103+
nodeCount: 0,
104+
relCount: 0,
105+
labels: {},
106+
relTypes: {},
107+
relTypesCount: {},
108+
},
109+
schema: schemaRow[0]?.value ?? {},
110+
nodeTypeProperties: nodeProps,
111+
relTypeProperties: relProps,
112+
indexes,
113+
constraints,
114+
};
115+
116+
logger.info("graph schema fetched", {
117+
durationMs: Date.now() - start,
118+
nodeCount: result.stats.nodeCount,
119+
relCount: result.stats.relCount,
120+
labels: Object.keys(result.stats.labels ?? {}).length,
121+
});
122+
123+
schemaCache = result;
124+
return result;
125+
} finally {
126+
schemaPending = null;
127+
}
128+
})();
129+
130+
return schemaPending;
131+
}
132+
133+
/** For tests — clears both the cached value and any in-flight fetch. */
134+
export function _resetGraphSchemaCache(): void {
135+
schemaCache = null;
136+
schemaPending = null;
137+
}

src/index.ts

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,10 @@ import { registerAllResources } from "./resources/index.js";
77
import { logger } from "./logger.js";
88
import { CONTENT_SERVICE_URL, ANALYSIS_SERVICE_URL, NEO4J_URI } from "./config.js";
99
import { buildServerInstructions } from "./instructions.js";
10+
import { fetchGraphSchema } from "./graph/schema.js";
1011

1112
const server = new McpServer(
12-
{ name: "reactome", version: "1.3.1" },
13+
{ name: "reactome", version: "1.4.0" },
1314
{ instructions: buildServerInstructions() }
1415
);
1516

@@ -24,6 +25,18 @@ async function main() {
2425
analysisService: ANALYSIS_SERVICE_URL,
2526
neo4jEnabled: Boolean(NEO4J_URI),
2627
});
28+
29+
// Warm the schema cache in the background so the first
30+
// reactome_cypher_schema call (or reactome://graph/schema read) doesn't
31+
// wait 15–30s on apoc.meta.schema(). Failures are logged; the cache
32+
// stays empty and the tool call will retry on demand.
33+
if (NEO4J_URI) {
34+
fetchGraphSchema().catch((err) => {
35+
logger.warn("graph schema prefetch failed; will retry on first use", {
36+
error: err instanceof Error ? err.message : String(err),
37+
});
38+
});
39+
}
2740
}
2841

2942
main().catch((error) => {

src/instructions.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ A local Neo4j Reactome graph is available. Use it when the user wants a query th
4040
4141
**Workflow for Cypher:**
4242
43-
1. Call \`reactome_cypher_schema\` (or read the \`reactome://graph/schema\` resource) **before writing any query** to learn the live labels, relationship types, and properties. Never guess the schema.
43+
1. Call \`reactome_cypher_schema\` (or read the \`reactome://graph/schema\` resource) **before writing any query**. The schema tool returns labels with node counts, relationship cardinalities, per-label and per-rel property types (with mandatory flags), indexes, and constraints. Pulled live via APOC on first use and cached in-memory for the session (warm after the MCP's startup prefetch). Never guess the schema.
4444
2. Use \`reactome_cypher_sample\` on a label to see a representative node's shape.
4545
3. Write a Cypher query with \`reactome_cypher_query\`. Rules:
4646
- Sessions run in READ mode; write clauses will be rejected.

src/resources/static.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
22
import { contentClient } from "../clients/content.js";
33
import type { Species, Disease } from "../types/index.js";
4-
import { isNeo4jConfigured, fetchGraphSchema } from "../clients/neo4j.js";
4+
import { isNeo4jConfigured } from "../clients/neo4j.js";
5+
import { fetchGraphSchema } from "../graph/schema.js";
56

67
export function registerStaticResources(server: McpServer) {
78
// All species

0 commit comments

Comments
 (0)