Skip to content

Commit 435e72b

Browse files
NeftedollarRoman Melnikov
andauthored
feat(mcp-server): persist async job registry across restarts (#237) (#251)
Co-authored-by: Roman Melnikov <roman@neftedollar.com>
1 parent 52417fb commit 435e72b

38 files changed

Lines changed: 1872 additions & 153 deletions

bun.lock

Lines changed: 25 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/superpowers/specs/2026-04-16-mcp-async-design.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -47,15 +47,16 @@ decides how to reuse them and what the MCP tool surface looks like.
4747
progress/elicitation as today. In job mode, elicitation surfaces via
4848
`get_workflow_status.currentTask` — no out-of-band prompt because the
4949
originating client may be gone.
50-
- **Job registry is in-process.** Matches `@ageflow/server` today. An
51-
interface hook (`RunStore`) is sketched for future persistent backends.
50+
- **Job registry defaults to in-process, with optional persistence.**
51+
By default jobs use an in-memory store. With `--job-db <path>` the
52+
registry persists snapshots to SQLite and hydrates known jobs on startup.
5253

5354
## Non-goals
5455

5556
- **No distributed jobs.** `jobId` is valid only on the server that created
5657
it. Horizontal scale requires sticky routing — out of scope.
57-
- **No persistence across restarts.** Jobs live in memory; restart drops
58-
them. Durable jobs → future work (see "Future: RunStore").
58+
- **No distributed persistence / replication.** Durable snapshots are local
59+
to a single server instance (for example SQLite on local disk).
5960
- **No job prioritization / queueing.** Single-run `BUSY` lock preserved
6061
(§5). First caller wins; second caller gets `BUSY`.
6162
- **No new HITL mechanism.** Existing `hitl-bridge`; only **surfacing**
@@ -405,7 +406,8 @@ input type is structurally identical to the sync tool's input type
405406
Restated for emphasis, since issue #18 is deliberately narrow:
406407

407408
- **No distributed jobs.** Jobs are single-instance only.
408-
- **No persistence across server restart.** In-memory `RunRegistry` only.
409+
- **No distributed persistence across server restart.** Restart recovery is
410+
supported only when a durable local `RunStore` backend is configured.
409411
- **No job prioritization / queueing.** Single `BUSY` lock — same policy
410412
as sync mode.
411413
- **No `list_jobs` / `wait_for_job` bulk APIs.** v2 if ever requested.
@@ -430,8 +432,8 @@ Restated for emphasis, since issue #18 is deliberately narrow:
430432

431433
## Open follow-ups / future work
432434

433-
- **`RunStore` persistence.** File-backed or SQLite-backed
434-
`RunRegistry` for jobs that survive server restart.
435+
- **Additional durable backends.** Add Redis/Postgres-grade `RunStore`
436+
adapters where restart recovery must survive host replacement.
435437
- **Webhook / push notifications.** Optional "call me at URL X when job
436438
finishes" so polling isn't required for clients that can accept
437439
callbacks.

docs/superpowers/specs/2026-04-16-server-execution-design.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,9 @@ transports are all re-expressed on top of it.
4444

4545
- **No bundled HTTP server.** We do not ship Express / Fastify middleware.
4646
Framework integrations live in userland or future packages.
47-
- **No persistence.** Runs live in process memory. Restarting the server
48-
drops in-flight runs. Durable runs are a v0.2+ feature.
47+
- **In-memory by default; persistence is pluggable.** Without a `RunStore`,
48+
runs live in process memory. With a durable `RunStore` backend, run
49+
snapshots can survive restart.
4950
- **No distributed execution.** A `runId` is only valid on the instance that
5051
created it. Horizontal scale requires sticky sessions or external state,
5152
out of scope here.
@@ -146,7 +147,8 @@ behavior unchanged). So CLI keeps prompting on TTY exactly like today.
146147

147148
### Run registry
148149

149-
`@ageflow/server` owns a `RunRegistry` — an in-memory `Map<string, RunHandle>`.
150+
`@ageflow/server` owns a `RunRegistry` (active handles) and can mirror
151+
run snapshots through a pluggable `RunStore` backend.
150152

151153
```ts
152154
interface RunHandle {

examples/mcp-server/package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
},
1818
"devDependencies": {
1919
"@ageflow/mcp-server": "workspace:*",
20+
"@ageflow/server-sqlite": "workspace:*",
2021
"@modelcontextprotocol/sdk": "^1.0.0",
2122
"@types/node": "^22.0.0",
2223
"vitest": "^2.1.0",

packages/cli/package.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@ageflow/cli",
3-
"version": "0.5.5",
3+
"version": "0.6.0",
44
"description": "CLI for ageflow \u2014 agentwf run / validate / dry-run / init",
55
"homepage": "https://github.com/Neftedollar/ageflow/tree/master/packages/cli",
66
"type": "module",
@@ -26,7 +26,9 @@
2626
"@ageflow/executor": "^0.7.0",
2727
"@ageflow/learning": "^0.5.0",
2828
"@ageflow/learning-sqlite": "^0.4.1",
29-
"@ageflow/mcp-server": "^0.5.0",
29+
"@ageflow/mcp-server": "^0.7.0",
30+
"@ageflow/server": "^0.6.0",
31+
"@ageflow/server-sqlite": "^0.2.0",
3032
"chalk": "^5.3.0",
3133
"ora": "^8.0.0",
3234
"boxen": "^8.0.0",

packages/cli/src/__tests__/mcp-serve.test.ts

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,16 @@ describe("parseMcpServeArgs: async mode flags (#18)", () => {
216216
expect(parsed.jobCheckpointTtlMs).toBe(900_000);
217217
});
218218

219+
it("parses --job-db <path>", () => {
220+
const parsed = parseMcpServeArgs([
221+
"wf.ts",
222+
"--async",
223+
"--job-db",
224+
"/tmp/jobs.sqlite",
225+
]);
226+
expect(parsed.jobDb).toBe("/tmp/jobs.sqlite");
227+
});
228+
219229
it("rejects --job-ttl with no value", () => {
220230
expect(() => parseMcpServeArgs(["wf.ts", "--async", "--job-ttl"])).toThrow(
221231
/requires/,
@@ -239,4 +249,10 @@ describe("parseMcpServeArgs: async mode flags (#18)", () => {
239249
parseMcpServeArgs(["wf.ts", "--checkpoint-ttl", "1000"]),
240250
).toThrow(/requires --async/);
241251
});
252+
253+
it("rejects --job-db without --async", () => {
254+
expect(() =>
255+
parseMcpServeArgs(["wf.ts", "--job-db", "/tmp/jobs.sqlite"]),
256+
).toThrow(/requires --async/);
257+
});
242258
});

packages/cli/src/commands/mcp-serve.ts

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ export interface McpServeArgs {
4848
readonly jobTtlMs?: number;
4949
/** Override default 1-hour checkpoint TTL in ms (--checkpoint-ttl <ms>). */
5050
readonly jobCheckpointTtlMs?: number;
51+
/** Persist async job registry to a SQLite database (--job-db <path>). */
52+
readonly jobDb?: string;
5153
/** Use Streamable HTTP transport instead of stdio (--http). */
5254
readonly http?: boolean;
5355
/** HTTP port (--port <n>, required with --http). */
@@ -90,6 +92,7 @@ export function parseMcpServeArgs(argv: readonly string[]): McpServeArgs {
9092
let asyncMode: boolean | undefined = undefined;
9193
let jobTtlMs: number | undefined = undefined;
9294
let jobCheckpointTtlMs: number | undefined = undefined;
95+
let jobDb: string | undefined = undefined;
9396
let httpMode: boolean | undefined = undefined;
9497
let httpPort: number | undefined = undefined;
9598
let httpHost: string | undefined = undefined;
@@ -215,6 +218,15 @@ export function parseMcpServeArgs(argv: readonly string[]): McpServeArgs {
215218
break;
216219
}
217220

221+
case "--job-db": {
222+
const val = args[++i];
223+
if (val === undefined || val.startsWith("-")) {
224+
throw new Error("--job-db requires a path argument");
225+
}
226+
jobDb = val;
227+
break;
228+
}
229+
218230
case "--http":
219231
httpMode = true;
220232
break;
@@ -259,9 +271,11 @@ export function parseMcpServeArgs(argv: readonly string[]): McpServeArgs {
259271

260272
if (
261273
asyncMode !== true &&
262-
(jobTtlMs !== undefined || jobCheckpointTtlMs !== undefined)
274+
(jobTtlMs !== undefined ||
275+
jobCheckpointTtlMs !== undefined ||
276+
jobDb !== undefined)
263277
) {
264-
throw new Error("--job-ttl / --checkpoint-ttl requires --async");
278+
throw new Error("--job-ttl / --checkpoint-ttl / --job-db requires --async");
265279
}
266280

267281
if (httpPort !== undefined && httpMode !== true) {
@@ -302,6 +316,7 @@ export function parseMcpServeArgs(argv: readonly string[]): McpServeArgs {
302316
...(asyncMode !== undefined ? { async: asyncMode } : {}),
303317
...(jobTtlMs !== undefined ? { jobTtlMs } : {}),
304318
...(jobCheckpointTtlMs !== undefined ? { jobCheckpointTtlMs } : {}),
319+
...(jobDb !== undefined ? { jobDb } : {}),
305320
...(httpMode !== undefined ? { http: httpMode } : {}),
306321
...(httpPort !== undefined ? { port: httpPort } : {}),
307322
...(httpHost !== undefined ? { httpHost } : {}),
@@ -381,6 +396,7 @@ async function runMcpServe(rawArgv: string[]): Promise<void> {
381396
...(parsed.jobCheckpointTtlMs !== undefined
382397
? { jobCheckpointTtlMs: parsed.jobCheckpointTtlMs }
383398
: {}),
399+
...(parsed.jobDb !== undefined ? { jobDbPath: parsed.jobDb } : {}),
384400
});
385401

386402
if (parsed.http === true) {
@@ -464,6 +480,7 @@ export function registerMcpCommand(program: Command): void {
464480
" --async enable async job mode (5 extra tools)\n" +
465481
" --job-ttl <ms> job TTL in ms (default: 1800000, requires --async)\n" +
466482
" --checkpoint-ttl <ms> checkpoint TTL in ms (default: 3600000, requires --async)\n" +
483+
" --job-db <path> persist async job registry to SQLite (requires --async)\n" +
467484
" --http use Streamable HTTP transport instead of stdio\n" +
468485
" --port <n> HTTP port (required with --http)\n" +
469486
" --host <addr> HTTP bind address (default: 127.0.0.1, requires --http)\n" +

packages/dev-workflow/pipelines/release.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ const PUBLISH_ORDER = [
2323
"@ageflow/runner-anthropic",
2424
"@ageflow/testing",
2525
"@ageflow/server",
26+
"@ageflow/server-sqlite",
2627
"@ageflow/mcp-server",
2728
"@ageflow/learning",
2829
"@ageflow/learning-sqlite",

packages/mcp-server/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,9 @@ agentwf mcp serve ./workflow.ts --async --checkpoint-ttl 7200000
8989

9090
### Known limitations
9191

92-
- **No persistence** — the job registry is in-memory only. Restarting the
93-
server loses all job state.
92+
- **Durability is opt-in** — by default the job registry is in-memory.
93+
Use `--job-db <path>` to persist async job snapshots to SQLite and
94+
recover known jobs after restart.
9495
- **Single-instance** — the registry is not shared across processes. Running
9596
multiple server processes will have independent job stores.
9697
- **Single BUSY lock** — only one `start_*` call can be in-flight at a time.

packages/mcp-server/package.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@ageflow/mcp-server",
3-
"version": "0.5.1",
3+
"version": "0.7.0",
44
"description": "Expose ageflow workflows as MCP tools (stdio transport, progress streaming, HITL via elicitation).",
55
"homepage": "https://github.com/Neftedollar/ageflow/tree/master/packages/mcp-server",
66
"type": "module",
@@ -18,12 +18,14 @@
1818
"dependencies": {
1919
"@ageflow/core": "^0.6.0",
2020
"@ageflow/executor": "^0.7.0",
21-
"@ageflow/server": "^0.4.4",
21+
"@ageflow/server": "^0.6.0",
22+
"@ageflow/server-sqlite": "^0.2.0",
2223
"@modelcontextprotocol/sdk": "^1.0.0",
2324
"zod-to-json-schema": "^3.23.0"
2425
},
2526
"devDependencies": {
2627
"@ageflow/testing": "workspace:*",
28+
"@types/bun": "^1.3.12",
2729
"@types/node": "^22.0.0",
2830
"vitest": "^2.1.0",
2931
"zod": "^3.23.0"

0 commit comments

Comments
 (0)