Use this checklist when you want to verify the real gateway path in a network-enabled environment.
This covers four layers:
- Free smoke: startup, simple response, a basic tool call, and session cost reporting.
- Free model matrix: exact-response and basic tool-use probes across every current free NVIDIA model.
- Core happy-path: write/read/glob/grep/multi-tool and multi-turn session accounting.
- Weak-model polish: output cleanup on a live low-cost model.
- Paid tools: ExaSearch, ExaAnswer, ExaReadUrls, and VideoGen.
Before running live e2e, make sure all of these are true:
- Node.js is 20 or newer.
- Dependencies are installed with
npm install. - The project builds locally with
npm run build. - Deterministic local tests pass with
npm test. - The machine has outbound network access to the BlockRun gateway.
- For paid-tool coverage, the wallet is funded and usable.
Useful checks:
npm run build
npm test
node dist/index.js --help
node dist/index.js balanceE2E_MODEL=<provider/model>overrides the default live model. If unset, e2e defaults tozai/glm-5.1.FREE_MODEL_MATRIX=<provider/model>,<shortcut>limitsnpm run test:free-modelsto a subset. If unset, it runs every picker-listed free model.FREE_MODEL_MATRIX_PROBES=echo,bashcontrols the live free-model probes. Useechofor a lighter rate-limit-friendly pass.FREE_MODEL_MATRIX_TIMEOUT_MS=180000controls the per-model live matrix timeout.RUN_PAID_E2E=1enables the paid-tool tests. Without it, paid tests are skipped on purpose.FRANKLIN_MODEL_REQUEST_TIMEOUT_MScontrols how long Franklin waits for the initial model response headers.FRANKLIN_MODEL_STREAM_IDLE_TIMEOUT_MScontrols how long Franklin waits for the next streamed chunk after the response has started.
For normal validation, leave the timeout env vars alone. They are mainly for debugging slow or flaky networks.
Run the live suite in this order so failures are easy to localize.
This is the fastest signal that the CLI starts, the gateway is reachable, the default live model answers, and the basic session summary still works.
node --test --test-reporter=spec \
--test-name-pattern='startup|simple response|bash tool: executes shell command and returns output|session cost: token usage reported at session end' \
test/e2e.mjsExpected result in a healthy network-enabled environment:
startuppasses immediately.simple responsepasses.bash toolpasses.session costpasses.
If these skip with Live gateway/network unavailable in this environment, treat that as an environment problem, not a product pass.
This checks that all current free models in the picker behave consistently on the two smallest live behaviors Franklin relies on: plain text output and one basic local tool call. It spends no USDC, but it can consume free-tier request quota.
npm run test:free-modelsFor a lighter smoke when rate limits are tight:
FREE_MODEL_MATRIX_PROBES=echo npm run test:free-modelsTo isolate one or two models:
FREE_MODEL_MATRIX=nvidia/qwen3-coder-480b,maverick npm run test:free-modelsExpected result:
- The catalog sanity test passes locally.
- Each selected free model echoes its marker.
- When the
bashprobe is enabled, each selected free model uses the Bash tool and returns the marker. - No response leaks raw
<think>tags or role-played[TOOLCALL]text.
Once smoke passes, verify the main tool and session paths.
node --test --test-reporter=spec \
--test-name-pattern='write tool: creates a file with specified content|read tool: reads a pre-existing file|glob tool: finds files by pattern|grep tool: finds content in files|bash tool: error exit code is captured|multi-tool: write then read a file in same session|session cost: accumulates across multiple turns|session cost: /cost command shows cost info|polish: weak model respects instruction without leaking <think> or \\[TOOLCALL\\]' \
test/e2e.mjsExpected result:
- File tools pass on a real temp directory.
bash tool: error exit code is capturedstill exits the CLI cleanly./costand multi-turn accounting both pass.- The weak-model polish probe returns
POLISH_PROBE_OKwithout leaking<think>or[TOOLCALL].
Run this only after free/core are clean and the wallet has funds.
RUN_PAID_E2E=1 node --test --test-reporter=spec \
--test-name-pattern='ExaSearch tool|ExaAnswer tool|ExaReadUrls tool|VideoGen tool' \
test/e2e.mjsExpected result:
- ExaSearch shows a visible
ExaSearchcall and at least one URL. - ExaAnswer shows a visible
ExaAnswercall and a grounded answer mentioningx402,payment, orHTTP 402. - ExaReadUrls shows a visible
ExaReadUrlscall and mentionsHTTP 402or payment. - VideoGen creates a non-trivial MP4 at the requested output path.
After the focused runs are green, run the full live suite as the final check.
RUN_PAID_E2E=1 npm run test:e2eIf you only want the unfunded/free live suite, omit RUN_PAID_E2E=1.
Use the first recognizable failure signature to decide where to look next.
-
Live gateway/network unavailable in this environment- The machine could not reach the live gateway, or the request timed out before headers/stream data arrived.
- Check outbound network access first.
-
Model unavailable due to payment/balance constraints- The selected model or tool path needs funds, or payment verification failed.
- Check wallet balance and try again with a funded wallet or a cheaper/free
E2E_MODEL.
-
Free tier rate limited (60 req/hr)- The free model path is exhausted for now.
- Retry later or switch
E2E_MODELto another model you intend to validate.
-
A harness-level timeout with no skip
- This is more suspicious.
- It can mean a regression in request timeout handling, stream idle handling, or a CLI code path that no longer exits cleanly.
-
Free smoke passes but write/read/glob/grep fails
- The gateway is likely fine.
- Focus on local tool execution, file-path handling, or prompt/tool orchestration.
-
The free model matrix passes
echobut failsbash- The selected model can answer but is not reliably following Franklin's tool protocol.
- Focus on weak-model prompting, tool inventory guardrails, or model-specific routing.
-
Tool tests pass but session cost tests fail
- Focus on stderr summaries, token accounting, or
/costcommand rendering.
- Focus on stderr summaries, token accounting, or
-
Paid Exa tests fail but free/core passes
- Focus on x402 payment flow, wallet funding, or the paid-tool integration layer rather than the base CLI loop.
Treat the run as truly green only when:
- Free smoke passes without skipping.
- Free model matrix passes for the current picker-listed free models, or any skips are explicitly attributable to rate limit/network.
- Core happy-path passes without skipping.
- Paid-tool tests pass when
RUN_PAID_E2E=1is enabled on a funded wallet. - No test spends a long time hanging before failing or skipping.
Fast skip is acceptable in a network-restricted environment. It is not evidence that the live happy-path works.