A browser game that teaches DigitalOcean Serverless Inference by pulling a one-night heist.
Seven levels. Every modality — chat, streaming, vision, image generation, TTS, embeddings, knowledge-base retrieval, inference routing, reasoning — exercised inside the gameplay loop. Every API call shows up in a real-time transparency drawer with model, tokens, latency, cost, and a runnable code snippet. By the time you crack the vault you know how to use the platform.
The DigitalOcean Serverless Inference platform exposes a lot of capabilities — chat, vision, image generation, TTS, embeddings, retrieval, reasoning, routing. Docs and quickstarts cover them individually. This game wires them together in a single playthrough so a developer ends a 10-minute session having actually exercised all of them: with their own prompts, their own model choices, their own cost telemetry, in their own browser.
The fiction is a noir heist; the substance is the API surface.
| # | Codename | What it teaches | Endpoint(s) |
|---|---|---|---|
| 1 | The Briefing | Streaming chat completions, system prompts, structured token extraction (FACT::key=value) |
chat/completions (streaming) |
| 2 | The Photo Wall | Multimodal vision — describing real generated portraits | chat/completions with image input · async-invoke (image gen, one-time seed) |
| 3 | The Forger | Image generation + LLM-as-judge grading on the result | images/generations + chat/completions (vision) |
| 4 | The Voice Lock | Text-to-speech synthesis + LLM-as-judge on voice quality | async-invoke (TTS via ElevenLabs) + chat/completions (judge) |
| 5 | The Archive | Embeddings-driven retrieval + chat synthesis (RAG) | embeddings + chat/completions (with retrieved-chunk context) |
| 6 | The Switchboard | Inference routing — matching each job to the right model tier | chat/completions across cheap / mid / premium models |
| 7 | The Vault | Multi-constraint reasoning puzzle | chat/completions with a reasoning model (e.g. arcee-trinity-large-thinking) |
After the seven levels, the run-complete screen surfaces the player's final score against a top-50 leaderboard.
- HUD: live score, total tokens, total cost in USD, elapsed time, current level. The cost counter updates within 500 ms of every API call completing — cost-awareness is built into the gameplay loop.
- Transparency Drawer (right-edge pill labeled "Show your work"): expands into a side panel listing every inference call made in the current run. For each call you see the model, latency, token in/out, exact USD cost, the full request and response JSON, and a copy-pasteable code snippet in Python, curl, or Node.
- Hint Pill (right-edge, above the drawer): three curated hints per level, progressive from "gentle orientation" to "answer-level." Each hint reveal costs +1 attempt in the scoring formula.
- Model Picker: every level exposes a searchable picker filtered to the modalities the level needs (vision, chat, image-gen, TTS, embed, router, reasoning). Switching models mid-attempt is fine — costs and behavior change visibly in real time.
git clone https://github.com/DO-Solutions/inference-heist
cd inference-heist
cp .env.example .env.local
# MOCK_INFERENCE defaults to 1 in .env.example so this works with no credentials.
npm install
npm run dev
# → http://localhost:3000Open /play, name yourself, and walk through all seven levels. The transparency drawer shows mocked telemetry; the design language, scoring, and gameplay are otherwise identical to live mode.
- Generate an inference key at https://cloud.digitalocean.com/gen-ai/access-keys. Drop it in
.env.localasDO_INFERENCE_KEYand setMOCK_INFERENCE=0. - (Optional) provision Managed Postgres — without it the game still plays, but the leaderboard and admin insights stay empty.
Then create the tables:
# in .env.local DATABASE_URL=postgres://...npx drizzle-kit push. - Generate the Level 2 suspect portraits (one-time, ~$0.24 in image-gen costs):
This writes six 1024×1024 noir portraits to
npm run seed-level2
public/levels/2/. Re-run with--forceto regenerate. - Seed the Level 5 knowledge base. The seed script writes 10 fictional intel markdown files to
scripts/kb-seed/:For the production knowledge-base path, upload those files to a DigitalOcean KB namednpm run seed-kb
heist-archivevia the control panel, then setDO_KB_ID=<id>in.env.local. WithoutDO_KB_IDthe route falls back to local BM25 over the same files — still fully playable. - (Optional) admin gate. Set
ADMIN_USER/ADMIN_PASSto unlock/admin/insights(PMM dashboard: level drop-off, model picks, average cost per finished run). - Deploy to DigitalOcean App Platform.
Or click the Deploy to DO badge at the top of this README after forking.
doctl apps create --spec app.yaml
Browser (Zustand store)
│
│ fetch /api/inference/*
▼
Next.js Route Handlers ◄────► https://inference.do-ai.run/v1
│ (DigitalOcean Serverless Inference)
│ optional persistence
▼
DigitalOcean Managed Postgres
↳ runs, level_results, telemetry_log
Every inference call goes through a server-side Next.js route handler. The DO inference key is never shipped to the browser. Route handlers also enrich responses with the transparency payload — model, latency, computed cost — before streaming back to the client.
Two patterns deserve a closer look:
- Streaming chat (Level 1) uses Server-Sent Events. The route handler emits
event: deltafor each model token andevent: telemetryonce at the end, so the client can paint streamed tokens into the UI and only resolve cost/usage when the model is done. - TTS via async-invoke (Level 4) uses DigitalOcean's
POST /v1/async-invokeendpoint (the fal-ai bridge to ElevenLabsmultilingual-v2). The route submits the job, polls/v1/async-invoke/{request_id}untilstatus === "COMPLETED", fetches the rendered MP3, and returns base64 to the browser.
- Next.js 16 (App Router) + TypeScript (strict)
- Tailwind CSS v4 (CSS-first config) with hand-rolled shadcn-style primitives over Radix UI
- Zustand for client state (capped, sanitized, partially persisted to localStorage)
- Drizzle ORM over DigitalOcean Managed Postgres for leaderboard + telemetry persistence
- OpenAI SDK pointed at
https://inference.do-ai.run/v1for chat / vision / image / embeddings - Direct fetch for the async-invoke TTS path (the SDK doesn't yet expose it)
npm run dev # Next dev server
npm run smoke # Smoke-test the inference client end-to-end (works in mock mode)
npm run seed-kb # Generate Level 5 markdown intel files (free, local-only)
npm run seed-level2 # Generate the six Level 2 suspect portraits (~$0.24 one-time)
npm run build # Production build
npm run lint # ESLint
npx tsc --noEmit # Typecheckapp/ # Next.js App Router
(game)/{play,leaderboard}/ # game shell + leaderboard
admin/insights/ # PMM insights (basic auth)
api/
inference/{chat,vision,image,tts,stt,embed,kb,route}/route.ts
models/route.ts # 24h-cached model catalog
score/route.ts # POST: persist a finished run
leaderboard/route.ts # GET: top 50
admin/insights/route.ts # aggregate stats
components/
shared/{HUD,TransparencyDrawer,ModelPicker,Bootstrap,LevelComplete,HintBox}.tsx
levels/Level{1..8}.tsx
ui/{button,tabs,sheet,input}.tsx # shadcn-style primitives
lib/
inference/{client,pricing,models,snippet,sse,types,local-kb}.ts
game/{store,scoring,levels,mock}.ts
game/levels/{level1..level7,hints}.ts # canonical prompts + validators + hints
db/{schema,client}.ts
utils/cn.ts
public/
brand/ # DigitalOcean logo + icon
levels/2/ # Six suspect portraits (generated by seed-level2)
scripts/
smoke-inference.ts
seed-kb.ts
seed-level2-photos.ts
kb-seed/ # 10 markdown intel files
app.yaml # App Platform deploy spec
middleware.ts # IP rate-limit + admin basic auth
drizzle.config.ts
- Inference keys never leave the server. Every model call goes through a Next.js route handler running on the server. The browser bundle does not contain the key.
- No prompts or responses are persisted by default. The
telemetry_logtable records only metadata (model, latency, token counts, cost). SetLOG_PAYLOADS=1in dev if you need raw payloads locally; never in production. - No email or signup gate. The leaderboard uses a self-chosen display name and a cookie-scoped run id. No PII is collected.
- IP rate limit on every
/api/inference/*route — 30 calls/minute/IP — to keep accidental cost runaways bounded. - Image data URIs and base64 audio are sanitized out of the client store. A 2 MB inbound vision request shows up in the transparency drawer as
"<2153 KB image/png data URI>"(28 chars) so the localStorage 5 MB quota can never blow up on a long playthrough.
Issues and PRs welcome at https://github.com/DO-Solutions/inference-heist.
For new levels: each level lives in two files — lib/game/levels/levelN.ts (canonical prompts, validators, constants) and components/levels/LevelN.tsx (UI). Wire it into app/(game)/play/PlayShell.tsx and add hints in lib/game/levels/hints.ts. Levels 1 (chat/streaming) and 3 (image+vision) are the cleanest reference implementations.
For new modalities: the inference client wrapper lives in lib/inference/client.ts — every call returns { result, telemetry }. Add a new exported function there, a route handler under app/api/inference/{name}/route.ts, and a pricing entry in lib/inference/pricing.ts.
Apache 2.0 — see LICENSE.