The Inference Heist

A browser game that teaches DigitalOcean Serverless Inference by pulling a one-night heist.

Seven levels. Every modality — chat, streaming, vision, image generation, TTS, embeddings, knowledge-base retrieval, inference routing, reasoning — exercised inside the gameplay loop. Every API call shows up in a real-time transparency drawer with model, tokens, latency, cost, and a runnable code snippet. By the time you crack the vault you know how to use the platform.

Quick start · The levels · Architecture · Going live

Why this exists

The DigitalOcean Serverless Inference platform exposes a lot of capabilities — chat, vision, image generation, TTS, embeddings, retrieval, reasoning, routing. Docs and quickstarts cover them individually. This game wires them together in a single playthrough so a developer ends a 10-minute session having actually exercised all of them: with their own prompts, their own model choices, their own cost telemetry, in their own browser.

The fiction is a noir heist; the substance is the API surface.

The levels

#	Codename	What it teaches	Endpoint(s)
1	The Briefing	Streaming chat completions, system prompts, structured token extraction (`FACT::key=value`)	`chat/completions` (streaming)
2	The Photo Wall	Multimodal vision — describing real generated portraits	`chat/completions` with image input · `async-invoke` (image gen, one-time seed)
3	The Forger	Image generation + LLM-as-judge grading on the result	`images/generations` + `chat/completions` (vision)
4	The Voice Lock	Text-to-speech synthesis + LLM-as-judge on voice quality	`async-invoke` (TTS via ElevenLabs) + `chat/completions` (judge)
5	The Archive	Embeddings-driven retrieval + chat synthesis (RAG)	`embeddings` + `chat/completions` (with retrieved-chunk context)
6	The Switchboard	Inference routing — matching each job to the right model tier	`chat/completions` across cheap / mid / premium models
7	The Vault	Multi-constraint reasoning puzzle	`chat/completions` with a reasoning model (e.g. `arcee-trinity-large-thinking`)

After the seven levels, the run-complete screen surfaces the player's final score against a top-50 leaderboard.

What's on screen during gameplay

HUD: live score, total tokens, total cost in USD, elapsed time, current level. The cost counter updates within 500 ms of every API call completing — cost-awareness is built into the gameplay loop.
Transparency Drawer (right-edge pill labeled "Show your work"): expands into a side panel listing every inference call made in the current run. For each call you see the model, latency, token in/out, exact USD cost, the full request and response JSON, and a copy-pasteable code snippet in Python, curl, or Node.
Hint Pill (right-edge, above the drawer): three curated hints per level, progressive from "gentle orientation" to "answer-level." Each hint reveal costs +1 attempt in the scoring formula.
Model Picker: every level exposes a searchable picker filtered to the modalities the level needs (vision, chat, image-gen, TTS, embed, router, reasoning). Switching models mid-attempt is fine — costs and behavior change visibly in real time.

Quick start (mock mode, no DigitalOcean key required)

git clone https://github.com/DO-Solutions/inference-heist
cd inference-heist
cp .env.example .env.local
# MOCK_INFERENCE defaults to 1 in .env.example so this works with no credentials.

npm install
npm run dev
# → http://localhost:3000

Open /play, name yourself, and walk through all seven levels. The transparency drawer shows mocked telemetry; the design language, scoring, and gameplay are otherwise identical to live mode.

Going live (real DigitalOcean Serverless Inference)

Generate an inference key at https://cloud.digitalocean.com/gen-ai/access-keys. Drop it in .env.local as DO_INFERENCE_KEY and set MOCK_INFERENCE=0.
(Optional) provision Managed Postgres — without it the game still plays, but the leaderboard and admin insights stay empty.
```
# in .env.local
DATABASE_URL=postgres://...
```
Then create the tables: npx drizzle-kit push.
Generate the Level 2 suspect portraits (one-time, ~$0.24 in image-gen costs):
```
npm run seed-level2
```
This writes six 1024×1024 noir portraits to public/levels/2/. Re-run with --force to regenerate.
Seed the Level 5 knowledge base. The seed script writes 10 fictional intel markdown files to scripts/kb-seed/:
```
npm run seed-kb
```
For the production knowledge-base path, upload those files to a DigitalOcean KB named heist-archive via the control panel, then set DO_KB_ID=<id> in .env.local. Without DO_KB_ID the route falls back to local BM25 over the same files — still fully playable.
(Optional) admin gate. Set ADMIN_USER / ADMIN_PASS to unlock /admin/insights (PMM dashboard: level drop-off, model picks, average cost per finished run).
Deploy to DigitalOcean App Platform.
```
doctl apps create --spec app.yaml
```
Or click the Deploy to DO badge at the top of this README after forking.

Architecture

   Browser (Zustand store)
        │
        │  fetch /api/inference/*
        ▼
   Next.js Route Handlers ◄────►  https://inference.do-ai.run/v1
        │                          (DigitalOcean Serverless Inference)
        │  optional persistence
        ▼
   DigitalOcean Managed Postgres
        ↳ runs, level_results, telemetry_log

Every inference call goes through a server-side Next.js route handler. The DO inference key is never shipped to the browser. Route handlers also enrich responses with the transparency payload — model, latency, computed cost — before streaming back to the client.

Two patterns deserve a closer look:

Streaming chat (Level 1) uses Server-Sent Events. The route handler emits event: delta for each model token and event: telemetry once at the end, so the client can paint streamed tokens into the UI and only resolve cost/usage when the model is done.
TTS via async-invoke (Level 4) uses DigitalOcean's POST /v1/async-invoke endpoint (the fal-ai bridge to ElevenLabs multilingual-v2). The route submits the job, polls /v1/async-invoke/{request_id} until status === "COMPLETED", fetches the rendered MP3, and returns base64 to the browser.

Tech stack

Next.js 16 (App Router) + TypeScript (strict)
Tailwind CSS v4 (CSS-first config) with hand-rolled shadcn-style primitives over Radix UI
Zustand for client state (capped, sanitized, partially persisted to localStorage)
Drizzle ORM over DigitalOcean Managed Postgres for leaderboard + telemetry persistence
OpenAI SDK pointed at https://inference.do-ai.run/v1 for chat / vision / image / embeddings
Direct fetch for the async-invoke TTS path (the SDK doesn't yet expose it)

Local development

npm run dev          # Next dev server
npm run smoke        # Smoke-test the inference client end-to-end (works in mock mode)
npm run seed-kb      # Generate Level 5 markdown intel files (free, local-only)
npm run seed-level2  # Generate the six Level 2 suspect portraits (~$0.24 one-time)
npm run build        # Production build
npm run lint         # ESLint
npx tsc --noEmit     # Typecheck

Project layout

app/                                  # Next.js App Router
  (game)/{play,leaderboard}/          # game shell + leaderboard
  admin/insights/                     # PMM insights (basic auth)
  api/
    inference/{chat,vision,image,tts,stt,embed,kb,route}/route.ts
    models/route.ts                   # 24h-cached model catalog
    score/route.ts                    # POST: persist a finished run
    leaderboard/route.ts              # GET: top 50
    admin/insights/route.ts           # aggregate stats
components/
  shared/{HUD,TransparencyDrawer,ModelPicker,Bootstrap,LevelComplete,HintBox}.tsx
  levels/Level{1..8}.tsx
  ui/{button,tabs,sheet,input}.tsx    # shadcn-style primitives
lib/
  inference/{client,pricing,models,snippet,sse,types,local-kb}.ts
  game/{store,scoring,levels,mock}.ts
  game/levels/{level1..level7,hints}.ts   # canonical prompts + validators + hints
  db/{schema,client}.ts
  utils/cn.ts
public/
  brand/                              # DigitalOcean logo + icon
  levels/2/                           # Six suspect portraits (generated by seed-level2)
scripts/
  smoke-inference.ts
  seed-kb.ts
  seed-level2-photos.ts
  kb-seed/                            # 10 markdown intel files
app.yaml                              # App Platform deploy spec
middleware.ts                         # IP rate-limit + admin basic auth
drizzle.config.ts

Privacy + safety

Inference keys never leave the server. Every model call goes through a Next.js route handler running on the server. The browser bundle does not contain the key.
No prompts or responses are persisted by default. The telemetry_log table records only metadata (model, latency, token counts, cost). Set LOG_PAYLOADS=1 in dev if you need raw payloads locally; never in production.
No email or signup gate. The leaderboard uses a self-chosen display name and a cookie-scoped run id. No PII is collected.
IP rate limit on every /api/inference/* route — 30 calls/minute/IP — to keep accidental cost runaways bounded.
Image data URIs and base64 audio are sanitized out of the client store. A 2 MB inbound vision request shows up in the transparency drawer as "<2153 KB image/png data URI>" (28 chars) so the localStorage 5 MB quota can never blow up on a long playthrough.

Contributing

Issues and PRs welcome at https://github.com/DO-Solutions/inference-heist.

For new levels: each level lives in two files — lib/game/levels/levelN.ts (canonical prompts, validators, constants) and components/levels/LevelN.tsx (UI). Wire it into app/(game)/play/PlayShell.tsx and add hints in lib/game/levels/hints.ts. Levels 1 (chat/streaming) and 3 (image+vision) are the cleanest reference implementations.

For new modalities: the inference client wrapper lives in lib/inference/client.ts — every call returns { result, telemetry }. Add a new exported function there, a route handler under app/api/inference/{name}/route.ts, and a pricing entry in lib/inference/pricing.ts.

License

Apache 2.0 — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Inference Heist

Why this exists

The levels

What's on screen during gameplay

Quick start (mock mode, no DigitalOcean key required)

Going live (real DigitalOcean Serverless Inference)

Architecture

Tech stack

Local development

Project layout

Privacy + safety

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.do		.do
app		app
components		components
lib		lib
public		public
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.yaml		app.yaml
drizzle.config.ts		drizzle.config.ts
eslint.config.mjs		eslint.config.mjs
middleware.ts		middleware.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

The Inference Heist

Why this exists

The levels

What's on screen during gameplay

Quick start (mock mode, no DigitalOcean key required)

Going live (real DigitalOcean Serverless Inference)

Architecture

Tech stack

Local development

Project layout

Privacy + safety

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages