Skip to content

DO-Solutions/inference-heist

Repository files navigation

The Inference Heist

A browser game that teaches DigitalOcean Serverless Inference by pulling a one-night heist.

Seven levels. Every modality — chat, streaming, vision, image generation, TTS, embeddings, knowledge-base retrieval, inference routing, reasoning — exercised inside the gameplay loop. Every API call shows up in a real-time transparency drawer with model, tokens, latency, cost, and a runnable code snippet. By the time you crack the vault you know how to use the platform.

Deploy to DO

Quick start · The levels · Architecture · Going live


Why this exists

The DigitalOcean Serverless Inference platform exposes a lot of capabilities — chat, vision, image generation, TTS, embeddings, retrieval, reasoning, routing. Docs and quickstarts cover them individually. This game wires them together in a single playthrough so a developer ends a 10-minute session having actually exercised all of them: with their own prompts, their own model choices, their own cost telemetry, in their own browser.

The fiction is a noir heist; the substance is the API surface.

The levels

# Codename What it teaches Endpoint(s)
1 The Briefing Streaming chat completions, system prompts, structured token extraction (FACT::key=value) chat/completions (streaming)
2 The Photo Wall Multimodal vision — describing real generated portraits chat/completions with image input · async-invoke (image gen, one-time seed)
3 The Forger Image generation + LLM-as-judge grading on the result images/generations + chat/completions (vision)
4 The Voice Lock Text-to-speech synthesis + LLM-as-judge on voice quality async-invoke (TTS via ElevenLabs) + chat/completions (judge)
5 The Archive Embeddings-driven retrieval + chat synthesis (RAG) embeddings + chat/completions (with retrieved-chunk context)
6 The Switchboard Inference routing — matching each job to the right model tier chat/completions across cheap / mid / premium models
7 The Vault Multi-constraint reasoning puzzle chat/completions with a reasoning model (e.g. arcee-trinity-large-thinking)

After the seven levels, the run-complete screen surfaces the player's final score against a top-50 leaderboard.

What's on screen during gameplay

  • HUD: live score, total tokens, total cost in USD, elapsed time, current level. The cost counter updates within 500 ms of every API call completing — cost-awareness is built into the gameplay loop.
  • Transparency Drawer (right-edge pill labeled "Show your work"): expands into a side panel listing every inference call made in the current run. For each call you see the model, latency, token in/out, exact USD cost, the full request and response JSON, and a copy-pasteable code snippet in Python, curl, or Node.
  • Hint Pill (right-edge, above the drawer): three curated hints per level, progressive from "gentle orientation" to "answer-level." Each hint reveal costs +1 attempt in the scoring formula.
  • Model Picker: every level exposes a searchable picker filtered to the modalities the level needs (vision, chat, image-gen, TTS, embed, router, reasoning). Switching models mid-attempt is fine — costs and behavior change visibly in real time.

Quick start (mock mode, no DigitalOcean key required)

git clone https://github.com/DO-Solutions/inference-heist
cd inference-heist
cp .env.example .env.local
# MOCK_INFERENCE defaults to 1 in .env.example so this works with no credentials.

npm install
npm run dev
# → http://localhost:3000

Open /play, name yourself, and walk through all seven levels. The transparency drawer shows mocked telemetry; the design language, scoring, and gameplay are otherwise identical to live mode.

Going live (real DigitalOcean Serverless Inference)

  1. Generate an inference key at https://cloud.digitalocean.com/gen-ai/access-keys. Drop it in .env.local as DO_INFERENCE_KEY and set MOCK_INFERENCE=0.
  2. (Optional) provision Managed Postgres — without it the game still plays, but the leaderboard and admin insights stay empty.
    # in .env.local
    DATABASE_URL=postgres://...
    Then create the tables: npx drizzle-kit push.
  3. Generate the Level 2 suspect portraits (one-time, ~$0.24 in image-gen costs):
    npm run seed-level2
    This writes six 1024×1024 noir portraits to public/levels/2/. Re-run with --force to regenerate.
  4. Seed the Level 5 knowledge base. The seed script writes 10 fictional intel markdown files to scripts/kb-seed/:
    npm run seed-kb
    For the production knowledge-base path, upload those files to a DigitalOcean KB named heist-archive via the control panel, then set DO_KB_ID=<id> in .env.local. Without DO_KB_ID the route falls back to local BM25 over the same files — still fully playable.
  5. (Optional) admin gate. Set ADMIN_USER / ADMIN_PASS to unlock /admin/insights (PMM dashboard: level drop-off, model picks, average cost per finished run).
  6. Deploy to DigitalOcean App Platform.
    doctl apps create --spec app.yaml
    Or click the Deploy to DO badge at the top of this README after forking.

Architecture

   Browser (Zustand store)
        │
        │  fetch /api/inference/*
        ▼
   Next.js Route Handlers ◄────►  https://inference.do-ai.run/v1
        │                          (DigitalOcean Serverless Inference)
        │  optional persistence
        ▼
   DigitalOcean Managed Postgres
        ↳ runs, level_results, telemetry_log

Every inference call goes through a server-side Next.js route handler. The DO inference key is never shipped to the browser. Route handlers also enrich responses with the transparency payload — model, latency, computed cost — before streaming back to the client.

Two patterns deserve a closer look:

  • Streaming chat (Level 1) uses Server-Sent Events. The route handler emits event: delta for each model token and event: telemetry once at the end, so the client can paint streamed tokens into the UI and only resolve cost/usage when the model is done.
  • TTS via async-invoke (Level 4) uses DigitalOcean's POST /v1/async-invoke endpoint (the fal-ai bridge to ElevenLabs multilingual-v2). The route submits the job, polls /v1/async-invoke/{request_id} until status === "COMPLETED", fetches the rendered MP3, and returns base64 to the browser.

Tech stack

  • Next.js 16 (App Router) + TypeScript (strict)
  • Tailwind CSS v4 (CSS-first config) with hand-rolled shadcn-style primitives over Radix UI
  • Zustand for client state (capped, sanitized, partially persisted to localStorage)
  • Drizzle ORM over DigitalOcean Managed Postgres for leaderboard + telemetry persistence
  • OpenAI SDK pointed at https://inference.do-ai.run/v1 for chat / vision / image / embeddings
  • Direct fetch for the async-invoke TTS path (the SDK doesn't yet expose it)

Local development

npm run dev          # Next dev server
npm run smoke        # Smoke-test the inference client end-to-end (works in mock mode)
npm run seed-kb      # Generate Level 5 markdown intel files (free, local-only)
npm run seed-level2  # Generate the six Level 2 suspect portraits (~$0.24 one-time)
npm run build        # Production build
npm run lint         # ESLint
npx tsc --noEmit     # Typecheck

Project layout

app/                                  # Next.js App Router
  (game)/{play,leaderboard}/          # game shell + leaderboard
  admin/insights/                     # PMM insights (basic auth)
  api/
    inference/{chat,vision,image,tts,stt,embed,kb,route}/route.ts
    models/route.ts                   # 24h-cached model catalog
    score/route.ts                    # POST: persist a finished run
    leaderboard/route.ts              # GET: top 50
    admin/insights/route.ts           # aggregate stats
components/
  shared/{HUD,TransparencyDrawer,ModelPicker,Bootstrap,LevelComplete,HintBox}.tsx
  levels/Level{1..8}.tsx
  ui/{button,tabs,sheet,input}.tsx    # shadcn-style primitives
lib/
  inference/{client,pricing,models,snippet,sse,types,local-kb}.ts
  game/{store,scoring,levels,mock}.ts
  game/levels/{level1..level7,hints}.ts   # canonical prompts + validators + hints
  db/{schema,client}.ts
  utils/cn.ts
public/
  brand/                              # DigitalOcean logo + icon
  levels/2/                           # Six suspect portraits (generated by seed-level2)
scripts/
  smoke-inference.ts
  seed-kb.ts
  seed-level2-photos.ts
  kb-seed/                            # 10 markdown intel files
app.yaml                              # App Platform deploy spec
middleware.ts                         # IP rate-limit + admin basic auth
drizzle.config.ts

Privacy + safety

  • Inference keys never leave the server. Every model call goes through a Next.js route handler running on the server. The browser bundle does not contain the key.
  • No prompts or responses are persisted by default. The telemetry_log table records only metadata (model, latency, token counts, cost). Set LOG_PAYLOADS=1 in dev if you need raw payloads locally; never in production.
  • No email or signup gate. The leaderboard uses a self-chosen display name and a cookie-scoped run id. No PII is collected.
  • IP rate limit on every /api/inference/* route — 30 calls/minute/IP — to keep accidental cost runaways bounded.
  • Image data URIs and base64 audio are sanitized out of the client store. A 2 MB inbound vision request shows up in the transparency drawer as "<2153 KB image/png data URI>" (28 chars) so the localStorage 5 MB quota can never blow up on a long playthrough.

Contributing

Issues and PRs welcome at https://github.com/DO-Solutions/inference-heist.

For new levels: each level lives in two files — lib/game/levels/levelN.ts (canonical prompts, validators, constants) and components/levels/LevelN.tsx (UI). Wire it into app/(game)/play/PlayShell.tsx and add hints in lib/game/levels/hints.ts. Levels 1 (chat/streaming) and 3 (image+vision) are the cleanest reference implementations.

For new modalities: the inference client wrapper lives in lib/inference/client.ts — every call returns { result, telemetry }. Add a new exported function there, a route handler under app/api/inference/{name}/route.ts, and a pricing entry in lib/inference/pricing.ts.

License

Apache 2.0 — see LICENSE.

About

A browser game that teaches DigitalOcean Serverless Inference by pulling a one-night heist.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors