Skip to content

Commit cf7ee7a

Browse files
docs(design): key rumble compatibility and epochs on behaviorVersion
Release versions stay lockstep across all artifacts; a server-owned integer behaviorVersion, bumped only on game-observable changes, becomes the compatibility contract for clients, result validation, and epochs. Supersedes the earlier patch-vs-minor rule. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent de94cb6 commit cf7ee7a

3 files changed

Lines changed: 61 additions & 37 deletions

File tree

docs/design/rumble/README.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ tracks them so sub-documents stay consistent:
112112
|-----------|----------------|
113113
| **Ruleset and scoring = RoboRumble/LiteRumble, unchanged** (APS primary; Win%, Survival, Vote, NPP/ANPP, KNNPBI, Glicko-2). Battle tested for two decades; do not reinvent. | Aggregation doc |
114114
| **Own-bot priority: yes**, with the self-reported-only marker plus independent confirmation for trust. | Client doc |
115-
| **Engine pinning at minor granularity, patch range allowed.** Requires an engine release-policy commitment: patch releases (1.1.x) must never change game-observable behavior; anything that could alter outcomes is at least a minor bump. A minor bump obsoletes all clients at once and opens a new result **epoch**. | Client + aggregation docs |
115+
| **Engine pinning by `behaviorVersion`.** Release versions stay lockstep across all Tank Royale artifacts (the right model for the product); a separate integer `behaviorVersion`, owned by the server and bumped only on game-observable changes (server physics/scoring/turn processing/RNG plus Bot API behavior), is the compatibility contract. Compatibility, client rollout, and result **epochs** all key on it; releases that do not bump it (e.g. GUI-only) cause no rollout and no epoch reset. Supersedes the earlier patch-vs-minor rule. | Client + aggregation docs |
116116
| **One active version per bot** (version bump supersedes the old one, RoboRumble style), plus a per-owner **bot slot** budget. | Submission doc |
117117
| **Owner (forge account) is distinct from authors (display names)**; ownership drives permissions, slots, bans, and self-report detection. | Submission doc |
118118
| **Banning** of owners/bots via an auditable list; enforced automatically in submission CI and result validation. | Submission + aggregation docs |
@@ -148,9 +148,12 @@ Questions that span more than one sub-document. Per-document questions live in t
148148
2. **Game types.** Start with 1v1 only, or 1v1 and melee from day one? Melee multiplies the pairing
149149
space and the matchmaking math.
150150
3. **Working name.** "Tank Royale Rumble" is used throughout as a placeholder.
151-
4. **Enforcing patch behavior-invariance.** The epoch model leans on "patches never change game
152-
behavior". How is that guarded in the engine's release process (deterministic replay regression
153-
tests against recorded battles would be the strong version)?
151+
4. **Guarding `behaviorVersion` bumps.** The epoch model leans on the behavior version being
152+
bumped exactly when game-observable behavior changes. The strong guard is a deterministic
153+
replay regression test in the engine's CI: replay recorded battles and compare outcomes; an
154+
unintended difference fails the build, an intended one requires bumping `behaviorVersion` in
155+
the same change. The concrete test design (which battles, how many, where recordings live) is
156+
open.
154157

155158
## Glossary
156159

@@ -163,6 +166,7 @@ Questions that span more than one sub-document. Per-document questions live in t
163166
| Issue-ops | Using forge issues as a write-free submission inbox, drained by CI. |
164167
| Single writer | The rule that only the aggregation CI commits to `rumble-data`. |
165168
| Owner | The forge account that submitted a bot; drives permissions, slots, bans. Distinct from the display `authors` in the bot config. |
166-
| Epoch | The result partition for one engine minor version. Patches stay in the epoch; a minor bump opens a new one and the ranked table restarts sampling. |
169+
| Behavior version | Server-owned integer bumped only on game-observable changes (physics, scoring, turn processing, RNG, Bot API behavior). The compatibility contract between engine, clients, and results; independent of the lockstep release version. |
170+
| Epoch | The result partition for one `behaviorVersion`. Releases that keep the behavior version stay in the epoch; a bump opens a new one and the ranked table restarts sampling. |
167171
| Practice mode | Client mode for private tuning battles; results are never submitted. Ranked mode auto-submits everything. |
168172
| Journal | The client's local append-only file of ranked results, submitted in batches. |

docs/design/rumble/aggregation-and-dashboard.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ rumble-data/
2020
├── matchmaking/matches_needed.json (projection: advice for clients)
2121
├── matchmaking/pairings.json (projection: per-pairing stats)
2222
├── clients.json (projection: per-client stats + flags)
23-
├── engine.json (pinned server/runner versions)
23+
├── engine.json (pinned behaviorVersion + release/image)
2424
├── wellknown/rumble.json (canonical-location pointer)
2525
├── scripts/validate.py (payload validation)
2626
├── scripts/aggregate.py (facts → all projections, pure function)
@@ -45,7 +45,7 @@ sequenceDiagram
4545
Cl->>Ib: result payloads (all day)
4646
CI->>Ib: drain all open items
4747
loop each payload (a batch of 1..N results)
48-
CI->>CI: validate.py per result<br/>(schema, engine pin, plausibility,<br/>known active bots, ban list,<br/>duplicate battleId / payload hash)
48+
CI->>CI: validate.py per result<br/>(schema, behaviorVersion pin, plausibility,<br/>known active bots, ban list,<br/>duplicate battleId / payload hash)
4949
alt valid
5050
CI->>Raw: stage file (content-addressed name)
5151
else invalid
@@ -188,13 +188,15 @@ batch for the rest) if full recompute proves slow.
188188
`bots/index.json`, see the submission document). Superseded, retired, and disqualified versions
189189
keep their facts and per-version detail shards but leave the ranked table, exactly like a
190190
RoboRumble version bump.
191-
- Results are partitioned into **epochs by engine minor version** (`1.1`, `1.2`, ...). Patch
192-
releases do not open a new epoch (the engine release policy guarantees patches are
193-
behavior-neutral; see the client document). A minor/major bump opens a new epoch: the ranked
194-
leaderboard is computed from the current epoch only, while old epochs remain browsable archives.
195-
This is the honest consequence of "mixed engine versions corrupt comparability": rather than
196-
pretending results across minors are comparable, the rumble restarts sampling and lets
197-
matchmaking (everything is suddenly under-sampled) rebuild the table quickly.
191+
- Results are partitioned into **epochs by `behaviorVersion`** (the server-owned integer that
192+
bumps only on game-observable changes; see the client document's Engine Pinning section). The
193+
release version is irrelevant here: a GUI-only release, whatever its semver bump, keeps the
194+
behavior version and therefore the epoch. A `behaviorVersion` bump opens a new epoch: the
195+
ranked leaderboard is computed from the current epoch only, while old epochs remain browsable
196+
archives. This is the honest consequence of "mixed game behavior corrupts comparability":
197+
rather than pretending results across behavior versions are comparable, the rumble restarts
198+
sampling and lets matchmaking (everything is suddenly under-sampled) rebuild the table
199+
quickly.
198200

199201
### Matchmaking output
200202

@@ -240,7 +242,7 @@ as the classic rumble; per-pairing averaging makes extra samples harmless.
240242
"aps": 78.42, "winPct": 91.4, "survival": 84.1, "vote": 3.2,
241243
"anpp": 81.7, "knnpbi": -0.4, "glicko2": 1834,
242244
"battles": 412, "pairings": 148, "pairingsTotal": 152, "unconfirmedPairings": 2,
243-
"epoch": "1.1", "firstSeen": "2026-05-01" }
245+
"epoch": 7, "firstSeen": "2026-05-01" }
244246
]
245247
}
246248
```

docs/design/rumble/client-battles-and-results.md

Lines changed: 40 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ or corrupted results, which is a bigger practical risk than malicious bot code.
152152
| Layer | Mechanism | Cost |
153153
|-------|-----------|------|
154154
| Identity | The forge account that opens the result issue / fork-PR **is** the identity. No extra key management in v1. | Free |
155-
| Plausibility | Server-side validation: schema, engine version matches the pin, scores consistent with rounds and ranks, known bot versions, duplicate hash detection. | Script |
155+
| Plausibility | Server-side validation: schema, `behaviorVersion` matches the pin, scores consistent with rounds and ranks, known bot versions, duplicate hash detection. | Script |
156156
| Consensus | With dozens of clients, most pairings get samples from several submitters. Per-client deviation from pairing consensus is computed; outliers are flagged in a report for moderators, and a client can be quarantined (its results excluded by the pure recompute, since facts are never deleted). | Script |
157157
| Self-report marker | Pairings sampled only by a participant's owner stay "unconfirmed" until an independent client contributes (see above). | Script |
158158
| Evidence | Each battle has a `battleId` (UUID) binding the result record to a locally kept, read-only `.battle.gz` replay whose SHA-256 is in the record. Moderators can request the replay for a disputed result. Spot-check, not universal verification. See "Replay evidence store" below. | Optional |
@@ -170,7 +170,7 @@ One immutable JSON file per battle. Participant fields map 1:1 onto the existing
170170
"gameType": "classic-1v1",
171171
"timestamp": "2026-07-02T14:03:22Z",
172172
"client": { "id": "flemming-desktop-01", "version": "0.3.0" },
173-
"engine": { "serverVersion": "1.0.2", "runnerVersion": "1.0.2" },
173+
"engine": { "behaviorVersion": 7, "serverVersion": "1.1.4", "runnerVersion": "1.1.4" },
174174
"botsRepoCommit": "08940d5",
175175
"rounds": 35,
176176
"participants": [
@@ -232,32 +232,47 @@ Conflict-freedom comes from content-addressed filenames, never coordination:
232232

233233
## Engine Pinning and Version Rollout
234234

235-
`rumble-data/engine.json` pins the engine at **minor-version granularity with a patch range**:
235+
Tank Royale releases all artifacts in **lockstep** (one release version for server, booter, GUI,
236+
recorder, runner, Bot APIs). That is the right model for the product, but it means a release
237+
version bump signals "something in the suite changed", not "the game changed": a GUI-only
238+
feature release must not obsolete rumble clients or reset rankings. The rumble therefore pins a
239+
second, dedicated axis:
240+
241+
**`behaviorVersion`**: a plain integer owned by the server, bumped **only** when game-observable
242+
behavior changes, i.e. anything that could alter the outcome of a battle: server physics,
243+
scoring, turn processing, RNG behavior, and Bot API changes that affect what bots do (event
244+
dispatch fixes are the canonical example: patch-sized code changes that absolutely change battle
245+
outcomes). It is reported in the server handshake, stamped into every result record, and pinned
246+
in `rumble-data/engine.json`:
236247

237248
```json
238249
{
239250
"schemaVersion": 1,
240-
"engine": "1.1",
241-
"minPatch": 2,
242-
"clientImage": "ghcr.io/<org>/rumble-client:1.1",
243-
"behaviorNote": "Patch releases (1.1.x) must not change game behavior; see release policy."
251+
"behaviorVersion": 7,
252+
"release": "1.1.4",
253+
"clientImage": "ghcr.io/<org>/rumble-client:1.1.4"
244254
}
245255
```
246256

247257
Semantics:
248258

249-
- **Patch releases (`1.1.x`) are accepted interchangeably.** This relies on a release-policy
250-
commitment in Tank Royale itself: a patch must never change physics, scoring, or any
251-
game-observable behavior. Anything that could alter battle outcomes is at least a minor bump.
252-
This is a policy the rumble imposes back onto the engine's release process, and it should be
253-
stated in the engine's release documentation and guarded by regression tests where possible.
254-
- **A minor/major bump (`1.x.0`) is a rollout event.** All clients become obsolete at that
255-
moment, by design: mixed engine minors would silently corrupt result comparability. On its next
259+
- **Compatibility is decided by `behaviorVersion`, never by the release version.** Any release
260+
that carries the pinned behavior version is acceptable; the client and the validator both
261+
compare the behavior version reported by the running server against the pin. A GUI-only
262+
release 1.2.0 with unchanged `behaviorVersion 7` causes no rollout, no client obsolescence,
263+
and no epoch reset. `release`/`clientImage` in the pin are convenience ("which build to
264+
install"), not the compatibility contract.
265+
- **A `behaviorVersion` bump is the rollout event.** All clients become obsolete at that moment,
266+
by design: mixed behavior versions would silently corrupt result comparability. On its next
256267
sync the client sees the new pin, refuses ranked mode, and prints exactly how to upgrade.
257-
Results produced on the old version are rejected by the validator (and the client will not
258-
submit them). Whether a minor bump also resets or partitions the rankings is handled in the
259-
aggregation document (result epochs).
260-
- Upgrading must be **one step**: `docker pull ghcr.io/<org>/rumble-client:1.2` for container
268+
Results produced on the old behavior version are rejected by the validator (and the client
269+
will not submit them). Each behavior version is its own result **epoch** (aggregation
270+
document).
271+
- **Bump discipline is guarded, not trusted.** The engine's CI replays recorded battles
272+
deterministically and compares outcomes: an unintended outcome difference fails the build, and
273+
an intended one requires bumping `behaviorVersion` in the same change (umbrella open question
274+
on the concrete test design).
275+
- Upgrading must be **one step**: `docker pull` the image named in the new pin for container
261276
users, or re-running the platform install script for bare-metal users.
262277

263278
## Runtimes: the Client Container and Install Scripts
@@ -268,7 +283,9 @@ participation killer, so the primary distribution is a container image:
268283

269284
- **`rumble-client` image**: bundles the pinned server, booter, runner, the rumble client itself,
270285
plus the exact runtime versions (JRE, .NET SDK, Python, Node.js/npm) matching the engine pin.
271-
Tagged by engine version (`rumble-client:1.1`), so upgrading engine and runtimes is one pull.
286+
Tagged by release version (`rumble-client:1.1.4`, the image named in `engine.json`), so
287+
upgrading engine and runtimes is one pull; the pinned image implies the pinned
288+
`behaviorVersion`.
272289
- The image doubles as the **sandbox**: run with no outbound network (localhost WebSocket only)
273290
except the submission endpoint, and CPU/memory/time limits via container flags. Review reduces
274291
malice (submission document), the container contains it; no one pretends there is a central
@@ -294,9 +311,10 @@ participation killer, so the primary distribution is a container image:
294311
2. **`myBots` is purely a local scheduling hint.** No verification in the client. Ownership is
295312
verified where it matters: at bot upload time, where the bot name is bound to the owner
296313
account (see the submission document). The server-side self-report marker handles trust.
297-
3. **Journal staleness is bounded by the engine pin.** Queued results produced on an engine
298-
version older than the current pin are incompatible and are dropped (with a message telling
299-
the user what was discarded and why). No separate staleness clock is needed.
314+
3. **Journal staleness is bounded by the engine pin.** Queued results produced on a
315+
`behaviorVersion` other than the currently pinned one are incompatible and are dropped (with
316+
a message telling the user what was discarded and why). No separate staleness clock is
317+
needed.
300318
4. **Submission happens at battle boundaries.** A result exists only when a battle has completed
301319
its game type's full round count (e.g. 35 rounds); nothing is ever submitted mid-battle. The
302320
default is to submit after each completed battle; the journal batches multiple battles into

0 commit comments

Comments
 (0)