docs(design): key rumble compatibility and epochs on behaviorVersion

flemming-n-larsen · claude · flemming-n-larsen · commit cf7ee7a698d3 · 2026-07-02T22:39:28.000+02:00
Release versions stay lockstep across all artifacts; a server-owned
integer behaviorVersion, bumped only on game-observable changes, becomes
the compatibility contract for clients, result validation, and epochs.
Supersedes the earlier patch-vs-minor rule.

Co-Authored-By: Claude Fable 5 &lt;noreply@anthropic.com&gt;
diff --git a/docs/design/rumble/README.md b/docs/design/rumble/README.md
@@ -112,7 +112,7 @@ tracks them so sub-documents stay consistent:
 |-----------|----------------|
 | **Ruleset and scoring = RoboRumble/LiteRumble, unchanged** (APS primary; Win%, Survival, Vote, NPP/ANPP, KNNPBI, Glicko-2). Battle tested for two decades; do not reinvent. | Aggregation doc |
 | **Own-bot priority: yes**, with the self-reported-only marker plus independent confirmation for trust. | Client doc |
-| **Engine pinning at minor granularity, patch range allowed.** Requires an engine release-policy commitment: patch releases (1.1.x) must never change game-observable behavior; anything that could alter outcomes is at least a minor bump. A minor bump obsoletes all clients at once and opens a new result **epoch**. | Client + aggregation docs |
+| **Engine pinning by `behaviorVersion`.** Release versions stay lockstep across all Tank Royale artifacts (the right model for the product); a separate integer `behaviorVersion`, owned by the server and bumped only on game-observable changes (server physics/scoring/turn processing/RNG plus Bot API behavior), is the compatibility contract. Compatibility, client rollout, and result **epochs** all key on it; releases that do not bump it (e.g. GUI-only) cause no rollout and no epoch reset. Supersedes the earlier patch-vs-minor rule. | Client + aggregation docs |
 | **One active version per bot** (version bump supersedes the old one, RoboRumble style), plus a per-owner **bot slot** budget. | Submission doc |
 | **Owner (forge account) is distinct from authors (display names)**; ownership drives permissions, slots, bans, and self-report detection. | Submission doc |
 | **Banning** of owners/bots via an auditable list; enforced automatically in submission CI and result validation. | Submission + aggregation docs |
@@ -148,9 +148,12 @@ Questions that span more than one sub-document. Per-document questions live in t
 2. **Game types.** Start with 1v1 only, or 1v1 and melee from day one? Melee multiplies the pairing
    space and the matchmaking math.
 3. **Working name.** "Tank Royale Rumble" is used throughout as a placeholder.
-4. **Enforcing patch behavior-invariance.** The epoch model leans on "patches never change game
-   behavior". How is that guarded in the engine's release process (deterministic replay regression
-   tests against recorded battles would be the strong version)?
+4. **Guarding `behaviorVersion` bumps.** The epoch model leans on the behavior version being
+   bumped exactly when game-observable behavior changes. The strong guard is a deterministic
+   replay regression test in the engine's CI: replay recorded battles and compare outcomes; an
+   unintended difference fails the build, an intended one requires bumping `behaviorVersion` in
+   the same change. The concrete test design (which battles, how many, where recordings live) is
+   open.
 
 ## Glossary
 
@@ -163,6 +166,7 @@ Questions that span more than one sub-document. Per-document questions live in t
 | Issue-ops | Using forge issues as a write-free submission inbox, drained by CI. |
 | Single writer | The rule that only the aggregation CI commits to `rumble-data`. |
 | Owner | The forge account that submitted a bot; drives permissions, slots, bans. Distinct from the display `authors` in the bot config. |
-| Epoch | The result partition for one engine minor version. Patches stay in the epoch; a minor bump opens a new one and the ranked table restarts sampling. |
+| Behavior version | Server-owned integer bumped only on game-observable changes (physics, scoring, turn processing, RNG, Bot API behavior). The compatibility contract between engine, clients, and results; independent of the lockstep release version. |
+| Epoch | The result partition for one `behaviorVersion`. Releases that keep the behavior version stay in the epoch; a bump opens a new one and the ranked table restarts sampling. |
 | Practice mode | Client mode for private tuning battles; results are never submitted. Ranked mode auto-submits everything. |
 | Journal | The client's local append-only file of ranked results, submitted in batches. |
diff --git a/docs/design/rumble/aggregation-and-dashboard.md b/docs/design/rumble/aggregation-and-dashboard.md
@@ -20,7 +20,7 @@ rumble-data/
 ├── matchmaking/matches_needed.json      (projection: advice for clients)
 ├── matchmaking/pairings.json            (projection: per-pairing stats)
 ├── clients.json                         (projection: per-client stats + flags)
-├── engine.json                          (pinned server/runner versions)
+├── engine.json                          (pinned behaviorVersion + release/image)
 ├── wellknown/rumble.json                (canonical-location pointer)
 ├── scripts/validate.py                  (payload validation)
 ├── scripts/aggregate.py                 (facts → all projections, pure function)
@@ -45,7 +45,7 @@ sequenceDiagram
     Cl->>Ib: result payloads (all day)
     CI->>Ib: drain all open items
     loop each payload (a batch of 1..N results)
-        CI->>CI: validate.py per result<br/>(schema, engine pin, plausibility,<br/>known active bots, ban list,<br/>duplicate battleId / payload hash)
+        CI->>CI: validate.py per result<br/>(schema, behaviorVersion pin, plausibility,<br/>known active bots, ban list,<br/>duplicate battleId / payload hash)
         alt valid
             CI->>Raw: stage file (content-addressed name)
         else invalid
@@ -188,13 +188,15 @@ batch for the rest) if full recompute proves slow.
   `bots/index.json`, see the submission document). Superseded, retired, and disqualified versions
   keep their facts and per-version detail shards but leave the ranked table, exactly like a
   RoboRumble version bump.
-- Results are partitioned into **epochs by engine minor version** (`1.1`, `1.2`, ...). Patch
-  releases do not open a new epoch (the engine release policy guarantees patches are
-  behavior-neutral; see the client document). A minor/major bump opens a new epoch: the ranked
-  leaderboard is computed from the current epoch only, while old epochs remain browsable archives.
-  This is the honest consequence of "mixed engine versions corrupt comparability": rather than
-  pretending results across minors are comparable, the rumble restarts sampling and lets
-  matchmaking (everything is suddenly under-sampled) rebuild the table quickly.
+- Results are partitioned into **epochs by `behaviorVersion`** (the server-owned integer that
+  bumps only on game-observable changes; see the client document's Engine Pinning section). The
+  release version is irrelevant here: a GUI-only release, whatever its semver bump, keeps the
+  behavior version and therefore the epoch. A `behaviorVersion` bump opens a new epoch: the
+  ranked leaderboard is computed from the current epoch only, while old epochs remain browsable
+  archives. This is the honest consequence of "mixed game behavior corrupts comparability":
+  rather than pretending results across behavior versions are comparable, the rumble restarts
+  sampling and lets matchmaking (everything is suddenly under-sampled) rebuild the table
+  quickly.
 
 ### Matchmaking output
 
@@ -240,7 +242,7 @@ as the classic rumble; per-pairing averaging makes extra samples harmless.
       "aps": 78.42, "winPct": 91.4, "survival": 84.1, "vote": 3.2,
       "anpp": 81.7, "knnpbi": -0.4, "glicko2": 1834,
       "battles": 412, "pairings": 148, "pairingsTotal": 152, "unconfirmedPairings": 2,
-      "epoch": "1.1", "firstSeen": "2026-05-01" }
+      "epoch": 7, "firstSeen": "2026-05-01" }
   ]
 }
 ```
diff --git a/docs/design/rumble/client-battles-and-results.md b/docs/design/rumble/client-battles-and-results.md
@@ -152,7 +152,7 @@ or corrupted results, which is a bigger practical risk than malicious bot code.
 | Layer | Mechanism | Cost |
 |-------|-----------|------|
 | Identity | The forge account that opens the result issue / fork-PR **is** the identity. No extra key management in v1. | Free |
-| Plausibility | Server-side validation: schema, engine version matches the pin, scores consistent with rounds and ranks, known bot versions, duplicate hash detection. | Script |
+| Plausibility | Server-side validation: schema, `behaviorVersion` matches the pin, scores consistent with rounds and ranks, known bot versions, duplicate hash detection. | Script |
 | Consensus | With dozens of clients, most pairings get samples from several submitters. Per-client deviation from pairing consensus is computed; outliers are flagged in a report for moderators, and a client can be quarantined (its results excluded by the pure recompute, since facts are never deleted). | Script |
 | Self-report marker | Pairings sampled only by a participant's owner stay "unconfirmed" until an independent client contributes (see above). | Script |
 | Evidence | Each battle has a `battleId` (UUID) binding the result record to a locally kept, read-only `.battle.gz` replay whose SHA-256 is in the record. Moderators can request the replay for a disputed result. Spot-check, not universal verification. See "Replay evidence store" below. | Optional |
@@ -170,7 +170,7 @@ One immutable JSON file per battle. Participant fields map 1:1 onto the existing
   "gameType": "classic-1v1",
   "timestamp": "2026-07-02T14:03:22Z",
   "client": { "id": "flemming-desktop-01", "version": "0.3.0" },
-  "engine": { "serverVersion": "1.0.2", "runnerVersion": "1.0.2" },
+  "engine": { "behaviorVersion": 7, "serverVersion": "1.1.4", "runnerVersion": "1.1.4" },
   "botsRepoCommit": "08940d5",
   "rounds": 35,
   "participants": [
@@ -232,32 +232,47 @@ Conflict-freedom comes from content-addressed filenames, never coordination:
 
 ## Engine Pinning and Version Rollout
 
-`rumble-data/engine.json` pins the engine at **minor-version granularity with a patch range**:
+Tank Royale releases all artifacts in **lockstep** (one release version for server, booter, GUI,
+recorder, runner, Bot APIs). That is the right model for the product, but it means a release
+version bump signals "something in the suite changed", not "the game changed": a GUI-only
+feature release must not obsolete rumble clients or reset rankings. The rumble therefore pins a
+second, dedicated axis:
+
+**`behaviorVersion`**: a plain integer owned by the server, bumped **only** when game-observable
+behavior changes, i.e. anything that could alter the outcome of a battle: server physics,
+scoring, turn processing, RNG behavior, and Bot API changes that affect what bots do (event
+dispatch fixes are the canonical example: patch-sized code changes that absolutely change battle
+outcomes). It is reported in the server handshake, stamped into every result record, and pinned
+in `rumble-data/engine.json`:
 
 ```json
 {
   "schemaVersion": 1,
-  "engine": "1.1",
-  "minPatch": 2,
-  "clientImage": "ghcr.io/<org>/rumble-client:1.1",
-  "behaviorNote": "Patch releases (1.1.x) must not change game behavior; see release policy."
+  "behaviorVersion": 7,
+  "release": "1.1.4",
+  "clientImage": "ghcr.io/<org>/rumble-client:1.1.4"
 }
 ```
 
 Semantics:
 
-- **Patch releases (`1.1.x`) are accepted interchangeably.** This relies on a release-policy
-  commitment in Tank Royale itself: a patch must never change physics, scoring, or any
-  game-observable behavior. Anything that could alter battle outcomes is at least a minor bump.
-  This is a policy the rumble imposes back onto the engine's release process, and it should be
-  stated in the engine's release documentation and guarded by regression tests where possible.
-- **A minor/major bump (`1.x.0`) is a rollout event.** All clients become obsolete at that
-  moment, by design: mixed engine minors would silently corrupt result comparability. On its next
+- **Compatibility is decided by `behaviorVersion`, never by the release version.** Any release
+  that carries the pinned behavior version is acceptable; the client and the validator both
+  compare the behavior version reported by the running server against the pin. A GUI-only
+  release 1.2.0 with unchanged `behaviorVersion 7` causes no rollout, no client obsolescence,
+  and no epoch reset. `release`/`clientImage` in the pin are convenience ("which build to
+  install"), not the compatibility contract.
+- **A `behaviorVersion` bump is the rollout event.** All clients become obsolete at that moment,
+  by design: mixed behavior versions would silently corrupt result comparability. On its next
   sync the client sees the new pin, refuses ranked mode, and prints exactly how to upgrade.
-  Results produced on the old version are rejected by the validator (and the client will not
-  submit them). Whether a minor bump also resets or partitions the rankings is handled in the
-  aggregation document (result epochs).
-- Upgrading must be **one step**: `docker pull ghcr.io/<org>/rumble-client:1.2` for container
+  Results produced on the old behavior version are rejected by the validator (and the client
+  will not submit them). Each behavior version is its own result **epoch** (aggregation
+  document).
+- **Bump discipline is guarded, not trusted.** The engine's CI replays recorded battles
+  deterministically and compares outcomes: an unintended outcome difference fails the build, and
+  an intended one requires bumping `behaviorVersion` in the same change (umbrella open question
+  on the concrete test design).
+- Upgrading must be **one step**: `docker pull` the image named in the new pin for container
   users, or re-running the platform install script for bare-metal users.
 
 ## Runtimes: the Client Container and Install Scripts
@@ -268,7 +283,9 @@ participation killer, so the primary distribution is a container image:
 
 - **`rumble-client` image**: bundles the pinned server, booter, runner, the rumble client itself,
   plus the exact runtime versions (JRE, .NET SDK, Python, Node.js/npm) matching the engine pin.
-  Tagged by engine version (`rumble-client:1.1`), so upgrading engine and runtimes is one pull.
+  Tagged by release version (`rumble-client:1.1.4`, the image named in `engine.json`), so
+  upgrading engine and runtimes is one pull; the pinned image implies the pinned
+  `behaviorVersion`.
 - The image doubles as the **sandbox**: run with no outbound network (localhost WebSocket only)
   except the submission endpoint, and CPU/memory/time limits via container flags. Review reduces
   malice (submission document), the container contains it; no one pretends there is a central
@@ -294,9 +311,10 @@ participation killer, so the primary distribution is a container image:
 2. **`myBots` is purely a local scheduling hint.** No verification in the client. Ownership is
    verified where it matters: at bot upload time, where the bot name is bound to the owner
    account (see the submission document). The server-side self-report marker handles trust.
-3. **Journal staleness is bounded by the engine pin.** Queued results produced on an engine
-   version older than the current pin are incompatible and are dropped (with a message telling
-   the user what was discarded and why). No separate staleness clock is needed.
+3. **Journal staleness is bounded by the engine pin.** Queued results produced on a
+   `behaviorVersion` other than the currently pinned one are incompatible and are dropped (with
+   a message telling the user what was discarded and why). No separate staleness clock is
+   needed.
 4. **Submission happens at battle boundaries.** A result exists only when a battle has completed
    its game type's full round count (e.g. 35 rounds); nothing is ever submitted mid-battle. The
    default is to submit after each completed battle; the journal batches multiple battles into