docs(design): Composed-1 M5 — Jepsen route-shuffle workload proposal by bootjp · Pull Request #905 · bootjp/elastickv

bootjp · 2026-06-02T04:21:07Z

Summary

Design doc for Composed-1 M5 — Jepsen route-shuffle workload, the integration-level proof from the parent doc's milestone table.
Doc only — no implementation in this PR. Per CLAUDE.md design-doc-first workflow.
M5 itself is two phases: M5a (CLI + workload setup, mergeable on its own), M5b (route-shuffle nemesis + cadence tuning).

Companion to

Parent: docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md
M1+M2+M3+M4 implementation: kv(composed1): M4 — Dispatch retry on ErrComposed1Violation / ErrComposed1VersionGCd #900 (currently awaiting human merge after 9 codex review rounds)

Forward-looking posture (matches parent)

Today's SplitRange is same-group only (per CLAUDE.md and adapter/distribution_server.go), so the Composed-1 hazard the M3/M4 guard catches cannot yet be triggered in production. M5 ships the scaffolding so:

The current SplitRange is exercised under realistic concurrent multi-shard write load and proved non-regressing — workload finds zero G1c, which is the baseline M4 contract.
When a future PR introduces a route-mutating RPC that DOES shift ownership across groups (cross-group SplitRange, MoveRange, online rebalancer), the M5 workload — with a one-line nemesis swap — becomes the integration-level proof that M3+M4 hold under cross-group churn.

Open questions for review

The doc tracks four open questions in §5. The two that benefit most from reviewer input:

OQ-2 — Ship M5a + M5b in a single PR or two? (Two if M5a's CLI work runs ≥150 lines; one if M5a fits in a single screen.)
OQ-4 — Lifecycle of the parent doc post-kv(composed1): M4 — Dispatch retry on ErrComposed1Violation / ErrComposed1VersionGCd #900 merge: rename *_proposed_*.md → *_partial_*.md (since only M5 remains)? Tentative yes.

Test plan

gofmt / lint clean (doc-only, no Go changes)
Reviewer: confirm M5a + M5b phasing matches existing repo conventions
Reviewer: confirm cmd/elastickv-split as a standalone binary is preferable to a cmd/elastickv-admin subcommand (the doc tentatively chooses standalone; see §3.1 "Alternative considered")

Summary by CodeRabbit

Documentation
- Added design doc for the Composed-1 M5 milestone introducing a DynamoDB route-shuffle testing workload.
- Describes multi-shard/multi-group transaction coverage, setup verification, pass/fail validation criteria (including rejection of specific anomalies), default shuffle cadence, milestone breakdown, and remaining open questions.

Companion to 2026_05_29_proposed_composed1_cross_group_commit_guard.md (parent — proposes the cross-group commit-time ownership guard M1+M2+M3+M4 implemented in PR #900). M5 is the integration-level proof from the parent doc's milestone table. This proposal elaborates the row into a full design: * cmd/elastickv-split — tiny CLI that invokes SplitRange once by route ID + split key + expected version. Lets the Jepsen nemesis shell out instead of re-implementing the gRPC client in Clojure. * jepsen/src/elastickv/composed1_nemesis.clj — route-shuffle nemesis that periodically splits an existing route. Composes with the existing partition+kill nemesis package. * dynamodb-append-workload setup hook — issues one initial SplitRange so the workload spans >=2 shards from t=0, exercising the 2PC path (prewrite -> primary commit -> secondary commit + the new ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel). Forward-looking posture (same as parent): today's SplitRange is same-group only, so the Composed-1 hazard the M3/M4 guard catches cannot yet be *triggered* in production. M5 ships the scaffolding so: 1. The current SplitRange is exercised under realistic concurrent multi-shard write load and proved non-regressing (workload finds zero G1c — baseline M4 contract). 2. When a future PR introduces a route-mutating RPC that DOES shift ownership across groups, the M5 workload — with a one-line nemesis swap — becomes the integration-level proof that M3+M4 hold under cross-group churn. Two-phase milestone breakdown: * M5a: CLI + workload setup (mergeable on its own; trivially finds zero G1c without the nemesis). * M5b: Route-shuffle nemesis itself + cadence-tuning analysis. Open questions tracked in section 5; lifecycle questions about renaming the parent doc are in OQ-4.

coderabbitai · 2026-06-02T04:21:14Z

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: afa5f861-5d94-4d55-8652-e03b96c682d9

📥 Commits

Reviewing files that changed from the base of the PR and between f92a029 and 58d6ac5.

📒 Files selected for processing (1)

docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md

📝 Walkthrough

Walkthrough

Design doc proposing Composed-1 M5: a Jepsen route-shuffle nemesis with a single-call SplitRange CLI, split-key encoding and catalog-version drift handling, a multi-table workload and multi-group local launch, Elle pass criteria rejecting G1c, cadence defaults, milestones, open questions, and a self-review checklist.

Changes

Route-shuffle workload design

Layer / File(s)	Summary
Proposal framing and goals `docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md` (lines 1–95)	Document header, M5 goals and non-goals, and rationale framing M5 as an integration-level non-regression check after M1–M4.
CLI, nemesis, and split-key handling `docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md` (lines 100–266)	Specifies `cmd/elastickv-split` CLI (single SplitRange RPC, non-zero exit on errors), `route-shuffle-nemesis` lifecycle and periodic ListRoutes→choose route→invoke split flow, split-key encoding helper (base64 URL-safe without padding), and expected_catalog_version drift detection per tick.
Multi-table workload and multi-group launch `docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md` (lines 267–333)	Adds `dynamodb-append-multi-table-workload` variant and local Jepsen runner updates (`--raftGroups`, `--shardRanges`) to start multi-Raft-group routing with static table-route key boundaries; explains nemesis usefulness even if SplitRange stays same-group-only.
Verification and cadence `docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md` (lines 334–376)	Elle pass condition requires `:valid?` and explicitly rejects `:G1c` and `:G-single`; corrects prior anomaly-map filtering mistake. Default shuffle cadence is 30s and composes with partition/kill nemesis via interval scheduling.
Milestones, open questions, checklist `docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md` (lines 379–476)	Breaks work into M5a (CLI, workload, multi-group launch, setup verification) and M5b (nemesis wiring, expected zero G1c after shuffles). Lists open questions and a five-point self-review checklist covering data loss, concurrency, perf, consistency, and tests.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A shuffle through routes, a jepsen quest,
CLI hops, split keys put to the test.
Multi-table hops and groups in a dance,
Catalogs steady, give consistency a chance.
🥕✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: a design document proposing the Composed-1 M5 Jepsen route-shuffle workload. It matches the file added and the PR's primary objective.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/composed1-m5-design

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request proposes a design document for the Composed-1 M5 Jepsen route-shuffle workload, detailing the CLI helper, route-shuffle nemesis, and multi-shard testing strategy. The review feedback highlights three critical design issues: a lexicographical split key issue that would prevent workload keys from being split across shards, a route ID resolution problem where the nemesis needs to dynamically query active routes before splits, and a concurrency issue where the initial split in the Jepsen setup hook must be gated to run only on the bootstrap node.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T04:22:26Z

+**Split key picking strategy.** A simple monotonically-increasing
+counter: every `:start` invocation appends a fresh integer
+suffix to a fixed key prefix the workload reserves. This avoids
+collisions with the workload's keyspace and guarantees the
+split always picks a key that's between existing keys (so the
+operation succeeds against a real catalog).


Lexicographical Shard Split Issue

In the proposed split key picking strategy, using a prefix like /split/ (e.g., /split/1, /split/2) will cause all split keys to be lexicographically smaller than the workload's keyspace (which consists of stringified integers like "0", "1", "2", etc., because / [ASCII 47] is less than 0 [ASCII 48]).

As a result, all workload keys will always fall into the rightmost shard (e.g., [/split/N, +inf)), meaning the workload keys themselves are never split across shards. This completely defeats Goal G2 of exercising the 2PC multi-shard transaction path.

Recommendation:
Choose split keys from within the active workload keyspace (e.g., by selecting values that interleave with the workload's keys) to ensure the workload's keyspace is successfully divided across multiple shards.

gemini-code-assist · 2026-06-02T04:22:26Z

+**Expected version.** The nemesis calls `ListRoutes` once at
+setup to learn the current catalog version, then increments its
+local copy by 1 after each successful split. Catalog drift
+(another split landing concurrently) is rare in practice — if it
+happens, the nemesis logs and refreshes from `ListRoutes`.


Route ID Resolution for SplitRange

The design states that the nemesis will call ListRoutes once at setup to learn the current catalog version and then increment its local copy. However, because splits continuously modify the routing table (deleting parent routes and creating new child routes with new IDs), the nemesis cannot rely on a static route_id or just the initial ListRoutes call to perform subsequent splits.

To invoke SplitRange successfully, the nemesis must provide the correct route_id that currently covers the chosen split_key.

Recommendation:
The nemesis must query ListRoutes (or dynamically track the active routes and their boundaries) before each split to resolve the correct route_id for the target split_key.

gemini-code-assist · 2026-06-02T04:22:26Z

+The setup hook (`db/setup!` in Jepsen parlance, or the test's
+`:setup` map) runs `elastickv-split` once with a split key in
+the middle of the workload's keyspace.


Gating of Initial Split in Setup Hook

If the initial split is executed within the Jepsen db/setup! hook, keep in mind that db/setup! is executed on every node in the cluster.

Recommendation:
To prevent concurrent split attempts and catalog version conflicts during cluster bootstrap, the invocation of elastickv-split must be gated to run only on the primary/bootstrap node (e.g., using (when (= node (first (:nodes test))) ...)), or run as a single test-level setup step.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ffb9c73fc7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T04:23:18Z

+**Choose A.** Less invasive to the workload, and the
+route-shuffle nemesis itself increases the shard count over
+time, giving organic multi-shard coverage.
+
+The setup hook (`db/setup!` in Jepsen parlance, or the test's
+`:setup` map) runs `elastickv-split` once with a split key in
+the middle of the workload's keyspace.


Don’t rely on item-key splits to shard DynamoDB txns

This setup will not make the current DynamoDB append workload span shards: the router normalizes all DynamoDB table metadata, item, and GSI keys to a single per-table route key (!ddb|route|table|<tableSegment> in kv/shard_key.go:94-124), so every jepsen_append item resolves to the same catalog point regardless of its partition key. With Option A, splitting “the middle of the workload's keyspace” can increase the route count but TransactWriteItems over multiple keys in that one table will still be grouped to one shard, so the run can pass without exercising dispatchMultiShardTxn/secondary commits as G2 requires; the workload needs distinct table route keys or a routing change instead.

Useful? React with 👍 / 👎.

) Codex P1 + 3 gemini medium findings on the original PR #905 revision (ffb9c73). All addressed by revising sections 3.2, 3.3, and 4 (milestone breakdown) and adding OQ-5 / OQ-6: * codex P1 — "Don't rely on item-key splits to shard DynamoDB txns." Verified against kv/shard_key.go:94-124: every DynamoDB table-metadata, item, and GSI key normalises to a SINGLE per- table route key (!ddb|route|table|<tableSegment>). Splitting inside a single-table workload's item keyspace cannot put two items on different shards, so the 2PC path (dispatchMultiShardTxn, secondary commits, ErrTxnSecondaryRouteShiftedAfterPrimaryCommit) would never fire — invalidating G2. Fix: replace single-key-split (Option A) with a NEW workload variant dynamodb-append-multi-table-workload that creates N=4 tables (jepsen_append_t1 … jepsen_append_t4) and writes to >=2 distinct tables per TransactWriteItems. The router maps each table to its own route key, so cross-table txns naturally fan out across shards. The setup hook splits the table-route keyspace at !ddb|route|table|jepsen_append_t2. * gemini medium R1 — "Lexicographical Shard Split Issue." The prior /split/<int> split-key prefix was lexicographically smaller than the workload's keyspace ("/" < "0" in ASCII), so every workload key ended up on the rightmost shard and G2 was never exercised. Fix: anchor split keys to the table-route prefix !ddb|route|table|... so the split lands INSIDE the active workload route range. * gemini medium R2 — "Route ID Resolution for SplitRange." Successful SplitRange deletes the parent route ID and creates two child IDs, so a cached ID from a one-time setup-time ListRoutes call is stale on the next shuffle. Fix: nemesis re-queries ListRoutes on every :start invocation, walks the snapshot to find the route covering the chosen split key, and uses that route's ID + snapshot.version as expected_catalog_version. Catalog drift surfaces as ErrCatalogVersionMismatch from the server and the nemesis refreshes on the next tick. * gemini medium R3 — "Gating of Initial Split in Setup Hook." Jepsen db/setup! runs on EVERY node; an ungated initial split would be attempted concurrently by all nodes. Fix: gate the setup-time split on (when (= node (first (:nodes test))) ...) so only the first node attempts it. Also: * Updated §4 milestone table: M5a now ships the new workload variant (not just a setup hook), so it is meaningfully bigger than the original §4 row suggested. * Added OQ-5 (is N=4 the right default?) and OQ-6 (first-node gate semantics) as follow-ups for implementation time. * Resolved OQ-4: PR #900 has merged, so the parent doc rename *_proposed_*.md → *_partial_*.md should now land as a separate small doc-only PR.

bootjp · 2026-06-02T07:15:33Z

Pushed as f5d2ad7 — all four findings on ffb9c73 addressed.

Codex P1 — correctly identified the core flaw. Verified against kv/shard_key.go:94-124: every DynamoDB table-metadata, item, and GSI key normalises to a SINGLE per-table route key (!ddb|route|table|<tableSegment>). Splitting inside a single-table workload's item keyspace cannot put two items on different shards, so the 2PC path (dispatchMultiShardTxn, secondary commits, the new ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel) would never fire — invalidating G2.

Fix: §3.3 rewritten. New workload variant dynamodb-append-multi-table-workload creates N=4 tables (jepsen_append_t1 … jepsen_append_t4) and writes to ≥2 distinct tables per TransactWriteItems. The router maps each table to its own route key, so cross-table txns naturally fan out across shards. The setup hook splits at !ddb|route|table|jepsen_append_t2 so tables 1 lives on one shard, tables 2-4 on another from t=0.

Gemini medium R1 (Lexicographical split) — fixed by anchoring split keys to the table-route prefix !ddb|route|table|... so they land INSIDE the active workload route range. The prior /split/<int> was lexicographically smaller than the workload's keyspace (/ < 0), so every workload key ended up on the rightmost shard.

Gemini medium R2 (Route ID resolution) — fixed by having the nemesis re-query ListRoutes on every :start invocation, walk the snapshot to find the route covering the chosen split key, and use that route's ID + snapshot.version as expected_catalog_version. Catalog drift surfaces as ErrCatalogVersionMismatch and the nemesis refreshes on the next tick. A cached route ID was indeed always going to be stale — every successful split deletes the parent and creates two fresh children.

Gemini medium R3 (Setup hook gating) — fixed by gating on (when (= node (first (:nodes test))) ...) so only the first node attempts the initial split. Jepsen's db/setup! runs on every node by default.

Also:

§4 milestone table updated — M5a now ships the new workload variant (not just a setup hook), so it's meaningfully bigger than the original row suggested.
Added OQ-5 (is N=4 the right default?) and OQ-6 (first-node gate semantics) as implementation-time follow-ups.
Resolved OQ-4: PR kv(composed1): M4 — Dispatch retry on ErrComposed1Violation / ErrComposed1VersionGCd #900 has merged, so the parent doc rename *_proposed_*.md → *_partial_*.md should now land as a separate small doc-only PR.

@claude review

claude · 2026-06-02T07:15:49Z

Claude finished @bootjp's task in 3m 47s —— View job

Review of PR #905 — `f5d2ad7`

Read the updated design document
Verify Codex P1 fix (multi-table workload for 2PC path)
Verify Gemini R1 fix (lexicographical split key)
Verify Gemini R2 fix (route ID resolution on each nemesis invocation)
Verify Gemini R3 fix (setup hook gating to single node)
General review pass for correctness, consistency, and open issues

Verification of the four prior findings

Codex P1 — confirmed fixed. Verified against kv/shard_key.go:94-124: dynamoRouteKey() normalises every DynamoDB table-meta, item, and GSI key to !ddb|route|table|<tableSegment>, so a single-table workload's entire keyspace maps to one route key regardless of partition-key value. The revised §3.3 strategy — N=4 tables, each TransactWriteItems writes to ≥2 distinct tables — is technically sound. Each table gets its own route key, so after the initial split at !ddb|route|table|jepsen_append_t2 the cross-table txn will fan out through dispatchMultiShardTxn (kv/sharded_coordinator.go:851) and commitSecondaryTxns (:1049), which is exactly what G2 requires. The ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel at :1101 will also be exercisable under concurrent shuffles.

Gemini R2 — confirmed fixed. The nemesis description in §3.2 correctly re-queries ListRoutes on every :start invocation, walks the snapshot, uses the live route_id + snapshot.version as expected_catalog_version. Confirmed ErrCatalogVersionMismatch is the right sentinel (distribution/catalog.go:32, distribution/catalog.go:567,632). The stale-ID retry-on-next-tick pattern matches what SplitRange in adapter/distribution_server.go:152-158 actually returns.

Gemini R3 — confirmed fixed. The (when (= node (first (:nodes test))) ...) gate is standard Jepsen idiom and correctly prevents concurrent initial splits. OQ-6 acknowledging (first (:nodes test)) semantics is the right move.

New findings

P2 — Success criterion in §3.4 will always pass (incorrect Elle result structure)

The proposed check:

(zero? (count (filter #(= (:type %) :G1c) (:anomalies results))))

assumes (:anomalies results) is a sequence of maps with a :type key. But elle/jepsen.tests.cycle.append returns :anomalies as a map from keyword → seq (e.g., {:G1c [...], :G-single [...]}). Iterating a Clojure map with filter yields [key val] vectors; none of them have a :type key, so (= (:type %) :G1c) is always false, and the count is always zero. This check will pass even on a G1c run.

The existing workload (dynamodb_workload.clj:316) uses :consistency-models [:strict-serializable] inside append/test, which is where the real check lives. The explicit Clojure check in §3.4 should either be removed (trust (:valid? results)) or corrected:

;; correct form
(and (:valid? results)
     (nil? (get (:anomalies results) :G1c))
     (nil? (get (:anomalies results) :G-single)))

Fix this →

P2 — Lexicographic direction in the Gemini R1 explanation is inverted (§3.2)

The doc explains the old /split/<int> proposal by saying "that was lexicographically smaller than the workload's keyspace (/ < 0 in ASCII)". The DynamoDB table-route keyspace starts with ! (ASCII 33); / is ASCII 47. So /split/<int> is lexicographically larger than !ddb|route|table|* — workload route keys would land on the left shard, not the rightmost. The fix (anchoring to !ddb|route|table|jepsen_append_t2) is absolutely correct, but the explanation has the direction inverted. This is only a doc-clarity issue, but a reviewer following the reasoning could get confused.

Fix this →

P3 — NG3 contradicts the new multi-table variant introduced in §3.3

Non-goal NG3 states: "No new Jepsen workload primitives … The existing dynamodb-append-workload is the right surface." §3.3 then introduces dynamodb-append-multi-table-workload as a new workload variant. The non-goal should be tightened to something like: "No new operation types or generators beyond the multi-table variant and route-shuffle nemesis." Otherwise a reviewer reading §1.2 before §3.3 will think the multi-table variant contradicts a stated non-goal.

P3 — Setup hook route ID resolution not described

§3.3 says the setup hook issues the initial split, but doesn't explain how it resolves the route ID needed for the elastickv-split call. The nemesis does this via ListRoutes; the same pattern is needed in db/setup!. Worth adding one sentence: "The hook calls ListRoutes first to obtain the initial route ID and catalog version before invoking elastickv-split." Avoids an implementer having to re-derive this.

Observation — OQ-4 wording vs. git log

The trigger comment says "PR #900 has merged" and OQ-4 is marked resolved accordingly. However the doc's §5 OQ-4 text says the rename should be "a separate small doc-only PR." That's the right call: the parent-doc rename from *_proposed_* → *_partial_* and the eventual *_implemented_* transition should stay out of this PR per CLAUDE.md's lifecycle convention. No action needed here, just confirming the resolution is correct.

Summary

The four prior reviewer findings have been addressed correctly and the core technical direction is sound. The P2 success-criterion bug (§3.4) is the most important remaining issue — as written, the explicit G1c guard will always silently pass. The lexicographic-direction explanation inversion (§3.2) is a documentation correctness issue. Both are fixable with targeted line edits to the doc before this merges.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5d2ad7ac7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T07:18:06Z

+(`!ddb|route|table|<tableSegment>` — see `kv/shard_key.go:94-124`).
+Concretely, with N tables `jepsen_append_t1` …
+`jepsen_append_tN` per §3.3, the route key for table `tK` is
+`!ddb|route|table|jepsen_append_tK`. Splits happen between


Encode table segments before choosing split keys

The proposed split key uses the raw table name, but DynamoDB route keys are built from encodeDynamoSegment(tableName) (base64 RawURLEncoding in adapter/dynamodb.go) and kv.dynamoRouteTableKey appends that encoded segment. For tables like jepsen_append_t1, the real route keys start with !ddb|route|table|am..., so splitting at !ddb|route|table|jepsen_append_tK sorts after all of the workload's table-route keys and leaves every table on the same side; the route-shuffle workload would still miss the intended 2PC path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-02T07:18:06Z

+keys. The setup hook splits the table-route keyspace at
+`!ddb|route|table|jepsen_append_t2` so tables 1 lives on one
+shard and tables 2–4 on another from t=0.


Seed cross-group routes before claiming 2PC coverage

Fresh evidence is that the revised setup hook now relies on SplitRange at the table-route boundary, but the current SplitRange only creates child routes with the parent's GroupID, while dispatchTxn enters dispatchMultiShardTxn only when mutations/read keys group to more than one Raft group. With just this initial split, tables 1 and 2–4 become separate ranges in the same group and the workload can satisfy the documented setup while still taking the single-shard transaction path; M5a needs an initial --shardRanges/routing plan that puts the table route keys on distinct groups (or a future cross-group mover) before using it as 2PC evidence.

Useful? React with 👍 / 👎.

…#905) Six findings on f5d2ad7 — two codex P1, two claude P2, two claude P3. All addressed: * codex P1 #1 (encode table segments before choosing split keys). My split key used the raw table name; the actual table-route key uses dynamoRouteTableKey(encodeDynamoSegment( tableName)) — base64 RawURLEncoding (adapter/dynamodb.go: 8337-8365). Split at the raw name sorts after every base64-encoded segment and leaves every table on one side of the split. Fix: §3.2 now specifies the Clojure helper encode-dynamo-segment that mirrors the Go encoding, and the split key is constructed as `(str "!ddb|route|table|" (encode-dynamo-segment "jepsen_append_t2"))`. * codex P1 #2 (seed cross-group routes before claiming 2PC coverage). Today's SplitRange is same-group only — both children inherit the parent's GroupID. dispatchTxn enters dispatchMultiShardTxn only when mutations group to ≥2 Raft GROUPS, not ≥2 routes. An initial SplitRange in the setup hook would leave tables 1 and 2-4 in the same group and the workload would still take the single-shard path. Fix: §3.3 rewritten substantially. Multi-group cluster startup is now required — M5a extends scripts/run-jepsen-local.sh to pass --shardRanges so the cluster launches with ≥2 Raft groups, table-route keys for t1..t4 statically split across groups (e.g. tables 1-2 → group 1, tables 3-4 → group 2). The Jepsen db/setup! hook no longer issues SplitRange — it calls ListRoutes (gated to first node) and VERIFIES the multi-group routing is in place, failing fast if not. Also added §3.3's "Why the nemesis is still useful" paragraph explaining that nemesis-driven SplitRange (same-group) churns catalog versions without moving ownership, exercising M3 + M4 drift detection — a non-regression check. When a future cross-group MoveRange lands, swapping it into the nemesis upgrades the workload from non-regression to real Composed-1 anomaly hunting. * claude P2 #1 (success criterion in §3.4 will always pass). My Clojure snippet used `(filter #(= (:type %) :G1c) (:anomalies results))` but :anomalies is a map (keyword → seq), not a flat list of maps with :type. Iterating a map yields [key val] vectors with no :type field — the filter always returns empty, the check always passes silently. Fix: §3.4 corrected to `(nil? (get (:anomalies results) :G1c))`. Added doc note about the prior broken form so future readers understand the trap. * claude P2 #2 (lexicographic direction inverted in §3.2). My old explanation said `/split/<int>` was smaller than workload keys, but `!` is ASCII 33 and `/` is 47, so `/split/<int>` is LARGER than `!ddb|route|table|*`. Fix: §3.2 corrected — workload keys would land on the LEFT shard, not the rightmost. The fix (anchoring to the table-route prefix) is correct in either direction. * claude P3 #1 (NG3 contradicts the new multi-table variant). Fix: NG3 tightened to "no new operation types or generators BEYOND the multi-table workload variant and the route- shuffle nemesis specified in §3." * claude P3 #2 (setup hook route ID resolution not described). Resolved differently from claude's suggestion: §3.3's revision means the setup hook no longer needs route ID resolution at all (no mutation), just a ListRoutes verification. * Added OQ-7 (codex P1 #1 follow-up) for the --shardRanges boundary key encoding helper.

chatgpt-codex-connector · 2026-06-02T08:33:44Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

bootjp · 2026-06-02T08:34:19Z

Pushed as 2481286 — all six findings on f5d2ad7 addressed (2 codex P1, 2 claude P2, 2 claude P3).

codex P1 #1 — encode table segments before split keys. My split key used the raw table name; actual route key uses dynamoRouteTableKey(encodeDynamoSegment(tableName)) — base64 RawURLEncoding (adapter/dynamodb.go:8337-8365). Split at raw name sorts after all encoded segments → all tables on one side. Fix: §3.2 now specifies a Clojure encode-dynamo-segment helper mirroring the Go encoding.

codex P1 #2 — same-group SplitRange can't seed cross-group routes. Current SplitRange only creates children with the parent's GroupID. dispatchTxn enters dispatchMultiShardTxn only when mutations group to ≥2 Raft groups, not ≥2 routes. Setup-hook SplitRange would leave tables in the same group → single-shard path → 2PC never fires. Fix: §3.3 substantially rewritten. M5a now requires multi-group cluster startup — extend scripts/run-jepsen-local.sh to pass --shardRanges so cluster launches with ≥2 Raft groups, table-route keys for t1..t4 statically split across groups. Jepsen db/setup! no longer issues SplitRange — calls ListRoutes (gated to first node) and verifies the multi-group routing is in place, failing fast if not. Added a "Why the nemesis is still useful" paragraph: same-group SplitRange churns catalog versions/route IDs, exercising M3+M4 drift detection — a non-regression check, not a real anomaly hunt until cross-group routing lands.

claude[bot] P2 #1 — success criterion would always pass. My (filter #(= (:type %) :G1c) (:anomalies results)) was wrong: :anomalies is a map (keyword → seq), iterating yields [k v] vectors with no :type. Fix: §3.4 uses (nil? (get (:anomalies results) :G1c)).

claude[bot] P2 #2 — lexicographic direction inverted. ! is ASCII 33, / is 47 → /split/ is LARGER than !ddb|.... Workload keys would land on LEFT shard, not rightmost. Fix: §3.2 explanation corrected.

claude[bot] P3 #1 — NG3 contradicts §3.3. Tightened: "no new operation types or generators beyond the multi-table variant and the route-shuffle nemesis."

claude[bot] P3 #2 — setup hook route ID resolution. Resolved differently: §3.3's rewrite means the hook no longer mutates routes (only verifies), so no route-ID lookup needed for mutation. Just ListRoutes for verification.

Added OQ-7 for the --shardRanges boundary key encoding helper (tentative: small Go binary cmd/elastickv-route-key so the encoding stays in Go).

@claude review

claude · 2026-06-02T08:34:37Z

Claude finished @bootjp's task in 4m 53s —— View job

Review of PR #905 — `2481286`

Verification of the six prior findings

Codex P1 #1 — confirmed fixed. adapter/dynamodb.go:8441-8442 confirms encodeDynamoSegment is exactly base64.RawURLEncoding.EncodeToString. The doc correctly calls out the encoding at kv/shard_key.go:117-124 and proposes a Clojure encode-dynamo-segment helper.

Codex P1 #2 — confirmed fixed. sharded_coordinator.go:821-843 confirms groupMutations groups by Raft group ID via c.router.ResolveGroup(mut.Key). The fast path at line 835 fires when len(gids) == 1 — single group. dispatchMultiShardTxn fires only when len(gids) > 1. distribution_server.go:390,398 confirm SplitRange sets both children's GroupID to parent.GroupID, so same-group splits leave cross-table txns on one group and never trigger 2PC. The revised §3.3 strategy (multi-group cluster startup) is the correct fix.

Claude P2 #1 — confirmed fixed. §3.4 uses (nil? (get (:anomalies results) :G1c)) — correct map lookup. The explanation of the prior bug is accurate.

Claude P2 #2 — confirmed fixed. §3.2 now correctly states / (ASCII 47) > ! (ASCII 33), so /split/<int> is lexicographically larger than !ddb|route|table|*, meaning workload keys would land on the left shard.

Claude P3 #1 — confirmed fixed. NG3 is now tightened to "beyond the new dynamodb-append-multi-table-workload variant … and the new route-shuffle-nemesis."

Claude P3 #2 — confirmed fixed. Setup hook is now read-only (only calls ListRoutes and verifies), so no route-ID resolution is needed for mutation. Clean resolution.

New findings

P2 — "only the `--shardRanges` argument is new" understates M5a's launch script changes (§3.3)

The doc says: "The DDB leader address per-group is wired via --raftDynamoMap (already supported by the runner — only the --shardRanges argument is new)."

This is incorrect. shard_config.go:61-99 shows that parseRaftGroups returns a single default group {id:1, address:defaultAddr} when --raftGroups is empty. Without --raftGroups, all --shardRanges entries still fall into group 1 regardless of the groupID specified, and groupMutations returns gids = [1] for all keys — single-shard path, 2PC never fires.

For ≥2 Raft groups, M5a must also:

Pass --raftGroups "1=127.0.0.1:50051,2=127.0.0.1:50054" (or equivalent)
Decide the cluster topology — today's run-jepsen-local.sh has 3 nodes all in one group. Multi-group requires a topology change: either 6 nodes (3-node consensus per group) or 2 single-node groups for test speed (each with --raftBootstrap)
Update --raftBootstrapMembers per group (different members sets for group 1 vs group 2 nodes)
Expand --raftDynamoMap to cover both groups' leader addresses
Update port allocations (currently 5005{1,2,3}, 6380{1,2,3})

This is substantially more than one new flag. M5a is already noted as "meaningfully bigger" in the §4 table — correcting the claim in §3.3 will avoid the implementer discovering this gap mid-PR.

Fix this →

P3 — Clojure `encode-dynamo-segment` implementation details need one more line (§3.2)

The doc says the helper "mirrors the Go encodeDynamoSegment exactly." Go uses base64.RawURLEncoding = URL-safe charset (-/_) without = padding. In Clojure/Java, three Base64 variants exist:

Base64/getEncoder → standard (+//, with padding) — wrong
Base64/getUrlEncoder → URL-safe (-/_, with padding) — wrong (padding changes the string)
Base64/getUrlEncoder + .withoutPadding → URL-safe, no padding — correct

Since OQ-7's Go helper (cmd/elastickv-route-key) mitigates this risk for the launch-script boundary keys, the remaining exposure is M5b's Clojure nemesis. Adding one line — e.g. ; use (.withoutPadding (java.util.Base64/getUrlEncoder)) — removes a silent mis-routing footgun at implementation time. The failure mode is non-obvious: a wrong encoding produces split keys that fall outside every route's range, ListRoutes returns no matching route, the nemesis logs and skips every :start, and the run appears clean while the nemesis is a no-op.

Observation — OQ-4 resolution and parent doc rename

OQ-4 is correctly marked resolved: the parent doc rename (*_proposed_* → *_partial_*) should be a separate PR now that #900 is merged. Confirmed the parent doc (2026_05_29_proposed_composed1_cross_group_commit_guard.md) is not touched in this PR's diff — the separation is correct per CLAUDE.md lifecycle convention.

Summary

All six prior findings are correctly addressed. Two new issues: the P2 claim that only --shardRanges is new in the launch script is the more important one — it will mislead the M5a implementer into thinking the cluster topology and --raftGroups are already handled. The P3 encoding callout is a small but high-value one-liner to add. Both are doc-only fixes.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md`:
- Around line 17-21: The blockquote lines around the SplitRange discussion use
multiple spaces after the '>' which violates markdownlint MD027; edit the two
affected quote lines (the ones starting with the blockquote entries for the list
items referencing SplitRange) to use a single space after '>' (e.g., change '>  
1.' and '>   2.' to '> 1.' and '> 2.') so the markdown lists are normalized.
- Around line 104-110: The fenced code blocks in the document (for example the
block showing the elastickv-split command and the other blocks around lines with
route shuffle and Jepsen snippets) need explicit language identifiers to satisfy
MD040; update each triple-backtick fence that contains shell commands to use
```bash (e.g., the block containing the elastickv-split invocation) and use
appropriate languages like ```clojure for code snippets in the Jepsen examples
so all three blocks (the one with elastickv-split, the block at the later
route/jepsen example, and the block at lines ~280) include the correct language
tag.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 46930714-7b12-429d-84a9-4ea1cdbcd711

📥 Commits

Reviewing files that changed from the base of the PR and between f74453a and 2481286.

📒 Files selected for processing (1)

docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md

coderabbitai minor on 2481286: * MD027 (multiple-space-after-blockquote) at lines 17,21 in the upfront posture block. Trimmed double-spaces after `>` to a single space. * MD040 (fenced-code-language) at the three plain ``` blocks: - the elastickv-split CLI usage (```bash) - the route-shuffle-nemesis Clojure sketch (```clojure) - the M5 success-criterion Clojure snippet (```clojure) Doc-only, no semantic changes.

chatgpt-codex-connector · 2026-06-02T08:39:40Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

bootjp · 2026-06-02T08:39:40Z

Pushed as 673bee8 — coderabbitai minor lint findings on 2481286 addressed:

MD027 (multiple-space-after-blockquote) at lines 17, 21 — trimmed double-spaces after > to a single space in the upfront posture block.
MD040 (fenced-code-language) at three plain ``` blocks — added language tags: bash for the `elastickv-split` CLI usage, `clojure` for both the route-shuffle nemesis sketch and the success-criterion snippet.

Doc-only, no semantic changes.

@claude review

claude · 2026-06-02T08:39:56Z

Claude finished @bootjp's task in 3m 4s —— View job

Review of PR #905 — `673bee8`

Read current state of design document
Verify lint fixes (MD027 + MD040) are correct
Verify all previously flagged issues from f5d2ad7 are addressed in 2481286
Check for any new issues in the full document
Post review findings

Verification of lint fixes (`673bee8`)

MD027 — confirmed correct. Lines 17 and 21 changed from > 1. / > 2. (3 spaces) to > 1. / > 2. (1 space). The continuation lines on 23–26 (> MoveRange...) retain their deeper indentation, which markdownlint correctly does not flag for MD027 — the rule targets the space immediately after > on the first token of each blockquote line, and continuation indentation for list items is exempt.

MD040 — confirmed correct. All three previously-bare fenced code blocks now carry language identifiers:

Line 104: ```bash (elastickv-split CLI invocation) ✓
Line 133: ```clojure (route-shuffle-nemesis skeleton) ✓
Line 280: ```clojure (success-criterion snippet) ✓

No other code blocks in the document are missing language identifiers (the encode-dynamo-segment block added in 2481286 already carried ```clojure).

Verification of the six findings from `f5d2ad7` (addressed in `2481286`)

All six confirmed resolved:

Finding	Status
codex P1 #1 — split key must use `encodeDynamoSegment` (base64 RawURLEncoding)	✓ §3.2 specifies `encode-dynamo-segment` Clojure helper; example uses `(str "!ddb\|route\|table\|" (encode-dynamo-segment "jepsen_append_t2"))`
codex P1 #2 — same-group `SplitRange` can't trigger `dispatchMultiShardTxn`	✓ §3.3 rewrites to multi-group cluster startup via `--shardRanges`; setup hook does verification only
claude P2 #1 — Elle `:anomalies` is a map, not a seq of `:type`-keyed maps	✓ §3.4 now uses `(nil? (get (:anomalies results) :G1c))`
claude P2 #2 — lexicographic direction inverted (`/` is ASCII 47 > `!` ASCII 33)	✓ §3.2 correctly states `/split/<int>` lands workload keys on the left shard
claude P3 #1 — NG3 contradicted the new multi-table variant in §3.3	✓ NG3 now explicitly carves out the `dynamodb-append-multi-table-workload` variant and the route-shuffle nemesis
claude P3 #2 — setup hook route ID resolution not described	✓ Resolved by design: hook is now read-only verification (`ListRoutes` + assert), not mutation — no route ID needed

Remaining observation (minor, non-blocking)

Scope description for M5a is now understated. The sentence at the bottom of §4 — "M5a is a small, focused PR (Go CLI + Clojure setup hook + docs)" — doesn't reflect the updated M5a scope in the milestone table, which now includes: CLI + new multi-table workload + scripts/run-jepsen-local.sh multi-group launch extension + setup hook verification. "Small, focused PR" is accurate for the original scope; it's mildly misleading after the codex P1 #2 revision expanded M5a. Worth updating to something like "M5a is a single focused PR (Go CLI + new workload variant + multi-group script extension + setup-hook verification)" before implementation — avoids scope confusion when the PR lands.

This is the only remaining issue in the document; it's a doc-clarity note, not a correctness problem.

Summary

The lint cleanup (673bee8) is correct and complete. All six prior review findings are properly addressed in 2481286. The document is technically sound and consistent. Ready to merge.

…+ Base64 nuance) Two findings on 2481286 from claude[bot]'s post-revision review: * P2 — '"only --shardRanges is new" understates M5a's launch script changes' (§3.3 table cell). Confirmed against shard_config.go:61-99: without --raftGroups, every --shardRanges entry collapses into the default single group 1 and groupMutations returns gids = [1] for every key. Single-shard path, 2PC never fires. Fix: §3.3 table cell rewritten to list FIVE coordinated launch-script changes M5a must make: 1. --raftGroups declaring >=2 groups 2. --shardRanges with t1-t2 in group 1, t3-t4 in group 2 3. Cluster topology decision (tentative: 2 single-node groups for CI speed; scale to 6 nodes if needed) 4. Per-group --raftBootstrapMembers 5. Expanded --raftDynamoMap + matching port allocations Avoids the implementer discovering the topology gap mid-PR. * P3 — encode-dynamo-segment Clojure implementation needs Base64 nuance. Go uses base64.RawURLEncoding (URL-safe charset WITHOUT '=' padding). Java's three Base64 variants differ: - Base64/getEncoder (standard '+'/'/', with padding) — wrong - Base64/getUrlEncoder (URL-safe, WITH padding) — wrong - Base64/getUrlEncoder + .withoutPadding — correct Failure mode is silent: wrong encoding produces split keys outside every route's range; ListRoutes returns no match; nemesis logs and skips every :start; run appears clean while the nemesis is a no-op. Fix: §3.2 now calls out .withoutPadding explicitly and documents the silent-no-op failure mode.

bootjp · 2026-06-02T08:44:22Z

Pushed as 3ca2a7f — claude[bot] P2 + P3 on 2481286 addressed.

P2 — launch-script scope underestimated. Confirmed against shard_config.go:61-99: without --raftGroups, every --shardRanges entry collapses into the default single group 1 and groupMutations returns gids = [1] for every key. Single-shard path → 2PC never fires. My "only --shardRanges is new" was wrong.

Fix: §3.3 table cell rewritten to list five coordinated launch-script changes M5a must make:

--raftGroups declaring ≥2 groups
--shardRanges placing t1-t2 in group 1, t3-t4 in group 2
Cluster topology decision (tentative: 2 single-node groups for CI speed; scale to 6 nodes if needed)
Per-group --raftBootstrapMembers
Expanded --raftDynamoMap + matching port allocations

This documents the topology gap upfront so the M5a implementer doesn't discover it mid-PR.

P3 — Clojure Base64 nuance. Go uses base64.RawURLEncoding (URL-safe charset without = padding). Java's three Base64 variants differ:

Base64/getEncoder (standard +//, with padding) — wrong
Base64/getUrlEncoder (URL-safe, with padding) — wrong
Base64/getUrlEncoder + .withoutPadding — correct

Failure mode is silent: wrong encoding produces split keys outside every route's range; ListRoutes returns no match; nemesis logs and skips every :start; the run appears clean while the nemesis is a no-op.

Fix: §3.2 now calls out .withoutPadding explicitly and documents the silent-no-op failure mode.

@claude review

claude · 2026-06-02T08:44:37Z

Claude finished @bootjp's task in 4m 55s —— View job

Review of PR #905 — `3ca2a7f`

Read current design document (3ca2a7f)
Verify P2 fix (§3.3 — five coordinated launch-script changes)
Verify P3 fix (§3.2 — Clojure Base64 .withoutPadding + silent-no-op)
Cross-check --raftBootstrapMembers / multi-group compat in main.go
General review pass
Post findings

Verification of the two prior findings

Claude P2 (2481286) — confirmed fixed. §3.3 now enumerates five coordinated launch-script changes with specific flag names, a topology decision, and correct port reasoning. The previous "only --shardRanges is new" understated claim is gone.

Claude P3 (2481286) — confirmed fixed. §3.2 correctly calls out (.withoutPadding (java.util.Base64/getUrlEncoder)), names both wrong variants (Base64/getEncoder, default Base64/getUrlEncoder), and explains the silent-no-op failure mode (wrong encoding → no matching route → nemesis skips every :start → run appears clean). ✓

New finding

P2 — `--raftBootstrapMembers` is rejected by multi-group configs (§3.3 item 4)

main.go:739-740 has a hard guard:

if len(groups) != 1 {
    return nil, errors.WithStack(ErrBootstrapMembersRequireSingleGroup)
}

resolveBootstrapServers is called with the parsed --raftGroups slice. Any process launched with --raftGroups "1=...,2=..." has len(groups) == 2 and will fail at startup with "flag --raftBootstrapMembers requires exactly one raft group" — before any Raft work begins.

The current run-jepsen-local.sh works because it has a single group (3-node, len(groups) == 1) and can pass --raftBootstrapMembers legitimately. M5a must NOT carry this flag over to the multi-group topology.

For the tentative "2 single-node groups" case: each process hosts one group with one member. A single-member group doesn't need peer discovery at all — --raftBootstrap (the boolean flag at main.go:101) is both necessary and sufficient. --raftBootstrapMembers is the right tool only for a single-group, multi-node formation where you need to specify the initial peer set.

Item 4 of the five launch-script changes should be: "Each group node uses --raftBootstrap (boolean); --raftBootstrapMembers applies only to single-group, multi-node formations and must be removed from the multi-group launch." The current wording ("different members sets for each group's nodes") will lead the implementer to try --raftBootstrapMembers with --raftGroups "1=...,2=...", hit ErrBootstrapMembersRequireSingleGroup, and need to trace through main.go to understand why.

Fix this →

Remaining observation (from `673bee8` review, still unaddressed)

Line 340–341: "M5a is a small, focused PR (Go CLI + Clojure setup hook + docs)." The milestone table directly above (line 337) correctly lists the full expanded scope — CLI, new workload variant, multi-group script extension (5 changes), and setup-hook verification. The prose summary still reflects the pre-revision scope and contradicts the table. Low urgency, but it will confuse a reviewer who reads line 340 before the table. Updating to something like "M5a is a single, focused PR (Go CLI + multi-table workload variant + multi-group script extension + setup-hook verification)" would bring the two in sync.

Summary

Two prior findings (P2 launch-script scope and P3 Base64 .withoutPadding) are correctly addressed. One new P2: --raftBootstrapMembers errors out with ErrBootstrapMembersRequireSingleGroup on any multi-group process — item 4 of the launch-script changes is incorrect and should specify --raftBootstrap (boolean) instead. The stale "Go CLI + Clojure setup hook" prose at line 340 from the 673bee8 round is still present (doc-clarity only, non-blocking).

The 'M5a is a small, focused PR' line at the bottom of §4 was accurate for the original M5a scope (Go CLI + setup hook + docs) but became understated after the codex P1 #2 revision expanded M5a to include the multi-table workload variant and the five-change multi-group launch script. Updated to reflect actual scope: M5a now ships: * cmd/elastickv-split (Go CLI) * dynamodb-append-multi-table-workload (Clojure) * scripts/run-jepsen-local.sh multi-group launch extension (5 coordinated changes per §3.3 table) * setup-hook verification (read-only ListRoutes + assert) * docs Likely lands as a single PR for atomicity (workload variant only makes sense alongside multi-group launch + verification) but reviewable as four roughly independent diff sections.

bootjp · 2026-06-02T08:48:42Z

Pushed as 87dcfe3 — claude[bot]'s only remaining minor on 673bee8 addressed.

claude[bot] LGTM ("Ready to merge") on 673bee8 with one minor doc-clarity note: the bottom-of-§4 line "M5a is a small, focused PR (Go CLI + Clojure setup hook + docs)" was accurate for the original M5a scope but became understated after the codex P1 #2 revision expanded M5a to include the multi-table workload variant and the five-change multi-group launch script.

Fix: §4 trailer rewritten to reflect actual M5a scope:

cmd/elastickv-split (Go CLI)
dynamodb-append-multi-table-workload (Clojure)
scripts/run-jepsen-local.sh multi-group launch extension (5 coordinated changes per §3.3 table)
Setup-hook verification (read-only ListRoutes + assert)
Docs

Likely lands as a single PR for atomicity (workload variant only makes sense alongside multi-group launch + verification) but reviewable as four roughly independent diff sections.

claude[bot]'s review of 3ca2a7f (which carried the launch-script-scope + Base64-nuance fixes themselves) is still in flight at the time of this comment — I don't expect new findings there since the fixes followed claude[bot]'s own P2/P3 recommendations.

@claude review

claude · 2026-06-02T08:48:56Z

Claude finished @bootjp's task in 1m 29s —— View job

Review of PR #905 — `87dcfe3`

Load tools
Read current design document (87dcfe3)
Verify §4 scope fix (M5a description updated to reflect expanded scope)
General review pass for any remaining issues
Post findings

Verification of the §4 scope fix (`87dcfe3`)

Confirmed correct. The §4 trailer now reads:

"M5a is a single focused PR but no longer 'small' after the codex P1 #2 expansion: cmd/elastickv-split (Go CLI), the new dynamodb-append-multi-table-workload (Clojure), the scripts/run-jepsen-local.sh multi-group launch extension (five coordinated changes — see §3.3 table), the setup-hook verification (read-only ListRoutes + assertion), plus docs. Likely to land as a single PR for atomicity (the workload variant only makes sense alongside the multi-group launch and the verification hook), but reviewable as four roughly independent diff sections."

This matches the actual M5a scope derived from the §3.3 five-change table and is consistent with OQ-2's guidance to "decide at implementation time." The milestone table row (line 337) and the trailing paragraph now agree. ✓

General review pass

Area	Status
§3.1 CLI flags (`--route-id`, `--split-key`, `--expected-version`, `--address`)	✓ Coherent. Multi-group callers pick the target address after `ListRoutes`; the CLI doesn't need to know topology.
§3.2 `encode-dynamo-segment` + `.withoutPadding` + silent-no-op failure mode	✓ Correct and fully documented.
§3.2 Lexicographic direction (ASCII `/` 47 > `!` 33 → workload keys on left)	✓ Corrected in `2481286`, still correct.
§3.2 Route ID re-query on every `:start` invocation	✓ Pattern matches `ErrCatalogVersionMismatch` sentinel in `distribution/catalog.go:567,632`.
§3.3 Multi-table workload + multi-group cluster (five launch-script changes)	✓ Grounded in `shard_config.go:61-99` and `sharded_coordinator.go:821-843`.
§3.3 Setup hook read-only (`ListRoutes` + assertion, no mutation)	✓ Correct; first-node gate (`(when (= node (first (:nodes test))) …)`) is sound.
§3.4 Elle success criterion (`nil? (get (:anomalies results) :G1c)`)	✓ Correct map lookup; prior bug documented for posterity.
§3.5 Cadence (30s, composed with partition+kill at 40s)	✓ Reasonable defaults.
§4 Milestone table + trailing scope paragraph	✓ Consistent after `87dcfe3`.
§5 OQ-7 (`cmd/elastickv-route-key` for boundary key encoding)	✓ Appropriate deferral; the Go-helper approach is the right call.
NG3 carve-out for multi-table variant and route-shuffle nemesis	✓ No contradiction with §3.3.
Five-pass self-review in §6	✓ Covers all five lenses per CLAUDE.md.

No new issues found.

Summary

All previously flagged findings have been addressed across this PR's six commits. The design document is technically accurate, internally consistent, and grounded in the current codebase. Ready to merge.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87dcfe3f31

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T08:51:00Z

+| Half | Mechanism |
+|---|---|
+| Multi-table workload | A NEW workload variant `dynamodb-append-multi-table-workload` creates N=4 tables (`jepsen_append_t1` … `jepsen_append_t4`) and writes to ≥2 distinct tables per `TransactWriteItems`.  The router maps each table to its own route key so cross-table txns engage multi-route routing. |
+| Multi-group cluster | M5a extends `scripts/run-jepsen-local.sh` to launch a multi-group cluster.  Per `shard_config.go:61-99` (claude[bot] P2 on 24812867), `--shardRanges` alone is not enough: without `--raftGroups`, every shard range's `groupID` collapses into the single default group 1 and `groupMutations` returns `gids = [1]` for every key — single-shard path, 2PC never fires.  M5a must therefore add **five** coordinated launch-script changes:<br><br>1. **`--raftGroups`** declaring ≥2 groups (e.g. `1=127.0.0.1:50051,2=127.0.0.1:50054`).<br>2. **`--shardRanges`** boundary keys placing `t1-t2` in group 1 and `t3-t4` in group 2.<br>3. **Cluster topology decision** — either 6 nodes (3-per-group, production-like consensus) or 2 single-node groups (`--raftBootstrap` per group, faster for CI).  Tentative pick: 2 single-node groups for M5a, scale to 6 nodes if/when the workload outgrows it (low cost — flip a flag).<br>4. **Per-group `--raftBootstrapMembers`** (different members sets for each group's nodes).<br>5. **Expanded `--raftDynamoMap`** covering both groups' leader addresses, plus the matching port allocations (currently `5005{1,2,3}`/`6380{1,2,3}` for the single-group 3-node layout — need `5005{1,4}`/`6380{1,4}` for the 2-single-node layout, or `5005{1..6}`/`6380{1..6}` for 6-node). |


Don’t require bootstrap members for multi-group startup

In the inspected startup path, resolveBootstrapServers returns ErrBootstrapMembersRequireSingleGroup whenever --raftBootstrapMembers is set and --raftGroups parses to more than one group (main.go:735-741). This line makes per-group --raftBootstrapMembers one of the required M5a launch changes, so following the design for the planned ≥2-group run will abort during startup before Jepsen can run; either the design needs new flag/server support for per-group membership or should explicitly omit bootstrap members for the 2 single-node-groups topology.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-02T08:51:00Z

+(str "!ddb|route|table|"
+     (encode-dynamo-segment "jepsen_append_t2"))  ; base64 RawURLEncoding


Generate fresh interior split keys for each shuffle

When the nemesis repeatedly chooses the table-route key shown here, the first successful SplitRange makes that key the Start of the right child (ranges are [Start, End)), and validateSplitKey rejects splitKey == parent.Start/End as split key at route boundary (adapter/distribution_server.go:357-372). In a 5-minute run that expects ≥10 shuffles, subsequent ticks at the same table boundary will fail/skip instead of churning catalog versions; the nemesis needs to derive a new interior key inside the currently selected route rather than reusing the table's route key.

Useful? React with 👍 / 👎.

…laude[bot] P2 on 3ca2a7f) Two new findings on 3ca2a7f: * codex P2 + claude[bot] P2 (same defect, independently flagged): --raftBootstrapMembers is rejected on multi-group startup (main.go:735-741 — ErrBootstrapMembersRequireSingleGroup guard). My §3.3 item 4 said 'Per-group --raftBootstrapMembers', which would abort startup before Jepsen could run. Fix: §3.3 item 4 now correctly specifies --raftBootstrap (boolean) for the tentative 2-single-node-groups topology. Single-member groups need no peer discovery so --raftBootstrap is necessary and sufficient. The existing --raftBootstrapMembers in run-jepsen-local.sh must be REMOVED for multi-group processes — it's only valid for single-group multi-node formations. * codex P2 #2 — split key reuse becomes no-op. My §3.2 example showed the boundary key !ddb|route|table|<encode(t2)>, but after the setup-time split that key is the Start of the right child route. adapter/distribution_server.go:357-372's validateSplitKey rejects splitKey == parent.Start/End with 'split key at route boundary' — so every subsequent nemesis tick at the same key would fail. In a 5-minute run expecting >=10 shuffles, the nemesis would no-op silently and the run would appear clean while catalog churn never landed. Fix: §3.2 now splits into TWO distinct split-key constructions: (a) Setup-time boundary key (one-shot, in scripts/run-jepsen-local.sh) — IS a route boundary by design (the static partition between groups). (b) Per-shuffle interior key (every nemesis :start) — MUST lie strictly between the current route's Start and End. Strategy: append a base64-alphabet byte plus a fresh counter to route.start; the result sorts after start but before any non-base64 byte (e.g. ASCII '|'=124). Implementation details (counter persistence, collision recovery) deferred to M5b; the design contract is: each shuffle's split key is strictly interior to the route's current range.

chatgpt-codex-connector · 2026-06-02T08:54:21Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

bootjp · 2026-06-02T08:54:40Z

Pushed as b752a89 — codex P2 + claude[bot] P2 (same defect) and codex P2 #2 on 3ca2a7f addressed.

codex P2 + claude[bot] P2 (same defect, independently flagged) — --raftBootstrapMembers rejected on multi-group startup. Confirmed against main.go:735-741: resolveBootstrapServers returns ErrBootstrapMembersRequireSingleGroup whenever --raftBootstrapMembers is set and --raftGroups parses to more than one group. My §3.3 item 4 ("Per-group --raftBootstrapMembers") would abort startup before Jepsen could run.

Fix: §3.3 item 4 now correctly specifies --raftBootstrap (boolean) for the tentative 2-single-node-groups topology. Single-member groups need no peer discovery → --raftBootstrap is necessary and sufficient. The existing --raftBootstrapMembers in run-jepsen-local.sh must be removed for multi-group processes — only valid for single-group multi-node formations.

codex P2 #2 — split key reuse becomes no-op. My §3.2 example showed !ddb|route|table|<encode("jepsen_append_t2")> as the split key, but after the setup-time split that key is the Start of the right child route. adapter/distribution_server.go:357-372's validateSplitKey rejects splitKey == parent.Start/End with "split key at route boundary" — every subsequent nemesis tick would fail. In a 5-min run expecting ≥10 shuffles, the nemesis would no-op silently and the workload would appear clean while catalog churn never lands.

Fix: §3.2 split into TWO distinct split-key constructions:

(a) Setup-time boundary key (one-shot, in run-jepsen-local.sh) — IS a route boundary by design (the static partition between groups).
(b) Per-shuffle interior key (every nemesis :start) — MUST lie strictly between the current route's Start and End. Strategy: append a base64-alphabet byte plus a fresh counter to route.start; the result sorts after start but before any non-base64 byte (e.g. ASCII |=124).

Implementation details (counter persistence, collision recovery) deferred to M5b; the design contract is: each shuffle's split key is strictly interior to the route's current range.

claude[bot] gave LGTM on 87dcfe3 "Ready to merge" with no new findings — this commit (b752a89) addresses two findings from claude[bot]'s earlier 3ca2a7f review (which arrived between the 87dcfe3 review starting and finishing) plus codex's two P2s on 87dcfe3.

@claude review

claude · 2026-06-02T08:54:56Z

Claude finished @bootjp's task in 4m 18s —— View job

Review of PR #905 — `b752a89`

Load tools
Read current design document (b752a89)
Verify codex P2 + claude P2 fix (--raftBootstrap for multi-group startup)
Verify codex P2 Update dependency go to v1.21.4 #2 fix (interior split keys — two distinct constructions in §3.2)
Cross-check Go source: main.go:735-741, distribution_server.go:357-377
General review pass for new issues
Post findings

Verification of the two prior findings

codex P2 + claude P2 (--raftBootstrapMembers rejected on multi-group startup) — confirmed fixed. main.go:739-741 confirms resolveBootstrapServers returns ErrBootstrapMembersRequireSingleGroup whenever len(groups) != 1 and bootstrapMembers is non-empty. §3.3 item 4 now correctly specifies --raftBootstrap (boolean), references the guard at main.go:735-741, and explains that --raftBootstrapMembers must be removed for multi-group processes. For the tentative 2-single-node-groups topology, each process hosts one group with one member — --raftBootstrap is both necessary and sufficient. ✓

codex P2 #2 (split key reuse becomes no-op) — confirmed fixed. distribution_server.go:357-377 confirms validateSplitKey rejects splitKey == parent.Start (line 359) and splitKey == parent.End (line 370) with errDistributionSplitKeyAtBoundary. §3.2 now distinguishes two constructions:

(a) Setup-time boundary key — is a route boundary by design; used once in run-jepsen-local.sh via --shardRanges
(b) Per-shuffle interior key — fresh-interior-split-key appends "A" + (System/nanoTime) to route.start, which is strictly > start and provably < route.end (since start and end diverge at the table-encoded segment, any extension of start stays on the same side)

The validateSplitKey check confirms: startCmp < 0 rejects keys before start, startCmp == 0 rejects start itself. fresh-interior-split-key always produces a key > start. The end-side comparison (endCmp > 0 / endCmp == 0) is safe because route.start and route.end share a common prefix up to the table encoding, then differ — any extension of start is lexicographically before end. ✓

General review pass

Area	Status
§3.1 CLI flags	✓ Coherent; non-zero exit on error means Jepsen nemesis correctly sees failures
§3.2 `encode-dynamo-segment` + `.withoutPadding` + silent-no-op failure mode	✓ Documented. Clojure MUST use `(.withoutPadding (java.util.Base64/getUrlEncoder))`
§3.2 Two split-key constructions (boundary vs interior)	✓ Distinction is clear and grounded in `validateSplitKey`'s boundary rejection
§3.2 Route ID re-query on every `:start` invocation	✓ `ErrCatalogVersionMismatch` from `distribution/catalog.go:567,632` handles the concurrent-split race
§3.3 Five launch-script changes, `--raftBootstrap` not `--raftBootstrapMembers`	✓ Correct after this commit
§3.3 Setup hook read-only (`ListRoutes` verify, no mutation)	✓ First-node gate is correct Jepsen idiom
§3.4 Elle success criterion (`nil? (get (:anomalies results) :G1c)`)	✓ Correct map lookup; existing workload uses `:consistency-models [:strict-serializable]` as the canonical gate
§3.5 Cadence (30s, composed with 40s partition+kill)	✓ Reasonable defaults
§4 Milestone table + scope paragraph	✓ Consistent; M5a scope accurately described as "no longer small"
§5 OQ-7 (`cmd/elastickv-route-key` for boundary key encoding)	✓ Correct deferral; Go-helper approach removes the shell-encoding footgun
§6 Five-pass self-review	✓ All five CLAUDE.md lenses covered

Minor implementation note (non-blocking)

fresh-interior-split-key stubs the implementation as (str (:start route) "A" (System/nanoTime)). In Jepsen, route fields returned by ListRoutes may be byte arrays rather than strings depending on how the protobuf response is deserialized — calling str on a byte array produces the Java object reference ("[B@…") rather than the actual bytes. This is an M5b implementation concern, not a design issue (the doc correctly defers strategy details to implementation), but worth flagging so the implementer doesn't spend time debugging a silent mis-routing caused by incorrect Clojure str coercion.

Summary

Both codex P2 + claude P2 findings are correctly addressed. The document is technically sound and internally consistent. No new P1 or P2 findings. Ready to merge.

…5b implementation list (claude[bot] minor on b752a89) claude[bot] flagged a non-blocking M5b implementation concern on b752a89: ListRoutes protobuf response fields may be Java byte arrays rather than strings depending on deserialization. Calling Clojure (str byte-array) silently yields '[B@…' (Java object reference) instead of the bytes, producing a wrong split key that would land outside the route's range — a silent mis-routing footgun. The existing 'Implementation details deferred to M5b' line in §3.2 already covers this in spirit, but explicitly listing 'byte-vs-string coercion of ListRoutes response fields' next to 'counter persistence' and 'collision recovery' makes the specific footgun visible at design-doc-read time so the M5b implementer can budget for it rather than discover it through debugging silent mis-routing. Doc-only, no semantic changes.

bootjp · 2026-06-02T09:03:07Z

Pushed as 6f024aa — addresses the only remaining note from claude[bot]'s "Ready to merge" review of b752a89.

claude[bot] minor (non-blocking) on b752a89: ListRoutes protobuf response fields may be Java byte arrays rather than strings depending on deserialization. Calling Clojure (str byte-array) silently yields "[B@…" (Java object reference) instead of the bytes — a silent mis-routing footgun.

Fix: §3.2's existing "Implementation details deferred to M5b" list now explicitly includes "byte-vs-string coercion of ListRoutes response fields" alongside "counter persistence" and "collision recovery." The footgun is now visible at design-doc-read time so the M5b implementer can budget for it rather than discover it through debugging silent mis-routing.

State of PR #905:

claude[bot] LGTM "Ready to merge" on both 87dcfe3 and b752a89 (back-to-back review cycles)
Codex bot: rate-limited (will not review further until reset)
CI on b752a89 (and 6f024aa once it kicks): build/test/lint all pass
9 commits of refinement across 5 substantive review rounds (codex + claude + gemini + coderabbit)

Loop exit condition reached. Stopping the 5-minute cron.

@claude review

claude · 2026-06-02T09:03:25Z

Claude finished @bootjp's task in 2m 4s —— View job

Review of PR #905 — `6f024aa`

Verification of the `b752a89` minor note (`6f024aa`)

Confirmed fixed. The diff adds the byte-vs-string coercion note inline into the existing "Exact strategy details … are M5b implementation notes" sentence in §3.2 (lines 217–224):

Exact strategy details (counter persistence across nemesis
restarts, collision recovery, byte-vs-string coercion of
`ListRoutes` response fields — `(str (:start route) …)`
silently yields `"[B@…"` if `:start` is a Java byte array
rather than a string, a silent mis-routing footgun flagged
by claude[bot] on b752a894) are M5b implementation notes —
the design contract is: each shuffle's split key is strictly
interior to the route's current range.

The placement is correct: immediately after the fresh-interior-split-key stub that exhibits the pattern, before the Base64 encoding discussion. The failure mode description is accurate — Clojure (str java-byte-array) calls Object.toString(), which returns the Java reference string "[B@…", not the bytes. An M5b implementer who reads §3.2 from top to bottom sees the footgun before they write any code. ✓

Final general review pass

All twelve prior findings (two Gemini R, two Codex P1, three Codex P2, three Claude P2, one Claude P3, one Claude minor) have been correctly addressed across the six commits of this PR. Full verification table:

Finding	Status
Gemini R1 — lexicographic split key	✓ §3.2 uses table-route prefix; ASCII direction corrected
Gemini R2 — route ID stale after each split	✓ §3.2 re-queries `ListRoutes` on every `:start`
Gemini R3 — setup hook runs on all nodes	✓ `(when (= node (first (:nodes test))) …)` gate
Codex P1 #1 — raw table name vs. base64	✓ `encode-dynamo-segment` + `.withoutPadding` documented
Codex P1 #2 — same-group SplitRange can't trigger 2PC	✓ §3.3 rewrites to multi-group cluster startup
Codex P2 + Claude P2 — `--raftBootstrapMembers` rejected on multi-group	✓ §3.3 item 4 specifies `--raftBootstrap` (boolean)
Codex P2 #2 — split key reuse becomes no-op	✓ §3.2 distinguishes boundary vs. interior key
Claude P2 #1 — Elle `:anomalies` is a map, not a seq	✓ §3.4 uses `(nil? (get (:anomalies results) :G1c))`
Claude P2 #2 — lexicographic direction inverted	✓ `!` = ASCII 33, `/` = ASCII 47; corrected
Claude P3 #1 — NG3 contradicted §3.3	✓ NG3 carves out the multi-table variant and route-shuffle nemesis
Claude P3 — Base64 `.withoutPadding` silent no-op	✓ §3.2 calls out `.withoutPadding` and failure mode
Claude minor — §4 scope understated	✓ §4 correctly calls out M5a is "no longer small"
Claude minor (`b752a89`) — byte-vs-string coercion	✓ §3.2 names the footgun in the M5b deferred list

No new issues found. The document is technically accurate, internally consistent, and grounded in the current codebase.

Ready to merge.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f024aaa73

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T09:05:45Z

+| Half | Mechanism |
+|---|---|
+| Multi-table workload | A NEW workload variant `dynamodb-append-multi-table-workload` creates N=4 tables (`jepsen_append_t1` … `jepsen_append_t4`) and writes to ≥2 distinct tables per `TransactWriteItems`.  The router maps each table to its own route key so cross-table txns engage multi-route routing. |
+| Multi-group cluster | M5a extends `scripts/run-jepsen-local.sh` to launch a multi-group cluster.  Per `shard_config.go:61-99` (claude[bot] P2 on 24812867), `--shardRanges` alone is not enough: without `--raftGroups`, every shard range's `groupID` collapses into the single default group 1 and `groupMutations` returns `gids = [1]` for every key — single-shard path, 2PC never fires.  M5a must therefore add **five** coordinated launch-script changes:<br><br>1. **`--raftGroups`** declaring ≥2 groups (e.g. `1=127.0.0.1:50051,2=127.0.0.1:50054`).<br>2. **`--shardRanges`** boundary keys placing `t1-t2` in group 1 and `t3-t4` in group 2.<br>3. **Cluster topology decision** — either 6 nodes (3-per-group, production-like consensus) or 2 single-node groups (`--raftBootstrap` per group, faster for CI).  Tentative pick: 2 single-node groups for M5a, scale to 6 nodes if/when the workload outgrows it (low cost — flip a flag).<br>4. **Per-node `--raftBootstrap` (boolean) — NOT `--raftBootstrapMembers`.**  `main.go:735-741` has a hard guard: `resolveBootstrapServers` returns `ErrBootstrapMembersRequireSingleGroup` whenever `--raftBootstrapMembers` is set and `--raftGroups` parses to more than one group (codex P2 + claude[bot] P2 on 3ca2a7f7).  For the tentative 2-single-node-groups topology, each process hosts ONE group with ONE member — single-member groups need no peer discovery, so `--raftBootstrap` is both necessary and sufficient.  The existing `--raftBootstrapMembers` in `run-jepsen-local.sh` must be **removed** for multi-group processes; it is only valid for single-group, multi-node formations.<br>5. **Expanded `--raftDynamoMap`** covering both groups' leader addresses, plus the matching port allocations (currently `5005{1,2,3}`/`6380{1,2,3}` for the single-group 3-node layout — need `5005{1,4}`/`6380{1,4}` for the 2-single-node layout, or `5005{1..6}`/`6380{1..6}` for 6-node). |


Don’t split single-node groups across processes

In the proposed 2-single-node topology, the plan says each process hosts one group and uses separate 50051/50054 + 63801/63804 endpoints, but the current server config does not support a process that knows about only its own group while routing to the other one: validateShardRanges requires every --shardRanges group ID to appear in that process’s --raftGroups, and buildShardGroups then starts a local Raft runtime for every group listed. With the documented layout, either each per-group process fails validation for ranges owned by the other group, or both processes are configured with both groups and race to bind the same Raft listeners; M5a’s local launcher needs to model the supported shape, e.g. one process hosting both single-member groups with both Raft addresses mapped to its single Dynamo endpoint, or add server support for remote-only groups first.

Useful? React with 👍 / 👎.

…24aa) Codex P1 on 6f024aa correctly identifies that the proposed '2 processes, each hosting one group' topology is NOT supported by the current server: * shard_config.go:399-410 (validateShardRanges) requires every --shardRanges group ID to appear in that process's --raftGroups. Process A with --raftGroups '1=...' but a shardRanges entry pointing to group 2 fails validation. * main.go:764 (buildShardGroups) starts a local Raft runtime for every group listed in --raftGroups. Configuring both processes with --raftGroups '1=...,2=...' makes both try to host both groups and race on the Raft listeners. Fix: §3.3 items 3-5 rewritten for the SUPPORTED topology — ONE PROCESS hosting both single-member groups (two Raft addresses, shared Dynamo endpoint). Cross-group txns go through the in-process router and 2PC fires across the two co-located groups. Trade-off explicitly documented: because both groups live in one process, partition/kill nemeses can't separate them. Only the route-shuffle nemesis (API-level) exercises the cross-group path meaningfully. True distributed multi-group (multi-process) requires server-side support for 'remote-only groups in --raftGroups' or equivalent — M6+ work, out of M5 scope. This is a strictly correct narrowing of the topology choice (removed the unsupported '2 separate processes' and '6 nodes' options; the only supported shape is now spelled out). The M5a Done-when criterion still holds: cluster starts with tables 1-2 in group 1 and tables 3-4 in group 2, workload exercises dispatchMultiShardTxn.

bootjp · 2026-06-02T12:32:17Z

Pushed as f92a029 — codex P1 on 6f024aa addressed.

codex P1 — proposed topology not supported by the current server. Confirmed against the codebase:

shard_config.go:399-410 (validateShardRanges) requires every --shardRanges group ID to appear in that process's --raftGroups.
main.go:764 (buildShardGroups) starts a local Raft runtime for every group listed in --raftGroups.

Consequence: my proposed "2 processes, each hosting one group" is rejected — Process A with --raftGroups "1=…" but a --shardRanges entry pointing to group 2 fails validation. Configuring both processes with --raftGroups "1=…,2=…" makes both try to host both groups and race on the Raft listeners.

Fix: §3.3 items 3-5 rewritten for the SUPPORTED topology — one process hosting both single-member groups, two Raft addresses (50051 for group 1, 50054 for group 2), shared Dynamo endpoint (e.g. 63801). Each group is single-member so --raftBootstrap works for both. Cross-group txns go through the in-process router → 2PC fires across the two co-located groups.

Trade-off explicitly documented: because both groups live in one process, partition/kill nemeses can't separate them — only the route-shuffle nemesis (API-level) exercises the cross-group path meaningfully. True distributed multi-group (multi-process) requires server-side support for "remote-only groups in --raftGroups" or equivalent — M6+ work, explicitly out of M5's scope.

This is a strictly correct narrowing of the topology choice (removed the unsupported "2 separate processes" and the speculative "6 nodes" expansion; the only supported shape is now spelled out). The M5a Done-when criterion still holds: cluster starts with tables 1-2 in group 1 and tables 3-4 in group 2, workload exercises dispatchMultiShardTxn.

@claude review

claude · 2026-06-02T12:32:33Z

Claude finished @bootjp's task in 9m 13s —— View job

Review of PR #905 — `f92a029`

Load tools
Read current design document (f92a029)
Verify codex P1 fix (single-process two-group topology)
Cross-check shard_config.go / main.go / multiraft_runtime.go for topology validity
General review pass for remaining issues
Post findings

Verification of the codex P1 fix (`6f024aa` → `f92a029`)

Confirmed fixed and correctly grounded. The codex P1 finding was that "two processes, each hosting one group" is NOT supported because validateShardRanges (shard_config.go:399-410) requires every --shardRanges group ID to appear in that process's --raftGroups, and buildShardGroups (main.go:764) starts a Raft runtime for every group in --raftGroups.

The fix — one process hosting both single-member groups — is fully supported by the current server. Tracing through the code:

Claim in §3.3	Source verification
One process can host multiple groups via `--raftGroups "1=…,2=…"`	`parseRaftGroups` (`shard_config.go:61-99`) parses both entries; `buildShardGroups` loops over all groups creating an independent Raft runtime per group via `buildRuntimeForGroup` (`multiraft_runtime.go:228-266`)
Each group gets its own Raft address	`buildRuntimeForGroup:248` passes `LocalAddress: group.address` to `factory.Create()` — each group binds its own port
Data directory separation across groups in one process	`multi = len(groups) > 1` (`main.go:588`); `groupDataDir` uses `multi` to namespace directories per group
`--raftBootstrap` bootstraps all groups	`main.go:527` → `bootstrap bool` is process-wide and passed to `buildShardGroups:769`; each single-member group self-elects immediately
`--raftBootstrapMembers` must be removed	`resolveBootstrapServers` (`main.go:735-761`) returns `ErrBootstrapMembersRequireSingleGroup` when `len(groups) != 1` — confirmed at `main.go:739-740`
Each group's gRPC transport listens on its own address	`main.go:1567-1574` loops over runtimes; each `rt.registerGRPC(gs)` registers transport, then `lc.Listen(ctx, "tcp", rt.spec.address)` binds that group's port
2PC fires when txn spans both groups	`groupMutations` (`sharded_coordinator.go:821-843`) groups by Raft group ID; `len(gids) > 1` triggers `dispatchMultiShardTxn`

All seven claims in §3.3 are verified against the source.

General review pass

All thirteen findings from prior review rounds are still correctly addressed. No prior fix was reverted by f92a029.

Area	Status
§3.2 `encode-dynamo-segment` + `.withoutPadding`	✓
§3.2 Two split-key constructions (boundary vs interior)	✓
§3.2 Route ID re-query on every `:start` invocation	✓
§3.2 Byte-vs-string coercion noted in M5b deferred list	✓
§3.3 Multi-table workload + multi-group cluster (five changes)	✓
§3.3 `--raftBootstrap` not `--raftBootstrapMembers`	✓
§3.3 Setup hook is read-only verification (no mutation)	✓
§3.3 Limitation acknowledged (partition nemesis can't separate co-located groups)	✓
§3.4 Elle criterion uses `(nil? (get (:anomalies results) :G1c))`	✓
§4 M5a scope described as "no longer small"	✓
NG3 carve-out for multi-table variant and route-shuffle nemesis	✓
Lexicographic ASCII direction (`!`=33, `/`=47)	✓

No new P1 or P2 issues found.

Observation (non-blocking)

The doc implies but doesn't spell out the Jepsen :nodes configuration change. The current run-jepsen-local.sh uses a 3-node layout (5005{1,2,3} / 6380{1,2,3}); M5a drops to a 1-process layout. Jepsen's :nodes list would shrink from 3 entries to 1 (["127.0.0.1"] or equivalent), and the (when (= node (first (:nodes test)))) gate on the setup-hook verification becomes a no-op gate (sole node == first node, always passes). This is an implementation detail the doc can leave to M5a, but a one-line note in OQ-2 or §3.3 would prevent the M5a implementer from discovering mid-PR that Jepsen's node-count configuration also needs updating.

Summary

The codex P1 topology fix is correct, well-grounded in the source, and the design document is now technically accurate across all twelve areas. Ready to merge.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md`:
- Line 386: The milestone description incorrectly states that passing
--shardRanges alone will launch ≥2 Raft groups; update the row so it explicitly
requires both --shardRanges and --raftGroups to be passed when launching (e.g.,
`scripts/run-jepsen-local.sh` should be documented to call with both
--shardRanges and --raftGroups) and mention that omitting --raftGroups collapses
to group 1; modify the text around the `--shardRanges` mention (the scope text
in the milestone row) to require both flags to satisfy the multi-group contract
and ensure the setup hook still calls ListRoutes to verify multi-group routing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ddb92f54-4107-4806-8a5e-5df0f5164e6f

📥 Commits

Reviewing files that changed from the base of the PR and between 2481286 and f92a029.

📒 Files selected for processing (1)

docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md

…anges (coderabbit Major on f92a029) coderabbit Major on f92a029 correctly identifies that §4's M5a milestone row was inconsistent with §3.3's five-change contract: the row said the launch extension passes '--shardRanges so the cluster launches with >=2 Raft groups' but §3.3 spells out that --shardRanges alone collapses every range to the single default group 1 (shard_config.go:61-99 — the claude[bot] P2 finding on 2481286 that drove the §3.3 rewrite in the first place). Both --raftGroups AND --shardRanges are required for the multi-group contract. Fix: §4's M5a Scope cell now explicitly says 'pass BOTH --raftGroups (declaring >=2 groups) AND --shardRanges (placing tables across those groups)', and the cell cross-references §3.3 for the 'shardRanges alone collapses to group 1' explanation. Avoids the implementer following §4 in isolation and missing the --raftGroups requirement. Doc-only, no semantic changes to the design itself — just internal consistency between §4 row and §3.3 table.

bootjp · 2026-06-02T12:43:57Z

Pushed as 58d6ac5 — coderabbit Major on f92a029 addressed.

coderabbit Major — §4 row was inconsistent with §3.3's multi-group contract. The M5a milestone row said the launch extension passes --shardRanges so the cluster launches with ≥2 Raft groups, but §3.3 explicitly states (per the claude[bot] P2 on 2481286 that drove that rewrite) that --shardRanges alone collapses every range to the single default group 1 — both --raftGroups AND --shardRanges are required.

Fix: §4's M5a Scope cell now explicitly says "pass BOTH --raftGroups (declaring ≥2 groups) AND --shardRanges (placing tables across those groups)" and cross-references §3.3 for the "shardRanges alone collapses to group 1" explanation. An implementer reading §4 in isolation will no longer miss the --raftGroups requirement.

Doc-only, no semantic changes — just internal consistency between the §4 row and the §3.3 table.

@claude review

claude · 2026-06-02T12:44:16Z

Claude finished @bootjp's task in 1m 55s —— View job

Review of PR #905 — `58d6ac5`

Read current design document (58d6ac5)
Verify coderabbit Major fix (§4 row now requires both --raftGroups AND --shardRanges)
Verify all prior findings remain correctly addressed
General review pass for new issues
Post findings

Verification of the coderabbit Major fix (`f92a029` → `58d6ac5`)

Confirmed fixed. Line 386's M5a Scope cell now reads (bold in original):

"scripts/run-jepsen-local.sh extended to pass BOTH --raftGroups (declaring ≥2 groups) AND --shardRanges (placing tables across those groups)" — per §3.3, --shardRanges alone collapses every range to the single default group 1 and 2PC never fires; both flags are required for the multi-group contract.

The fix correctly:

Bolds "BOTH" + the two flag names to make the requirement visually prominent
Cross-references §3.3's authoritative explanation ("--shardRanges alone collapses every range to the single default group 1")
Ends with a declarative statement: "both flags are required for the multi-group contract"

The §4 milestone row and the §3.3 five-change table are now fully consistent. ✓

Verification of all prior findings (full audit)

Finding	Still addressed?
Gemini R1 — split key encoding (table-route prefix, base64)	✓ §3.2 uses `!ddb\|route\|table\|<encode-dynamo-segment(...)>`
Gemini R2 — route ID re-query per `:start` invocation	✓ §3.2 "re-queries `ListRoutes`…on every `:start` invocation"
Gemini R3 — setup hook first-node gate	✓ §3.3 `(when (= node (first (:nodes test))) …)`
Codex P1 #1 — `encodeDynamoSegment` base64 RawURLEncoding	✓ §3.2 `encode-dynamo-segment` helper + `.withoutPadding`
Codex P1 #2 — same-group `SplitRange` can't trigger `dispatchMultiShardTxn`	✓ §3.3 multi-group cluster startup strategy
Codex P2 + Claude P2 — `--raftBootstrapMembers` rejected on multi-group	✓ §3.3 item 4: `--raftBootstrap` (boolean), not `--raftBootstrapMembers`
Codex P2 #2 — split key reuse becomes no-op at boundary	✓ §3.2 two-construction strategy (boundary vs. interior)
Claude P2 #1 — Elle `:anomalies` is a map, not a seq	✓ §3.4 `(nil? (get (:anomalies results) :G1c))`
Claude P2 #2 — lexicographic direction inverted	✓ §3.2 "`/` is ASCII 47…workload keys on the left shard"
Claude P3 #1 — NG3 contradicted §3.3	✓ §1.2 NG3 carves out multi-table variant and route-shuffle nemesis
Claude P3 — Base64 `.withoutPadding` silent no-op	✓ §3.2 names failure mode + `.withoutPadding` requirement
Claude minor — §4 scope understated	✓ §4 "no longer 'small'" with full scope list
Claude minor (`b752a89`) — byte-vs-string coercion footgun	✓ §3.2 M5b deferred list explicitly names the footgun
Coderabbit MD027/MD040 — blockquote spacing + code fence languages	✓ (`673bee8`)
Coderabbit Major (`f92a029`) — §4 row missing `--raftGroups`	✓ (`58d6ac5` — this commit)

All fifteen findings correctly addressed. No prior fix has been reverted.

General review pass

No new issues found. The document is technically accurate, internally consistent, and all claims are grounded in the source (shard_config.go, main.go, sharded_coordinator.go, distribution_server.go).

Ready to merge.

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md Outdated

Comment thread docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md Outdated

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md Outdated

bootjp merged commit 2a36826 into main Jun 2, 2026
9 checks passed

bootjp deleted the feat/composed1-m5-design branch June 2, 2026 12:55

		(str "!ddb\|route\|table\|"
		(encode-dynamo-segment "jepsen_append_t2")) ; base64 RawURLEncoding

Conversation

bootjp commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Companion to

Forward-looking posture (matches parent)

Open questions for review

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Lexicographical Shard Split Issue

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Route ID Resolution for SplitRange

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Gating of Initial Split in Setup Hook

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Jun 2, 2026

Uh oh!

claude Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #905 — f5d2ad7

Verification of the four prior findings

New findings

P2 — Success criterion in §3.4 will always pass (incorrect Elle result structure)

P2 — Lexicographic direction in the Gemini R1 explanation is inverted (§3.2)

P3 — NG3 contradicts the new multi-table variant introduced in §3.3

P3 — Setup hook route ID resolution not described

Observation — OQ-4 wording vs. git log

Summary

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented Jun 2, 2026

Uh oh!

bootjp commented Jun 2, 2026

Uh oh!

claude Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #905 — 2481286

Verification of the six prior findings

New findings

P2 — "only the --shardRanges argument is new" understates M5a's launch script changes (§3.3)

P3 — Clojure encode-dynamo-segment implementation details need one more line (§3.2)

Observation — OQ-4 resolution and parent doc rename

Summary

Uh oh!

coderabbitai Bot left a comment

bootjp commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

claude Bot commented Jun 2, 2026 •

edited

Loading

Review of PR #905 — `f5d2ad7`

claude Bot commented Jun 2, 2026 •

edited

Loading

Review of PR #905 — `2481286`

P2 — "only the `--shardRanges` argument is new" understates M5a's launch script changes (§3.3)

P3 — Clojure `encode-dynamo-segment` implementation details need one more line (§3.2)

claude Bot commented Jun 2, 2026 •

edited

Loading

Review of PR #905 — `673bee8`

Verification of lint fixes (`673bee8`)

Verification of the six findings from `f5d2ad7` (addressed in `2481286`)

claude Bot commented Jun 2, 2026 •

edited

Loading

Review of PR #905 — `3ca2a7f`

P2 — `--raftBootstrapMembers` is rejected by multi-group configs (§3.3 item 4)

Remaining observation (from `673bee8` review, still unaddressed)

claude Bot commented Jun 2, 2026 •

edited

Loading

Review of PR #905 — `87dcfe3`

Verification of the §4 scope fix (`87dcfe3`)

claude Bot commented Jun 2, 2026 •

edited

Loading

Review of PR #905 — `b752a89`

claude Bot commented Jun 2, 2026 •

edited

Loading

Review of PR #905 — `6f024aa`