docs(design): Composed-1 M5 — Jepsen route-shuffle workload proposal#905
Conversation
Companion to 2026_05_29_proposed_composed1_cross_group_commit_guard.md (parent — proposes the cross-group commit-time ownership guard M1+M2+M3+M4 implemented in PR #900). M5 is the integration-level proof from the parent doc's milestone table. This proposal elaborates the row into a full design: * cmd/elastickv-split — tiny CLI that invokes SplitRange once by route ID + split key + expected version. Lets the Jepsen nemesis shell out instead of re-implementing the gRPC client in Clojure. * jepsen/src/elastickv/composed1_nemesis.clj — route-shuffle nemesis that periodically splits an existing route. Composes with the existing partition+kill nemesis package. * dynamodb-append-workload setup hook — issues one initial SplitRange so the workload spans >=2 shards from t=0, exercising the 2PC path (prewrite -> primary commit -> secondary commit + the new ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel). Forward-looking posture (same as parent): today's SplitRange is same-group only, so the Composed-1 hazard the M3/M4 guard catches cannot yet be *triggered* in production. M5 ships the scaffolding so: 1. The current SplitRange is exercised under realistic concurrent multi-shard write load and proved non-regressing (workload finds zero G1c — baseline M4 contract). 2. When a future PR introduces a route-mutating RPC that DOES shift ownership across groups, the M5 workload — with a one-line nemesis swap — becomes the integration-level proof that M3+M4 hold under cross-group churn. Two-phase milestone breakdown: * M5a: CLI + workload setup (mergeable on its own; trivially finds zero G1c without the nemesis). * M5b: Route-shuffle nemesis itself + cadence-tuning analysis. Open questions tracked in section 5; lifecycle questions about renaming the parent doc are in OQ-4.
|
Warning Review limit reached
More reviews will be available in 48 minutes. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughDesign doc proposing Composed-1 M5: a Jepsen route-shuffle nemesis with a single-call SplitRange CLI, split-key encoding and catalog-version drift handling, a multi-table workload and multi-group local launch, Elle pass criteria rejecting G1c, cadence defaults, milestones, open questions, and a self-review checklist. ChangesRoute-shuffle workload design
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request proposes a design document for the Composed-1 M5 Jepsen route-shuffle workload, detailing the CLI helper, route-shuffle nemesis, and multi-shard testing strategy. The review feedback highlights three critical design issues: a lexicographical split key issue that would prevent workload keys from being split across shards, a route ID resolution problem where the nemesis needs to dynamically query active routes before splits, and a concurrency issue where the initial split in the Jepsen setup hook must be gated to run only on the bootstrap node.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| **Split key picking strategy.** A simple monotonically-increasing | ||
| counter: every `:start` invocation appends a fresh integer | ||
| suffix to a fixed key prefix the workload reserves. This avoids | ||
| collisions with the workload's keyspace and guarantees the | ||
| split always picks a key that's between existing keys (so the | ||
| operation succeeds against a real catalog). |
There was a problem hiding this comment.
Lexicographical Shard Split Issue
In the proposed split key picking strategy, using a prefix like /split/ (e.g., /split/1, /split/2) will cause all split keys to be lexicographically smaller than the workload's keyspace (which consists of stringified integers like "0", "1", "2", etc., because / [ASCII 47] is less than 0 [ASCII 48]).
As a result, all workload keys will always fall into the rightmost shard (e.g., [/split/N, +inf)), meaning the workload keys themselves are never split across shards. This completely defeats Goal G2 of exercising the 2PC multi-shard transaction path.
Recommendation:
Choose split keys from within the active workload keyspace (e.g., by selecting values that interleave with the workload's keys) to ensure the workload's keyspace is successfully divided across multiple shards.
| **Expected version.** The nemesis calls `ListRoutes` once at | ||
| setup to learn the current catalog version, then increments its | ||
| local copy by 1 after each successful split. Catalog drift | ||
| (another split landing concurrently) is rare in practice — if it | ||
| happens, the nemesis logs and refreshes from `ListRoutes`. |
There was a problem hiding this comment.
Route ID Resolution for SplitRange
The design states that the nemesis will call ListRoutes once at setup to learn the current catalog version and then increment its local copy. However, because splits continuously modify the routing table (deleting parent routes and creating new child routes with new IDs), the nemesis cannot rely on a static route_id or just the initial ListRoutes call to perform subsequent splits.
To invoke SplitRange successfully, the nemesis must provide the correct route_id that currently covers the chosen split_key.
Recommendation:
The nemesis must query ListRoutes (or dynamically track the active routes and their boundaries) before each split to resolve the correct route_id for the target split_key.
| The setup hook (`db/setup!` in Jepsen parlance, or the test's | ||
| `:setup` map) runs `elastickv-split` once with a split key in | ||
| the middle of the workload's keyspace. |
There was a problem hiding this comment.
Gating of Initial Split in Setup Hook
If the initial split is executed within the Jepsen db/setup! hook, keep in mind that db/setup! is executed on every node in the cluster.
Recommendation:
To prevent concurrent split attempts and catalog version conflicts during cluster bootstrap, the invocation of elastickv-split must be gated to run only on the primary/bootstrap node (e.g., using (when (= node (first (:nodes test))) ...)), or run as a single test-level setup step.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ffb9c73fc7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| **Choose A.** Less invasive to the workload, and the | ||
| route-shuffle nemesis itself increases the shard count over | ||
| time, giving organic multi-shard coverage. | ||
|
|
||
| The setup hook (`db/setup!` in Jepsen parlance, or the test's | ||
| `:setup` map) runs `elastickv-split` once with a split key in | ||
| the middle of the workload's keyspace. |
There was a problem hiding this comment.
Don’t rely on item-key splits to shard DynamoDB txns
This setup will not make the current DynamoDB append workload span shards: the router normalizes all DynamoDB table metadata, item, and GSI keys to a single per-table route key (!ddb|route|table|<tableSegment> in kv/shard_key.go:94-124), so every jepsen_append item resolves to the same catalog point regardless of its partition key. With Option A, splitting “the middle of the workload's keyspace” can increase the route count but TransactWriteItems over multiple keys in that one table will still be grouped to one shard, so the run can pass without exercising dispatchMultiShardTxn/secondary commits as G2 requires; the workload needs distinct table route keys or a routing change instead.
Useful? React with 👍 / 👎.
) Codex P1 + 3 gemini medium findings on the original PR #905 revision (ffb9c73). All addressed by revising sections 3.2, 3.3, and 4 (milestone breakdown) and adding OQ-5 / OQ-6: * codex P1 — "Don't rely on item-key splits to shard DynamoDB txns." Verified against kv/shard_key.go:94-124: every DynamoDB table-metadata, item, and GSI key normalises to a SINGLE per- table route key (!ddb|route|table|<tableSegment>). Splitting inside a single-table workload's item keyspace cannot put two items on different shards, so the 2PC path (dispatchMultiShardTxn, secondary commits, ErrTxnSecondaryRouteShiftedAfterPrimaryCommit) would never fire — invalidating G2. Fix: replace single-key-split (Option A) with a NEW workload variant dynamodb-append-multi-table-workload that creates N=4 tables (jepsen_append_t1 … jepsen_append_t4) and writes to >=2 distinct tables per TransactWriteItems. The router maps each table to its own route key, so cross-table txns naturally fan out across shards. The setup hook splits the table-route keyspace at !ddb|route|table|jepsen_append_t2. * gemini medium R1 — "Lexicographical Shard Split Issue." The prior /split/<int> split-key prefix was lexicographically smaller than the workload's keyspace ("/" < "0" in ASCII), so every workload key ended up on the rightmost shard and G2 was never exercised. Fix: anchor split keys to the table-route prefix !ddb|route|table|... so the split lands INSIDE the active workload route range. * gemini medium R2 — "Route ID Resolution for SplitRange." Successful SplitRange deletes the parent route ID and creates two child IDs, so a cached ID from a one-time setup-time ListRoutes call is stale on the next shuffle. Fix: nemesis re-queries ListRoutes on every :start invocation, walks the snapshot to find the route covering the chosen split key, and uses that route's ID + snapshot.version as expected_catalog_version. Catalog drift surfaces as ErrCatalogVersionMismatch from the server and the nemesis refreshes on the next tick. * gemini medium R3 — "Gating of Initial Split in Setup Hook." Jepsen db/setup! runs on EVERY node; an ungated initial split would be attempted concurrently by all nodes. Fix: gate the setup-time split on (when (= node (first (:nodes test))) ...) so only the first node attempts it. Also: * Updated §4 milestone table: M5a now ships the new workload variant (not just a setup hook), so it is meaningfully bigger than the original §4 row suggested. * Added OQ-5 (is N=4 the right default?) and OQ-6 (first-node gate semantics) as follow-ups for implementation time. * Resolved OQ-4: PR #900 has merged, so the parent doc rename *_proposed_*.md → *_partial_*.md should now land as a separate small doc-only PR.
|
Pushed as f5d2ad7 — all four findings on ffb9c73 addressed. Codex P1 — correctly identified the core flaw. Verified against Fix: §3.3 rewritten. New workload variant Gemini medium R1 (Lexicographical split) — fixed by anchoring split keys to the table-route prefix Gemini medium R2 (Route ID resolution) — fixed by having the nemesis re-query Gemini medium R3 (Setup hook gating) — fixed by gating on Also:
@claude review |
|
Claude finished @bootjp's task in 3m 47s —— View job Review of PR #905 — f5d2ad7
Verification of the four prior findingsCodex P1 — confirmed fixed. Verified against Gemini R2 — confirmed fixed. The nemesis description in §3.2 correctly re-queries Gemini R3 — confirmed fixed. The New findingsP2 — Success criterion in §3.4 will always pass (incorrect Elle result structure)The proposed check: (zero? (count (filter #(= (:type %) :G1c) (:anomalies results))))assumes The existing workload ( ;; correct form
(and (:valid? results)
(nil? (get (:anomalies results) :G1c))
(nil? (get (:anomalies results) :G-single)))P2 — Lexicographic direction in the Gemini R1 explanation is inverted (§3.2)The doc explains the old P3 — NG3 contradicts the new multi-table variant introduced in §3.3Non-goal NG3 states: "No new Jepsen workload primitives … The existing P3 — Setup hook route ID resolution not described§3.3 says the setup hook issues the initial split, but doesn't explain how it resolves the route ID needed for the Observation — OQ-4 wording vs. git logThe trigger comment says "PR #900 has merged" and OQ-4 is marked resolved accordingly. However the doc's §5 OQ-4 text says the rename should be "a separate small doc-only PR." That's the right call: the parent-doc rename from SummaryThe four prior reviewer findings have been addressed correctly and the core technical direction is sound. The P2 success-criterion bug (§3.4) is the most important remaining issue — as written, the explicit G1c guard will always silently pass. The lexicographic-direction explanation inversion (§3.2) is a documentation correctness issue. Both are fixable with targeted line edits to the doc before this merges. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5d2ad7ac7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| (`!ddb|route|table|<tableSegment>` — see `kv/shard_key.go:94-124`). | ||
| Concretely, with N tables `jepsen_append_t1` … | ||
| `jepsen_append_tN` per §3.3, the route key for table `tK` is | ||
| `!ddb|route|table|jepsen_append_tK`. Splits happen between |
There was a problem hiding this comment.
Encode table segments before choosing split keys
The proposed split key uses the raw table name, but DynamoDB route keys are built from encodeDynamoSegment(tableName) (base64 RawURLEncoding in adapter/dynamodb.go) and kv.dynamoRouteTableKey appends that encoded segment. For tables like jepsen_append_t1, the real route keys start with !ddb|route|table|am..., so splitting at !ddb|route|table|jepsen_append_tK sorts after all of the workload's table-route keys and leaves every table on the same side; the route-shuffle workload would still miss the intended 2PC path.
Useful? React with 👍 / 👎.
| keys. The setup hook splits the table-route keyspace at | ||
| `!ddb|route|table|jepsen_append_t2` so tables 1 lives on one | ||
| shard and tables 2–4 on another from t=0. |
There was a problem hiding this comment.
Seed cross-group routes before claiming 2PC coverage
Fresh evidence is that the revised setup hook now relies on SplitRange at the table-route boundary, but the current SplitRange only creates child routes with the parent's GroupID, while dispatchTxn enters dispatchMultiShardTxn only when mutations/read keys group to more than one Raft group. With just this initial split, tables 1 and 2–4 become separate ranges in the same group and the workload can satisfy the documented setup while still taking the single-shard transaction path; M5a needs an initial --shardRanges/routing plan that puts the table route keys on distinct groups (or a future cross-group mover) before using it as 2PC evidence.
Useful? React with 👍 / 👎.
…#905) Six findings on f5d2ad7 — two codex P1, two claude P2, two claude P3. All addressed: * codex P1 #1 (encode table segments before choosing split keys). My split key used the raw table name; the actual table-route key uses dynamoRouteTableKey(encodeDynamoSegment( tableName)) — base64 RawURLEncoding (adapter/dynamodb.go: 8337-8365). Split at the raw name sorts after every base64-encoded segment and leaves every table on one side of the split. Fix: §3.2 now specifies the Clojure helper encode-dynamo-segment that mirrors the Go encoding, and the split key is constructed as `(str "!ddb|route|table|" (encode-dynamo-segment "jepsen_append_t2"))`. * codex P1 #2 (seed cross-group routes before claiming 2PC coverage). Today's SplitRange is same-group only — both children inherit the parent's GroupID. dispatchTxn enters dispatchMultiShardTxn only when mutations group to ≥2 Raft GROUPS, not ≥2 routes. An initial SplitRange in the setup hook would leave tables 1 and 2-4 in the same group and the workload would still take the single-shard path. Fix: §3.3 rewritten substantially. Multi-group cluster startup is now required — M5a extends scripts/run-jepsen-local.sh to pass --shardRanges so the cluster launches with ≥2 Raft groups, table-route keys for t1..t4 statically split across groups (e.g. tables 1-2 → group 1, tables 3-4 → group 2). The Jepsen db/setup! hook no longer issues SplitRange — it calls ListRoutes (gated to first node) and VERIFIES the multi-group routing is in place, failing fast if not. Also added §3.3's "Why the nemesis is still useful" paragraph explaining that nemesis-driven SplitRange (same-group) churns catalog versions without moving ownership, exercising M3 + M4 drift detection — a non-regression check. When a future cross-group MoveRange lands, swapping it into the nemesis upgrades the workload from non-regression to real Composed-1 anomaly hunting. * claude P2 #1 (success criterion in §3.4 will always pass). My Clojure snippet used `(filter #(= (:type %) :G1c) (:anomalies results))` but :anomalies is a map (keyword → seq), not a flat list of maps with :type. Iterating a map yields [key val] vectors with no :type field — the filter always returns empty, the check always passes silently. Fix: §3.4 corrected to `(nil? (get (:anomalies results) :G1c))`. Added doc note about the prior broken form so future readers understand the trap. * claude P2 #2 (lexicographic direction inverted in §3.2). My old explanation said `/split/<int>` was smaller than workload keys, but `!` is ASCII 33 and `/` is 47, so `/split/<int>` is LARGER than `!ddb|route|table|*`. Fix: §3.2 corrected — workload keys would land on the LEFT shard, not the rightmost. The fix (anchoring to the table-route prefix) is correct in either direction. * claude P3 #1 (NG3 contradicts the new multi-table variant). Fix: NG3 tightened to "no new operation types or generators BEYOND the multi-table workload variant and the route- shuffle nemesis specified in §3." * claude P3 #2 (setup hook route ID resolution not described). Resolved differently from claude's suggestion: §3.3's revision means the setup hook no longer needs route ID resolution at all (no mutation), just a ListRoutes verification. * Added OQ-7 (codex P1 #1 follow-up) for the --shardRanges boundary key encoding helper.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Pushed as 2481286 — all six findings on f5d2ad7 addressed (2 codex P1, 2 claude P2, 2 claude P3). codex P1 #1 — encode table segments before split keys. My split key used the raw table name; actual route key uses codex P1 #2 — same-group SplitRange can't seed cross-group routes. Current claude[bot] P2 #1 — success criterion would always pass. My claude[bot] P2 #2 — lexicographic direction inverted. claude[bot] P3 #1 — NG3 contradicts §3.3. Tightened: "no new operation types or generators beyond the multi-table variant and the route-shuffle nemesis." claude[bot] P3 #2 — setup hook route ID resolution. Resolved differently: §3.3's rewrite means the hook no longer mutates routes (only verifies), so no route-ID lookup needed for mutation. Just ListRoutes for verification. Added OQ-7 for the @claude review |
|
Claude finished @bootjp's task in 4m 53s —— View job Review of PR #905 — 2481286
Verification of the six prior findingsCodex P1 #1 — confirmed fixed. Codex P1 #2 — confirmed fixed. Claude P2 #1 — confirmed fixed. §3.4 uses Claude P2 #2 — confirmed fixed. §3.2 now correctly states Claude P3 #1 — confirmed fixed. NG3 is now tightened to "beyond the new Claude P3 #2 — confirmed fixed. Setup hook is now read-only (only calls New findingsP2 — "only the
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md`:
- Around line 17-21: The blockquote lines around the SplitRange discussion use
multiple spaces after the '>' which violates markdownlint MD027; edit the two
affected quote lines (the ones starting with the blockquote entries for the list
items referencing SplitRange) to use a single space after '>' (e.g., change '>
1.' and '> 2.' to '> 1.' and '> 2.') so the markdown lists are normalized.
- Around line 104-110: The fenced code blocks in the document (for example the
block showing the elastickv-split command and the other blocks around lines with
route shuffle and Jepsen snippets) need explicit language identifiers to satisfy
MD040; update each triple-backtick fence that contains shell commands to use
```bash (e.g., the block containing the elastickv-split invocation) and use
appropriate languages like ```clojure for code snippets in the Jepsen examples
so all three blocks (the one with elastickv-split, the block at the later
route/jepsen example, and the block at lines ~280) include the correct language
tag.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 46930714-7b12-429d-84a9-4ea1cdbcd711
📒 Files selected for processing (1)
docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md
coderabbitai minor on 2481286: * MD027 (multiple-space-after-blockquote) at lines 17,21 in the upfront posture block. Trimmed double-spaces after `>` to a single space. * MD040 (fenced-code-language) at the three plain ``` blocks: - the elastickv-split CLI usage (```bash) - the route-shuffle-nemesis Clojure sketch (```clojure) - the M5 success-criterion Clojure snippet (```clojure) Doc-only, no semantic changes.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Pushed as 673bee8 — coderabbitai minor lint findings on 2481286 addressed:
Doc-only, no semantic changes. @claude review |
|
Claude finished @bootjp's task in 3m 4s —— View job Review of PR #905 — 673bee8
Verification of lint fixes (673bee8)MD027 — confirmed correct. Lines 17 and 21 changed from MD040 — confirmed correct. All three previously-bare fenced code blocks now carry language identifiers:
No other code blocks in the document are missing language identifiers (the Verification of the six findings from f5d2ad7 (addressed in 2481286)All six confirmed resolved:
Remaining observation (minor, non-blocking)Scope description for M5a is now understated. The sentence at the bottom of §4 — "M5a is a small, focused PR (Go CLI + Clojure setup hook + docs)" — doesn't reflect the updated M5a scope in the milestone table, which now includes: CLI + new multi-table workload + This is the only remaining issue in the document; it's a doc-clarity note, not a correctness problem. SummaryThe lint cleanup (673bee8) is correct and complete. All six prior review findings are properly addressed in 2481286. The document is technically sound and consistent. Ready to merge. |
…+ Base64 nuance) Two findings on 2481286 from claude[bot]'s post-revision review: * P2 — '"only --shardRanges is new" understates M5a's launch script changes' (§3.3 table cell). Confirmed against shard_config.go:61-99: without --raftGroups, every --shardRanges entry collapses into the default single group 1 and groupMutations returns gids = [1] for every key. Single-shard path, 2PC never fires. Fix: §3.3 table cell rewritten to list FIVE coordinated launch-script changes M5a must make: 1. --raftGroups declaring >=2 groups 2. --shardRanges with t1-t2 in group 1, t3-t4 in group 2 3. Cluster topology decision (tentative: 2 single-node groups for CI speed; scale to 6 nodes if needed) 4. Per-group --raftBootstrapMembers 5. Expanded --raftDynamoMap + matching port allocations Avoids the implementer discovering the topology gap mid-PR. * P3 — encode-dynamo-segment Clojure implementation needs Base64 nuance. Go uses base64.RawURLEncoding (URL-safe charset WITHOUT '=' padding). Java's three Base64 variants differ: - Base64/getEncoder (standard '+'/'/', with padding) — wrong - Base64/getUrlEncoder (URL-safe, WITH padding) — wrong - Base64/getUrlEncoder + .withoutPadding — correct Failure mode is silent: wrong encoding produces split keys outside every route's range; ListRoutes returns no match; nemesis logs and skips every :start; run appears clean while the nemesis is a no-op. Fix: §3.2 now calls out .withoutPadding explicitly and documents the silent-no-op failure mode.
|
Pushed as 3ca2a7f — claude[bot] P2 + P3 on 2481286 addressed. P2 — launch-script scope underestimated. Confirmed against Fix: §3.3 table cell rewritten to list five coordinated launch-script changes M5a must make:
This documents the topology gap upfront so the M5a implementer doesn't discover it mid-PR. P3 — Clojure Base64 nuance. Go uses
Failure mode is silent: wrong encoding produces split keys outside every route's range; ListRoutes returns no match; nemesis logs and skips every Fix: §3.2 now calls out @claude review |
|
Claude finished @bootjp's task in 4m 55s —— View job Review of PR #905 — 3ca2a7f
Verification of the two prior findingsClaude P2 (2481286) — confirmed fixed. §3.3 now enumerates five coordinated launch-script changes with specific flag names, a topology decision, and correct port reasoning. The previous "only Claude P3 (2481286) — confirmed fixed. §3.2 correctly calls out New findingP2 —
|
The 'M5a is a small, focused PR' line at the bottom of §4 was accurate for the original M5a scope (Go CLI + setup hook + docs) but became understated after the codex P1 #2 revision expanded M5a to include the multi-table workload variant and the five-change multi-group launch script. Updated to reflect actual scope: M5a now ships: * cmd/elastickv-split (Go CLI) * dynamodb-append-multi-table-workload (Clojure) * scripts/run-jepsen-local.sh multi-group launch extension (5 coordinated changes per §3.3 table) * setup-hook verification (read-only ListRoutes + assert) * docs Likely lands as a single PR for atomicity (workload variant only makes sense alongside multi-group launch + verification) but reviewable as four roughly independent diff sections.
|
Pushed as 87dcfe3 — claude[bot]'s only remaining minor on 673bee8 addressed. claude[bot] LGTM ("Ready to merge") on 673bee8 with one minor doc-clarity note: the bottom-of-§4 line "M5a is a small, focused PR (Go CLI + Clojure setup hook + docs)" was accurate for the original M5a scope but became understated after the codex P1 #2 revision expanded M5a to include the multi-table workload variant and the five-change multi-group launch script. Fix: §4 trailer rewritten to reflect actual M5a scope:
Likely lands as a single PR for atomicity (workload variant only makes sense alongside multi-group launch + verification) but reviewable as four roughly independent diff sections. claude[bot]'s review of 3ca2a7f (which carried the launch-script-scope + Base64-nuance fixes themselves) is still in flight at the time of this comment — I don't expect new findings there since the fixes followed claude[bot]'s own P2/P3 recommendations. @claude review |
|
Claude finished @bootjp's task in 1m 29s —— View job Review of PR #905 — 87dcfe3
|
| Area | Status |
|---|---|
§3.1 CLI flags (--route-id, --split-key, --expected-version, --address) |
✓ Coherent. Multi-group callers pick the target address after ListRoutes; the CLI doesn't need to know topology. |
§3.2 encode-dynamo-segment + .withoutPadding + silent-no-op failure mode |
✓ Correct and fully documented. |
§3.2 Lexicographic direction (ASCII / 47 > ! 33 → workload keys on left) |
✓ Corrected in 2481286, still correct. |
§3.2 Route ID re-query on every :start invocation |
✓ Pattern matches ErrCatalogVersionMismatch sentinel in distribution/catalog.go:567,632. |
| §3.3 Multi-table workload + multi-group cluster (five launch-script changes) | ✓ Grounded in shard_config.go:61-99 and sharded_coordinator.go:821-843. |
§3.3 Setup hook read-only (ListRoutes + assertion, no mutation) |
✓ Correct; first-node gate ((when (= node (first (:nodes test))) …)) is sound. |
§3.4 Elle success criterion (nil? (get (:anomalies results) :G1c)) |
✓ Correct map lookup; prior bug documented for posterity. |
| §3.5 Cadence (30s, composed with partition+kill at 40s) | ✓ Reasonable defaults. |
| §4 Milestone table + trailing scope paragraph | ✓ Consistent after 87dcfe3. |
§5 OQ-7 (cmd/elastickv-route-key for boundary key encoding) |
✓ Appropriate deferral; the Go-helper approach is the right call. |
| NG3 carve-out for multi-table variant and route-shuffle nemesis | ✓ No contradiction with §3.3. |
| Five-pass self-review in §6 | ✓ Covers all five lenses per CLAUDE.md. |
No new issues found.
Summary
All previously flagged findings have been addressed across this PR's six commits. The design document is technically accurate, internally consistent, and grounded in the current codebase. Ready to merge.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 87dcfe3f31
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| | Half | Mechanism | | ||
| |---|---| | ||
| | Multi-table workload | A NEW workload variant `dynamodb-append-multi-table-workload` creates N=4 tables (`jepsen_append_t1` … `jepsen_append_t4`) and writes to ≥2 distinct tables per `TransactWriteItems`. The router maps each table to its own route key so cross-table txns engage multi-route routing. | | ||
| | Multi-group cluster | M5a extends `scripts/run-jepsen-local.sh` to launch a multi-group cluster. Per `shard_config.go:61-99` (claude[bot] P2 on 24812867), `--shardRanges` alone is not enough: without `--raftGroups`, every shard range's `groupID` collapses into the single default group 1 and `groupMutations` returns `gids = [1]` for every key — single-shard path, 2PC never fires. M5a must therefore add **five** coordinated launch-script changes:<br><br>1. **`--raftGroups`** declaring ≥2 groups (e.g. `1=127.0.0.1:50051,2=127.0.0.1:50054`).<br>2. **`--shardRanges`** boundary keys placing `t1-t2` in group 1 and `t3-t4` in group 2.<br>3. **Cluster topology decision** — either 6 nodes (3-per-group, production-like consensus) or 2 single-node groups (`--raftBootstrap` per group, faster for CI). Tentative pick: 2 single-node groups for M5a, scale to 6 nodes if/when the workload outgrows it (low cost — flip a flag).<br>4. **Per-group `--raftBootstrapMembers`** (different members sets for each group's nodes).<br>5. **Expanded `--raftDynamoMap`** covering both groups' leader addresses, plus the matching port allocations (currently `5005{1,2,3}`/`6380{1,2,3}` for the single-group 3-node layout — need `5005{1,4}`/`6380{1,4}` for the 2-single-node layout, or `5005{1..6}`/`6380{1..6}` for 6-node). | |
There was a problem hiding this comment.
Don’t require bootstrap members for multi-group startup
In the inspected startup path, resolveBootstrapServers returns ErrBootstrapMembersRequireSingleGroup whenever --raftBootstrapMembers is set and --raftGroups parses to more than one group (main.go:735-741). This line makes per-group --raftBootstrapMembers one of the required M5a launch changes, so following the design for the planned ≥2-group run will abort during startup before Jepsen can run; either the design needs new flag/server support for per-group membership or should explicitly omit bootstrap members for the 2 single-node-groups topology.
Useful? React with 👍 / 👎.
| (str "!ddb|route|table|" | ||
| (encode-dynamo-segment "jepsen_append_t2")) ; base64 RawURLEncoding |
There was a problem hiding this comment.
Generate fresh interior split keys for each shuffle
When the nemesis repeatedly chooses the table-route key shown here, the first successful SplitRange makes that key the Start of the right child (ranges are [Start, End)), and validateSplitKey rejects splitKey == parent.Start/End as split key at route boundary (adapter/distribution_server.go:357-372). In a 5-minute run that expects ≥10 shuffles, subsequent ticks at the same table boundary will fail/skip instead of churning catalog versions; the nemesis needs to derive a new interior key inside the currently selected route rather than reusing the table's route key.
Useful? React with 👍 / 👎.
…laude[bot] P2 on 3ca2a7f) Two new findings on 3ca2a7f: * codex P2 + claude[bot] P2 (same defect, independently flagged): --raftBootstrapMembers is rejected on multi-group startup (main.go:735-741 — ErrBootstrapMembersRequireSingleGroup guard). My §3.3 item 4 said 'Per-group --raftBootstrapMembers', which would abort startup before Jepsen could run. Fix: §3.3 item 4 now correctly specifies --raftBootstrap (boolean) for the tentative 2-single-node-groups topology. Single-member groups need no peer discovery so --raftBootstrap is necessary and sufficient. The existing --raftBootstrapMembers in run-jepsen-local.sh must be REMOVED for multi-group processes — it's only valid for single-group multi-node formations. * codex P2 #2 — split key reuse becomes no-op. My §3.2 example showed the boundary key !ddb|route|table|<encode(t2)>, but after the setup-time split that key is the Start of the right child route. adapter/distribution_server.go:357-372's validateSplitKey rejects splitKey == parent.Start/End with 'split key at route boundary' — so every subsequent nemesis tick at the same key would fail. In a 5-minute run expecting >=10 shuffles, the nemesis would no-op silently and the run would appear clean while catalog churn never landed. Fix: §3.2 now splits into TWO distinct split-key constructions: (a) Setup-time boundary key (one-shot, in scripts/run-jepsen-local.sh) — IS a route boundary by design (the static partition between groups). (b) Per-shuffle interior key (every nemesis :start) — MUST lie strictly between the current route's Start and End. Strategy: append a base64-alphabet byte plus a fresh counter to route.start; the result sorts after start but before any non-base64 byte (e.g. ASCII '|'=124). Implementation details (counter persistence, collision recovery) deferred to M5b; the design contract is: each shuffle's split key is strictly interior to the route's current range.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Pushed as b752a89 — codex P2 + claude[bot] P2 (same defect) and codex P2 #2 on 3ca2a7f addressed. codex P2 + claude[bot] P2 (same defect, independently flagged) — Fix: §3.3 item 4 now correctly specifies codex P2 #2 — split key reuse becomes no-op. My §3.2 example showed Fix: §3.2 split into TWO distinct split-key constructions:
Implementation details (counter persistence, collision recovery) deferred to M5b; the design contract is: each shuffle's split key is strictly interior to the route's current range. claude[bot] gave LGTM on 87dcfe3 "Ready to merge" with no new findings — this commit (b752a89) addresses two findings from claude[bot]'s earlier 3ca2a7f review (which arrived between the 87dcfe3 review starting and finishing) plus codex's two P2s on 87dcfe3. @claude review |
|
Claude finished @bootjp's task in 4m 18s —— View job Review of PR #905 — b752a89
Verification of the two prior findingscodex P2 + claude P2 ( codex P2 #2 (split key reuse becomes no-op) — confirmed fixed.
The General review pass
Minor implementation note (non-blocking)
SummaryBoth codex P2 + claude P2 findings are correctly addressed. The document is technically sound and internally consistent. No new P1 or P2 findings. Ready to merge. |
…5b implementation list (claude[bot] minor on b752a89) claude[bot] flagged a non-blocking M5b implementation concern on b752a89: ListRoutes protobuf response fields may be Java byte arrays rather than strings depending on deserialization. Calling Clojure (str byte-array) silently yields '[B@…' (Java object reference) instead of the bytes, producing a wrong split key that would land outside the route's range — a silent mis-routing footgun. The existing 'Implementation details deferred to M5b' line in §3.2 already covers this in spirit, but explicitly listing 'byte-vs-string coercion of ListRoutes response fields' next to 'counter persistence' and 'collision recovery' makes the specific footgun visible at design-doc-read time so the M5b implementer can budget for it rather than discover it through debugging silent mis-routing. Doc-only, no semantic changes.
|
Pushed as 6f024aa — addresses the only remaining note from claude[bot]'s "Ready to merge" review of b752a89. claude[bot] minor (non-blocking) on b752a89: Fix: §3.2's existing "Implementation details deferred to M5b" list now explicitly includes "byte-vs-string coercion of State of PR #905:
Loop exit condition reached. Stopping the 5-minute cron. @claude review |
|
Claude finished @bootjp's task in 2m 4s —— View job Review of PR #905 — 6f024aa
Verification of the b752a89 minor note (6f024aa)Confirmed fixed. The diff adds the byte-vs-string coercion note inline into the existing "Exact strategy details … are M5b implementation notes" sentence in §3.2 (lines 217–224): The placement is correct: immediately after the Final general review passAll twelve prior findings (two Gemini R, two Codex P1, three Codex P2, three Claude P2, one Claude P3, one Claude minor) have been correctly addressed across the six commits of this PR. Full verification table:
No new issues found. The document is technically accurate, internally consistent, and grounded in the current codebase. Ready to merge. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f024aaa73
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| | Half | Mechanism | | ||
| |---|---| | ||
| | Multi-table workload | A NEW workload variant `dynamodb-append-multi-table-workload` creates N=4 tables (`jepsen_append_t1` … `jepsen_append_t4`) and writes to ≥2 distinct tables per `TransactWriteItems`. The router maps each table to its own route key so cross-table txns engage multi-route routing. | | ||
| | Multi-group cluster | M5a extends `scripts/run-jepsen-local.sh` to launch a multi-group cluster. Per `shard_config.go:61-99` (claude[bot] P2 on 24812867), `--shardRanges` alone is not enough: without `--raftGroups`, every shard range's `groupID` collapses into the single default group 1 and `groupMutations` returns `gids = [1]` for every key — single-shard path, 2PC never fires. M5a must therefore add **five** coordinated launch-script changes:<br><br>1. **`--raftGroups`** declaring ≥2 groups (e.g. `1=127.0.0.1:50051,2=127.0.0.1:50054`).<br>2. **`--shardRanges`** boundary keys placing `t1-t2` in group 1 and `t3-t4` in group 2.<br>3. **Cluster topology decision** — either 6 nodes (3-per-group, production-like consensus) or 2 single-node groups (`--raftBootstrap` per group, faster for CI). Tentative pick: 2 single-node groups for M5a, scale to 6 nodes if/when the workload outgrows it (low cost — flip a flag).<br>4. **Per-node `--raftBootstrap` (boolean) — NOT `--raftBootstrapMembers`.** `main.go:735-741` has a hard guard: `resolveBootstrapServers` returns `ErrBootstrapMembersRequireSingleGroup` whenever `--raftBootstrapMembers` is set and `--raftGroups` parses to more than one group (codex P2 + claude[bot] P2 on 3ca2a7f7). For the tentative 2-single-node-groups topology, each process hosts ONE group with ONE member — single-member groups need no peer discovery, so `--raftBootstrap` is both necessary and sufficient. The existing `--raftBootstrapMembers` in `run-jepsen-local.sh` must be **removed** for multi-group processes; it is only valid for single-group, multi-node formations.<br>5. **Expanded `--raftDynamoMap`** covering both groups' leader addresses, plus the matching port allocations (currently `5005{1,2,3}`/`6380{1,2,3}` for the single-group 3-node layout — need `5005{1,4}`/`6380{1,4}` for the 2-single-node layout, or `5005{1..6}`/`6380{1..6}` for 6-node). | |
There was a problem hiding this comment.
Don’t split single-node groups across processes
In the proposed 2-single-node topology, the plan says each process hosts one group and uses separate 50051/50054 + 63801/63804 endpoints, but the current server config does not support a process that knows about only its own group while routing to the other one: validateShardRanges requires every --shardRanges group ID to appear in that process’s --raftGroups, and buildShardGroups then starts a local Raft runtime for every group listed. With the documented layout, either each per-group process fails validation for ranges owned by the other group, or both processes are configured with both groups and race to bind the same Raft listeners; M5a’s local launcher needs to model the supported shape, e.g. one process hosting both single-member groups with both Raft addresses mapped to its single Dynamo endpoint, or add server support for remote-only groups first.
Useful? React with 👍 / 👎.
…24aa) Codex P1 on 6f024aa correctly identifies that the proposed '2 processes, each hosting one group' topology is NOT supported by the current server: * shard_config.go:399-410 (validateShardRanges) requires every --shardRanges group ID to appear in that process's --raftGroups. Process A with --raftGroups '1=...' but a shardRanges entry pointing to group 2 fails validation. * main.go:764 (buildShardGroups) starts a local Raft runtime for every group listed in --raftGroups. Configuring both processes with --raftGroups '1=...,2=...' makes both try to host both groups and race on the Raft listeners. Fix: §3.3 items 3-5 rewritten for the SUPPORTED topology — ONE PROCESS hosting both single-member groups (two Raft addresses, shared Dynamo endpoint). Cross-group txns go through the in-process router and 2PC fires across the two co-located groups. Trade-off explicitly documented: because both groups live in one process, partition/kill nemeses can't separate them. Only the route-shuffle nemesis (API-level) exercises the cross-group path meaningfully. True distributed multi-group (multi-process) requires server-side support for 'remote-only groups in --raftGroups' or equivalent — M6+ work, out of M5 scope. This is a strictly correct narrowing of the topology choice (removed the unsupported '2 separate processes' and '6 nodes' options; the only supported shape is now spelled out). The M5a Done-when criterion still holds: cluster starts with tables 1-2 in group 1 and tables 3-4 in group 2, workload exercises dispatchMultiShardTxn.
|
Pushed as f92a029 — codex P1 on 6f024aa addressed. codex P1 — proposed topology not supported by the current server. Confirmed against the codebase:
Consequence: my proposed "2 processes, each hosting one group" is rejected — Process A with Fix: §3.3 items 3-5 rewritten for the SUPPORTED topology — one process hosting both single-member groups, two Raft addresses ( Trade-off explicitly documented: because both groups live in one process, partition/kill nemeses can't separate them — only the route-shuffle nemesis (API-level) exercises the cross-group path meaningfully. True distributed multi-group (multi-process) requires server-side support for "remote-only groups in This is a strictly correct narrowing of the topology choice (removed the unsupported "2 separate processes" and the speculative "6 nodes" expansion; the only supported shape is now spelled out). The M5a Done-when criterion still holds: cluster starts with tables 1-2 in group 1 and tables 3-4 in group 2, workload exercises @claude review |
|
Claude finished @bootjp's task in 9m 13s —— View job Review of PR #905 — f92a029
Verification of the codex P1 fix (6f024aa → f92a029)Confirmed fixed and correctly grounded. The codex P1 finding was that "two processes, each hosting one group" is NOT supported because The fix — one process hosting both single-member groups — is fully supported by the current server. Tracing through the code:
All seven claims in §3.3 are verified against the source. General review passAll thirteen findings from prior review rounds are still correctly addressed. No prior fix was reverted by f92a029.
No new P1 or P2 issues found. Observation (non-blocking)The doc implies but doesn't spell out the Jepsen SummaryThe codex P1 topology fix is correct, well-grounded in the source, and the design document is now technically accurate across all twelve areas. Ready to merge. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md`:
- Line 386: The milestone description incorrectly states that passing
--shardRanges alone will launch ≥2 Raft groups; update the row so it explicitly
requires both --shardRanges and --raftGroups to be passed when launching (e.g.,
`scripts/run-jepsen-local.sh` should be documented to call with both
--shardRanges and --raftGroups) and mention that omitting --raftGroups collapses
to group 1; modify the text around the `--shardRanges` mention (the scope text
in the milestone row) to require both flags to satisfy the multi-group contract
and ensure the setup hook still calls ListRoutes to verify multi-group routing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ddb92f54-4107-4806-8a5e-5df0f5164e6f
📒 Files selected for processing (1)
docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md
…anges (coderabbit Major on f92a029) coderabbit Major on f92a029 correctly identifies that §4's M5a milestone row was inconsistent with §3.3's five-change contract: the row said the launch extension passes '--shardRanges so the cluster launches with >=2 Raft groups' but §3.3 spells out that --shardRanges alone collapses every range to the single default group 1 (shard_config.go:61-99 — the claude[bot] P2 finding on 2481286 that drove the §3.3 rewrite in the first place). Both --raftGroups AND --shardRanges are required for the multi-group contract. Fix: §4's M5a Scope cell now explicitly says 'pass BOTH --raftGroups (declaring >=2 groups) AND --shardRanges (placing tables across those groups)', and the cell cross-references §3.3 for the 'shardRanges alone collapses to group 1' explanation. Avoids the implementer following §4 in isolation and missing the --raftGroups requirement. Doc-only, no semantic changes to the design itself — just internal consistency between §4 row and §3.3 table.
|
Pushed as 58d6ac5 — coderabbit Major on f92a029 addressed. coderabbit Major — §4 row was inconsistent with §3.3's multi-group contract. The M5a milestone row said the launch extension passes Fix: §4's M5a Scope cell now explicitly says "pass BOTH Doc-only, no semantic changes — just internal consistency between the §4 row and the §3.3 table. @claude review |
|
Claude finished @bootjp's task in 1m 55s —— View job Review of PR #905 — 58d6ac5
Verification of the coderabbit Major fix (f92a029 → 58d6ac5)Confirmed fixed. Line 386's M5a Scope cell now reads (bold in original):
The fix correctly:
The §4 milestone row and the §3.3 five-change table are now fully consistent. ✓ Verification of all prior findings (full audit)
All fifteen findings correctly addressed. No prior fix has been reverted. General review passNo new issues found. The document is technically accurate, internally consistent, and all claims are grounded in the source ( Ready to merge. |

Summary
Companion to
docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.mdForward-looking posture (matches parent)
Today's
SplitRangeis same-group only (per CLAUDE.md andadapter/distribution_server.go), so the Composed-1 hazard the M3/M4 guard catches cannot yet be triggered in production. M5 ships the scaffolding so:SplitRangeis exercised under realistic concurrent multi-shard write load and proved non-regressing — workload finds zero G1c, which is the baseline M4 contract.SplitRange,MoveRange, online rebalancer), the M5 workload — with a one-line nemesis swap — becomes the integration-level proof that M3+M4 hold under cross-group churn.Open questions for review
The doc tracks four open questions in §5. The two that benefit most from reviewer input:
*_proposed_*.md→*_partial_*.md(since only M5 remains)? Tentative yes.Test plan
gofmt/ lint clean (doc-only, no Go changes)cmd/elastickv-splitas a standalone binary is preferable to acmd/elastickv-adminsubcommand (the doc tentatively chooses standalone; see §3.1 "Alternative considered")Summary by CodeRabbit