Skip to content

docs(design): Composed-1 M5 — Jepsen route-shuffle workload proposal#905

Merged
bootjp merged 10 commits into
mainfrom
feat/composed1-m5-design
Jun 2, 2026
Merged

docs(design): Composed-1 M5 — Jepsen route-shuffle workload proposal#905
bootjp merged 10 commits into
mainfrom
feat/composed1-m5-design

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Jun 2, 2026

Summary

  • Design doc for Composed-1 M5 — Jepsen route-shuffle workload, the integration-level proof from the parent doc's milestone table.
  • Doc only — no implementation in this PR. Per CLAUDE.md design-doc-first workflow.
  • M5 itself is two phases: M5a (CLI + workload setup, mergeable on its own), M5b (route-shuffle nemesis + cadence tuning).

Companion to

Forward-looking posture (matches parent)

Today's SplitRange is same-group only (per CLAUDE.md and adapter/distribution_server.go), so the Composed-1 hazard the M3/M4 guard catches cannot yet be triggered in production. M5 ships the scaffolding so:

  1. The current SplitRange is exercised under realistic concurrent multi-shard write load and proved non-regressing — workload finds zero G1c, which is the baseline M4 contract.
  2. When a future PR introduces a route-mutating RPC that DOES shift ownership across groups (cross-group SplitRange, MoveRange, online rebalancer), the M5 workload — with a one-line nemesis swap — becomes the integration-level proof that M3+M4 hold under cross-group churn.

Open questions for review

The doc tracks four open questions in §5. The two that benefit most from reviewer input:

Test plan

  • gofmt / lint clean (doc-only, no Go changes)
  • Reviewer: confirm M5a + M5b phasing matches existing repo conventions
  • Reviewer: confirm cmd/elastickv-split as a standalone binary is preferable to a cmd/elastickv-admin subcommand (the doc tentatively chooses standalone; see §3.1 "Alternative considered")

Summary by CodeRabbit

  • Documentation
    • Added design doc for the Composed-1 M5 milestone introducing a DynamoDB route-shuffle testing workload.
    • Describes multi-shard/multi-group transaction coverage, setup verification, pass/fail validation criteria (including rejection of specific anomalies), default shuffle cadence, milestone breakdown, and remaining open questions.

Companion to 2026_05_29_proposed_composed1_cross_group_commit_guard.md
(parent — proposes the cross-group commit-time ownership guard
M1+M2+M3+M4 implemented in PR #900).

M5 is the integration-level proof from the parent doc's
milestone table.  This proposal elaborates the row into a full
design:

  * cmd/elastickv-split — tiny CLI that invokes SplitRange once
    by route ID + split key + expected version.  Lets the
    Jepsen nemesis shell out instead of re-implementing the
    gRPC client in Clojure.
  * jepsen/src/elastickv/composed1_nemesis.clj — route-shuffle
    nemesis that periodically splits an existing route.
    Composes with the existing partition+kill nemesis package.
  * dynamodb-append-workload setup hook — issues one initial
    SplitRange so the workload spans >=2 shards from t=0,
    exercising the 2PC path (prewrite -> primary commit ->
    secondary commit + the new
    ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel).

Forward-looking posture (same as parent): today's SplitRange
is same-group only, so the Composed-1 hazard the M3/M4 guard
catches cannot yet be *triggered* in production.  M5 ships the
scaffolding so:

  1. The current SplitRange is exercised under realistic
     concurrent multi-shard write load and proved non-regressing
     (workload finds zero G1c — baseline M4 contract).
  2. When a future PR introduces a route-mutating RPC that DOES
     shift ownership across groups, the M5 workload — with a
     one-line nemesis swap — becomes the integration-level proof
     that M3+M4 hold under cross-group churn.

Two-phase milestone breakdown:

  * M5a: CLI + workload setup (mergeable on its own; trivially
    finds zero G1c without the nemesis).
  * M5b: Route-shuffle nemesis itself + cadence-tuning analysis.

Open questions tracked in section 5; lifecycle questions about
renaming the parent doc are in OQ-4.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: afa5f861-5d94-4d55-8652-e03b96c682d9

📥 Commits

Reviewing files that changed from the base of the PR and between f92a029 and 58d6ac5.

📒 Files selected for processing (1)
  • docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md
📝 Walkthrough

Walkthrough

Design doc proposing Composed-1 M5: a Jepsen route-shuffle nemesis with a single-call SplitRange CLI, split-key encoding and catalog-version drift handling, a multi-table workload and multi-group local launch, Elle pass criteria rejecting G1c, cadence defaults, milestones, open questions, and a self-review checklist.

Changes

Route-shuffle workload design

Layer / File(s) Summary
Proposal framing and goals
docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md (lines 1–95)
Document header, M5 goals and non-goals, and rationale framing M5 as an integration-level non-regression check after M1–M4.
CLI, nemesis, and split-key handling
docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md (lines 100–266)
Specifies cmd/elastickv-split CLI (single SplitRange RPC, non-zero exit on errors), route-shuffle-nemesis lifecycle and periodic ListRoutes→choose route→invoke split flow, split-key encoding helper (base64 URL-safe without padding), and expected_catalog_version drift detection per tick.
Multi-table workload and multi-group launch
docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md (lines 267–333)
Adds dynamodb-append-multi-table-workload variant and local Jepsen runner updates (--raftGroups, --shardRanges) to start multi-Raft-group routing with static table-route key boundaries; explains nemesis usefulness even if SplitRange stays same-group-only.
Verification and cadence
docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md (lines 334–376)
Elle pass condition requires :valid? and explicitly rejects :G1c and :G-single; corrects prior anomaly-map filtering mistake. Default shuffle cadence is 30s and composes with partition/kill nemesis via interval scheduling.
Milestones, open questions, checklist
docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md (lines 379–476)
Breaks work into M5a (CLI, workload, multi-group launch, setup verification) and M5b (nemesis wiring, expected zero G1c after shuffles). Lists open questions and a five-point self-review checklist covering data loss, concurrency, perf, consistency, and tests.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A shuffle through routes, a jepsen quest,
CLI hops, split keys put to the test.
Multi-table hops and groups in a dance,
Catalogs steady, give consistency a chance.
🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: a design document proposing the Composed-1 M5 Jepsen route-shuffle workload. It matches the file added and the PR's primary objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/composed1-m5-design

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request proposes a design document for the Composed-1 M5 Jepsen route-shuffle workload, detailing the CLI helper, route-shuffle nemesis, and multi-shard testing strategy. The review feedback highlights three critical design issues: a lexicographical split key issue that would prevent workload keys from being split across shards, a route ID resolution problem where the nemesis needs to dynamically query active routes before splits, and a concurrency issue where the initial split in the Jepsen setup hook must be gated to run only on the bootstrap node.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +149 to +154
**Split key picking strategy.** A simple monotonically-increasing
counter: every `:start` invocation appends a fresh integer
suffix to a fixed key prefix the workload reserves. This avoids
collisions with the workload's keyspace and guarantees the
split always picks a key that's between existing keys (so the
operation succeeds against a real catalog).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Lexicographical Shard Split Issue

In the proposed split key picking strategy, using a prefix like /split/ (e.g., /split/1, /split/2) will cause all split keys to be lexicographically smaller than the workload's keyspace (which consists of stringified integers like "0", "1", "2", etc., because / [ASCII 47] is less than 0 [ASCII 48]).

As a result, all workload keys will always fall into the rightmost shard (e.g., [/split/N, +inf)), meaning the workload keys themselves are never split across shards. This completely defeats Goal G2 of exercising the 2PC multi-shard transaction path.

Recommendation:
Choose split keys from within the active workload keyspace (e.g., by selecting values that interleave with the workload's keys) to ensure the workload's keyspace is successfully divided across multiple shards.

Comment on lines +156 to +160
**Expected version.** The nemesis calls `ListRoutes` once at
setup to learn the current catalog version, then increments its
local copy by 1 after each successful split. Catalog drift
(another split landing concurrently) is rare in practice — if it
happens, the nemesis logs and refreshes from `ListRoutes`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Route ID Resolution for SplitRange

The design states that the nemesis will call ListRoutes once at setup to learn the current catalog version and then increment its local copy. However, because splits continuously modify the routing table (deleting parent routes and creating new child routes with new IDs), the nemesis cannot rely on a static route_id or just the initial ListRoutes call to perform subsequent splits.

To invoke SplitRange successfully, the nemesis must provide the correct route_id that currently covers the chosen split_key.

Recommendation:
The nemesis must query ListRoutes (or dynamically track the active routes and their boundaries) before each split to resolve the correct route_id for the target split_key.

Comment on lines +179 to +181
The setup hook (`db/setup!` in Jepsen parlance, or the test's
`:setup` map) runs `elastickv-split` once with a split key in
the middle of the workload's keyspace.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Gating of Initial Split in Setup Hook

If the initial split is executed within the Jepsen db/setup! hook, keep in mind that db/setup! is executed on every node in the cluster.

Recommendation:
To prevent concurrent split attempts and catalog version conflicts during cluster bootstrap, the invocation of elastickv-split must be gated to run only on the primary/bootstrap node (e.g., using (when (= node (first (:nodes test))) ...)), or run as a single test-level setup step.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ffb9c73fc7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +175 to +181
**Choose A.** Less invasive to the workload, and the
route-shuffle nemesis itself increases the shard count over
time, giving organic multi-shard coverage.

The setup hook (`db/setup!` in Jepsen parlance, or the test's
`:setup` map) runs `elastickv-split` once with a split key in
the middle of the workload's keyspace.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don’t rely on item-key splits to shard DynamoDB txns

This setup will not make the current DynamoDB append workload span shards: the router normalizes all DynamoDB table metadata, item, and GSI keys to a single per-table route key (!ddb|route|table|<tableSegment> in kv/shard_key.go:94-124), so every jepsen_append item resolves to the same catalog point regardless of its partition key. With Option A, splitting “the middle of the workload's keyspace” can increase the route count but TransactWriteItems over multiple keys in that one table will still be grouped to one shard, so the run can pass without exercising dispatchMultiShardTxn/secondary commits as G2 requires; the workload needs distinct table route keys or a routing change instead.

Useful? React with 👍 / 👎.

)

Codex P1 + 3 gemini medium findings on the original PR #905
revision (ffb9c73).  All addressed by revising sections 3.2,
3.3, and 4 (milestone breakdown) and adding OQ-5 / OQ-6:

* codex P1 — "Don't rely on item-key splits to shard DynamoDB
  txns."  Verified against kv/shard_key.go:94-124: every DynamoDB
  table-metadata, item, and GSI key normalises to a SINGLE per-
  table route key (!ddb|route|table|<tableSegment>).  Splitting
  inside a single-table workload's item keyspace cannot put two
  items on different shards, so the 2PC path
  (dispatchMultiShardTxn, secondary commits,
  ErrTxnSecondaryRouteShiftedAfterPrimaryCommit) would never
  fire — invalidating G2.

  Fix: replace single-key-split (Option A) with a NEW workload
  variant dynamodb-append-multi-table-workload that creates N=4
  tables (jepsen_append_t1 … jepsen_append_t4) and writes to >=2
  distinct tables per TransactWriteItems.  The router maps each
  table to its own route key, so cross-table txns naturally fan
  out across shards.  The setup hook splits the table-route
  keyspace at !ddb|route|table|jepsen_append_t2.

* gemini medium R1 — "Lexicographical Shard Split Issue."  The
  prior /split/<int> split-key prefix was lexicographically
  smaller than the workload's keyspace ("/" < "0" in ASCII),
  so every workload key ended up on the rightmost shard and
  G2 was never exercised.

  Fix: anchor split keys to the table-route prefix
  !ddb|route|table|... so the split lands INSIDE the active
  workload route range.

* gemini medium R2 — "Route ID Resolution for SplitRange."
  Successful SplitRange deletes the parent route ID and
  creates two child IDs, so a cached ID from a one-time
  setup-time ListRoutes call is stale on the next shuffle.

  Fix: nemesis re-queries ListRoutes on every :start
  invocation, walks the snapshot to find the route covering
  the chosen split key, and uses that route's ID +
  snapshot.version as expected_catalog_version.  Catalog drift
  surfaces as ErrCatalogVersionMismatch from the server and the
  nemesis refreshes on the next tick.

* gemini medium R3 — "Gating of Initial Split in Setup Hook."
  Jepsen db/setup! runs on EVERY node; an ungated initial
  split would be attempted concurrently by all nodes.

  Fix: gate the setup-time split on
  (when (= node (first (:nodes test))) ...) so only the first
  node attempts it.

Also:

  * Updated §4 milestone table: M5a now ships the new workload
    variant (not just a setup hook), so it is meaningfully
    bigger than the original §4 row suggested.
  * Added OQ-5 (is N=4 the right default?) and OQ-6 (first-node
    gate semantics) as follow-ups for implementation time.
  * Resolved OQ-4: PR #900 has merged, so the parent doc rename
    *_proposed_*.md → *_partial_*.md should now land as a
    separate small doc-only PR.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

Pushed as f5d2ad7 — all four findings on ffb9c73 addressed.

Codex P1 — correctly identified the core flaw. Verified against kv/shard_key.go:94-124: every DynamoDB table-metadata, item, and GSI key normalises to a SINGLE per-table route key (!ddb|route|table|<tableSegment>). Splitting inside a single-table workload's item keyspace cannot put two items on different shards, so the 2PC path (dispatchMultiShardTxn, secondary commits, the new ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel) would never fire — invalidating G2.

Fix: §3.3 rewritten. New workload variant dynamodb-append-multi-table-workload creates N=4 tables (jepsen_append_t1jepsen_append_t4) and writes to ≥2 distinct tables per TransactWriteItems. The router maps each table to its own route key, so cross-table txns naturally fan out across shards. The setup hook splits at !ddb|route|table|jepsen_append_t2 so tables 1 lives on one shard, tables 2-4 on another from t=0.

Gemini medium R1 (Lexicographical split) — fixed by anchoring split keys to the table-route prefix !ddb|route|table|... so they land INSIDE the active workload route range. The prior /split/<int> was lexicographically smaller than the workload's keyspace (/ < 0), so every workload key ended up on the rightmost shard.

Gemini medium R2 (Route ID resolution) — fixed by having the nemesis re-query ListRoutes on every :start invocation, walk the snapshot to find the route covering the chosen split key, and use that route's ID + snapshot.version as expected_catalog_version. Catalog drift surfaces as ErrCatalogVersionMismatch and the nemesis refreshes on the next tick. A cached route ID was indeed always going to be stale — every successful split deletes the parent and creates two fresh children.

Gemini medium R3 (Setup hook gating) — fixed by gating on (when (= node (first (:nodes test))) ...) so only the first node attempts the initial split. Jepsen's db/setup! runs on every node by default.

Also:

  • §4 milestone table updated — M5a now ships the new workload variant (not just a setup hook), so it's meaningfully bigger than the original row suggested.
  • Added OQ-5 (is N=4 the right default?) and OQ-6 (first-node gate semantics) as implementation-time follow-ups.
  • Resolved OQ-4: PR kv(composed1): M4 — Dispatch retry on ErrComposed1Violation / ErrComposed1VersionGCd #900 has merged, so the parent doc rename *_proposed_*.md*_partial_*.md should now land as a separate small doc-only PR.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 3m 47s —— View job


Review of PR #905f5d2ad7

  • Read the updated design document
  • Verify Codex P1 fix (multi-table workload for 2PC path)
  • Verify Gemini R1 fix (lexicographical split key)
  • Verify Gemini R2 fix (route ID resolution on each nemesis invocation)
  • Verify Gemini R3 fix (setup hook gating to single node)
  • General review pass for correctness, consistency, and open issues

Verification of the four prior findings

Codex P1 — confirmed fixed. Verified against kv/shard_key.go:94-124: dynamoRouteKey() normalises every DynamoDB table-meta, item, and GSI key to !ddb|route|table|<tableSegment>, so a single-table workload's entire keyspace maps to one route key regardless of partition-key value. The revised §3.3 strategy — N=4 tables, each TransactWriteItems writes to ≥2 distinct tables — is technically sound. Each table gets its own route key, so after the initial split at !ddb|route|table|jepsen_append_t2 the cross-table txn will fan out through dispatchMultiShardTxn (kv/sharded_coordinator.go:851) and commitSecondaryTxns (:1049), which is exactly what G2 requires. The ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel at :1101 will also be exercisable under concurrent shuffles.

Gemini R2 — confirmed fixed. The nemesis description in §3.2 correctly re-queries ListRoutes on every :start invocation, walks the snapshot, uses the live route_id + snapshot.version as expected_catalog_version. Confirmed ErrCatalogVersionMismatch is the right sentinel (distribution/catalog.go:32, distribution/catalog.go:567,632). The stale-ID retry-on-next-tick pattern matches what SplitRange in adapter/distribution_server.go:152-158 actually returns.

Gemini R3 — confirmed fixed. The (when (= node (first (:nodes test))) ...) gate is standard Jepsen idiom and correctly prevents concurrent initial splits. OQ-6 acknowledging (first (:nodes test)) semantics is the right move.


New findings

P2 — Success criterion in §3.4 will always pass (incorrect Elle result structure)

The proposed check:

(zero? (count (filter #(= (:type %) :G1c) (:anomalies results))))

assumes (:anomalies results) is a sequence of maps with a :type key. But elle/jepsen.tests.cycle.append returns :anomalies as a map from keyword → seq (e.g., {:G1c [...], :G-single [...]}). Iterating a Clojure map with filter yields [key val] vectors; none of them have a :type key, so (= (:type %) :G1c) is always false, and the count is always zero. This check will pass even on a G1c run.

The existing workload (dynamodb_workload.clj:316) uses :consistency-models [:strict-serializable] inside append/test, which is where the real check lives. The explicit Clojure check in §3.4 should either be removed (trust (:valid? results)) or corrected:

;; correct form
(and (:valid? results)
     (nil? (get (:anomalies results) :G1c))
     (nil? (get (:anomalies results) :G-single)))

Fix this →

P2 — Lexicographic direction in the Gemini R1 explanation is inverted (§3.2)

The doc explains the old /split/<int> proposal by saying "that was lexicographically smaller than the workload's keyspace (/ < 0 in ASCII)". The DynamoDB table-route keyspace starts with ! (ASCII 33); / is ASCII 47. So /split/<int> is lexicographically larger than !ddb|route|table|* — workload route keys would land on the left shard, not the rightmost. The fix (anchoring to !ddb|route|table|jepsen_append_t2) is absolutely correct, but the explanation has the direction inverted. This is only a doc-clarity issue, but a reviewer following the reasoning could get confused.

Fix this →

P3 — NG3 contradicts the new multi-table variant introduced in §3.3

Non-goal NG3 states: "No new Jepsen workload primitives … The existing dynamodb-append-workload is the right surface." §3.3 then introduces dynamodb-append-multi-table-workload as a new workload variant. The non-goal should be tightened to something like: "No new operation types or generators beyond the multi-table variant and route-shuffle nemesis." Otherwise a reviewer reading §1.2 before §3.3 will think the multi-table variant contradicts a stated non-goal.

P3 — Setup hook route ID resolution not described

§3.3 says the setup hook issues the initial split, but doesn't explain how it resolves the route ID needed for the elastickv-split call. The nemesis does this via ListRoutes; the same pattern is needed in db/setup!. Worth adding one sentence: "The hook calls ListRoutes first to obtain the initial route ID and catalog version before invoking elastickv-split." Avoids an implementer having to re-derive this.

Observation — OQ-4 wording vs. git log

The trigger comment says "PR #900 has merged" and OQ-4 is marked resolved accordingly. However the doc's §5 OQ-4 text says the rename should be "a separate small doc-only PR." That's the right call: the parent-doc rename from *_proposed_**_partial_* and the eventual *_implemented_* transition should stay out of this PR per CLAUDE.md's lifecycle convention. No action needed here, just confirming the resolution is correct.


Summary

The four prior reviewer findings have been addressed correctly and the core technical direction is sound. The P2 success-criterion bug (§3.4) is the most important remaining issue — as written, the explicit G1c guard will always silently pass. The lexicographic-direction explanation inversion (§3.2) is a documentation correctness issue. Both are fixable with targeted line edits to the doc before this merges.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5d2ad7ac7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

(`!ddb|route|table|<tableSegment>` — see `kv/shard_key.go:94-124`).
Concretely, with N tables `jepsen_append_t1` …
`jepsen_append_tN` per §3.3, the route key for table `tK` is
`!ddb|route|table|jepsen_append_tK`. Splits happen between
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Encode table segments before choosing split keys

The proposed split key uses the raw table name, but DynamoDB route keys are built from encodeDynamoSegment(tableName) (base64 RawURLEncoding in adapter/dynamodb.go) and kv.dynamoRouteTableKey appends that encoded segment. For tables like jepsen_append_t1, the real route keys start with !ddb|route|table|am..., so splitting at !ddb|route|table|jepsen_append_tK sorts after all of the workload's table-route keys and leaves every table on the same side; the route-shuffle workload would still miss the intended 2PC path.

Useful? React with 👍 / 👎.

Comment on lines +209 to +211
keys. The setup hook splits the table-route keyspace at
`!ddb|route|table|jepsen_append_t2` so tables 1 lives on one
shard and tables 2–4 on another from t=0.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Seed cross-group routes before claiming 2PC coverage

Fresh evidence is that the revised setup hook now relies on SplitRange at the table-route boundary, but the current SplitRange only creates child routes with the parent's GroupID, while dispatchTxn enters dispatchMultiShardTxn only when mutations/read keys group to more than one Raft group. With just this initial split, tables 1 and 2–4 become separate ranges in the same group and the workload can satisfy the documented setup while still taking the single-shard transaction path; M5a needs an initial --shardRanges/routing plan that puts the table route keys on distinct groups (or a future cross-group mover) before using it as 2PC evidence.

Useful? React with 👍 / 👎.

…#905)

Six findings on f5d2ad7 — two codex P1, two claude P2, two
claude P3. All addressed:

* codex P1 #1 (encode table segments before choosing split
  keys).  My split key used the raw table name; the actual
  table-route key uses dynamoRouteTableKey(encodeDynamoSegment(
  tableName)) — base64 RawURLEncoding (adapter/dynamodb.go:
  8337-8365).  Split at the raw name sorts after every
  base64-encoded segment and leaves every table on one side
  of the split.

  Fix: §3.2 now specifies the Clojure helper
  encode-dynamo-segment that mirrors the Go encoding, and the
  split key is constructed as
  `(str "!ddb|route|table|" (encode-dynamo-segment "jepsen_append_t2"))`.

* codex P1 #2 (seed cross-group routes before claiming 2PC
  coverage).  Today's SplitRange is same-group only — both
  children inherit the parent's GroupID.  dispatchTxn enters
  dispatchMultiShardTxn only when mutations group to ≥2 Raft
  GROUPS, not ≥2 routes.  An initial SplitRange in the
  setup hook would leave tables 1 and 2-4 in the same group
  and the workload would still take the single-shard path.

  Fix: §3.3 rewritten substantially.  Multi-group cluster
  startup is now required — M5a extends
  scripts/run-jepsen-local.sh to pass --shardRanges so the
  cluster launches with ≥2 Raft groups, table-route keys for
  t1..t4 statically split across groups (e.g. tables 1-2 →
  group 1, tables 3-4 → group 2).  The Jepsen db/setup! hook
  no longer issues SplitRange — it calls ListRoutes (gated
  to first node) and VERIFIES the multi-group routing is in
  place, failing fast if not.  Also added §3.3's "Why the
  nemesis is still useful" paragraph explaining that
  nemesis-driven SplitRange (same-group) churns catalog
  versions without moving ownership, exercising M3 + M4
  drift detection — a non-regression check.  When a future
  cross-group MoveRange lands, swapping it into the nemesis
  upgrades the workload from non-regression to real
  Composed-1 anomaly hunting.

* claude P2 #1 (success criterion in §3.4 will always pass).
  My Clojure snippet used
  `(filter #(= (:type %) :G1c) (:anomalies results))` but
  :anomalies is a map (keyword → seq), not a flat list of
  maps with :type.  Iterating a map yields [key val] vectors
  with no :type field — the filter always returns empty, the
  check always passes silently.

  Fix: §3.4 corrected to
  `(nil? (get (:anomalies results) :G1c))`.  Added doc note
  about the prior broken form so future readers understand
  the trap.

* claude P2 #2 (lexicographic direction inverted in §3.2).
  My old explanation said `/split/<int>` was smaller than
  workload keys, but `!` is ASCII 33 and `/` is 47, so
  `/split/<int>` is LARGER than `!ddb|route|table|*`.

  Fix: §3.2 corrected — workload keys would land on the LEFT
  shard, not the rightmost.  The fix (anchoring to the
  table-route prefix) is correct in either direction.

* claude P3 #1 (NG3 contradicts the new multi-table variant).

  Fix: NG3 tightened to "no new operation types or generators
  BEYOND the multi-table workload variant and the route-
  shuffle nemesis specified in §3."

* claude P3 #2 (setup hook route ID resolution not described).
  Resolved differently from claude's suggestion: §3.3's
  revision means the setup hook no longer needs route ID
  resolution at all (no mutation), just a ListRoutes
  verification.

* Added OQ-7 (codex P1 #1 follow-up) for the
  --shardRanges boundary key encoding helper.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

Pushed as 2481286 — all six findings on f5d2ad7 addressed (2 codex P1, 2 claude P2, 2 claude P3).

codex P1 #1 — encode table segments before split keys. My split key used the raw table name; actual route key uses dynamoRouteTableKey(encodeDynamoSegment(tableName)) — base64 RawURLEncoding (adapter/dynamodb.go:8337-8365). Split at raw name sorts after all encoded segments → all tables on one side. Fix: §3.2 now specifies a Clojure encode-dynamo-segment helper mirroring the Go encoding.

codex P1 #2 — same-group SplitRange can't seed cross-group routes. Current SplitRange only creates children with the parent's GroupID. dispatchTxn enters dispatchMultiShardTxn only when mutations group to ≥2 Raft groups, not ≥2 routes. Setup-hook SplitRange would leave tables in the same group → single-shard path → 2PC never fires. Fix: §3.3 substantially rewritten. M5a now requires multi-group cluster startup — extend scripts/run-jepsen-local.sh to pass --shardRanges so cluster launches with ≥2 Raft groups, table-route keys for t1..t4 statically split across groups. Jepsen db/setup! no longer issues SplitRange — calls ListRoutes (gated to first node) and verifies the multi-group routing is in place, failing fast if not. Added a "Why the nemesis is still useful" paragraph: same-group SplitRange churns catalog versions/route IDs, exercising M3+M4 drift detection — a non-regression check, not a real anomaly hunt until cross-group routing lands.

claude[bot] P2 #1 — success criterion would always pass. My (filter #(= (:type %) :G1c) (:anomalies results)) was wrong: :anomalies is a map (keyword → seq), iterating yields [k v] vectors with no :type. Fix: §3.4 uses (nil? (get (:anomalies results) :G1c)).

claude[bot] P2 #2 — lexicographic direction inverted. ! is ASCII 33, / is 47 → /split/ is LARGER than !ddb|.... Workload keys would land on LEFT shard, not rightmost. Fix: §3.2 explanation corrected.

claude[bot] P3 #1 — NG3 contradicts §3.3. Tightened: "no new operation types or generators beyond the multi-table variant and the route-shuffle nemesis."

claude[bot] P3 #2 — setup hook route ID resolution. Resolved differently: §3.3's rewrite means the hook no longer mutates routes (only verifies), so no route-ID lookup needed for mutation. Just ListRoutes for verification.

Added OQ-7 for the --shardRanges boundary key encoding helper (tentative: small Go binary cmd/elastickv-route-key so the encoding stays in Go).

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 4m 53s —— View job


Review of PR #9052481286


Verification of the six prior findings

Codex P1 #1 — confirmed fixed. adapter/dynamodb.go:8441-8442 confirms encodeDynamoSegment is exactly base64.RawURLEncoding.EncodeToString. The doc correctly calls out the encoding at kv/shard_key.go:117-124 and proposes a Clojure encode-dynamo-segment helper.

Codex P1 #2 — confirmed fixed. sharded_coordinator.go:821-843 confirms groupMutations groups by Raft group ID via c.router.ResolveGroup(mut.Key). The fast path at line 835 fires when len(gids) == 1 — single group. dispatchMultiShardTxn fires only when len(gids) > 1. distribution_server.go:390,398 confirm SplitRange sets both children's GroupID to parent.GroupID, so same-group splits leave cross-table txns on one group and never trigger 2PC. The revised §3.3 strategy (multi-group cluster startup) is the correct fix.

Claude P2 #1 — confirmed fixed. §3.4 uses (nil? (get (:anomalies results) :G1c)) — correct map lookup. The explanation of the prior bug is accurate.

Claude P2 #2 — confirmed fixed. §3.2 now correctly states / (ASCII 47) > ! (ASCII 33), so /split/<int> is lexicographically larger than !ddb|route|table|*, meaning workload keys would land on the left shard.

Claude P3 #1 — confirmed fixed. NG3 is now tightened to "beyond the new dynamodb-append-multi-table-workload variant … and the new route-shuffle-nemesis."

Claude P3 #2 — confirmed fixed. Setup hook is now read-only (only calls ListRoutes and verifies), so no route-ID resolution is needed for mutation. Clean resolution.


New findings

P2 — "only the --shardRanges argument is new" understates M5a's launch script changes (§3.3)

The doc says: "The DDB leader address per-group is wired via --raftDynamoMap (already supported by the runner — only the --shardRanges argument is new)."

This is incorrect. shard_config.go:61-99 shows that parseRaftGroups returns a single default group {id:1, address:defaultAddr} when --raftGroups is empty. Without --raftGroups, all --shardRanges entries still fall into group 1 regardless of the groupID specified, and groupMutations returns gids = [1] for all keys — single-shard path, 2PC never fires.

For ≥2 Raft groups, M5a must also:

  1. Pass --raftGroups "1=127.0.0.1:50051,2=127.0.0.1:50054" (or equivalent)
  2. Decide the cluster topology — today's run-jepsen-local.sh has 3 nodes all in one group. Multi-group requires a topology change: either 6 nodes (3-node consensus per group) or 2 single-node groups for test speed (each with --raftBootstrap)
  3. Update --raftBootstrapMembers per group (different members sets for group 1 vs group 2 nodes)
  4. Expand --raftDynamoMap to cover both groups' leader addresses
  5. Update port allocations (currently 5005{1,2,3}, 6380{1,2,3})

This is substantially more than one new flag. M5a is already noted as "meaningfully bigger" in the §4 table — correcting the claim in §3.3 will avoid the implementer discovering this gap mid-PR.

Fix this →

P3 — Clojure encode-dynamo-segment implementation details need one more line (§3.2)

The doc says the helper "mirrors the Go encodeDynamoSegment exactly." Go uses base64.RawURLEncoding = URL-safe charset (-/_) without = padding. In Clojure/Java, three Base64 variants exist:

  • Base64/getEncoder → standard (+//, with padding) — wrong
  • Base64/getUrlEncoder → URL-safe (-/_, with padding) — wrong (padding changes the string)
  • Base64/getUrlEncoder + .withoutPadding → URL-safe, no padding — correct

Since OQ-7's Go helper (cmd/elastickv-route-key) mitigates this risk for the launch-script boundary keys, the remaining exposure is M5b's Clojure nemesis. Adding one line — e.g. ; use (.withoutPadding (java.util.Base64/getUrlEncoder)) — removes a silent mis-routing footgun at implementation time. The failure mode is non-obvious: a wrong encoding produces split keys that fall outside every route's range, ListRoutes returns no matching route, the nemesis logs and skips every :start, and the run appears clean while the nemesis is a no-op.

Observation — OQ-4 resolution and parent doc rename

OQ-4 is correctly marked resolved: the parent doc rename (*_proposed_**_partial_*) should be a separate PR now that #900 is merged. Confirmed the parent doc (2026_05_29_proposed_composed1_cross_group_commit_guard.md) is not touched in this PR's diff — the separation is correct per CLAUDE.md lifecycle convention.


Summary

All six prior findings are correctly addressed. Two new issues: the P2 claim that only --shardRanges is new in the launch script is the more important one — it will mislead the M5a implementer into thinking the cluster topology and --raftGroups are already handled. The P3 encoding callout is a small but high-value one-liner to add. Both are doc-only fixes.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md`:
- Around line 17-21: The blockquote lines around the SplitRange discussion use
multiple spaces after the '>' which violates markdownlint MD027; edit the two
affected quote lines (the ones starting with the blockquote entries for the list
items referencing SplitRange) to use a single space after '>' (e.g., change '>  
1.' and '>   2.' to '> 1.' and '> 2.') so the markdown lists are normalized.
- Around line 104-110: The fenced code blocks in the document (for example the
block showing the elastickv-split command and the other blocks around lines with
route shuffle and Jepsen snippets) need explicit language identifiers to satisfy
MD040; update each triple-backtick fence that contains shell commands to use
```bash (e.g., the block containing the elastickv-split invocation) and use
appropriate languages like ```clojure for code snippets in the Jepsen examples
so all three blocks (the one with elastickv-split, the block at the later
route/jepsen example, and the block at lines ~280) include the correct language
tag.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 46930714-7b12-429d-84a9-4ea1cdbcd711

📥 Commits

Reviewing files that changed from the base of the PR and between f74453a and 2481286.

📒 Files selected for processing (1)
  • docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md

Comment thread docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md Outdated
Comment thread docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md Outdated
coderabbitai minor on 2481286:

* MD027 (multiple-space-after-blockquote) at lines 17,21 in the
  upfront posture block.  Trimmed double-spaces after `>` to a
  single space.

* MD040 (fenced-code-language) at the three plain ``` blocks:
  - the elastickv-split CLI usage (```bash)
  - the route-shuffle-nemesis Clojure sketch (```clojure)
  - the M5 success-criterion Clojure snippet (```clojure)

Doc-only, no semantic changes.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

Pushed as 673bee8 — coderabbitai minor lint findings on 2481286 addressed:

  • MD027 (multiple-space-after-blockquote) at lines 17, 21 — trimmed double-spaces after > to a single space in the upfront posture block.
  • MD040 (fenced-code-language) at three plain ``` blocks — added language tags: bash for the `elastickv-split` CLI usage, `clojure` for both the route-shuffle nemesis sketch and the success-criterion snippet.

Doc-only, no semantic changes.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 3m 4s —— View job


Review of PR #905673bee8

  • Read current state of design document
  • Verify lint fixes (MD027 + MD040) are correct
  • Verify all previously flagged issues from f5d2ad7 are addressed in 2481286
  • Check for any new issues in the full document
  • Post review findings

Verification of lint fixes (673bee8)

MD027 — confirmed correct. Lines 17 and 21 changed from > 1. / > 2. (3 spaces) to > 1. / > 2. (1 space). The continuation lines on 23–26 (> MoveRange...) retain their deeper indentation, which markdownlint correctly does not flag for MD027 — the rule targets the space immediately after > on the first token of each blockquote line, and continuation indentation for list items is exempt.

MD040 — confirmed correct. All three previously-bare fenced code blocks now carry language identifiers:

  • Line 104: ```bash (elastickv-split CLI invocation) ✓
  • Line 133: ```clojure (route-shuffle-nemesis skeleton) ✓
  • Line 280: ```clojure (success-criterion snippet) ✓

No other code blocks in the document are missing language identifiers (the encode-dynamo-segment block added in 2481286 already carried ```clojure).


Verification of the six findings from f5d2ad7 (addressed in 2481286)

All six confirmed resolved:

Finding Status
codex P1 #1 — split key must use encodeDynamoSegment (base64 RawURLEncoding) ✓ §3.2 specifies encode-dynamo-segment Clojure helper; example uses (str "!ddb|route|table|" (encode-dynamo-segment "jepsen_append_t2"))
codex P1 #2 — same-group SplitRange can't trigger dispatchMultiShardTxn ✓ §3.3 rewrites to multi-group cluster startup via --shardRanges; setup hook does verification only
claude P2 #1 — Elle :anomalies is a map, not a seq of :type-keyed maps ✓ §3.4 now uses (nil? (get (:anomalies results) :G1c))
claude P2 #2 — lexicographic direction inverted (/ is ASCII 47 > ! ASCII 33) ✓ §3.2 correctly states /split/<int> lands workload keys on the left shard
claude P3 #1 — NG3 contradicted the new multi-table variant in §3.3 ✓ NG3 now explicitly carves out the dynamodb-append-multi-table-workload variant and the route-shuffle nemesis
claude P3 #2 — setup hook route ID resolution not described ✓ Resolved by design: hook is now read-only verification (ListRoutes + assert), not mutation — no route ID needed

Remaining observation (minor, non-blocking)

Scope description for M5a is now understated. The sentence at the bottom of §4 — "M5a is a small, focused PR (Go CLI + Clojure setup hook + docs)" — doesn't reflect the updated M5a scope in the milestone table, which now includes: CLI + new multi-table workload + scripts/run-jepsen-local.sh multi-group launch extension + setup hook verification. "Small, focused PR" is accurate for the original scope; it's mildly misleading after the codex P1 #2 revision expanded M5a. Worth updating to something like "M5a is a single focused PR (Go CLI + new workload variant + multi-group script extension + setup-hook verification)" before implementation — avoids scope confusion when the PR lands.

This is the only remaining issue in the document; it's a doc-clarity note, not a correctness problem.


Summary

The lint cleanup (673bee8) is correct and complete. All six prior review findings are properly addressed in 2481286. The document is technically sound and consistent. Ready to merge.

…+ Base64 nuance)

Two findings on 2481286 from claude[bot]'s post-revision review:

* P2 — '"only --shardRanges is new" understates M5a's launch
  script changes' (§3.3 table cell).  Confirmed against
  shard_config.go:61-99: without --raftGroups, every
  --shardRanges entry collapses into the default single group 1
  and groupMutations returns gids = [1] for every key.
  Single-shard path, 2PC never fires.

  Fix: §3.3 table cell rewritten to list FIVE coordinated
  launch-script changes M5a must make:
    1. --raftGroups declaring >=2 groups
    2. --shardRanges with t1-t2 in group 1, t3-t4 in group 2
    3. Cluster topology decision (tentative: 2 single-node
       groups for CI speed; scale to 6 nodes if needed)
    4. Per-group --raftBootstrapMembers
    5. Expanded --raftDynamoMap + matching port allocations

  Avoids the implementer discovering the topology gap mid-PR.

* P3 — encode-dynamo-segment Clojure implementation needs
  Base64 nuance.  Go uses base64.RawURLEncoding (URL-safe charset
  WITHOUT '=' padding).  Java's three Base64 variants differ:
    - Base64/getEncoder (standard '+'/'/', with padding) — wrong
    - Base64/getUrlEncoder (URL-safe, WITH padding) — wrong
    - Base64/getUrlEncoder + .withoutPadding — correct

  Failure mode is silent: wrong encoding produces split keys
  outside every route's range; ListRoutes returns no match;
  nemesis logs and skips every :start; run appears clean while
  the nemesis is a no-op.

  Fix: §3.2 now calls out .withoutPadding explicitly and
  documents the silent-no-op failure mode.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

Pushed as 3ca2a7f — claude[bot] P2 + P3 on 2481286 addressed.

P2 — launch-script scope underestimated. Confirmed against shard_config.go:61-99: without --raftGroups, every --shardRanges entry collapses into the default single group 1 and groupMutations returns gids = [1] for every key. Single-shard path → 2PC never fires. My "only --shardRanges is new" was wrong.

Fix: §3.3 table cell rewritten to list five coordinated launch-script changes M5a must make:

  1. --raftGroups declaring ≥2 groups
  2. --shardRanges placing t1-t2 in group 1, t3-t4 in group 2
  3. Cluster topology decision (tentative: 2 single-node groups for CI speed; scale to 6 nodes if needed)
  4. Per-group --raftBootstrapMembers
  5. Expanded --raftDynamoMap + matching port allocations

This documents the topology gap upfront so the M5a implementer doesn't discover it mid-PR.

P3 — Clojure Base64 nuance. Go uses base64.RawURLEncoding (URL-safe charset without = padding). Java's three Base64 variants differ:

  • Base64/getEncoder (standard +//, with padding) — wrong
  • Base64/getUrlEncoder (URL-safe, with padding) — wrong
  • Base64/getUrlEncoder + .withoutPaddingcorrect

Failure mode is silent: wrong encoding produces split keys outside every route's range; ListRoutes returns no match; nemesis logs and skips every :start; the run appears clean while the nemesis is a no-op.

Fix: §3.2 now calls out .withoutPadding explicitly and documents the silent-no-op failure mode.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 4m 55s —— View job


Review of PR #9053ca2a7f

  • Read current design document (3ca2a7f)
  • Verify P2 fix (§3.3 — five coordinated launch-script changes)
  • Verify P3 fix (§3.2 — Clojure Base64 .withoutPadding + silent-no-op)
  • Cross-check --raftBootstrapMembers / multi-group compat in main.go
  • General review pass
  • Post findings

Verification of the two prior findings

Claude P2 (2481286) — confirmed fixed. §3.3 now enumerates five coordinated launch-script changes with specific flag names, a topology decision, and correct port reasoning. The previous "only --shardRanges is new" understated claim is gone.

Claude P3 (2481286) — confirmed fixed. §3.2 correctly calls out (.withoutPadding (java.util.Base64/getUrlEncoder)), names both wrong variants (Base64/getEncoder, default Base64/getUrlEncoder), and explains the silent-no-op failure mode (wrong encoding → no matching route → nemesis skips every :start → run appears clean). ✓


New finding

P2 — --raftBootstrapMembers is rejected by multi-group configs (§3.3 item 4)

main.go:739-740 has a hard guard:

if len(groups) != 1 {
    return nil, errors.WithStack(ErrBootstrapMembersRequireSingleGroup)
}

resolveBootstrapServers is called with the parsed --raftGroups slice. Any process launched with --raftGroups "1=...,2=..." has len(groups) == 2 and will fail at startup with "flag --raftBootstrapMembers requires exactly one raft group" — before any Raft work begins.

The current run-jepsen-local.sh works because it has a single group (3-node, len(groups) == 1) and can pass --raftBootstrapMembers legitimately. M5a must NOT carry this flag over to the multi-group topology.

For the tentative "2 single-node groups" case: each process hosts one group with one member. A single-member group doesn't need peer discovery at all — --raftBootstrap (the boolean flag at main.go:101) is both necessary and sufficient. --raftBootstrapMembers is the right tool only for a single-group, multi-node formation where you need to specify the initial peer set.

Item 4 of the five launch-script changes should be: "Each group node uses --raftBootstrap (boolean); --raftBootstrapMembers applies only to single-group, multi-node formations and must be removed from the multi-group launch." The current wording ("different members sets for each group's nodes") will lead the implementer to try --raftBootstrapMembers with --raftGroups "1=...,2=...", hit ErrBootstrapMembersRequireSingleGroup, and need to trace through main.go to understand why.

Fix this →


Remaining observation (from 673bee8 review, still unaddressed)

Line 340–341: "M5a is a small, focused PR (Go CLI + Clojure setup hook + docs)." The milestone table directly above (line 337) correctly lists the full expanded scope — CLI, new workload variant, multi-group script extension (5 changes), and setup-hook verification. The prose summary still reflects the pre-revision scope and contradicts the table. Low urgency, but it will confuse a reviewer who reads line 340 before the table. Updating to something like "M5a is a single, focused PR (Go CLI + multi-table workload variant + multi-group script extension + setup-hook verification)" would bring the two in sync.


Summary

Two prior findings (P2 launch-script scope and P3 Base64 .withoutPadding) are correctly addressed. One new P2: --raftBootstrapMembers errors out with ErrBootstrapMembersRequireSingleGroup on any multi-group process — item 4 of the launch-script changes is incorrect and should specify --raftBootstrap (boolean) instead. The stale "Go CLI + Clojure setup hook" prose at line 340 from the 673bee8 round is still present (doc-clarity only, non-blocking).

The 'M5a is a small, focused PR' line at the bottom of §4
was accurate for the original M5a scope (Go CLI + setup hook
+ docs) but became understated after the codex P1 #2 revision
expanded M5a to include the multi-table workload variant and
the five-change multi-group launch script.

Updated to reflect actual scope: M5a now ships:
  * cmd/elastickv-split (Go CLI)
  * dynamodb-append-multi-table-workload (Clojure)
  * scripts/run-jepsen-local.sh multi-group launch extension
    (5 coordinated changes per §3.3 table)
  * setup-hook verification (read-only ListRoutes + assert)
  * docs

Likely lands as a single PR for atomicity (workload variant
only makes sense alongside multi-group launch + verification)
but reviewable as four roughly independent diff sections.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

Pushed as 87dcfe3 — claude[bot]'s only remaining minor on 673bee8 addressed.

claude[bot] LGTM ("Ready to merge") on 673bee8 with one minor doc-clarity note: the bottom-of-§4 line "M5a is a small, focused PR (Go CLI + Clojure setup hook + docs)" was accurate for the original M5a scope but became understated after the codex P1 #2 revision expanded M5a to include the multi-table workload variant and the five-change multi-group launch script.

Fix: §4 trailer rewritten to reflect actual M5a scope:

  • cmd/elastickv-split (Go CLI)
  • dynamodb-append-multi-table-workload (Clojure)
  • scripts/run-jepsen-local.sh multi-group launch extension (5 coordinated changes per §3.3 table)
  • Setup-hook verification (read-only ListRoutes + assert)
  • Docs

Likely lands as a single PR for atomicity (workload variant only makes sense alongside multi-group launch + verification) but reviewable as four roughly independent diff sections.

claude[bot]'s review of 3ca2a7f (which carried the launch-script-scope + Base64-nuance fixes themselves) is still in flight at the time of this comment — I don't expect new findings there since the fixes followed claude[bot]'s own P2/P3 recommendations.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 1m 29s —— View job


Review of PR #90587dcfe3

  • Load tools
  • Read current design document (87dcfe3)
  • Verify §4 scope fix (M5a description updated to reflect expanded scope)
  • General review pass for any remaining issues
  • Post findings

Verification of the §4 scope fix (87dcfe3)

Confirmed correct. The §4 trailer now reads:

"M5a is a single focused PR but no longer 'small' after the codex P1 #2 expansion: cmd/elastickv-split (Go CLI), the new dynamodb-append-multi-table-workload (Clojure), the scripts/run-jepsen-local.sh multi-group launch extension (five coordinated changes — see §3.3 table), the setup-hook verification (read-only ListRoutes + assertion), plus docs. Likely to land as a single PR for atomicity (the workload variant only makes sense alongside the multi-group launch and the verification hook), but reviewable as four roughly independent diff sections."

This matches the actual M5a scope derived from the §3.3 five-change table and is consistent with OQ-2's guidance to "decide at implementation time." The milestone table row (line 337) and the trailing paragraph now agree. ✓


General review pass

Area Status
§3.1 CLI flags (--route-id, --split-key, --expected-version, --address) ✓ Coherent. Multi-group callers pick the target address after ListRoutes; the CLI doesn't need to know topology.
§3.2 encode-dynamo-segment + .withoutPadding + silent-no-op failure mode ✓ Correct and fully documented.
§3.2 Lexicographic direction (ASCII / 47 > ! 33 → workload keys on left) ✓ Corrected in 2481286, still correct.
§3.2 Route ID re-query on every :start invocation ✓ Pattern matches ErrCatalogVersionMismatch sentinel in distribution/catalog.go:567,632.
§3.3 Multi-table workload + multi-group cluster (five launch-script changes) ✓ Grounded in shard_config.go:61-99 and sharded_coordinator.go:821-843.
§3.3 Setup hook read-only (ListRoutes + assertion, no mutation) ✓ Correct; first-node gate ((when (= node (first (:nodes test))) …)) is sound.
§3.4 Elle success criterion (nil? (get (:anomalies results) :G1c)) ✓ Correct map lookup; prior bug documented for posterity.
§3.5 Cadence (30s, composed with partition+kill at 40s) ✓ Reasonable defaults.
§4 Milestone table + trailing scope paragraph ✓ Consistent after 87dcfe3.
§5 OQ-7 (cmd/elastickv-route-key for boundary key encoding) ✓ Appropriate deferral; the Go-helper approach is the right call.
NG3 carve-out for multi-table variant and route-shuffle nemesis ✓ No contradiction with §3.3.
Five-pass self-review in §6 ✓ Covers all five lenses per CLAUDE.md.

No new issues found.


Summary

All previously flagged findings have been addressed across this PR's six commits. The design document is technically accurate, internally consistent, and grounded in the current codebase. Ready to merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87dcfe3f31

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

| Half | Mechanism |
|---|---|
| Multi-table workload | A NEW workload variant `dynamodb-append-multi-table-workload` creates N=4 tables (`jepsen_append_t1` … `jepsen_append_t4`) and writes to ≥2 distinct tables per `TransactWriteItems`. The router maps each table to its own route key so cross-table txns engage multi-route routing. |
| Multi-group cluster | M5a extends `scripts/run-jepsen-local.sh` to launch a multi-group cluster. Per `shard_config.go:61-99` (claude[bot] P2 on 24812867), `--shardRanges` alone is not enough: without `--raftGroups`, every shard range's `groupID` collapses into the single default group 1 and `groupMutations` returns `gids = [1]` for every key — single-shard path, 2PC never fires. M5a must therefore add **five** coordinated launch-script changes:<br><br>1. **`--raftGroups`** declaring ≥2 groups (e.g. `1=127.0.0.1:50051,2=127.0.0.1:50054`).<br>2. **`--shardRanges`** boundary keys placing `t1-t2` in group 1 and `t3-t4` in group 2.<br>3. **Cluster topology decision** — either 6 nodes (3-per-group, production-like consensus) or 2 single-node groups (`--raftBootstrap` per group, faster for CI). Tentative pick: 2 single-node groups for M5a, scale to 6 nodes if/when the workload outgrows it (low cost — flip a flag).<br>4. **Per-group `--raftBootstrapMembers`** (different members sets for each group's nodes).<br>5. **Expanded `--raftDynamoMap`** covering both groups' leader addresses, plus the matching port allocations (currently `5005{1,2,3}`/`6380{1,2,3}` for the single-group 3-node layout — need `5005{1,4}`/`6380{1,4}` for the 2-single-node layout, or `5005{1..6}`/`6380{1..6}` for 6-node). |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don’t require bootstrap members for multi-group startup

In the inspected startup path, resolveBootstrapServers returns ErrBootstrapMembersRequireSingleGroup whenever --raftBootstrapMembers is set and --raftGroups parses to more than one group (main.go:735-741). This line makes per-group --raftBootstrapMembers one of the required M5a launch changes, so following the design for the planned ≥2-group run will abort during startup before Jepsen can run; either the design needs new flag/server support for per-group membership or should explicitly omit bootstrap members for the 2 single-node-groups topology.

Useful? React with 👍 / 👎.

Comment on lines +173 to +174
(str "!ddb|route|table|"
(encode-dynamo-segment "jepsen_append_t2")) ; base64 RawURLEncoding
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Generate fresh interior split keys for each shuffle

When the nemesis repeatedly chooses the table-route key shown here, the first successful SplitRange makes that key the Start of the right child (ranges are [Start, End)), and validateSplitKey rejects splitKey == parent.Start/End as split key at route boundary (adapter/distribution_server.go:357-372). In a 5-minute run that expects ≥10 shuffles, subsequent ticks at the same table boundary will fail/skip instead of churning catalog versions; the nemesis needs to derive a new interior key inside the currently selected route rather than reusing the table's route key.

Useful? React with 👍 / 👎.

…laude[bot] P2 on 3ca2a7f)

Two new findings on 3ca2a7f:

* codex P2 + claude[bot] P2 (same defect, independently flagged):
  --raftBootstrapMembers is rejected on multi-group startup
  (main.go:735-741 — ErrBootstrapMembersRequireSingleGroup
  guard).  My §3.3 item 4 said 'Per-group --raftBootstrapMembers',
  which would abort startup before Jepsen could run.

  Fix: §3.3 item 4 now correctly specifies --raftBootstrap
  (boolean) for the tentative 2-single-node-groups topology.
  Single-member groups need no peer discovery so --raftBootstrap
  is necessary and sufficient.  The existing --raftBootstrapMembers
  in run-jepsen-local.sh must be REMOVED for multi-group
  processes — it's only valid for single-group multi-node
  formations.

* codex P2 #2 — split key reuse becomes no-op.  My §3.2 example
  showed the boundary key !ddb|route|table|<encode(t2)>, but
  after the setup-time split that key is the Start of the right
  child route.  adapter/distribution_server.go:357-372's
  validateSplitKey rejects splitKey == parent.Start/End with
  'split key at route boundary' — so every subsequent nemesis
  tick at the same key would fail.  In a 5-minute run expecting
  >=10 shuffles, the nemesis would no-op silently and the run
  would appear clean while catalog churn never landed.

  Fix: §3.2 now splits into TWO distinct split-key constructions:
    (a) Setup-time boundary key (one-shot, in
        scripts/run-jepsen-local.sh) — IS a route boundary by
        design (the static partition between groups).
    (b) Per-shuffle interior key (every nemesis :start) — MUST
        lie strictly between the current route's Start and End.
        Strategy: append a base64-alphabet byte plus a fresh
        counter to route.start; the result sorts after start
        but before any non-base64 byte (e.g. ASCII '|'=124).

  Implementation details (counter persistence, collision
  recovery) deferred to M5b; the design contract is: each
  shuffle's split key is strictly interior to the route's
  current range.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

Pushed as b752a89 — codex P2 + claude[bot] P2 (same defect) and codex P2 #2 on 3ca2a7f addressed.

codex P2 + claude[bot] P2 (same defect, independently flagged) — --raftBootstrapMembers rejected on multi-group startup. Confirmed against main.go:735-741: resolveBootstrapServers returns ErrBootstrapMembersRequireSingleGroup whenever --raftBootstrapMembers is set and --raftGroups parses to more than one group. My §3.3 item 4 ("Per-group --raftBootstrapMembers") would abort startup before Jepsen could run.

Fix: §3.3 item 4 now correctly specifies --raftBootstrap (boolean) for the tentative 2-single-node-groups topology. Single-member groups need no peer discovery → --raftBootstrap is necessary and sufficient. The existing --raftBootstrapMembers in run-jepsen-local.sh must be removed for multi-group processes — only valid for single-group multi-node formations.

codex P2 #2 — split key reuse becomes no-op. My §3.2 example showed !ddb|route|table|<encode("jepsen_append_t2")> as the split key, but after the setup-time split that key is the Start of the right child route. adapter/distribution_server.go:357-372's validateSplitKey rejects splitKey == parent.Start/End with "split key at route boundary" — every subsequent nemesis tick would fail. In a 5-min run expecting ≥10 shuffles, the nemesis would no-op silently and the workload would appear clean while catalog churn never lands.

Fix: §3.2 split into TWO distinct split-key constructions:

  • (a) Setup-time boundary key (one-shot, in run-jepsen-local.sh) — IS a route boundary by design (the static partition between groups).
  • (b) Per-shuffle interior key (every nemesis :start) — MUST lie strictly between the current route's Start and End. Strategy: append a base64-alphabet byte plus a fresh counter to route.start; the result sorts after start but before any non-base64 byte (e.g. ASCII |=124).

Implementation details (counter persistence, collision recovery) deferred to M5b; the design contract is: each shuffle's split key is strictly interior to the route's current range.

claude[bot] gave LGTM on 87dcfe3 "Ready to merge" with no new findings — this commit (b752a89) addresses two findings from claude[bot]'s earlier 3ca2a7f review (which arrived between the 87dcfe3 review starting and finishing) plus codex's two P2s on 87dcfe3.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 4m 18s —— View job


Review of PR #905b752a89

  • Load tools
  • Read current design document (b752a89)
  • Verify codex P2 + claude P2 fix (--raftBootstrap for multi-group startup)
  • Verify codex P2 Update dependency go to v1.21.4 #2 fix (interior split keys — two distinct constructions in §3.2)
  • Cross-check Go source: main.go:735-741, distribution_server.go:357-377
  • General review pass for new issues
  • Post findings

Verification of the two prior findings

codex P2 + claude P2 (--raftBootstrapMembers rejected on multi-group startup) — confirmed fixed. main.go:739-741 confirms resolveBootstrapServers returns ErrBootstrapMembersRequireSingleGroup whenever len(groups) != 1 and bootstrapMembers is non-empty. §3.3 item 4 now correctly specifies --raftBootstrap (boolean), references the guard at main.go:735-741, and explains that --raftBootstrapMembers must be removed for multi-group processes. For the tentative 2-single-node-groups topology, each process hosts one group with one member — --raftBootstrap is both necessary and sufficient. ✓

codex P2 #2 (split key reuse becomes no-op) — confirmed fixed. distribution_server.go:357-377 confirms validateSplitKey rejects splitKey == parent.Start (line 359) and splitKey == parent.End (line 370) with errDistributionSplitKeyAtBoundary. §3.2 now distinguishes two constructions:

  • (a) Setup-time boundary key — is a route boundary by design; used once in run-jepsen-local.sh via --shardRanges
  • (b) Per-shuffle interior keyfresh-interior-split-key appends "A" + (System/nanoTime) to route.start, which is strictly > start and provably < route.end (since start and end diverge at the table-encoded segment, any extension of start stays on the same side)

The validateSplitKey check confirms: startCmp < 0 rejects keys before start, startCmp == 0 rejects start itself. fresh-interior-split-key always produces a key > start. The end-side comparison (endCmp > 0 / endCmp == 0) is safe because route.start and route.end share a common prefix up to the table encoding, then differ — any extension of start is lexicographically before end. ✓


General review pass

Area Status
§3.1 CLI flags ✓ Coherent; non-zero exit on error means Jepsen nemesis correctly sees failures
§3.2 encode-dynamo-segment + .withoutPadding + silent-no-op failure mode ✓ Documented. Clojure MUST use (.withoutPadding (java.util.Base64/getUrlEncoder))
§3.2 Two split-key constructions (boundary vs interior) ✓ Distinction is clear and grounded in validateSplitKey's boundary rejection
§3.2 Route ID re-query on every :start invocation ErrCatalogVersionMismatch from distribution/catalog.go:567,632 handles the concurrent-split race
§3.3 Five launch-script changes, --raftBootstrap not --raftBootstrapMembers ✓ Correct after this commit
§3.3 Setup hook read-only (ListRoutes verify, no mutation) ✓ First-node gate is correct Jepsen idiom
§3.4 Elle success criterion (nil? (get (:anomalies results) :G1c)) ✓ Correct map lookup; existing workload uses :consistency-models [:strict-serializable] as the canonical gate
§3.5 Cadence (30s, composed with 40s partition+kill) ✓ Reasonable defaults
§4 Milestone table + scope paragraph ✓ Consistent; M5a scope accurately described as "no longer small"
§5 OQ-7 (cmd/elastickv-route-key for boundary key encoding) ✓ Correct deferral; Go-helper approach removes the shell-encoding footgun
§6 Five-pass self-review ✓ All five CLAUDE.md lenses covered

Minor implementation note (non-blocking)

fresh-interior-split-key stubs the implementation as (str (:start route) "A" (System/nanoTime)). In Jepsen, route fields returned by ListRoutes may be byte arrays rather than strings depending on how the protobuf response is deserialized — calling str on a byte array produces the Java object reference ("[B@…") rather than the actual bytes. This is an M5b implementation concern, not a design issue (the doc correctly defers strategy details to implementation), but worth flagging so the implementer doesn't spend time debugging a silent mis-routing caused by incorrect Clojure str coercion.


Summary

Both codex P2 + claude P2 findings are correctly addressed. The document is technically sound and internally consistent. No new P1 or P2 findings. Ready to merge.

…5b implementation list (claude[bot] minor on b752a89)

claude[bot] flagged a non-blocking M5b implementation concern
on b752a89: ListRoutes protobuf response fields may be Java
byte arrays rather than strings depending on deserialization.
Calling Clojure (str byte-array) silently yields '[B@…' (Java
object reference) instead of the bytes, producing a wrong
split key that would land outside the route's range — a silent
mis-routing footgun.

The existing 'Implementation details deferred to M5b' line in
§3.2 already covers this in spirit, but explicitly listing
'byte-vs-string coercion of ListRoutes response fields' next
to 'counter persistence' and 'collision recovery' makes the
specific footgun visible at design-doc-read time so the M5b
implementer can budget for it rather than discover it through
debugging silent mis-routing.

Doc-only, no semantic changes.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

Pushed as 6f024aa — addresses the only remaining note from claude[bot]'s "Ready to merge" review of b752a89.

claude[bot] minor (non-blocking) on b752a89: ListRoutes protobuf response fields may be Java byte arrays rather than strings depending on deserialization. Calling Clojure (str byte-array) silently yields "[B@…" (Java object reference) instead of the bytes — a silent mis-routing footgun.

Fix: §3.2's existing "Implementation details deferred to M5b" list now explicitly includes "byte-vs-string coercion of ListRoutes response fields" alongside "counter persistence" and "collision recovery." The footgun is now visible at design-doc-read time so the M5b implementer can budget for it rather than discover it through debugging silent mis-routing.

State of PR #905:

  • claude[bot] LGTM "Ready to merge" on both 87dcfe3 and b752a89 (back-to-back review cycles)
  • Codex bot: rate-limited (will not review further until reset)
  • CI on b752a89 (and 6f024aa once it kicks): build/test/lint all pass
  • 9 commits of refinement across 5 substantive review rounds (codex + claude + gemini + coderabbit)

Loop exit condition reached. Stopping the 5-minute cron.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 2m 4s —— View job


Review of PR #9056f024aa

  • Load tools
  • Read current design document (6f024aa)
  • Verify byte-vs-string coercion fix in §3.2
  • General review pass
  • Post findings

Verification of the b752a89 minor note (6f024aa)

Confirmed fixed. The diff adds the byte-vs-string coercion note inline into the existing "Exact strategy details … are M5b implementation notes" sentence in §3.2 (lines 217–224):

Exact strategy details (counter persistence across nemesis
restarts, collision recovery, byte-vs-string coercion of
`ListRoutes` response fields — `(str (:start route) …)`
silently yields `"[B@…"` if `:start` is a Java byte array
rather than a string, a silent mis-routing footgun flagged
by claude[bot] on b752a894) are M5b implementation notes —
the design contract is: each shuffle's split key is strictly
interior to the route's current range.

The placement is correct: immediately after the fresh-interior-split-key stub that exhibits the pattern, before the Base64 encoding discussion. The failure mode description is accurate — Clojure (str java-byte-array) calls Object.toString(), which returns the Java reference string "[B@…", not the bytes. An M5b implementer who reads §3.2 from top to bottom sees the footgun before they write any code. ✓


Final general review pass

All twelve prior findings (two Gemini R, two Codex P1, three Codex P2, three Claude P2, one Claude P3, one Claude minor) have been correctly addressed across the six commits of this PR. Full verification table:

Finding Status
Gemini R1 — lexicographic split key ✓ §3.2 uses table-route prefix; ASCII direction corrected
Gemini R2 — route ID stale after each split ✓ §3.2 re-queries ListRoutes on every :start
Gemini R3 — setup hook runs on all nodes (when (= node (first (:nodes test))) …) gate
Codex P1 #1 — raw table name vs. base64 encode-dynamo-segment + .withoutPadding documented
Codex P1 #2 — same-group SplitRange can't trigger 2PC ✓ §3.3 rewrites to multi-group cluster startup
Codex P2 + Claude P2 — --raftBootstrapMembers rejected on multi-group ✓ §3.3 item 4 specifies --raftBootstrap (boolean)
Codex P2 #2 — split key reuse becomes no-op ✓ §3.2 distinguishes boundary vs. interior key
Claude P2 #1 — Elle :anomalies is a map, not a seq ✓ §3.4 uses (nil? (get (:anomalies results) :G1c))
Claude P2 #2 — lexicographic direction inverted ! = ASCII 33, / = ASCII 47; corrected
Claude P3 #1 — NG3 contradicted §3.3 ✓ NG3 carves out the multi-table variant and route-shuffle nemesis
Claude P3 — Base64 .withoutPadding silent no-op ✓ §3.2 calls out .withoutPadding and failure mode
Claude minor — §4 scope understated ✓ §4 correctly calls out M5a is "no longer small"
Claude minor (b752a89) — byte-vs-string coercion ✓ §3.2 names the footgun in the M5b deferred list

No new issues found. The document is technically accurate, internally consistent, and grounded in the current codebase.

Ready to merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f024aaa73

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

| Half | Mechanism |
|---|---|
| Multi-table workload | A NEW workload variant `dynamodb-append-multi-table-workload` creates N=4 tables (`jepsen_append_t1` … `jepsen_append_t4`) and writes to ≥2 distinct tables per `TransactWriteItems`. The router maps each table to its own route key so cross-table txns engage multi-route routing. |
| Multi-group cluster | M5a extends `scripts/run-jepsen-local.sh` to launch a multi-group cluster. Per `shard_config.go:61-99` (claude[bot] P2 on 24812867), `--shardRanges` alone is not enough: without `--raftGroups`, every shard range's `groupID` collapses into the single default group 1 and `groupMutations` returns `gids = [1]` for every key — single-shard path, 2PC never fires. M5a must therefore add **five** coordinated launch-script changes:<br><br>1. **`--raftGroups`** declaring ≥2 groups (e.g. `1=127.0.0.1:50051,2=127.0.0.1:50054`).<br>2. **`--shardRanges`** boundary keys placing `t1-t2` in group 1 and `t3-t4` in group 2.<br>3. **Cluster topology decision** — either 6 nodes (3-per-group, production-like consensus) or 2 single-node groups (`--raftBootstrap` per group, faster for CI). Tentative pick: 2 single-node groups for M5a, scale to 6 nodes if/when the workload outgrows it (low cost — flip a flag).<br>4. **Per-node `--raftBootstrap` (boolean) — NOT `--raftBootstrapMembers`.** `main.go:735-741` has a hard guard: `resolveBootstrapServers` returns `ErrBootstrapMembersRequireSingleGroup` whenever `--raftBootstrapMembers` is set and `--raftGroups` parses to more than one group (codex P2 + claude[bot] P2 on 3ca2a7f7). For the tentative 2-single-node-groups topology, each process hosts ONE group with ONE member — single-member groups need no peer discovery, so `--raftBootstrap` is both necessary and sufficient. The existing `--raftBootstrapMembers` in `run-jepsen-local.sh` must be **removed** for multi-group processes; it is only valid for single-group, multi-node formations.<br>5. **Expanded `--raftDynamoMap`** covering both groups' leader addresses, plus the matching port allocations (currently `5005{1,2,3}`/`6380{1,2,3}` for the single-group 3-node layout — need `5005{1,4}`/`6380{1,4}` for the 2-single-node layout, or `5005{1..6}`/`6380{1..6}` for 6-node). |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don’t split single-node groups across processes

In the proposed 2-single-node topology, the plan says each process hosts one group and uses separate 50051/50054 + 63801/63804 endpoints, but the current server config does not support a process that knows about only its own group while routing to the other one: validateShardRanges requires every --shardRanges group ID to appear in that process’s --raftGroups, and buildShardGroups then starts a local Raft runtime for every group listed. With the documented layout, either each per-group process fails validation for ranges owned by the other group, or both processes are configured with both groups and race to bind the same Raft listeners; M5a’s local launcher needs to model the supported shape, e.g. one process hosting both single-member groups with both Raft addresses mapped to its single Dynamo endpoint, or add server support for remote-only groups first.

Useful? React with 👍 / 👎.

…24aa)

Codex P1 on 6f024aa correctly identifies that the proposed
'2 processes, each hosting one group' topology is NOT
supported by the current server:

  * shard_config.go:399-410 (validateShardRanges) requires
    every --shardRanges group ID to appear in that process's
    --raftGroups.  Process A with --raftGroups '1=...' but a
    shardRanges entry pointing to group 2 fails validation.

  * main.go:764 (buildShardGroups) starts a local Raft runtime
    for every group listed in --raftGroups.  Configuring both
    processes with --raftGroups '1=...,2=...' makes both try
    to host both groups and race on the Raft listeners.

Fix: §3.3 items 3-5 rewritten for the SUPPORTED topology —
ONE PROCESS hosting both single-member groups (two Raft
addresses, shared Dynamo endpoint).  Cross-group txns go
through the in-process router and 2PC fires across the two
co-located groups.

Trade-off explicitly documented: because both groups live in
one process, partition/kill nemeses can't separate them.
Only the route-shuffle nemesis (API-level) exercises the
cross-group path meaningfully.  True distributed multi-group
(multi-process) requires server-side support for 'remote-only
groups in --raftGroups' or equivalent — M6+ work, out of M5
scope.

This is a strictly correct narrowing of the topology choice
(removed the unsupported '2 separate processes' and '6 nodes'
options; the only supported shape is now spelled out).  The
M5a Done-when criterion still holds: cluster starts with
tables 1-2 in group 1 and tables 3-4 in group 2, workload
exercises dispatchMultiShardTxn.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

Pushed as f92a029 — codex P1 on 6f024aa addressed.

codex P1 — proposed topology not supported by the current server. Confirmed against the codebase:

  • shard_config.go:399-410 (validateShardRanges) requires every --shardRanges group ID to appear in that process's --raftGroups.
  • main.go:764 (buildShardGroups) starts a local Raft runtime for every group listed in --raftGroups.

Consequence: my proposed "2 processes, each hosting one group" is rejected — Process A with --raftGroups "1=…" but a --shardRanges entry pointing to group 2 fails validation. Configuring both processes with --raftGroups "1=…,2=…" makes both try to host both groups and race on the Raft listeners.

Fix: §3.3 items 3-5 rewritten for the SUPPORTED topology — one process hosting both single-member groups, two Raft addresses (50051 for group 1, 50054 for group 2), shared Dynamo endpoint (e.g. 63801). Each group is single-member so --raftBootstrap works for both. Cross-group txns go through the in-process router → 2PC fires across the two co-located groups.

Trade-off explicitly documented: because both groups live in one process, partition/kill nemeses can't separate them — only the route-shuffle nemesis (API-level) exercises the cross-group path meaningfully. True distributed multi-group (multi-process) requires server-side support for "remote-only groups in --raftGroups" or equivalent — M6+ work, explicitly out of M5's scope.

This is a strictly correct narrowing of the topology choice (removed the unsupported "2 separate processes" and the speculative "6 nodes" expansion; the only supported shape is now spelled out). The M5a Done-when criterion still holds: cluster starts with tables 1-2 in group 1 and tables 3-4 in group 2, workload exercises dispatchMultiShardTxn.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 9m 13s —— View job


Review of PR #905f92a029

  • Load tools
  • Read current design document (f92a029)
  • Verify codex P1 fix (single-process two-group topology)
  • Cross-check shard_config.go / main.go / multiraft_runtime.go for topology validity
  • General review pass for remaining issues
  • Post findings

Verification of the codex P1 fix (6f024aaf92a029)

Confirmed fixed and correctly grounded. The codex P1 finding was that "two processes, each hosting one group" is NOT supported because validateShardRanges (shard_config.go:399-410) requires every --shardRanges group ID to appear in that process's --raftGroups, and buildShardGroups (main.go:764) starts a Raft runtime for every group in --raftGroups.

The fix — one process hosting both single-member groups — is fully supported by the current server. Tracing through the code:

Claim in §3.3 Source verification
One process can host multiple groups via --raftGroups "1=…,2=…" parseRaftGroups (shard_config.go:61-99) parses both entries; buildShardGroups loops over all groups creating an independent Raft runtime per group via buildRuntimeForGroup (multiraft_runtime.go:228-266)
Each group gets its own Raft address buildRuntimeForGroup:248 passes LocalAddress: group.address to factory.Create() — each group binds its own port
Data directory separation across groups in one process multi = len(groups) > 1 (main.go:588); groupDataDir uses multi to namespace directories per group
--raftBootstrap bootstraps all groups main.go:527bootstrap bool is process-wide and passed to buildShardGroups:769; each single-member group self-elects immediately
--raftBootstrapMembers must be removed resolveBootstrapServers (main.go:735-761) returns ErrBootstrapMembersRequireSingleGroup when len(groups) != 1 — confirmed at main.go:739-740
Each group's gRPC transport listens on its own address main.go:1567-1574 loops over runtimes; each rt.registerGRPC(gs) registers transport, then lc.Listen(ctx, "tcp", rt.spec.address) binds that group's port
2PC fires when txn spans both groups groupMutations (sharded_coordinator.go:821-843) groups by Raft group ID; len(gids) > 1 triggers dispatchMultiShardTxn

All seven claims in §3.3 are verified against the source.


General review pass

All thirteen findings from prior review rounds are still correctly addressed. No prior fix was reverted by f92a029.

Area Status
§3.2 encode-dynamo-segment + .withoutPadding
§3.2 Two split-key constructions (boundary vs interior)
§3.2 Route ID re-query on every :start invocation
§3.2 Byte-vs-string coercion noted in M5b deferred list
§3.3 Multi-table workload + multi-group cluster (five changes)
§3.3 --raftBootstrap not --raftBootstrapMembers
§3.3 Setup hook is read-only verification (no mutation)
§3.3 Limitation acknowledged (partition nemesis can't separate co-located groups)
§3.4 Elle criterion uses (nil? (get (:anomalies results) :G1c))
§4 M5a scope described as "no longer small"
NG3 carve-out for multi-table variant and route-shuffle nemesis
Lexicographic ASCII direction (!=33, /=47)

No new P1 or P2 issues found.


Observation (non-blocking)

The doc implies but doesn't spell out the Jepsen :nodes configuration change. The current run-jepsen-local.sh uses a 3-node layout (5005{1,2,3} / 6380{1,2,3}); M5a drops to a 1-process layout. Jepsen's :nodes list would shrink from 3 entries to 1 (["127.0.0.1"] or equivalent), and the (when (= node (first (:nodes test)))) gate on the setup-hook verification becomes a no-op gate (sole node == first node, always passes). This is an implementation detail the doc can leave to M5a, but a one-line note in OQ-2 or §3.3 would prevent the M5a implementer from discovering mid-PR that Jepsen's node-count configuration also needs updating.


Summary

The codex P1 topology fix is correct, well-grounded in the source, and the design document is now technically accurate across all twelve areas. Ready to merge.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md`:
- Line 386: The milestone description incorrectly states that passing
--shardRanges alone will launch ≥2 Raft groups; update the row so it explicitly
requires both --shardRanges and --raftGroups to be passed when launching (e.g.,
`scripts/run-jepsen-local.sh` should be documented to call with both
--shardRanges and --raftGroups) and mention that omitting --raftGroups collapses
to group 1; modify the text around the `--shardRanges` mention (the scope text
in the milestone row) to require both flags to satisfy the multi-group contract
and ensure the setup hook still calls ListRoutes to verify multi-group routing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ddb92f54-4107-4806-8a5e-5df0f5164e6f

📥 Commits

Reviewing files that changed from the base of the PR and between 2481286 and f92a029.

📒 Files selected for processing (1)
  • docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md

Comment thread docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md Outdated
…anges (coderabbit Major on f92a029)

coderabbit Major on f92a029 correctly identifies that §4's
M5a milestone row was inconsistent with §3.3's five-change
contract: the row said the launch extension passes
'--shardRanges so the cluster launches with >=2 Raft groups'
but §3.3 spells out that --shardRanges alone collapses every
range to the single default group 1 (shard_config.go:61-99 —
the claude[bot] P2 finding on 2481286 that drove the §3.3
rewrite in the first place).  Both --raftGroups AND
--shardRanges are required for the multi-group contract.

Fix: §4's M5a Scope cell now explicitly says 'pass BOTH
--raftGroups (declaring >=2 groups) AND --shardRanges (placing
tables across those groups)', and the cell cross-references
§3.3 for the 'shardRanges alone collapses to group 1'
explanation.  Avoids the implementer following §4 in
isolation and missing the --raftGroups requirement.

Doc-only, no semantic changes to the design itself — just
internal consistency between §4 row and §3.3 table.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

Pushed as 58d6ac5 — coderabbit Major on f92a029 addressed.

coderabbit Major — §4 row was inconsistent with §3.3's multi-group contract. The M5a milestone row said the launch extension passes --shardRanges so the cluster launches with ≥2 Raft groups, but §3.3 explicitly states (per the claude[bot] P2 on 2481286 that drove that rewrite) that --shardRanges alone collapses every range to the single default group 1 — both --raftGroups AND --shardRanges are required.

Fix: §4's M5a Scope cell now explicitly says "pass BOTH --raftGroups (declaring ≥2 groups) AND --shardRanges (placing tables across those groups)" and cross-references §3.3 for the "shardRanges alone collapses to group 1" explanation. An implementer reading §4 in isolation will no longer miss the --raftGroups requirement.

Doc-only, no semantic changes — just internal consistency between the §4 row and the §3.3 table.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 1m 55s —— View job


Review of PR #90558d6ac5

  • Read current design document (58d6ac5)
  • Verify coderabbit Major fix (§4 row now requires both --raftGroups AND --shardRanges)
  • Verify all prior findings remain correctly addressed
  • General review pass for new issues
  • Post findings

Verification of the coderabbit Major fix (f92a02958d6ac5)

Confirmed fixed. Line 386's M5a Scope cell now reads (bold in original):

"scripts/run-jepsen-local.sh extended to pass BOTH --raftGroups (declaring ≥2 groups) AND --shardRanges (placing tables across those groups)" — per §3.3, --shardRanges alone collapses every range to the single default group 1 and 2PC never fires; both flags are required for the multi-group contract.

The fix correctly:

  1. Bolds "BOTH" + the two flag names to make the requirement visually prominent
  2. Cross-references §3.3's authoritative explanation ("--shardRanges alone collapses every range to the single default group 1")
  3. Ends with a declarative statement: "both flags are required for the multi-group contract"

The §4 milestone row and the §3.3 five-change table are now fully consistent. ✓


Verification of all prior findings (full audit)

Finding Still addressed?
Gemini R1 — split key encoding (table-route prefix, base64) ✓ §3.2 uses !ddb|route|table|<encode-dynamo-segment(...)>
Gemini R2 — route ID re-query per :start invocation ✓ §3.2 "re-queries ListRoutes…on every :start invocation"
Gemini R3 — setup hook first-node gate ✓ §3.3 (when (= node (first (:nodes test))) …)
Codex P1 #1encodeDynamoSegment base64 RawURLEncoding ✓ §3.2 encode-dynamo-segment helper + .withoutPadding
Codex P1 #2 — same-group SplitRange can't trigger dispatchMultiShardTxn ✓ §3.3 multi-group cluster startup strategy
Codex P2 + Claude P2 — --raftBootstrapMembers rejected on multi-group ✓ §3.3 item 4: --raftBootstrap (boolean), not --raftBootstrapMembers
Codex P2 #2 — split key reuse becomes no-op at boundary ✓ §3.2 two-construction strategy (boundary vs. interior)
Claude P2 #1 — Elle :anomalies is a map, not a seq ✓ §3.4 (nil? (get (:anomalies results) :G1c))
Claude P2 #2 — lexicographic direction inverted ✓ §3.2 "/ is ASCII 47…workload keys on the left shard"
Claude P3 #1 — NG3 contradicted §3.3 ✓ §1.2 NG3 carves out multi-table variant and route-shuffle nemesis
Claude P3 — Base64 .withoutPadding silent no-op ✓ §3.2 names failure mode + .withoutPadding requirement
Claude minor — §4 scope understated ✓ §4 "no longer 'small'" with full scope list
Claude minor (b752a89) — byte-vs-string coercion footgun ✓ §3.2 M5b deferred list explicitly names the footgun
Coderabbit MD027/MD040 — blockquote spacing + code fence languages ✓ (673bee8)
Coderabbit Major (f92a029) — §4 row missing --raftGroups (58d6ac5 — this commit)

All fifteen findings correctly addressed. No prior fix has been reverted.


General review pass

No new issues found. The document is technically accurate, internally consistent, and all claims are grounded in the source (shard_config.go, main.go, sharded_coordinator.go, distribution_server.go).

Ready to merge.

@bootjp bootjp merged commit 2a36826 into main Jun 2, 2026
9 checks passed
@bootjp bootjp deleted the feat/composed1-m5-design branch June 2, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant