Skip to content

fix(redis): finish option-2 dedup HLC-4 / ctx-parent parity#901

Merged
bootjp merged 2 commits into
mainfrom
feat/dedup-cleanup-ctx-nextfenced
Jun 2, 2026
Merged

fix(redis): finish option-2 dedup HLC-4 / ctx-parent parity#901
bootjp merged 2 commits into
mainfrom
feat/dedup-cleanup-ctx-nextfenced

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Jun 1, 2026

Summary

Picks up the two follow-up items noted in claude[bot]'s PR #887 verdict but deferred from that PR:

(a) prepareDispatch() ctx-parent symmetric fix

prepareDispatch() took its bounded ctx from t.server.handlerContext() — the server-lifetime ctx, not the caller's dispatchCtx. The reuseCtx fix on PR #887 round 1 corrected runTransactionWithDedup to derive from dispatchCtx; this PR applies the symmetric fix to prepareDispatch so a disconnected client / retryRedisWrite timeout interrupts the prepare+dispatch promptly.

Nil-guard falls back to handlerContext for test fixtures that construct a txnContext without setting ctx.

(b) Last two Clock().Next() callers in adapter/redis.go → NextFenced()

dispatchListPushReuse and the listPushCoreWithDedup first-attempt site both shipped before the HLC-4 physical-ceiling migration and were missed in that pass. dispatchExecReuse already uses NextFenced (PR #887 round 1) and so do all first-attempt commit_ts allocations through prepareDispatch.

These two list-push dedup sites were the last persistence-grade Clock().Next() callers in adapter/redis.go — verified by grep '\.Clock()\.Next\b' adapter/redis.go now returning only NextFenced variants.

Without HLC-4 parity here, a stale-leader window could let the list-push dedup path mint a commit_ts that collides with the successor's freshly-fenced range — defeating the same anomaly class option-2 was introduced to prevent.

Caller audit (per /loop semantic-change rule)

  • prepareDispatch: sole callers are commit() and firstExecAttempt (both in adapter/redis.go). Ctx-parent change is additive; same defer-cancel discipline, same error mapping.
  • dispatchListPushReuse: sole caller listPushCoreWithDedup. NextFenced error wired through existing (newLen, drop, err) tuple; non-retryable so retryRedisWrite surfaces it directly — matches PR feat(txn-dedup): re-land M3 EXEC reuse on main + close M2 open + multi-mop #887 round-1's dispatchExecReuse NextFenced wiring.
  • listPushCoreWithDedup first-attempt: surfaces NextFenced error through existing retryRedisWrite closure return.

Validation

  • go test ./adapter/ -run 'ListPushDedup|ExecDedup|StandaloneSetDedup' -race -count=2 → ok 1.2s
  • gofmt + go vet + golangci-lint run → 0 issues

Scope

Closes the two follow-up items explicitly noted in PR #887's round-2 verdict ("Note on pre-existing Clock().Next() callers" + prepareDispatch ctx deviation). No new functionality.

Picks up the two follow-up items noted in claude[bot]'s PR #887
verdict but deferred from that PR:

  (a) prepareDispatch() takes its bounded ctx from t.server.
      handlerContext() — the server-lifetime ctx, NOT the caller's
      dispatchCtx. The reuseCtx fix on PR #887 round 1 corrected
      runTransactionWithDedup to derive from dispatchCtx; this
      commit applies the symmetric fix to prepareDispatch so a
      disconnected client / retryRedisWrite timeout interrupts the
      prepare+dispatch promptly. Nil-guard falls back to
      handlerContext for test fixtures that construct a txnContext
      without setting ctx.

  (b) dispatchListPushReuse and the listPushCoreWithDedup first-
      attempt site both shipped before the HLC-4 physical-ceiling
      migration and were missed in that pass. dispatchExecReuse
      already uses NextFenced (PR #887 round 1) and so do all
      first-attempt commit_ts allocations through prepareDispatch
      and the post-raise SetQueueAttributes etc. These two
      list-push dedup sites were the last persistence-grade
      Clock().Next() callers in adapter/redis.go — verified by
      grep '\.Clock()\.Next\b' adapter/redis.go now returning
      only the NextFenced variants.

Without HLC-4 parity here, a stale-leader window could let the
list-push dedup path mint a commit_ts that collides with the
successor's freshly-fenced range — defeating the same anomaly
class option-2 was introduced to prevent.

Caller audit per /loop semantic-change rule
============================================

  - prepareDispatch: sole caller is commit() (in this file) and
    firstExecAttempt (in this file). Both honor defer
    prepared.cancel() and check len(prepared.elems) == 0; ctx-
    parent change is additive — same defer-cancel discipline,
    same error mapping. No external callers.

  - dispatchListPushReuse: sole caller is listPushCoreWithDedup
    (in this file). NextFenced error wired through the existing
    (newLen, drop, err) tuple position; the new error is non-
    retryable so retryRedisWrite surfaces it directly to the
    caller — same shape as PR #887 round-1's dispatchExecReuse
    NextFenced wiring.

  - listPushCoreWithDedup first-attempt site: surfaces NextFenced
    error through the existing retryRedisWrite closure return,
    matching the pattern in prepareDispatch.

Validation
==========

  go test ./adapter/ -run 'ListPushDedup|ExecDedup|StandaloneSetDedup' -race -count=2 -> ok 1.2s
  gofmt + go vet + golangci-lint run -> 0 issues

This PR closes the explicitly-noted follow-up from PR #887's
"Note on pre-existing Clock().Next() callers" + the
prepareDispatch ctx-parent deviation the round-1 review marked
for symmetric treatment. No new functionality.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 45 minutes and 26 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e6920407-496f-4124-8a55-f347a6be24a0

📥 Commits

Reviewing files that changed from the base of the PR and between fe9e941 and 216ca94.

📒 Files selected for processing (1)
  • adapter/redis.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/dedup-cleanup-ctx-nextfenced

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates adapter/redis.go to improve context cancellation and ensure HLC-4 parity. Specifically, prepareDispatch now derives its timeout-bounded context from the transaction context (t.ctx) instead of the server-lifetime handler context, allowing prompt interruption on client disconnects. Additionally, both dispatchListPushReuse and listPushCoreWithDedup now utilize NextFenced() instead of Next() for commit timestamp allocation to honor the physical-ceiling fence and prevent timestamp collisions. There are no review comments, and I have no additional feedback to provide.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

PR-α: option-2 dedup HLC-4 / ctx-parent parity. Follow-up to PR #887 review's noted items.

gemini-code-assist already reviewed: "There are no review comments, and I have no additional feedback to provide." — no findings.

Verify: (a) prepareDispatch() ctx parent now derives from t.ctx (symmetric with reuseCtx fix on PR #887 round 1); (b) the last two persistence-grade Clock().Next() callers in adapter/redis.go now use NextFenced. grep '\.Clock()\.Next\b' adapter/redis.go returns only NextFenced variants now.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 3m 22s —— View job


Code Review — PR #901

  • Read changed file (adapter/redis.go)
  • Verify (a): prepareDispatch() ctx parent derives from t.ctx
  • Verify (b): no persistence-grade Clock().Next() callers remain (all NextFenced)
  • Full review pass (data loss, concurrency, performance, consistency, test coverage)

Verification Results

Claim (a) — prepareDispatch() ctx parent:

Confirmed. The diff replaces:

ctx, cancel := context.WithTimeout(t.server.handlerContext(), redisDispatchTimeout)

with:

parentCtx := t.ctx
if parentCtx == nil {
    parentCtx = t.server.handlerContext()
}
ctx, cancel := context.WithTimeout(parentCtx, redisDispatchTimeout)

Both production callers always set txn.ctxrunTransaction at line 3129 and firstExecAttempt at line 3330 both assign ctx: dispatchCtx. The nil-guard exists for the test fixtures in redis_txn_test.go (lines 37–44, 73–79, 108–…, 146–…) which omit ctx — confirmed none of those call prepareDispatch() or commit() so the guard is purely defensive. The cancel() call on the buildStreamDeletionElems error path at line 2799 is present and correct.

Claim (b) — Clock().Next() audit:

grep '\.Clock()\.Next\b' adapter/redis.go  →  (empty)
grep '\.Clock()\.Next' adapter/redis.go

All 6 occurrences:

Line Site
2768 prepareDispatch (pre-existing)
3210 dispatchExecReuse (pre-existing, PR #887 R1)
3549 another pre-existing site
3625 dispatchListPushReusenewly migrated
3801 listPushCoreWithDedup first-attempt ← newly migrated
4025 pre-existing site

Zero unmitigated Next() calls remain. ✅


5-Lens Review

1. Data Loss

No issues. context.WithTimeout(parentCtx, redisDispatchTimeout) still applies the same deadline cap — what changes is that a cancelled parentCtx truncates the budget further (intended). The cancel() is called on the error path at line 2799 before returning, and deferred via prepared.cancel in commit() on the success path. No resource leak. The NextFenced() allocation in dispatchListPushReuse and listPushCoreWithDedup returns (0, ErrCeilingExpired) on fence violation — correctly propagated, no silent discard.

2. Concurrency / Distributed Failures

ErrCeilingExpired is non-retryable: isRetryableRedisTxnErr (redis_retry.go:43) only retries on ErrWriteConflict and ErrTxnLocked. So a fenced-out allocation exits retryRedisWrite immediately — no spin. The drop=false return from dispatchListPushReuse on allocErr != nil (line 3627) is harmless: pending is not cleared but the error is non-retryable so the loop exits without a subsequent reuse iteration. Consistent with the dispatchExecReuse pattern.

3. Performance

No hot-path changes. NextFenced() is the same in-memory CAS loop as Next() plus a single atomic ceiling read — the added cost is negligible.

4. Data Consistency

Core correctness: previously, dispatchListPushReuse and the listPushCoreWithDedup first-attempt site called Next() (unfenced), which could issue a commit_ts inside a stale leader's window after ceiling expiry. A successor leader with a freshly-fenced ceiling could then re-issue the same physical-ms range, defeating the option-2 dedup guarantee (PrevCommitTS probe). NextFenced() fails closed instead, maintaining the invariant that no two leaders can simultaneously produce commitTs in the same physical window. ✅

5. Test Coverage

The existing ListPushDedup and ExecDedup test suites cover the success and WriteConflict retry paths. Two minor gaps worth noting (non-blocking):

  • No test exercises the ErrCeilingExpired error branch at the adapter level for the two new NextFenced() sites. The HLC fence itself is tested in kv/, but the wiring in dispatchListPushReuse and the first-attempt closure is untested at the adapter boundary. Low risk since the error simply escapes the retry loop, but a targeted table-driven case (mock HLC returning ErrCeilingExpired → verify error is surfaced, pending state is consistent) would lock down the wiring.
  • Inline comments reference PR numbers (PR #887 round 1, PR #887 verdict) — per project conventions, comments should avoid issue/PR references and state the underlying reason instead. Cosmetic only.

Summary

Both stated claims are verified. The ctx-parent fix is symmetric and correct; the two NextFenced migrations complete the HLC-4 sweep in adapter/redis.go. No data loss, correctness, or concurrency issues found. The two observations above are non-blocking. LGTM.

claude[bot] PR #901 minor #2 (CLAUDE.md convention): inline comments
should state the underlying reason, not reference issue/PR numbers
that rot as the codebase evolves. Dropped "PR #887" / "PR #887
round 1" / "PR #887 verdict" / "PR #887 review" attributions from
the three doc blocks. The WHY content is kept (ctx-parent symmetry,
HLC-4 fence rationale, ErrCeilingExpired non-retryable shape).

The PR description is the durable home for the flake-fix lineage
and the related PR refs.

No behavior change. Comment-only.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

Round 1 cleanup (commit 216ca94):

  • claude[bot] minor Update dependency go to v1.21.4 #2 (CLAUDE.md convention): stripped PR #887 round 1 / PR #887 verdict / PR #887 review PR-process attribution refs from the three doc blocks (prepareDispatch ctx parent, runTransactionWithDedup reuseCtx, dispatchListPushReuse HLC-4 fence). The WHY rationale is kept; the PR lineage moves to the PR description.

claude[bot] minor #1 (ErrCeilingExpired adapter-level test gap): explicitly out of scope per the verdict ("Low risk", "would lock down the wiring"). Left for follow-up.

Comment-only change. gofmt + golangci-lint run 0 issues.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 1m 55s —— View job


Code Review — PR #901 Round 1 Cleanup (commit 216ca94)

  • Read adapter/redis.go — verify PR-process refs stripped, WHY rationale preserved
  • Review the three affected doc blocks
  • Confirm no bare Clock().Next() persistence-grade calls remain

Verification

No PR #887 refs remain in adapter/redis.gogrep 'PR #887' adapter/redis.go returns empty. ✅


Three Doc Block Review

Block 1 — prepareDispatch() ctx parent (line ~2784)

Removed: "same rationale as runTransactionWithDedup's reuseCtx fix on PR #887"

Replaced with: "Symmetric with the reuseCtx threading in runTransactionWithDedup"

Correct. The WHY (symmetry with reuseCtx) is fully preserved; the PR lineage is dropped. The replacement is an improvement — runTransactionWithDedup is a stable function reference, not a PR number. ✅

Block 2 — runTransactionWithDedup reuseCtx (line ~3282)

Removed: "— see PR #887 review." (trailing phrase)

The surrounding comment already explains the rationale (disconnected client, wasted resources). Dropping the dangling see PR #887 review. is a clean excision with no content loss. ✅

Block 3 — dispatchListPushReuse HLC-4 fence (line ~3618)

Removed: "dispatchExecReuse already uses NextFenced (PR #887 round 1); the two listPush dedup sites shipped before the HLC-4 migration and were missed in that pass — see PR #887 verdict for the noted gap."

Replaced with: "The error path returns ErrCeilingExpired which isRetryableRedisTxnErr classifies as non-retryable, so it exits retryRedisWrite directly to the client — same shape as the other persistence-grade Next call sites in this file."

This is the most substantive change, and it's an improvement. The replacement adds concrete error-path behavior (the isRetryableRedisTxnErr non-retryable classification and retryRedisWrite exit path) which is more durable and useful than the PR history it replaces. ✅


Ancillary Check

Pre-existing (PR #867 Phase 2b) reference at line 4021 was not touched — out of scope for this PR's stated items (only PR #887 refs were on the cleanup list). Minor cosmetic note for a future pass, non-blocking.

grep '\.Clock()\.Next()' adapter/redis.go → empty (all persistence-grade sites use NextFenced). ✅


Summary

Comment-only change is correct and well-scoped. All three doc blocks have PR-process refs removed; the WHY rationale is fully preserved in each case, and Block 3's replacement adds technically precise detail. No behavior change. LGTM.

@bootjp bootjp merged commit c40e0c9 into main Jun 2, 2026
9 checks passed
@bootjp bootjp deleted the feat/dedup-cleanup-ctx-nextfenced branch June 2, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant