Skip to content

feat(server): Journal omits#7060

Merged
dranikpg merged 4 commits into
dragonflydb:mainfrom
dranikpg:new-journal-omits
May 20, 2026
Merged

feat(server): Journal omits#7060
dranikpg merged 4 commits into
dragonflydb:mainfrom
dranikpg:new-journal-omits

Conversation

@dranikpg
Copy link
Copy Markdown
Contributor

@dranikpg dranikpg commented Apr 3, 2026

Experimental journal omits

Implements DbSlice::IsOmittableWrite that can skip the CallOnChangeCallbacks + version update and has to notify the command to omit the journal write.

This is safe if we have one eventually consistent snapshot (replica, migration) and the bucket was not visited yet. It will be fully serialized once the iteration loop reaches it

image

@dranikpg dranikpg force-pushed the new-journal-omits branch 2 times, most recently from 751383e to cda32b8 Compare April 8, 2026 16:15
@dranikpg dranikpg force-pushed the new-journal-omits branch from cda32b8 to ffbdbf5 Compare April 16, 2026 07:50
Comment thread src/server/db_slice.h Outdated
@@ -631,6 +654,9 @@ class DbSlice {
// Record whenever a key expired to DbTable::expired_keys_events_ for keyspace notifications
bool expired_keys_events_recording_ = true;

// Provided for the next FindMutableInternal operation
MutationHints* mutation_hints_ = nullptr;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the size of MutationHints ? why do you need to pass by pointer and not by value?
looks fragile

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is also used to get output variables

Comment thread src/server/journal/journal.cc Outdated
@@ -47,8 +47,8 @@ error_code Close() {
return {};
}

bool HasRegisteredCallbacks() {
return journal_slice.HasRegisteredCallbacks();
unsigned CallbackNumber() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CallbackNumber is not a great name.

Comment thread src/server/string_family.cc Outdated
@@ -836,10 +837,15 @@ OpStatus SetCmd::Set(const SetParams& params, string_view key, string_view value
}
}

DbSlice::MutationHints hints{.hint{.single_key = true, .support_omit = explicit_journal_}};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do it more elegantly?

  1. at least in this example you pass hints with .single_key = true and set support_omit then why passing them at all? why not pass a final boolean true if the caller supports omit
  2. the boolean could be added to op_args_.db_cntx, maybe so you won't need to inject this before the find call.
  3. skip_journal_ can be returned by AddOrFind inside op_res.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's move it to existing structures

@dranikpg
Copy link
Copy Markdown
Contributor Author

dranikpg commented May 3, 2026

Will rebase to include #7253

@dranikpg dranikpg force-pushed the new-journal-omits branch from 056e6e1 to 18b5818 Compare May 18, 2026 14:02
more small fixes

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg force-pushed the new-journal-omits branch from 18b5818 to 47434d4 Compare May 19, 2026 09:34
@dranikpg
Copy link
Copy Markdown
Contributor Author

augment review

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented May 19, 2026

🤖 Augment PR Summary

Summary: Adds an experimental “journal omit” optimization to avoid emitting redundant journal entries during replica full-sync.

Changes:

  • Introduce DbContext::is_omittable_operation and propagate an omitted_journal result via DbSlice::ItAndUpdater.
  • Implement DbSlice::IsOmittableWrite to skip change-callback invocation and bucket version bumps when the snapshot hasn’t reached the affected bucket(s).
  • Plumb an “eventually consistent” flag through SerializerBase::RegisterChangeListener(bool) to gate the optimization to replica-style snapshots.
  • Expose journal consumer count via journal::GetCallbackCount() and add metrics (total_journal_omits) plus replication tests to validate omits occur.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/server/serializer_base.cc Outdated
@@ -152,7 +153,7 @@ void SerializerBase::UnregisterChangeListener() {
bool SerializerBase::ProcessBucket(DbIndex db_index, PrimeTable::bucket_iterator it,
bool on_update) {
// Check if this bucket is stale
if (it.is_done() || it.GetVersion() >= snapshot_version_) {
if (it.GetVersion() >= snapshot_version_) {
Copy link
Copy Markdown

@augmentcode augmentcode Bot May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SerializerBase::ProcessBucket, the logic inside the stale-bucket branch still contains an it.GetVersion() < snapshot_version_ check that can never be true once the outer condition is it.GetVersion() >= snapshot_version_. This looks like leftover logic from the earlier it.is_done() condition and may hide an intended version-stamping behavior for empty buckets.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@@ -836,9 +837,13 @@ OpStatus SetCmd::Set(const SetParams& params, string_view key, string_view value
}
}

// Enable journal omits for this operation
op_args_.db_cntx.is_omittable_operation = true;
Copy link
Copy Markdown

@augmentcode augmentcode Bot May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip_journal_ is only set via the AddOrFind path; the SET XX early-return path goes through FindMutable/SetExisting and never consults omitted_journal, so the new omit optimization won’t trigger for XX updates during full sync. If SET ... XX is used during replica full sync, this can still produce redundant/double-applied updates compared to the main AddOrFind path.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@dranikpg dranikpg force-pushed the new-journal-omits branch from 9e8c13b to 6a5510a Compare May 19, 2026 10:12
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg force-pushed the new-journal-omits branch from 6a5510a to fec262c Compare May 19, 2026 10:19
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg force-pushed the new-journal-omits branch from ba7f5b9 to 63148bc Compare May 19, 2026 17:38
@dranikpg dranikpg requested a review from romange May 19, 2026 18:23
@dranikpg dranikpg marked this pull request as ready for review May 19, 2026 18:23
@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Implement journal omit optimization for eventually-consistent replicas

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Implements journal omit optimization for eventually-consistent replicas
• Skips journal writes and version updates when safe during replication
• Adds IsOmittableWrite() to detect omittable operations based on snapshot state
• Tracks journal omit events in metrics and updates replication test coverage
Diagram
flowchart LR
  A["Write Operation"] --> B{"is_omittable_operation<br/>set?"}
  B -->|Yes| C["Check IsOmittableWrite"]
  C --> D{"Single replica<br/>+ not visited yet?"}
  D -->|Yes| E["Skip journal write<br/>Skip version update"]
  D -->|No| F["Normal write path"]
  E --> G["Mark omitted_journal"]
  F --> H["Record journal entry"]
  G --> I["Increment journal_omit counter"]
  H --> I
Loading

Grey Divider

File Changes

1. src/server/db_slice.h ✨ Enhancement +11/-0

Add journal omit tracking and omittable operation flag

src/server/db_slice.h


2. src/server/db_slice.cc ✨ Enhancement +49/-8

Implement IsOmittableWrite logic and omit journal writes

src/server/db_slice.cc


3. src/server/string_family.cc ✨ Enhancement +11/-2

Enable journal omits for SET command operations

src/server/string_family.cc


View more (13)
4. src/server/generic_family.cc ✨ Enhancement +1/-1

Update destructuring to handle omitted_journal field

src/server/generic_family.cc


5. src/server/serializer_base.h ✨ Enhancement +1/-1

Add replica parameter to RegisterChangeListener

src/server/serializer_base.h


6. src/server/serializer_base.cc ✨ Enhancement +4/-2

Track eventually_consistent flag for snapshot callbacks

src/server/serializer_base.cc


7. src/server/serializer_base_test.cc 🧪 Tests +1/-1

Update test to pass replica parameter

src/server/serializer_base_test.cc


8. src/server/snapshot.cc ✨ Enhancement +1/-1

Pass stream_journal flag to RegisterChangeListener

src/server/snapshot.cc


9. src/server/journal/journal.h ✨ Enhancement +4/-1

Replace HasRegisteredCallbacks with GetCallbackCount

src/server/journal/journal.h


10. src/server/journal/journal.cc ✨ Enhancement +2/-2

Implement GetCallbackCount to return consumer count

src/server/journal/journal.cc


11. src/server/journal/journal_slice.h ✨ Enhancement +2/-2

Rename method to OnChangeCbCount for clarity

src/server/journal/journal_slice.h


12. src/server/journal/streamer.cc ✨ Enhancement +1/-1

Pass true flag to RegisterChangeListener for replicas

src/server/journal/streamer.cc


13. src/server/tx_base.h ✨ Enhancement +5/-0

Add is_omittable_operation flag to DbContext

src/server/tx_base.h


14. src/server/server_family.cc ✨ Enhancement +2/-0

Add total_journal_omits metric to info output

src/server/server_family.cc


15. tests/dragonfly/replication_test.py 🧪 Tests +10/-2

Add stress test and verify journal omit metrics

tests/dragonfly/replication_test.py


16. tests/dragonfly/seeder/script-genlib.lua 🧪 Tests +7/-3

Increase SET operation frequency in string modifications

tests/dragonfly/seeder/script-genlib.lua


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 19, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Missing <algorithm> include 🐞 Bug ⚙ Maintainability
Description
DbSlice::IsOmittableWrite uses std::ranges::max, but db_slice.cc does not include <algorithm> (the
standard header that declares ranges algorithms), relying on transitive includes and risking build
failures on some toolchains.
Code

src/server/db_slice.cc[R1967-1970]

+  auto cb2 = [cb1](const PrimeTable::BucketSet& bs) -> uint64_t {
+    DCHECK(!std::ranges::empty(bs.buckets()));
+    return std::ranges::max(bs.buckets(), {}, cb1).GetVersion();
+  };
Evidence
The implementation explicitly calls std::ranges::max, while the file’s includes (as modified) add
` but not `, which is the standard header for ranges algorithms.

src/server/db_slice.cc[5-12]
src/server/db_slice.cc[1965-1970]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`src/server/db_slice.cc` calls `std::ranges::max(...)` inside `DbSlice::IsOmittableWrite`, but the file only adds `#include <ranges>` and does not include `<algorithm>`. `std::ranges::max` is declared in `<algorithm>`; relying on transitive includes can break builds depending on the standard library / compiler configuration.
## Issue Context
This PR introduces `DbSlice::IsOmittableWrite` and uses `std::ranges::max` over a buckets view.
## Fix Focus Areas
- src/server/db_slice.cc[4-12]
- src/server/db_slice.cc[1965-1970]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Omit requires backlog invalidation ✓ Resolved 🐞 Bug ☼ Reliability
Description
The PR introduces a generic ItAndUpdater::omitted_journal signal, but the critical safety step for
partial-sync correctness (clearing the journal ring buffer) is implemented only in SET; future uses
of omitted_journal can easily omit journal writes without invalidating the partial-sync backlog,
making partial sync incorrect.
Code

src/server/string_family.cc[R943-947]

+    // Skipping a journal write breaks the LSN buffer, so clear it
+    if (skip_journal_)
+      journal::ClearBuffer();
+    else
+      RecordJournal(params, key, value);
Evidence
journal::ClearBuffer documents it must advance LSN to prevent the partial-sync fast path from
succeeding with an empty/invalid buffer. The PR’s omit signal is generic (omitted_journal on
ItAndUpdater), but the backlog invalidation is only performed in SET’s explicit journaling path,
so the invariant is easy to violate when other commands start consuming omitted_journal.

src/server/journal/journal.cc[37-44]
src/server/db_slice.h[269-277]
src/server/db_slice.cc[676-699]
src/server/string_family.cc[942-948]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Omitting a journal write requires invalidating the partial-sync backlog. Today this is enforced only in `SetCmd::PostEdit` via `journal::ClearBuffer()`, but the PR exposes omission via the generic `DbSlice::ItAndUpdater::omitted_journal` flag. This makes it easy for future commands to adopt the omit optimization and skip journaling without also clearing the backlog, which the codebase explicitly documents as necessary to keep partial sync safe.
## Issue Context
- `journal::ClearBuffer()` exists specifically to prevent `DflyCmd::IsLSNInPartialSyncBuffer` from allowing a reconnecting replica to partial-sync from an LSN that no longer corresponds to a complete backlog.
- The omission decision is produced in `DbSlice::AddOrFindInternal` and surfaced via `ItAndUpdater::omitted_journal`.
## Fix Focus Areas
- src/server/db_slice.h[269-277]
- src/server/db_slice.cc[676-699]
- src/server/string_family.cc[942-948]
- src/server/journal/journal.cc[37-44]
## Expected fix direction
Introduce a single shared mechanism to pair "omit journal entry" with "invalidate partial-sync backlog" (e.g., a helper/token returned alongside `omitted_journal`, or a dedicated API like `journal::InvalidatePartialSyncBacklog()` that all omit-capable commands must call).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

Comment thread src/server/db_slice.cc
Comment on lines +1967 to +1970
auto cb2 = [cb1](const PrimeTable::BucketSet& bs) -> uint64_t {
DCHECK(!std::ranges::empty(bs.buckets()));
return std::ranges::max(bs.buckets(), {}, cb1).GetVersion();
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Missing include 🐞 Bug ⚙ Maintainability

DbSlice::IsOmittableWrite uses std::ranges::max, but db_slice.cc does not include <algorithm> (the
standard header that declares ranges algorithms), relying on transitive includes and risking build
failures on some toolchains.
Agent Prompt
## Issue description
`src/server/db_slice.cc` calls `std::ranges::max(...)` inside `DbSlice::IsOmittableWrite`, but the file only adds `#include <ranges>` and does not include `<algorithm>`. `std::ranges::max` is declared in `<algorithm>`; relying on transitive includes can break builds depending on the standard library / compiler configuration.

## Issue Context
This PR introduces `DbSlice::IsOmittableWrite` and uses `std::ranges::max` over a buckets view.

## Fix Focus Areas
- src/server/db_slice.cc[4-12]
- src/server/db_slice.cc[1965-1970]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment thread src/server/serializer_base.cc Outdated
@@ -138,9 +138,10 @@ SerializerBase::~SerializerBase() {
// same fiber, so the baseline is serialized first. big_value_mu_ prevents this callback path
// from interleaving with the traversal fiber's bucket serialization, which may preempt while
// emitting large values.
void SerializerBase::RegisterChangeListener() {
void SerializerBase::RegisterChangeListener(bool replica) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain replica argument?

Comment thread src/server/string_family.cc Outdated
@@ -935,7 +940,11 @@ void SetCmd::PostEdit(const SetParams& params, std::string_view key, std::string
}

if (explicit_journal_ && op_args_.shard->journal()) {
RecordJournal(params, key, value);
// Skipping a journal write breaks the LSN buffer, so clear it
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please expand? what do you mean it breaks the LSN buffer?

@@ -935,7 +940,13 @@ void SetCmd::PostEdit(const SetParams& params, std::string_view key, std::string
}

if (explicit_journal_ && op_args_.shard->journal()) {
RecordJournal(params, key, value);
// A skipped journal write is not added to the journal ring buffer, so it goes unnoticed for
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's still hard to understand the exact scenario of when it happens as the main scenario of snapshotting is unrelated to psync.

up to you.

@dranikpg dranikpg merged commit 05b1660 into dragonflydb:main May 20, 2026
13 checks passed
@dranikpg dranikpg deleted the new-journal-omits branch May 20, 2026 09:35
@romange
Copy link
Copy Markdown
Collaborator

romange commented May 20, 2026

@dranikpg one thing I forgot to add - can we add a runtime flag to be able to revert the feature if needed?

@dranikpg
Copy link
Copy Markdown
Contributor Author

will add it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants