Skip to content

Coordinate BackupAgent and DataDistributor to compute partitions for range-partitioned backup (V3)#13304

Open
akankshamahajan15 wants to merge 3 commits into
apple:mainfrom
akankshamahajan15:bk_callpartitions
Open

Coordinate BackupAgent and DataDistributor to compute partitions for range-partitioned backup (V3)#13304
akankshamahajan15 wants to merge 3 commits into
apple:mainfrom
akankshamahajan15:bk_callpartitions

Conversation

@akankshamahajan15
Copy link
Copy Markdown
Contributor

@akankshamahajan15 akankshamahajan15 commented May 30, 2026

This PR includes following changes:

  • Wire BackupAgent → DataDistributor coordination for range-partitioned backup partition computation.
  • Adds 2 system keys:
    • backupPartitionRequiredKey — todo signal (0/1/2) that BackupAgent writes and DD watches.
    • backupPartitionListKey — partition list (vector) DD writes after computing partitions.
  • BackupAgent writes backupPartitionRequired = 1 when the first range-partitioned backup starts, and = 2 when the last finishes, piggybacking on the existing backupStartedKey transaction.
  • DataDistributor watches the key; on value 1 it calls calculateBackupPartitionKeyRanges and writes the result to backupPartitionListKey; on value 2 it clears the partition list. In both cases the request key is cleared in the same commit.
  • Adds knob BACKUP_NUM_OF_PARTITIONS (default 100) controlling how many partitions calculateBackupPartitionKeyRanges produces.

Coordination between BackupAgent and DataDistributor

  • backupPartitionRequired is a small todo key (0/1/2) DD watches via tr.watch(). Multi-backup safe: BackupAgent only writes 1 when adding a UID to an empty backupStartedKey, and only
    writes 2 when removing the last UID; the read-then-write happens in the same txn as the backupStartedKey update.

Testing

100K Completed
20260530-022602-ak_bk_dd_2-43582fe4cfa1abb9 compressed=True data_size=37179731 duration=2238936 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:26:37 sanity=False started=100000 stopped=20260530-025239 submitted=20260530-022602 timeout=5400 username=ak_bk_dd_2

@akankshamahajan15 akankshamahajan15 requested a review from Copilot May 30, 2026 01:07
@akankshamahajan15 akankshamahajan15 added the Backup_v3 Range Partitioned Backup label May 30, 2026
@akankshamahajan15 akankshamahajan15 changed the title Implementation Coordinate BackupAgent and DataDistributor to compute partitions for range-partitioned backup (V3) May 30, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Wires BackupAgent ↔ DataDistributor coordination for computing the key-range partitions used by the experimental range-partitioned backup. BackupAgent writes a small request value (1 on first range-partitioned backup start, 2 on last stop) to a new system key, and DataDistributor watches it, calls calculateBackupPartitionKeyRanges, and stores the resulting vector<KeyRange> to a second system key (or clears it on stop).

Changes:

  • Add two \xff\x02/ system keys (backupPartitionRequiredKey, backupPartitionListKey) with encode/decode helpers.
  • Add monitorBackupPartitionRequired actor in DataDistribution.cpp, started alongside the shard tracker so it shares the same KeyRangeMap<ShardTrackedData>.
  • Have StartFullBackupTaskFunc and clearBackupStartID set values 1/2 on first/last range-partitioned backup transitions, piggybacking on the backupStartedKey transaction.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
fdbclient/SystemData.cpp Defines the two new system keys and four encode/decode helpers using Unversioned()/IncludeVersion() writers.
fdbclient/include/fdbclient/SystemData.h Declares the new keys/helpers and documents the 0/1/2 request-value scheme.
fdbclient/FileBackupAgent.cpp Writes request value 1 when starting the first range-partitioned backup and 2 when the last one is cleared.
fdbserver/core/include/fdbserver/core/BackupPartitionMap.h Includes KeyRangeMap.h, switches typedefs to using, and declares calculateBackupPartitionKeyRanges.
fdbserver/datadistributor/DataDistribution.cpp Adds monitorBackupPartitionRequired actor that watches the request key, computes/clears partitions, and writes the partition list.

Comment thread fdbserver/datadistributor/DataDistribution.cpp Outdated
Comment thread fdbclient/FileBackupAgent.cpp
Comment thread fdbclient/FileBackupAgent.cpp
Comment thread fdbclient/SystemData.cpp
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: b8508c9
  • Duration 0:22:41
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: b8508c9
  • Duration 0:46:53
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: b8508c9
  • Duration 0:47:36
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: b8508c9
  • Duration 0:56:09
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: b8508c9
  • Duration 1:02:08
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: b8508c9
  • Duration 1:04:32
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: b8508c9
  • Duration 1:15:08
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: c0fa69d
  • Duration 0:05:28
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: c0fa69d
  • Duration 0:05:38
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: c0fa69d
  • Duration 0:21:09
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: c0fa69d
  • Duration 0:46:38
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: c0fa69d
  • Duration 0:48:17
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: c0fa69d
  • Duration 0:52:11
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: c0fa69d
  • Duration 1:03:58
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: c0fa69d
  • Duration 0:04:32
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backup_v3 Range Partitioned Backup

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants