Skip to content

HDDS-13906. Reduce Bootstrap Write lock time on OM during bootstrapping execution.#9585

Merged
jojochuang merged 15 commits into
apache:masterfrom
sadanand48:HDDS-13906
Feb 12, 2026
Merged

HDDS-13906. Reduce Bootstrap Write lock time on OM during bootstrapping execution.#9585
jojochuang merged 15 commits into
apache:masterfrom
sadanand48:HDDS-13906

Conversation

@sadanand48

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Because of fixes in HDDS-13905 an entire BG service task would be blocked because of bootstrap operation running on a leader OM. One possible improvement here would be to just to create a hardlink(which we already do to ensure this file doesn't get deleted by rocksdb operations but we also write the file into the Tarball stream synchronously

public static long linkAndIncludeFile(File file, String entryName,
ArchiveOutputStream<TarArchiveEntry> archiveOutput, Path tmpDir) throws IOException {
File link = tmpDir.resolve(entryName).toFile();
long bytes = 0;
try {
Files.createLink(link.toPath(), file.toPath());
TarArchiveEntry entry = archiveOutput.createArchiveEntry(link, entryName);
archiveOutput.putArchiveEntry(entry);
try (InputStream input = Files.newInputStream(link.toPath())) {
bytes = IOUtils.copyLarge(input, archiveOutput);
}
archiveOutput.closeArchiveEntry();
} catch (IOException ioe) {
LOG.error("Couldn't create hardlink for file {} while including it in tarball.",
file.getAbsolutePath(), ioe);
throw ioe;
} finally {
Files.deleteIfExists(link.toPath());
}
return bytes;
}
) to the file to be copied into tar outputStream into a tmp directory under the bootstrap lock and write all the entries corresponding to the link created outside the lock(The entry should also include the hardLinkFile created in the last batch).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13906

How was this patch tested?

Will add tests

@swamirishi swamirishi self-requested a review January 5, 2026 14:11

@swamirishi swamirishi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @sadanand48 . Have a few doubts with the implementation

@sadanand48 sadanand48 added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Jan 5, 2026
@swamirishi

Copy link
Copy Markdown
Contributor

@rnblough Do you wanna take a look at this patch?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces the Bootstrap write lock time during OM bootstrapping by refactoring the checkpoint streaming process. The key change is separating the file collection phase (which creates hardlinks while holding the lock) from the actual tar streaming phase (which now happens outside the lock).

Changes:

  • Introduced OMDBArchiver class to manage hardlink creation and deferred tar writing
  • Refactored OMDBCheckpointServletInodeBasedXfer to collect files under lock and write outside lock
  • Updated tests to work with the new two-phase approach

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 15 comments.

File Description
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBArchiver.java New class that manages hardlink creation during lock phase and tar writing outside lock
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServletInodeBasedXfer.java Refactored to use two-phase approach: collectDbDataToTransfer (under lock) and writeToArchive (outside lock)
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestOMDBArchiver.java New test class covering OMDBArchiver functionality
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMDbCheckpointServletInodeBasedXfer.java Updated integration tests to work with new method signatures and two-phase approach
Comments suppressed due to low confidence (1)

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServletInodeBasedXfer.java:192

  • Potential resource leak: If an IOException occurs in the second try block (lines 173-182) after successfully creating the OMDBArchiver and collecting data, the hardlinks created in the tmpDir may not be cleaned up properly. The cleanup in the finally block only deletes the tmpDir, but if writeToArchive fails, the hardlink files stored in filesToWriteIntoTarball map won't be explicitly deleted since the cleanup happens in writeToArchive's finally block. Consider adding explicit cleanup of the archiver's resources in the outer finally block.
    try {
      Instant start = Instant.now();
      OutputStream outputStream = response.getOutputStream();
      omdbArchiver.writeToArchive(getConf(), outputStream);
      Instant end = Instant.now();
      long duration = Duration.between(start, end).toMillis();
      LOG.info("Time taken to write the checkpoint to response output " +
          "stream: {} milliseconds", duration);
    } catch (IOException e) {
      LOG.error("unable to write to archive stream", e);
    } finally {
      try {
        if (tmpdir != null) {
          FileUtils.deleteDirectory(tmpdir.toFile());
        }
      } catch (IOException e) {
        LOG.error("unable to delete: " + tmpdir, e.toString());
      }
    }
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jojochuang jojochuang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a github glitch and my comments were not sent out.

IMO we can merge as is. We could refactor the code later but right now I want to unblock this one.

// we finished transferring files from snapshot DB's by now and
// this is the last step where we transfer the active om.db contents
Map<String, String> hardLinkFileMap = new HashMap<>();
omdbArchiver.setHardLinkFileMap(hardLinkFileMap);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line doesn't make much sense to me.... setting an internal reference to an empty hash map.
In fact we can rewrite to get rid of setHardLinkFileMap() method.

it's fine but it's going to need some cleanup to make the logic more clear.

* handling deduplication and managing resource locking.
* Collects the snapshots to be transferred from the specified snapshot directories
* into the archive output stream.
*

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*
* @param omdbArchiver helper to archive the OM DB.
* @throws IOException if an I/O error occurs during processing.

*
* @param sstFilesToExclude Set of SST file identifiers to exclude from the archive.
* @param tmpdir Temporary directory for intermediate processing.
* @param snapshotPaths Set of paths to snapshot directories to be processed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param snapshotPaths Set of paths to snapshot directories to be processed.
@VisibleForTesting
void collectSnapshotData(Set<String> sstFilesToExclude, Collection<Path> snapshotPaths,

@jojochuang jojochuang marked this pull request as ready for review February 11, 2026 06:00

@smengcl smengcl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sadanand48 for the patch.

Comment on lines +185 to 187
} catch (IOException e) {
LOG.error("unable to write to archive stream", e);
} finally {

@smengcl smengcl Feb 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we should return 500 here as well? Other than that we can merge it and address remaining issues later

Suggested change
} catch (IOException e) {
LOG.error("unable to write to archive stream", e);
} finally {
} catch (IOException e) {
LOG.error("unable to write to archive stream", e);
response.setStatus(HttpServletResponse.SC_INTERNAL_SERVER_ERROR);
} finally {

@jojochuang jojochuang merged commit 67c105b into apache:master Feb 12, 2026
56 checks passed
@jojochuang

Copy link
Copy Markdown
Contributor

Merged. Opened https://issues.apache.org/jira/browse/HDDS-14629 to track the follow up tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants