Skip to content

Implement Directory and IndexInput layers for WritableWarm tiered storage#21177

Closed
MayankHarsh03 wants to merge 1 commit intoopensearch-project:mainfrom
MayankHarsh03:feature/writable-warm-directories-indexinputs
Closed

Implement Directory and IndexInput layers for WritableWarm tiered storage#21177
MayankHarsh03 wants to merge 1 commit intoopensearch-project:mainfrom
MayankHarsh03:feature/writable-warm-directories-indexinputs

Conversation

@MayankHarsh03
Copy link
Copy Markdown
Contributor

@MayankHarsh03 MayankHarsh03 commented Apr 8, 2026

Description

Directory Layer:

  • TieredDirectory — Composite directory that transparently reads/writes across local and remote storage. Handles file listing (merging local + remote), file caching via FileCache, opening inputs through SwitchableIndexInput, and local-to-remote switching after sync.
  • TieredDirectoryFactory — Factory that creates TieredDirectory instances for each shard.
  • DirectoryUtils — Utility for resolving file paths and switchable file paths from FSDirectory.

IndexInput Layer:

  • SwitchableIndexInput — IndexInput that starts reading from local full file and can switch to remote block-based reading. Manages clones/slices with shared read-write lock to prevent race conditions during switching.
  • CachedSwitchableIndexInput — Wraps SwitchableIndexInput for use with FileCache.
  • SwitchableIndexInputWrapper — Thin wrapper with Cleaner for GC-based cleanup of unclosed index inputs.
  • OnDemandPrefetchBlockSnapshotIndexInput — Block-level fetching from remote with read-ahead support (default 4 blocks).
  • BlockIndexInput — Block-based index input backed by downloaded block files.
  • BlockFetchRequest — Data class representing a block fetch request.
  • BlockTransferManager — Manages parallel block downloads from remote storage with deduplication.

Note: TieredStoragePrefetchSettings and per-query metric recording (TieredStorageQueryMetricService) are not yet integrated as those classes are not available upstream. Read-ahead uses a hardcoded default of 4 blocks. These will be plugged in via follow-up PRs.

Related Issues

Part of the tiered storage implementation — resolves #21078

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…rage

Implements the core Directory Layer (TieredDirectory, TieredDirectoryFactory,
DirectoryUtils) and IndexInput Layer (SwitchableIndexInput, CachedSwitchableIndexInput,
SwitchableIndexInputWrapper, OnDemandPrefetchBlockSnapshotIndexInput, BlockIndexInput,
BlockFetchRequest, BlockTransferManager) with unit tests.

Signed-off-by: Mayank Harsh <mayankmh@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 00d8b60.

PathLineSeverityDescription
server/src/main/java/org/opensearch/storage/indexinput/BlockFetchRequest.java1lowApache-2.0 license header was removed from the file. While not a direct security threat, deliberate removal of license headers can indicate tampering or an attempt to obscure file provenance.
server/src/main/java/org/opensearch/storage/indexinput/OnDemandPrefetchBlockSnapshotIndexInput.java74lowIn fetchBlock(), variables 'blockFileName' and 'cacheHit' are computed but never used — the method delegates to super.fetchBlock(blockId) unconditionally. Dead code that appears to have been stripped of its intended logic, which could indicate incomplete or intentionally hollowed-out metric/cache logic.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions github-actions Bot added enhancement Enhancement or improvement to existing feature or request Storage Issues and PRs relating to data and metadata storage labels Apr 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: BlockTransferManager and BlockFetchRequest implementation

Relevant files:

  • server/src/main/java/org/opensearch/storage/common/BlockTransferManager.java
  • server/src/main/java/org/opensearch/storage/indexinput/BlockFetchRequest.java
  • server/src/test/java/org/opensearch/storage/common/BlockTransferManagerTests.java
  • server/src/test/java/org/opensearch/storage/indexinput/BlockFetchRequestTests.java

Sub-PR theme: IndexInput layer implementation

Relevant files:

  • server/src/main/java/org/opensearch/storage/indexinput/BlockIndexInput.java
  • server/src/main/java/org/opensearch/storage/indexinput/OnDemandPrefetchBlockSnapshotIndexInput.java
  • server/src/main/java/org/opensearch/storage/indexinput/SwitchableIndexInput.java
  • server/src/main/java/org/opensearch/storage/indexinput/SwitchableIndexInputWrapper.java
  • server/src/main/java/org/opensearch/storage/indexinput/CachedSwitchableIndexInput.java
  • server/src/test/java/org/opensearch/storage/indexinput/BlockIndexInputTests.java

Sub-PR theme: Directory layer implementation

Relevant files:

  • server/src/main/java/org/opensearch/storage/directory/TieredDirectory.java
  • server/src/main/java/org/opensearch/storage/directory/TieredDirectoryFactory.java
  • server/src/main/java/org/opensearch/storage/utils/DirectoryUtils.java
  • server/src/test/java/org/opensearch/storage/utils/DirectoryUtilsTests.java
  • server/src/test/java/org/opensearch/storage/directory/CleanerDaemonThreadLeakFilter.java

⚡ Recommended focus areas for review

Missing License Header

The file is missing the Apache-2.0 license header that was present in the old version and is present in all other files in this PR. This is a compliance issue.

package org.opensearch.storage.indexinput;
Duplicate Field in toString

The toString() method includes filePath twice - once near the top and once at the end of the string. This is a bug that produces redundant output and may confuse debugging.

@Override
public String toString() {
    return "BlockFetchRequest{"
            + "filePath=" + filePath.toString()
            + ", directory=" + directory.toString()
            + ", fileName='" + fileName
            + ", blockFileName='" + blockFileName
            + ", blockStart=" + blockStart
            + ", blockSize=" + blockSize
            + ", filePath=" + filePath
            + '}';
}
Race Condition in switchToRemote

In switchToRemote(), after acquiring the write lock and object lock, the code iterates over clones.keySet() and calls c.switchToRemote() on each clone. Each clone's switchToRemote() also tries to acquire the shared write lock, which would cause a deadlock since ReentrantReadWriteLock is not reentrant across write locks from different threads. This recursive locking pattern needs careful review.

public void switchToRemote() throws IOException, IllegalStateException {
    sharedLock.writeLock().lock();
    try {
        objectLock.lock();
        try {
            if (isClosed || hasSwitchedToRemote)
                return;
            validateFilePresentInRemote();
            remoteIndexInput.set(getRemoteIndexInput());
            IndexInput localIndexInput = underlyingIndexInput.get();
            if (isClone) remoteIndexInput.get().seek(localIndexInput.getFilePointer());
            underlyingIndexInput.set(remoteIndexInput.get());
            if (!isClone) {
                clones.keySet().forEach(c -> {
                    try {
                        c.switchToRemote();
                    } catch (IOException e) {
                        logger.error("Failed to switch IndexInput to remote - {}", c, e);
                    }
                });
            }
            localIndexInput.close();
            hasSwitchedToRemote = true;
            if (!isClone) fileCache.remove(fullFilePath);
        } finally {
            objectLock.unlock();
        }
    } finally {
        sharedLock.writeLock().unlock();
    }
}
Swallowed Exception

In downloadBlocksAsync, exceptions from transferManager.fetchBlobAsync are caught and only logged as errors without rethrowing or propagating. This silently swallows prefetch failures, which could lead to missing blocks without any caller awareness.

try {
    transferManager.fetchBlobAsync(blobFetchRequest);
} catch (Exception e) {
    logger.error(
        "Exception while fetching block asynchronously from remote - " +
            "File: {} , Block File: {} , BlockStart: {} , BlockEnd: {} , OriginalFileSize: {}",
        fileName,
        blockFileName,
        blockStart,
        blockEnd,
        originalFileSize
    );
}
NullPointerException Suppression

In listAll(), a NullPointerException from remoteDirectory.listAll() is silently caught and treated as an empty result. This hides potential bugs or misconfigurations. A more specific exception type or a null check should be used instead.

} catch (NullPointerException e) {
    remoteFiles = new String[]{};
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect hardcoded length return value

The length() method always returns 0, which is incorrect. It should delegate to the
underlying switchableIndexInput.length() to return the actual file length, otherwise
callers relying on this value will get wrong results.

server/src/main/java/org/opensearch/storage/indexinput/CachedSwitchableIndexInput.java [52-54]

 @Override
 public long length() {
-    return 0;
+    return switchableIndexInput.length();
 }
Suggestion importance[1-10]: 8

__

Why: The length() method hardcodes return 0 instead of delegating to switchableIndexInput.length(), which would cause incorrect behavior for any caller relying on the file length from this CachedIndexInput implementation.

Medium
Fix state inconsistency on exception during switch

In switchToRemote(), localIndexInput.close() is called before hasSwitchedToRemote =
true is set. If localIndexInput.close() throws an exception, hasSwitchedToRemote
will remain false while the underlying index input has already been switched to
remote, leaving the object in an inconsistent state. Set hasSwitchedToRemote = true
before closing the local input.

server/src/main/java/org/opensearch/storage/indexinput/SwitchableIndexInput.java [186-216]

-public void switchToRemote() throws IOException, IllegalStateException {
-    sharedLock.writeLock().lock();
-    try {
-        objectLock.lock();
-        try {
-            if (isClosed || hasSwitchedToRemote)
-                return;
-            validateFilePresentInRemote();
-            remoteIndexInput.set(getRemoteIndexInput());
-            IndexInput localIndexInput = underlyingIndexInput.get();
-            if (isClone) remoteIndexInput.get().seek(localIndexInput.getFilePointer());
-            underlyingIndexInput.set(remoteIndexInput.get());
-            if (!isClone) {
-                clones.keySet().forEach(c -> {
-                    try {
-                        c.switchToRemote();
-                    } catch (IOException e) {
-                        logger.error("Failed to switch IndexInput to remote - {}", c, e);
-                    }
-                });
-            }
-            localIndexInput.close();
-            hasSwitchedToRemote = true;
-            if (!isClone) fileCache.remove(fullFilePath);
+localIndexInput.close();
+hasSwitchedToRemote = true;
+if (!isClone) fileCache.remove(fullFilePath);
Suggestion importance[1-10]: 5

__

Why: The suggestion points out a potential state inconsistency if localIndexInput.close() throws, but in practice IndexInput.close() rarely throws and the existing code already has underlyingIndexInput pointing to remote before the close. The improved code shown is identical to the existing code (same ordering), so the suggestion doesn't actually change anything meaningful.

Low
General
Remove duplicate field in toString output

The toString() method includes filePath twice — once at the beginning and once at
the end. Remove the duplicate entry to avoid redundant output and potential
confusion.

server/src/main/java/org/opensearch/storage/indexinput/BlockFetchRequest.java [65-75]

 return "BlockFetchRequest{"
         + "filePath=" + filePath.toString()
         + ", directory=" + directory.toString()
         + ", fileName='" + fileName
         + ", blockFileName='" + blockFileName
         + ", blockStart=" + blockStart
         + ", blockSize=" + blockSize
-        + ", filePath=" + filePath
         + '}';
Suggestion importance[1-10]: 6

__

Why: The toString() method clearly includes filePath twice (lines 67 and 73), which is a real bug producing redundant/confusing output. The fix is accurate and straightforward.

Low
Replace NullPointerException catch with proper exception handling

Catching NullPointerException to handle the case where remoteDirectory is
unavailable is a bad practice and can mask real bugs. Use a proper null check or
catch a more appropriate exception (e.g., IOException) that listAll() is declared to
throw.

server/src/main/java/org/opensearch/storage/directory/TieredDirectory.java [71-73]

 try {
-    remoteFiles = remoteDirectory.listAll();
-}
-} catch (NullPointerException e) {
+    if (hasLocalSegments) {
+        remoteFiles = Arrays.stream(remoteDirectory.listAll())
+            .filter(fileName -> !fileName.startsWith(IndexFileNames.SEGMENTS))
+            .toArray(String[]::new);
+    } else {
+        remoteFiles = remoteDirectory.listAll();
+    }
+} catch (IOException e) {
+    logger.warn("Failed to list remote files, treating as empty", e);
     remoteFiles = new String[]{};
 }
Suggestion importance[1-10]: 6

__

Why: Catching NullPointerException to handle remote directory unavailability is indeed bad practice and can mask real bugs. Replacing it with an IOException catch (which listAll() is declared to throw) is a valid improvement, though the improved code also restructures the entire try block correctly.

Low

@MayankHarsh03 MayankHarsh03 marked this pull request as ready for review April 8, 2026 18:33
@MayankHarsh03 MayankHarsh03 requested a review from a team as a code owner April 8, 2026 18:33
@MayankHarsh03 MayankHarsh03 deleted the feature/writable-warm-directories-indexinputs branch April 8, 2026 18:37
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

❌ Gradle check result for 00d8b60: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request Storage Issues and PRs relating to data and metadata storage

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

[Feature Request] API Models for hot and warm tiering actions

1 participant