Implement Directory and IndexInput layers for WritableWarm tiered storage#21177
Conversation
…rage Implements the core Directory Layer (TieredDirectory, TieredDirectoryFactory, DirectoryUtils) and IndexInput Layer (SwitchableIndexInput, CachedSwitchableIndexInput, SwitchableIndexInputWrapper, OnDemandPrefetchBlockSnapshotIndexInput, BlockIndexInput, BlockFetchRequest, BlockTransferManager) with unit tests. Signed-off-by: Mayank Harsh <mayankmh@amazon.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 00d8b60.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|
❌ Gradle check result for 00d8b60: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Directory Layer:
TieredDirectory— Composite directory that transparently reads/writes across local and remote storage. Handles file listing (merging local + remote), file caching via FileCache, opening inputs through SwitchableIndexInput, and local-to-remote switching after sync.TieredDirectoryFactory— Factory that creates TieredDirectory instances for each shard.DirectoryUtils— Utility for resolving file paths and switchable file paths from FSDirectory.IndexInput Layer:
SwitchableIndexInput— IndexInput that starts reading from local full file and can switch to remote block-based reading. Manages clones/slices with shared read-write lock to prevent race conditions during switching.CachedSwitchableIndexInput— Wraps SwitchableIndexInput for use with FileCache.SwitchableIndexInputWrapper— Thin wrapper with Cleaner for GC-based cleanup of unclosed index inputs.OnDemandPrefetchBlockSnapshotIndexInput— Block-level fetching from remote with read-ahead support (default 4 blocks).BlockIndexInput— Block-based index input backed by downloaded block files.BlockFetchRequest— Data class representing a block fetch request.BlockTransferManager— Manages parallel block downloads from remote storage with deduplication.Note: TieredStoragePrefetchSettings and per-query metric recording (TieredStorageQueryMetricService) are not yet integrated as those classes are not available upstream. Read-ahead uses a hardcoded default of 4 blocks. These will be plugged in via follow-up PRs.
Related Issues
Part of the tiered storage implementation — resolves #21078
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.