Fix DFA recovery failures: file-handle leak and reset-path crash#21759
Fix DFA recovery failures: file-handle leak and reset-path crash#21759ask-kamal-nayan wants to merge 4 commits into
Conversation
PR Reviewer Guide 🔍(Review updated until commit 84b9458)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 84b9458 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 8fab2b3
Suggestions up to commit 5053382
Suggestions up to commit 10e3972
Suggestions up to commit be5c46b
Suggestions up to commit be5c46b
|
|
❌ Gradle check result for c85674d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
c85674d to
60ef45c
Compare
|
Persistent review updated to latest commit 60ef45c |
60ef45c to
be5c46b
Compare
|
Persistent review updated to latest commit be5c46b |
|
❌ Gradle check result for be5c46b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Persistent review updated to latest commit be5c46b |
|
❌ Gradle check result for be5c46b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Kamal Nayan <askkamal@amazon.com>
be5c46b to
10e3972
Compare
|
Persistent review updated to latest commit 10e3972 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #21759 +/- ##
============================================
- Coverage 73.44% 73.41% -0.04%
+ Complexity 75084 75072 -12
============================================
Files 6016 6016
Lines 341072 341074 +2
Branches 49091 49091
============================================
- Hits 250513 250409 -104
- Misses 70641 70701 +60
- Partials 19918 19964 +46 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Persistent review updated to latest commit 5053382 |
|
Persistent review updated to latest commit 8fab2b3 |
|
❌ Gradle check result for 8fab2b3: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Persistent review updated to latest commit 84b9458 |
Description
Fixes two recovery-path bugs in the DataFormat-aware engine, both reproducible with seed
4DB4295327DC8DC0.1. File-handle leak in
LuceneReaderManager(CompositeLocalRecoveryIT)The initial
DirectoryReaderopened viaDirectoryReader.open(writer)was overwritten byafterRefresh()withoutdecRef, leaking itsFileChannelon_0.cfs.LeakFSflagged it at suite teardown. Hold a separateinitialReaderreference and release itidempotently from
close()andonDeleted(...)to balance the originalopen().2. Reset-path crash on DFA shards (
DataFormatAwareReplicationPromotionWithLuceneIT.testUncleanFailoverWithContinuousIndexing)During
resetEngineToGlobalCheckpoint, the wrapper'sgetSegmentInfosSnapshot()override calledapplyOnEngine(newEngineReference, ...)— a BWC shim that throws forDataFormatAwareEngine. Hit byReplicationCheckpointUpdaterduring translog-recovery flush;failed the shard, RED cluster. Add
Engine.acquireSnapshot()returning the format-agnosticCatalogSnapshot(mirrors existingacquireSafeCatalogSnapshotpattern, default impl bridges viagetSegmentInfosSnapshotfor legacy engines). RouteEngineBackedIndexer.acquireSnapshot()through it. Replace the broken override on the reset wrapper with one that delegates via theIndexerinterface — which both legacy and DFA implement.Files
sandbox/plugins/analytics-backend-lucene/.../LuceneReaderManager.javaserver/src/main/java/org/opensearch/index/engine/Engine.javaserver/src/main/java/org/opensearch/index/engine/EngineBackedIndexer.javaserver/src/main/java/org/opensearch/index/shard/IndexShard.javaVerification
Both failing tests now pass deterministically (3/3) with seed
4DB4295327DC8DC0. Existing UTs unaffected:ReadOnlyEngineTests(7),IndexShardTestsreset suite (5) — all pass. No public API signatures changed;Engine.acquireSnapshotdefault impl preserves legacybehavior.
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.