Skip to content

Commit 33f8fac

Browse files
committed
[fix](editlog) Fix BDBEnvironment.close() ConcurrentModificationException with two-phase close
BDBEnvironment.close() iterates openedDatabases without acquiring lock.writeLock(), while openDatabase() concurrently modifies the same ArrayList under the write lock. This causes ConcurrentModificationException when a RollbackException triggers close() on the replayer thread while HTTP/heartbeat/RPC handler threads call openDatabase() via getMaxJournalId(). Fix with two-phase close pattern: - Phase 1 (under writeLock): detach all resources — copy openedDatabases to a local list, clear the original, and null out epochDB and replicatedEnvironment fields. This atomically makes them invisible to concurrent openDatabase() callers. - Phase 2 (outside lock): perform the actual db.close() and env.close() calls without holding the lock, avoiding blocking unrelated threads during potentially slow close operations. Also fix misleading "will exit" in log messages — close() does not exit.
1 parent 9ecfd40 commit 33f8fac

1 file changed

Lines changed: 33 additions & 12 deletions

File tree

fe/fe-core/src/main/java/org/apache/doris/journal/bdbje/BDBEnvironment.java

Lines changed: 33 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -479,31 +479,52 @@ public List<Long> getDatabaseNames() {
479479
return ret;
480480
}
481481

482-
// Close the store and environment
482+
// Close the store and environment.
483+
// Two-phase close: detach resources under the write lock, then close them outside the lock
484+
// to avoid blocking openDatabase() callers during potentially slow close() calls.
483485
public void close() {
484-
for (Database db : openedDatabases) {
486+
List<Database> databasesToClose;
487+
Database epochDBToClose = null;
488+
ReplicatedEnvironment envToClose = null;
489+
490+
lock.writeLock().lock();
491+
try {
492+
databasesToClose = new ArrayList<>(openedDatabases);
493+
openedDatabases.clear();
494+
495+
if (epochDB != null) {
496+
epochDBToClose = epochDB;
497+
epochDB = null;
498+
}
499+
500+
if (replicatedEnvironment != null) {
501+
envToClose = replicatedEnvironment;
502+
replicatedEnvironment = null;
503+
}
504+
} finally {
505+
lock.writeLock().unlock();
506+
}
507+
508+
// Close resources outside the lock
509+
for (Database db : databasesToClose) {
485510
try {
486511
db.close();
487512
} catch (DatabaseException exception) {
488-
LOG.error("Error closing db {} will exit", db.getDatabaseName(), exception);
513+
LOG.error("Error closing db {}", db.getDatabaseName(), exception);
489514
}
490515
}
491-
openedDatabases.clear();
492516

493-
if (epochDB != null) {
517+
if (epochDBToClose != null) {
494518
try {
495-
epochDB.close();
496-
epochDB = null;
519+
epochDBToClose.close();
497520
} catch (DatabaseException exception) {
498-
LOG.error("Error closing db {} will exit", epochDB.getDatabaseName(), exception);
521+
LOG.error("Error closing epoch db {}", epochDBToClose.getDatabaseName(), exception);
499522
}
500523
}
501524

502-
if (replicatedEnvironment != null) {
525+
if (envToClose != null) {
503526
try {
504-
// Finally, close the store and environment.
505-
replicatedEnvironment.close();
506-
replicatedEnvironment = null;
527+
envToClose.close();
507528
} catch (DatabaseException exception) {
508529
LOG.error("Error closing replicatedEnvironment", exception);
509530
}

0 commit comments

Comments
 (0)