Skip to content

fix: Task250604UpdateFolderInodes hangs indefinitely on large databases (N+1 queries + ALTER TABLE deadlock) #35392

@dsilvam

Description

@dsilvam

What is the problem?

When upgrading from dotCMS 24.12.27 to main (or e.g. Docker image 26.04.02-01_f09f5ba), the startup upgrade task Task250604UpdateFolderInodes hangs indefinitely and never completes. A cloud customer reported 20+ hours without completion, blocking the upgrade entirely.

Root Cause Analysis

Two bugs compound to cause this:

Bug 1 — N×M query pattern in FixTask00090RecreateMissingFoldersInParentPath

executeFix() issues one SELECT COUNT(1) DB query per path segment per distinct parent_path in the identifier table. For a large database with many identifiers and deep folder paths, this results in hundreds of thousands of individual DB round-trips under startup load.

Example: 100K distinct paths × avg depth 3 = 300K individual SELECT queries.

Bug 2 — ALTER TABLE deadlock in fixFolderIds()

fixFolderIds() runs:

ALTER TABLE folder DROP CONSTRAINT IF EXISTS folder_identifier_fk;
ALTER TABLE folder ADD CONSTRAINT folder_identifier_fk ... DEFERRABLE;

This DDL requires an exclusive lock on the folder table. However, executeFix() leaves a Hibernate thread-local transaction open (idle in transaction) after calling HibernateUtil.save() in createFixAudit(). This idle transaction holds locks that prevent the ALTER TABLE from ever acquiring the exclusive lock, exhausting the connection pool and hanging forever.

Confirmed via pg_stat_activity:

  • PID A: idle in transaction — last query insert into fixes_audit
  • PID B: ALTER TABLE folder DROP CONSTRAINTwaiting on relation lock held by PID A

Fix

1. FixTask00090RecreateMissingFoldersInParentPath

Pre-load all existing folder identifier keys into a HashSet<String> with a single query upfront. Replace the per-segment SELECT COUNT(1) DB calls with O(1) Set.contains() lookups. Update the Set when new folders are created to prevent duplicate creation in the same run.

Before: N×M DB queries (N = distinct paths, M = avg path depth)
After: 1 query upfront + in-memory lookups

2. Task250604UpdateFolderInodes

After executeFix() returns, explicitly call HibernateUtil.commitTransaction() + DbConnectionFactory.closeSilently() to commit the open Hibernate transaction and release its locks before fixFolderIds() runs its DDL.

Steps to Reproduce

  1. Have a dotCMS database with a large number of identifiers (100K+) and folders where inode ≠ identifier
  2. Upgrade from 24.12.27 to current main
  3. Observe Task250604UpdateFolderInodes in startup logs — it never completes

Impact

  • Upgrade from 24.12.27 is completely blocked for large customers
  • Customers db: 20+ hours (never completed before fix)
  • After fix: completes in seconds

Files Changed

  • dotCMS/src/main/java/com/dotmarketing/fixtask/tasks/FixTask00090RecreateMissingFoldersInParentPath.java
  • dotCMS/src/main/java/com/dotmarketing/startup/runonce/Task250604UpdateFolderInodes.java

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions