fix: prevent unbounded memory leak during concurrent validation under… by KyleOps · Pull Request #239 · hapifhir/org.hl7.fhir.validator-wrapper

KyleOps · 2026-02-26T15:01:49Z

This PR fixes a memory and thread exhaustion vulnerability in ValidationServiceFactoryImpl.kt.

Under high concurrency, if the JVM's free memory dips below the engineReloadThreshold, the getValidationService() method re-initializes ValidationService. Because this check was not synchronized, a burst of incoming validation requests during a low-memory state would all evaluate the condition as true simultaneously.

This resulted in an unbounded loop where the application spawned dozens of massive ValidationEngine instances and hundreds of detached background downloader threads to reload presets. The JVM quickly spiraled into maxed CPU usage, gigabytes of leaked memory, and eventual crashes.

Impact & Context

We discovered this in while utilizing the validator API in conjunction with au-fhir-inferno. Under heavy testing load, the validator-wrapper would hit the memory threshold and trigger this un-synchronized cache reset. The true impact was three-fold:

Severe API Latency: The heavy I/O of re-building the cache blocks threads for several seconds. When 5 requests initialize 5 engines simultaneously, the CPU maxes out and all incoming requests hang, causing massive latency spikes (5-20s+) and upstream application timeouts.
Caching "Thrashing": Under load, if the threshold is reached frequently, the server spins up a death spiral of actively deleting and downloading large FHIR packages, completely negating the benefit of an in-memory cache.
Thread, File & Network Exhaustion: Even if the environment's memory can survive the spike, the background threads consume vast network bandwidth pulling .tgz files from packages.fhir.org and leak file descriptors on the host container.

Changes

Applied @Volatile to validationService.
Added a ReentrantLock (reloadLock) to ensure only one thread can lock and initialize the cache.
Added a 5-minute cooldown (RELOAD_COOLDOWN_MS) so the engine isn't perpetually thrashed under sustained heavy load.
Concurrent threads that hit the low memory state while the lock is acquired will now safely log and skip initialization, returning the existing engine and preventing the death spiral.

… load Introduced a ReentrantLock and rate-limiting to `getValidationService()` in ValidationServiceFactoryImpl.kt. When the JVM hits the low memory threshold during high concurrency, the application previously allowed all concurrent requests to spin up parallel initialization threads, spiraling memory usage and crashing the application. This fix ensures only a single thread can trigger the engine reload within a 5-minute cooldown period, allowing concurrent requests to queue or utilize the existing engine rather than flooding the JVM with unbounded heavy cache initialization threads.

dotasek · 2026-04-06T15:55:35Z

/AzurePipelines run

azure-pipelines · 2026-04-06T15:55:45Z

Azure Pipelines successfully started running 1 pipeline(s).

dotasek · 2026-04-06T21:18:19Z

@KyleOps thank you for this contribution.

I tested this locally and discovered that even with this in place, I get out of memory errors. I suspect that what's happening is that a new ValidationService is being instantiated before the previous one can be collected in garbage collection.

I made a change while reviewing that addresses this:

// New convenience function
 private fun createEmptyValidationServiceInstance() : ValidationService {
        val service = object : ValidationService() {};
        return service;
    }

// Logic change
     if (java.lang.Runtime.getRuntime().freeMemory() < validationServiceConfig.engineReloadThreshold &&
                        (now - lastReloadTime) > RELOAD_COOLDOWN_MS) {
                        
                        println(
                            "Free memory ${
                                java.lang.Runtime.getRuntime().freeMemory()
                            } is less than ${validationServiceConfig.engineReloadThreshold} and cooldown passed. Re-initializing validationService"
                        );
                        //First create an empty validationService so the older one can be garbage collected before the
                        //new one is constructed.
                        validationService = createEmptyValidationServiceInstance()
                        //Then 'suggest' this might be a good time to do garbage collection
                        System.gc()
                        //Then build our new service instance.
                        validationService = createValidationServiceInstance()
                        lastReloadTime = System.currentTimeMillis()
                    }

Does that make sense? Does it work with your use case as well?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent unbounded memory leak during concurrent validation under…#239

fix: prevent unbounded memory leak during concurrent validation under…#239
KyleOps wants to merge 1 commit intohapifhir:masterfrom
KyleOps:fix/validation-service-memory-leak

KyleOps commented Feb 26, 2026 •

edited

Loading

Uh oh!

dotasek commented Apr 6, 2026

Uh oh!

azure-pipelines Bot commented Apr 6, 2026

Uh oh!

dotasek commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KyleOps commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Impact & Context

Changes

Uh oh!

dotasek commented Apr 6, 2026

Uh oh!

azure-pipelines Bot commented Apr 6, 2026

Uh oh!

dotasek commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KyleOps commented Feb 26, 2026 •

edited

Loading