Add OOM pause mechanism to pause query threads before killing on critical heap#18163
Add OOM pause mechanism to pause query threads before killing on critical heap#18163yashmayya merged 6 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18163 +/- ##
============================================
+ Coverage 63.32% 63.49% +0.16%
Complexity 1627 1627
============================================
Files 3238 3243 +5
Lines 197011 197248 +237
Branches 30466 30510 +44
============================================
+ Hits 124762 125247 +485
+ Misses 62244 61958 -286
- Partials 10005 10043 +38
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xiangfu0
left a comment
There was a problem hiding this comment.
High-signal issue inline.
ceb1cb6 to
677e123
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| // NOTE: In production code, threadContext should never be null. It might be null in tests when QueryThreadContext | ||
| // is not set up. | ||
| if (threadContext != null) { | ||
| threadContext.waitIfPausedInternal(); |
There was a problem hiding this comment.
Discussed offline. Let's still check termination first, then when entering the pause, check termination again before exiting the pause. Pause should be done after the sampling
There was a problem hiding this comment.
Just realized that the second termination check should only be done if we're actually pausing and exiting the pause, and not on the regular fast path. Making this change now.
| private final ReentrantLock _pauseLock = new ReentrantLock(); | ||
| private final Condition _pauseCondition = _pauseLock.newCondition(); | ||
| private volatile boolean _pauseActive = false; | ||
| private volatile long _pauseDeadlineMs = 0; |
There was a problem hiding this comment.
(nit) Doesn't need to be volatile as it is maintained as a local state for the watcher.
Consider adding some comments describing the contract for them:
_pauseActive is shared by both watcher and query threads
_pauseDeadlineMs is local state for the watcher (similar to _triggeringLevel)
There was a problem hiding this comment.
Yes good point, this is left over from the previous implementation where the other threads were also checking the deadline to unpause themselves.
| clearPause(); | ||
| } | ||
| // else: still in grace period, skip kill this cycle | ||
| } else if (config.isOomPreQueryKillPauseEnabled() |
There was a problem hiding this comment.
This only pauses when the previous cycle was below critical. If the config is enabled while the heap is already critical/panic, the watcher still takes the immediate-kill path until usage drops below critical and rises again, so the advertised dynamic toggle does not take effect during an active incident. Please make the pause arm based on the current config/state, not only on the transition edge.
There was a problem hiding this comment.
I don't think this extreme edge case is worth introducing additional complexity for IMO, it's easier to reason about this way.
|
Docs PR opened: pinot-contrib/pinot-docs#750 |
Adds documentation for the OOM pause mechanism feature from apache/pinot#18163. This new feature pauses query execution threads when heap transitions to critical level, giving the JVM a chance to reclaim memory through garbage collection before resorting to query kills. - Explains the pause behavior (on critical heap transition, pause for timeout, call gc, resume if recovered, else proceed with kill) - Documents the two new cluster configs: accounting.oom.critical.query.pause.enabled and accounting.oom.critical.query.pause.timeout.ms - Notes that it's off by default and dynamically configurable
Summary
System.gc()(only once, on first transition to critical), giving the JVM breathing room to reclaim memory before resorting to query kills.Overhead
waitIfPaused()is called at every sampling checkpoint and costs a single volatile read of_pauseActive. No lock, no contention.ReentrantLock+Condition. Lock contention is not a concern because:Condition.await(), releasing it immediately - hold time is near-zero.activatePause()(watcher thread) is lock-free - it writes two volatiles with correct ordering.Configuration
Both are dynamically updatable via Helix cluster config.
State transitions
System.gc()