Skip to content

optimize(api): add warning logs for load-based request rejection#2972

Open
contrueCT wants to merge 5 commits intoapache:masterfrom
contrueCT:fix/issue-2747-load-detect-logs
Open

optimize(api): add warning logs for load-based request rejection#2972
contrueCT wants to merge 5 commits intoapache:masterfrom
contrueCT:fix/issue-2747-load-detect-logs

Conversation

@contrueCT
Copy link
Copy Markdown
Contributor

Purpose of the PR

Main Changes

Log busy-worker and low-memory rejections in LoadDetectFilter before returning ServiceUnavailableException so operators can diagnose 503 responses from server-side logs.

Add unit coverage for whitelist bypass, busy rejection,
low-memory rejection, and healthy request pass-through, and register the new test in UnitTestSuite.

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • xxx

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds operator-visible warning logs when LoadDetectFilter rejects requests due to high worker load or low free memory (503s), and introduces unit tests to cover the new rejection behavior plus whitelist/healthy pass-through.

Changes:

  • Add WARN logging for worker-load and low-memory request rejections in LoadDetectFilter.
  • Add LoadDetectFilterTest unit coverage for whitelist bypass, busy rejection, low-memory rejection, and healthy requests.
  • Register the new unit test in UnitTestSuite.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
hugegraph-server/hugegraph-api/src/main/java/org/apache/hugegraph/api/filter/LoadDetectFilter.java Adds rejection WARN logs and minor refactor (captures current load).
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/api/filter/LoadDetectFilterTest.java New unit tests for whitelist/busy/low-memory/healthy scenarios.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/UnitTestSuite.java Adds LoadDetectFilterTest to the unit suite.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@VGalaxies VGalaxies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I left two small review comments; CI looks green on my side, but I think these are still worth addressing.

@contrueCT
Copy link
Copy Markdown
Contributor Author

Thanks for the update. I left two small review comments; CI looks green on my side, but I think these are still worth addressing.

⚠️ The busy-load branch is rate-limited, but this low-memory WARN will still fire on every rejected request. Under sustained memory pressure that can flood the logs and add more I/O pressure to an already stressed server. Please apply the same REJECT_LOG_RATE_LIMITER here, or route both rejection logs through a shared sampling helper.
⚠️ This test drives the real System.gc() path by forcing the low-memory branch. That makes the suite slower and can make it behave differently on constrained CI nodes or JVMs with different GC ergonomics. Consider adding a test seam for the GC step so the test can stub it and only verify the rejection behavior.

Changes made:

  • Applied the same reject-log rate limiting to the low-memory branch, and routed both rejection paths through a shared warning helper so the sampling behavior is consistent.
  • Added a small test seam for gcIfNeeded() / reject-log sampling so the unit test no longer depends on a real System.gc() call.
  • Expanded LoadDetectFilterTest to verify:
    • busy rejection emits a warning log
    • low-memory rejection emits a warning log
    • whitelist / healthy paths do not emit rejection logs
    • reject-log sampling suppresses repeated warning logs as expected

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 2, 2026
@contrueCT contrueCT requested a review from imbajin April 2, 2026 13:22
LOG.warn("Rejected request due to low free memory, method={}, path={}, " +
"presumableFreeMemMB={}, recheckedFreeMemMB={}, gcTriggered={}, " +
"minFreeMemoryMB={}",
context.getMethod(), context.getUriInfo().getPath(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ 内存拒绝路径中的逻辑问题:GC 后未重新检查内存

当前逻辑在 GC 后直接抛出异常,并没有重新检查内存是否已恢复到阈值以上。既然已经在这里做了 recheckedFreeMem 计算,建议更进一步:GC 后重新检查内存,如果已恢复则放行请求,减少不必要的 503。

当前行为:检测到内存不足 → 尝试 GC → 记录 GC 后的内存 → 仍然抛异常
建议行为:检测到内存不足 → 尝试 GC → 重新检查 → 如果恢复则放行

Suggested change
context.getMethod(), context.getUriInfo().getPath(),
boolean gcTriggered = this.gcIfNeeded();
if (gcTriggered) {
long allocatedMemAfterGc = Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory();
long freeMemAfterGc = (Runtime.getRuntime().maxMemory() -
allocatedMemAfterGc) / Bytes.MB;
if (freeMemAfterGc >= minFreeMemory) {
this.logRejectWarning(
"Low memory recovered after GC, method={}, path={}, " +
"beforeFreeMB={}, afterFreeMB={}",
context.getMethod(), context.getUriInfo().getPath(),
presumableFreeMem, freeMemAfterGc);
return;
}
}
this.logRejectWarning(
"Rejected request due to low free memory, method={}, path={}, " +
"presumableFreeMemMB={}, gcTriggered={}, minFreeMemoryMB={}",
context.getMethod(), context.getUriInfo().getPath(),
presumableFreeMem, gcTriggered, minFreeMemory);

List<PathSegment> segments = context.getUriInfo().getPathSegments();
E.checkArgument(!segments.isEmpty(), "Invalid request uri '%s'",
context.getUriInfo().getPath());
String rootPath = segments.get(0).getPath();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ 方法可见性变更:gcIfNeededprivate static 改为 protected 实例方法

原方法是 private static void gcIfNeeded(),现改为 protected boolean gcIfNeeded()。变更影响:

  1. 扩大可见性(private → protected),破坏了封装性
  2. 从静态方法变为实例方法
  3. 主要目的似乎是为了测试时 override

建议保持 private 可见性,如果需要测试可测试性,可以:

  • 使用 package-private 可见性 + @VisibleForTesting 注解
  • 或者通过注入策略接口实现

同样的建议也适用于 allowRejectLog() 方法。

@@ -54,11 +58,40 @@ public class LoadDetectFilter implements ContainerRequestFilter {
private static final RateLimiter GC_RATE_LIMITER =
RateLimiter.create(1.0 / 30);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ 两条拒绝路径共享同一个 REJECT_LOG_RATE_LIMITER

高负载拒绝和低内存拒绝共享同一个 RateLimiter(每秒 1 个 permit)。这意味着:

  • 如果高负载拒绝消耗了 permit,紧接着的低内存拒绝就不会被记录(反之亦然)
  • 运维人员在同时出现高负载和低内存时,可能只看到一种告警,遗漏另一种

建议为两种拒绝原因使用独立的 RateLimiter:

private static final RateLimiter BUSY_LOG_LIMITER = RateLimiter.create(1.0);
private static final RateLimiter MEMORY_LOG_LIMITER = RateLimiter.create(1.0);

return REJECT_LOG_RATE_LIMITER.tryAcquire();
}

protected void logRejectWarning(String message, Object... args) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ logRejectWarning() 与直接调用 allowRejectLog() 的不一致

高负载路径使用封装的 logRejectWarning(),但低内存路径直接调用 allowRejectLog() + LOG.warn()

// 高负载路径
this.logRejectWarning("Rejected request due to high worker load...");

// 低内存路径
boolean shouldLog = this.allowRejectLog();
if (shouldLog) { LOG.warn(...); }

两条路径的日志方式不一致,降低了可读性。建议统一为同一种模式。另外 logRejectWarning() 在生产代码中只被调用了一次,是否真的需要抽取为独立方法值得考虑。

boolean shouldLog = this.allowRejectLog();
boolean gcTriggered = this.gcIfNeeded();
if (shouldLog) {
long allocatedMemAfterCheck = Runtime.getRuntime().totalMemory() -
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 recheckedFreeMemgcTriggered=false 时是冗余信息

gcTriggered=false 时,recheckedFreeMempresumableFreeMem 几乎完全相同(两次采样之间几乎没有时间差),日志中两个近似相同的值会造成困惑。

建议只在 gcTriggered=true 时才计算并记录 recheckedFreeMem,否则省略此字段。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Add critical log statements.

4 participants