optimize(api): add warning logs for load-based request rejection#2972
optimize(api): add warning logs for load-based request rejection#2972contrueCT wants to merge 5 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds operator-visible warning logs when LoadDetectFilter rejects requests due to high worker load or low free memory (503s), and introduces unit tests to cover the new rejection behavior plus whitelist/healthy pass-through.
Changes:
- Add
WARNlogging for worker-load and low-memory request rejections inLoadDetectFilter. - Add
LoadDetectFilterTestunit coverage for whitelist bypass, busy rejection, low-memory rejection, and healthy requests. - Register the new unit test in
UnitTestSuite.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| hugegraph-server/hugegraph-api/src/main/java/org/apache/hugegraph/api/filter/LoadDetectFilter.java | Adds rejection WARN logs and minor refactor (captures current load). |
| hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/api/filter/LoadDetectFilterTest.java | New unit tests for whitelist/busy/low-memory/healthy scenarios. |
| hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/UnitTestSuite.java | Adds LoadDetectFilterTest to the unit suite. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
VGalaxies
left a comment
There was a problem hiding this comment.
Thanks for the update. I left two small review comments; CI looks green on my side, but I think these are still worth addressing.
Changes made:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| LOG.warn("Rejected request due to low free memory, method={}, path={}, " + | ||
| "presumableFreeMemMB={}, recheckedFreeMemMB={}, gcTriggered={}, " + | ||
| "minFreeMemoryMB={}", | ||
| context.getMethod(), context.getUriInfo().getPath(), |
There was a problem hiding this comment.
当前逻辑在 GC 后直接抛出异常,并没有重新检查内存是否已恢复到阈值以上。既然已经在这里做了 recheckedFreeMem 计算,建议更进一步:GC 后重新检查内存,如果已恢复则放行请求,减少不必要的 503。
当前行为:检测到内存不足 → 尝试 GC → 记录 GC 后的内存 → 仍然抛异常
建议行为:检测到内存不足 → 尝试 GC → 重新检查 → 如果恢复则放行
| context.getMethod(), context.getUriInfo().getPath(), | |
| boolean gcTriggered = this.gcIfNeeded(); | |
| if (gcTriggered) { | |
| long allocatedMemAfterGc = Runtime.getRuntime().totalMemory() - | |
| Runtime.getRuntime().freeMemory(); | |
| long freeMemAfterGc = (Runtime.getRuntime().maxMemory() - | |
| allocatedMemAfterGc) / Bytes.MB; | |
| if (freeMemAfterGc >= minFreeMemory) { | |
| this.logRejectWarning( | |
| "Low memory recovered after GC, method={}, path={}, " + | |
| "beforeFreeMB={}, afterFreeMB={}", | |
| context.getMethod(), context.getUriInfo().getPath(), | |
| presumableFreeMem, freeMemAfterGc); | |
| return; | |
| } | |
| } | |
| this.logRejectWarning( | |
| "Rejected request due to low free memory, method={}, path={}, " + | |
| "presumableFreeMemMB={}, gcTriggered={}, minFreeMemoryMB={}", | |
| context.getMethod(), context.getUriInfo().getPath(), | |
| presumableFreeMem, gcTriggered, minFreeMemory); |
| List<PathSegment> segments = context.getUriInfo().getPathSegments(); | ||
| E.checkArgument(!segments.isEmpty(), "Invalid request uri '%s'", | ||
| context.getUriInfo().getPath()); | ||
| String rootPath = segments.get(0).getPath(); |
There was a problem hiding this comment.
gcIfNeeded 从 private static 改为 protected 实例方法
原方法是 private static void gcIfNeeded(),现改为 protected boolean gcIfNeeded()。变更影响:
- 扩大可见性(private → protected),破坏了封装性
- 从静态方法变为实例方法
- 主要目的似乎是为了测试时 override
建议保持 private 可见性,如果需要测试可测试性,可以:
- 使用 package-private 可见性 +
@VisibleForTesting注解 - 或者通过注入策略接口实现
同样的建议也适用于 allowRejectLog() 方法。
| @@ -54,11 +58,40 @@ public class LoadDetectFilter implements ContainerRequestFilter { | |||
| private static final RateLimiter GC_RATE_LIMITER = | |||
| RateLimiter.create(1.0 / 30); | |||
|
|
|||
There was a problem hiding this comment.
REJECT_LOG_RATE_LIMITER
高负载拒绝和低内存拒绝共享同一个 RateLimiter(每秒 1 个 permit)。这意味着:
- 如果高负载拒绝消耗了 permit,紧接着的低内存拒绝就不会被记录(反之亦然)
- 运维人员在同时出现高负载和低内存时,可能只看到一种告警,遗漏另一种
建议为两种拒绝原因使用独立的 RateLimiter:
private static final RateLimiter BUSY_LOG_LIMITER = RateLimiter.create(1.0);
private static final RateLimiter MEMORY_LOG_LIMITER = RateLimiter.create(1.0);| return REJECT_LOG_RATE_LIMITER.tryAcquire(); | ||
| } | ||
|
|
||
| protected void logRejectWarning(String message, Object... args) { |
There was a problem hiding this comment.
logRejectWarning() 与直接调用 allowRejectLog() 的不一致
高负载路径使用封装的 logRejectWarning(),但低内存路径直接调用 allowRejectLog() + LOG.warn():
// 高负载路径
this.logRejectWarning("Rejected request due to high worker load...");
// 低内存路径
boolean shouldLog = this.allowRejectLog();
if (shouldLog) { LOG.warn(...); }两条路径的日志方式不一致,降低了可读性。建议统一为同一种模式。另外 logRejectWarning() 在生产代码中只被调用了一次,是否真的需要抽取为独立方法值得考虑。
| boolean shouldLog = this.allowRejectLog(); | ||
| boolean gcTriggered = this.gcIfNeeded(); | ||
| if (shouldLog) { | ||
| long allocatedMemAfterCheck = Runtime.getRuntime().totalMemory() - |
There was a problem hiding this comment.
🧹 recheckedFreeMem 在 gcTriggered=false 时是冗余信息
当 gcTriggered=false 时,recheckedFreeMem 和 presumableFreeMem 几乎完全相同(两次采样之间几乎没有时间差),日志中两个近似相同的值会造成困惑。
建议只在 gcTriggered=true 时才计算并记录 recheckedFreeMem,否则省略此字段。
Purpose of the PR
Main Changes
Log busy-worker and low-memory rejections in LoadDetectFilter before returning ServiceUnavailableException so operators can diagnose 503 responses from server-side logs.
Add unit coverage for whitelist bypass, busy rejection,
low-memory rejection, and healthy request pass-through, and register the new test in UnitTestSuite.
Verifying these changes
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need