[AI-8th] Support Configurable Retry Exceptions#1560
Conversation
EvenLjj
left a comment
There was a problem hiding this comment.
Review 总结
方向正确、测试也写得不错,但默认行为变更、每次调用都反射加载类、配置非法直接打断调用这三类问题必须修,否则会带来线上风险。下面按严重程度分级列出。
🔴 Blocker(必须修)
B1. resolveRetryExceptions 在每次 RPC 调用都做 Class.forName — 性能问题
FailoverCluster.doInvoke 入口 + RetryableExceptionHelper.java:56-70
// FailoverCluster.doInvoke 每次调用都执行:
List<Class<? extends Throwable>> retryableExceptions =
RetryableExceptionHelper.resolveRetryExceptions(
consumerConfig.getMethodRetryExceptions(methodName));
// helper 内部:
for (String className : classNames) {
exceptionClass = ClassUtils.forName(className); // ← 反射 + split 字符串
...
}每个 RPC 请求都会拆字符串 + 反射加载 N 个类。RPC 是高频路径,这是显著的额外开销。
建议:在 ConsumerConfig#setRetryExceptions(...) / MethodConfig#setRetryExceptions(...) 时一次性解析并缓存 List<Class<? extends Throwable>>,或在 RetryableExceptionHelper 内用 ConcurrentHashMap<String, List<Class<? extends Throwable>>> 缓存。前者更优,可以在配置阶段 fail-fast。
B2. 配置非法时抛 IllegalArgumentException 直接打断每次 RPC 调用
RetryableExceptionHelper.java:62-69
try {
exceptionClass = ClassUtils.forName(className);
} catch (RuntimeException e) {
throw new IllegalArgumentException("Failed to load retry exception class: " + className, e);
}
if (!Throwable.class.isAssignableFrom(exceptionClass)) {
throw new IllegalArgumentException("Retry exception class must extend Throwable: " + className);
}- 如果用户拼错了类名 / 类不在 classpath,每次 RPC 调用都抛异常,整个接口直接不可用。
- 这个 IAE 会进到
FailoverCluster的catch (Exception e),又被包装成SofaRpcException(CLIENT_UNDECLARED_ERROR),再次走shouldRetry—— 解析仍然会失败,日志会被刷爆。
建议:
- 与 B1 配套:把校验前移到
setRetryExceptions(...)配置阶段,运行期不会再抛。 - 即便保留运行期解析,也应该 WARN + 跳过该非法类名,而不是中断整个调用。
B3. 默认行为破坏:业务返回 Throwable 不再"原样返回"
FailoverCluster.java:73-82(diff 修改后)
原行为:只要 response != null 就 return response,业务异常直接交给上层处理。
PR 行为:先 RetryableExceptionHelper.shouldRetry(response, ...) 判断;如果 appResponse instanceof Throwable 且匹配重试列表,就重试。
副作用:
- 隐式的 SERVER_BIZ 重试:现在 RPC 服务端业务异常会通过
HEAD_RESPONSE_EXCEPTION暴露真实类名给客户端,客户端只要配置了对应类(哪怕是RuntimeException这种泛型父类),就会盲目重试非幂等的业务异常,可能造成订单/支付双写。 - 配置
RuntimeException这种宽匹配直接命中所有业务异常,文档里必须明确警告,且建议加默认黑名单(如IllegalArgumentException、参数校验类异常默认不应重试)。
建议:
- 在
ConsumerConfig#setRetryExceptions的 javadoc 和 README/文档里明确警告幂等性要求。 - 考虑增加一个独立开关
retryOnBusinessException,默认关闭,避免存量用户升级后行为静默改变。
🟠 Major(强烈建议修)
M1. time 计数器语义被悄悄改变
FailoverCluster.java:73-130
原代码 time 表示"已失败次数",只在 catch 里 time++。PR 在"response 非空但被判 retryable"分支里也 time++,于是 time 变成"已尝试次数"。整体可以收敛,但:
- 没有任何注释说明这层语义变化;
- 三个
time++散落在不同分支,未来维护者很容易漏改; do { ... } while (time <= retries)的边界并未在测试里覆盖retries=0 + response 需重试的极端场景。
建议:抽 private boolean shouldContinue(int time) 之类的方法,并补一个注释说明 time 语义。同时补一条测试:retries=0 + 第一次返回 appResponse instanceof CustomRetryException → 应直接返回(不重试),断言 invokeTimes == 1。
M2. matchesRemoteException 在 Class.forName 失败时仅做字符串全等回退,与本地匹配语义不一致
RetryableExceptionHelper.java:112-138
} catch (RuntimeException e) {
return matchesClassName((String) exceptionClassName, retryableExceptions);
}
// matchesClassName 仅比 retryableException.getName().equals(exceptionClassName)本地匹配支持父类 + cause 链,远端类加载失败时退化为"全限定名相等",用户配 RuntimeException 时永远命中不了远端的 MyBizException。
建议:要么明确文档说明远端匹配仅支持精确类名;要么至少加 WARN 日志告知"远端类名 xxx 在本地无法加载,已退化为字符串匹配"。
M3. HTTP / Bolt 写出 HEAD_RESPONSE_EXCEPTION — 信息泄漏 + 跨版本兼容
Http1ServerTask、Http2ServerTask、SofaRpcSerialization
headers.set(RemotingConstants.HEAD_RESPONSE_EXCEPTION, throwable.getClass().getName());- 把服务端内部包结构无差别暴露到客户端,HTTP 场景常见跨组织调用,存在信息泄漏。
- 老版本客户端拿到这个 prop 会忽略,没问题;但反过来新客户端 + 老服务端 → 永远拿不到这个 prop,远端匹配能力失效,文档要写清这个限制。
- 是否所有写出业务异常的协议路径都加了?建议作者在 PR 描述里补一份"已覆盖路径清单"。
建议:增加配置开关 exposeRemoteExceptionClass(服务端侧),按团队策略决定是否开启;README/文档说明此能力依赖双端版本升级。
M4. cause 链匹配未防护循环引用
RetryableExceptionHelper.java:97-107
while (current != null) {
...
current = current.getCause();
}JDK 的 Throwable.initCause 有自防护,但反序列化构造的 Throwable 可绕过,理论上存在死循环风险。
建议:用 IdentityHashMap 记录已访问,或限制最大深度(如 16)。
🟡 Minor(建议)
FailoverCluster.doInvoke圈复杂度 ~16,建议抽handleResponse / handleSofaRpcException / handleOtherException三段。- 变量命名:
exceptionResponse→lastRetryableResponse更准确。 - 菱形运算符:
new ArrayList<Class<? extends Throwable>>(...)可改new ArrayList<>(...)。 - 缺关键注释:
throwable与exceptionResponse互斥置 null 的设计意图需要文字说明。 FailoverClusterTest没有断言invokedProviderInfos累积情况,建议补assertEquals(2, invokedProviderInfos.size())验证 select 不会重复。
✅ 优点
- 职责拆分清晰:
RetryableExceptionHelper工具类单一职责,便于单测。 - 保留默认行为:
isDefaultRetryable完整保留SERVER_BUSY/CLIENT_TIMEOUT自动重试。 - 测试覆盖度比较好(约 85%):覆盖了 method override、cause 链、远端类名、配置非法、retries 用尽等关键场景。
- HTTP 场景的远端类名传递思路正确,解决了协议层 Throwable 不可序列化的痛点。
建议补充的测试
// 1. 直接的父类匹配
@Test public void testParentClassMatch() // 配 RuntimeException,抛 IAE
// 2. retries=0 但 response 需重试
@Test public void testNoRetryButReturnRetryableResponse()
// 3. provider 累积
Assert.assertEquals(2, cluster.getInvokedProviderInfos().size());
// 4. 配置非法不应在每次调用时再次抛(修完 B2 后)
@Test public void testInvalidConfigDoesNotBreakRuntime()
// 5. cause 链自引用防护(修完 M4 后)
@Test public void testCauseLoopProtection()There was a problem hiding this comment.
Pull request overview
Adds configurable retryable-exception support to SOFARPC failover retry logic, including matching configured exception classes (and causes) and enabling HTTP transports to propagate original business-exception class metadata for retry decisions.
Changes:
- Introduces
retryExceptionsat consumer and method config levels, plus a new config key constant. - Extends failover retry logic to retry on configured exception types and business-exception responses, with helper utilities and tests.
- Propagates HTTP response metadata (
HEAD_RESPONSE_EXCEPTION) so clients can apply retry rules even when the originalThrowableisn’t serialized.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| remoting/remoting-http/src/main/java/com/alipay/sofa/rpc/transport/http/AbstractHttpClientHandler.java | Reads and forwards business exception class metadata from HTTP headers into SofaResponse props. |
| remoting/remoting-http/src/main/java/com/alipay/sofa/rpc/server/http/Http2ServerTask.java | Emits business exception class name into HTTP/2 headers for app errors. |
| remoting/remoting-http/src/main/java/com/alipay/sofa/rpc/server/http/Http1ServerTask.java | Emits business exception class name into HTTP/1.1 headers for app errors. |
| remoting/remoting-http/src/main/java/com/alipay/sofa/rpc/server/http/AbstractHttpServerTask.java | Plumbs the thrown business exception into sendAppError(...) so transports can emit metadata. |
| core/api/src/test/java/com/alipay/sofa/rpc/config/MethodConfigTest.java | Adds unit test coverage for MethodConfig.retryExceptions. |
| core/api/src/test/java/com/alipay/sofa/rpc/config/ConsumerConfigTest.java | Adds unit test coverage for consumer vs method override of retryExceptions. |
| core/api/src/main/java/com/alipay/sofa/rpc/config/MethodConfig.java | Adds method-level retryExceptions configuration property. |
| core/api/src/main/java/com/alipay/sofa/rpc/config/ConsumerConfig.java | Adds consumer-level retryExceptions and method-level resolution helper. |
| core/api/src/main/java/com/alipay/sofa/rpc/common/RpcConstants.java | Adds CONFIG_KEY_RETRY_EXCEPTIONS. |
| core/api/src/main/java/com/alipay/sofa/rpc/common/RemotingConstants.java | Adds HEAD_RESPONSE_EXCEPTION header constant. |
| core-impl/client/src/test/java/com/alipay/sofa/rpc/client/RetryableExceptionHelperTest.java | Tests helper parsing and retry matching (including remote exception class name). |
| core-impl/client/src/test/java/com/alipay/sofa/rpc/client/FailoverClusterTest.java | Tests failover retry behaviors with configured retry exceptions and response-based exceptions. |
| core-impl/client/src/main/java/com/alipay/sofa/rpc/client/RetryableExceptionHelper.java | New helper for parsing configured exceptions and matching local/remote failures. |
| core-impl/client/src/main/java/com/alipay/sofa/rpc/client/FailoverCluster.java | Integrates configurable exception retry logic into failover flow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Throwable throwable = new SofaRpcException(RpcErrorType.SERVER_BIZ, errorMsg); | ||
| response.setAppResponse(throwable); | ||
| String exceptionClass = headers.get(RemotingConstants.HEAD_RESPONSE_EXCEPTION); | ||
| if (exceptionClass != null) { |
There was a problem hiding this comment.
When propagating HEAD_RESPONSE_EXCEPTION, consider ignoring blank/empty header values (not just null) to avoid storing meaningless response props (e.g., use a blank check before addResponseProp).
| if (exceptionClass != null) { | |
| if (StringUtils.isNotBlank(exceptionClass)) { |
| /** | ||
| * Sets retry exceptions. | ||
| * | ||
| * @param retryExceptions the retry exceptions | ||
| * @return the retry exceptions | ||
| */ | ||
| public ConsumerConfig<T> setRetryExceptions(String retryExceptions) { | ||
| this.retryExceptions = retryExceptions; | ||
| return this; | ||
| } |
There was a problem hiding this comment.
Javadoc mismatch: this setter returns ConsumerConfig<T> for fluent chaining, but the @return tag says "the retry exceptions". Update the return description to reflect the actual return type/value (e.g., "this consumer config").
| Class<?> exceptionClass; | ||
| try { | ||
| exceptionClass = ClassUtils.forName(className); | ||
| } catch (RuntimeException e) { | ||
| throw new IllegalArgumentException("Failed to load retry exception class: " + className, e); | ||
| } | ||
| if (!Throwable.class.isAssignableFrom(exceptionClass)) { | ||
| throw new IllegalArgumentException("Retry exception class must extend Throwable: " + className); | ||
| } | ||
| result.add(castThrowableClass(exceptionClass)); | ||
| } |
There was a problem hiding this comment.
resolveRetryExceptions() uses ClassUtils.forName(className) which initializes classes (initialize=true). For user-provided exception class names this can trigger static initializers and adds avoidable overhead; prefer loading without initialization (e.g., ClassUtils.forName(className, false)), then validate Throwable assignability.
| Object exceptionClassName = response.getResponseProp(RemotingConstants.HEAD_RESPONSE_EXCEPTION); | ||
| if (!(exceptionClassName instanceof String) || StringUtils.isBlank((String) exceptionClassName)) { | ||
| return false; | ||
| } | ||
| Class<?> exceptionClass; | ||
| try { | ||
| exceptionClass = ClassUtils.forName((String) exceptionClassName); | ||
| } catch (RuntimeException e) { | ||
| return matchesClassName((String) exceptionClassName, retryableExceptions); | ||
| } |
There was a problem hiding this comment.
matchesRemoteException() attempts to ClassUtils.forName() the exception class name provided by the remote peer. Because ClassUtils.forName() initializes classes, this can execute static initializers based on untrusted input. Consider loading with initialization disabled (or avoiding class loading entirely and matching only against configured class names) to prevent side effects from remote-controlled headers.
| List<Class<? extends Throwable>> retryableExceptions = RetryableExceptionHelper | ||
| .resolveRetryExceptions(consumerConfig.getMethodRetryExceptions(methodName)); |
There was a problem hiding this comment.
resolveRetryExceptions(...) performs class loading and list building on every invocation of doInvoke(). Since retry exception config is method-scoped and typically static, consider resolving/caching the parsed List<Class<? extends Throwable>> in the consumer config value cache (or a per-method cache in the cluster) to avoid repeated class-loading work on hot paths.
好的,我会继续往建议的方向提交 |
This PR(Related issue #1362) adds support for configurable retryable exceptions in the SOFARPC failover retry flow.
Changes include: