Fix/issue 4954 etcd pause resume#5530
Conversation
Expose RpcServer-level control to pause/resume etcd lease renewal for gRPC services, and make publisher pause/resume signaling non-blocking to avoid call-site blocking. Also add unit tests for keepalive server controls and RpcServer controller forwarding.
|
Translation / English summary: This draft PR adds a
ReviewConcept: Solid and useful — the ability to temporarily remove a node from service discovery without killing the process is a common operational need (e.g., graceful drain before maintenance, canary rollout control). Key changes:
Review points:
This is a draft — please mark ready for review once the above concerns are addressed. Also, please consider writing the PR description in English or including an English translation to help all contributors review the change. |
背景
当前服务启动后会持续向 etcd 续租注册信息。
当某些节点 API 临时不可用时,如果希望让它们从服务发现中下线,通常只能停进程/停容器,运维成本较高,也不够灵活。
主要改动
1) core/discov:Publisher 支持安全暂停/恢复续租
2) zrpc/internal:对外暴露 etcd 注册控制入口
PauseEtcdRegister()ResumeEtcdRegister()3) zrpc:RpcServer 提供上层调用 API
PauseEtcdRegister()ResumeEtcdRegister()4) 测试补充
core/discov/publisher_test.gozrpc/internal/rpcpubserver_test.gozrpc/server_test.go兼容性与影响
使用方式(示例)
服务运行中可按需调用:
rpcServer.PauseEtcdRegister():暂停 etcd 注册续租(节点会从发现中下线)rpcServer.ResumeEtcdRegister():恢复 etcd 注册续租(节点重新上线)测试结果
go test ./core/discov ./zrpc/internal ./zrpc✅go test ./...✅