Skip to content

Commit fc7f14f

Browse files
XnLemonxianingawa-wqclaude
authored
feat(metadata): Clarify metadata report selection for multi-registry and multi-instance scenarios (#3369)
* fix(metadata): make GetMetadataReport() deterministic in multi-registry setup * fix(metadata): Remove() fans out to all reports, consistent with Map() * fix(metadata): thread registryId through GetMetadataFromMetadataReport call chain * fix(metadata): scope revision calculation to per-registry services, thread registryId through createInstance * fix(metadata): GetMetadataReportByRegistry returns nil for unknown registryId instead of silently falling back to wrong registry When a caller provides a specific non-empty registryId that has no registered report, the previous implementation silently fell back to GetMetadataReport() and returned an arbitrary report from a different registry. This undermined the per-registry scoping added in the earlier commits: metadata for registry A could be fetched from registry B's report without any indication. Fix: return nil (with a Warnf log) when a specific registryId is not found. The existing nil-guard in GetMetadataFromMetadataReport (client.go) already surfaces this as an error to the caller, which is the correct behaviour. The empty-string path is unchanged: it still delegates to GetMetadataReport() because callers that pass "" have no registry context and should receive the stable default. Also expose ClearMetadataReportInstances() to allow cross-package test isolation of the package-level instances map. * fix(metadata): ServiceNameMapping.Remove collects all errors instead of returning only the last one The previous implementation used a lastErr variable that was overwritten on each iteration. If the first report failed and the second succeeded, the error was silently discarded and the caller saw no error despite a partial failure — leaving a stale mapping in the first registry. Fix: collect all errors with errors.Join so the caller receives the full failure picture. The loop continues past individual failures (best-effort fan-out) so a transient error in one registry does not prevent removal from the others. Add a comment explaining the intentional behavioural difference from Map(), which is fail-fast: Map stops at the first report failure, while Remove continues and collects all errors so every reachable registry gets the removal attempt. * test: expand coverage for per-registry metadata report selection Three gaps identified in PR review, all addressed: 1. metadata/client_test.go — rewrite TestGetMetadataFromMetadataReport - Add sub-test: specific registryId routes to its own report (not default) - Add sub-test: unknown registryId returns error, not silent wrong-registry result - Each sub-test now resets instances independently to prevent state leakage 2. registry/servicediscovery/customizer/service_revision_customizer_test.go - Add TestExportedRevisionMissingRegistryIdYieldsZero: instance with no RegistryIdKey gets revision "0" and does not borrow another registry's service list - Add TestSubscribedRevisionMissingRegistryIdYieldsZero: same for subscribed - Add t.Cleanup(ClearMetadataReportInstances) + unique registry-id prefixes to all four tests to prevent cross-test global map pollution 3. registry/servicediscovery/service_instances_changed_listener_impl_test.go - Add TestListenerUsesRegistryIdToFetchRemoteMetadata: creates a listener with registryId="remote-reg-test", registers a mock MetadataReport under that id, triggers OnEvent with a RemoteMetadataStorageType instance, and asserts via mock.AssertExpectations that the correct per-registry report was called — end-to-end proof that registryId threads from NewServiceInstancesChangedListener through to GetMetadataFromMetadataReport * chore: translate Chinese comments to English, remove UTF-8 BOM - metadata/report_instance_test.go: translate inline comments and testify assertion messages in TestGetMetadataReportIsDeterministic from Chinese to English, consistent with the rest of the file - registry/servicediscovery/customizer/service_revision_customizer.go: translate Customize doc-comments for both exported and subscribed customizers from Chinese to English - registry/servicediscovery/service_instances_changed_listener_impl_test.go: strip the UTF-8 BOM (EF BB BF) that was introduced at line 1; Go toolchain accepts BOMs but they are not idiomatic and cause spurious diffs in some editors and CI tools * style(metadata): fix import formatting * fix(registry): upgrade missing-registryId log from Warn to Error with accurate consequence The previous Warnf said 'revision will be empty', which is wrong on two counts: the actual computed revision is "0" (not empty), and revision "0" causes OnEvent to silently skip the instance entirely, making it permanently invisible to all consumers. Upgrade to Errorf and describe the actual outcome so operators can diagnose misconfigurations immediately. Applies to both exported and subscribed revision customizers. * fix(registry): scope metaCache key to (registryId, revision) to prevent cross-registry metadata poisoning The cache in GetMetadataInfo was keyed by revision string only. In a multi-registry setup, if two registries happen to produce the same revision value (e.g. identical service sets yielding the same CRC, or test fixtures using literal revision strings), the second registry's listener would get a cache hit and return MetadataInfo fetched via a different registry's report — silently serving wrong endpoints with no error surfaced. Fix: use registryId+":"+revision as the cache key for both Get and Set, so each registry's metadata is cached independently. Update all test helpers and direct metaCache.Set/Delete calls to use the same composite key (with constant.DefaultKey as the registryId for all existing single-registry test listeners). * docs(registry): restore TODO for multi-instance metadata service URL alignment The previous commit removed the TODO without fixing the underlying issue. GetMetadataService() still returns a global singleton, so all instances across registries receive the same metadata service URL regardless of which registry they belong to. Restore the TODO with a more detailed explanation so the gap is visible to future contributors. The fix requires per-registry MetadataService support and is out of scope for this PR. * Revert "docs(registry): restore TODO for multi-instance metadata service URL alignment" This reverts commit 5d7bcc6. * style(metadata): fix import formatting * fix(debug) : fix review problems * style(metadata): fix format issue && revert some changes * fix(metadata): update tests and comment to match default fallback in GetMetadataReportByRegistry GetMetadataReportByRegistry intentionally falls back to the 'default' report when a specific registryId is not found. This supports standalone metadata-report configs registered under 'default' being used by named registries (e.g. nacos, zk). Update tests and doc comment to reflect this behavior instead of the old 'return nil for unknown id' contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: address review comments on clarify-metadata-report-selection - fix(registry): unify registryId default to constant.DefaultKey before GetMetadataInfo lookup in RegisterService, eliminating empty-string fallback mismatch and redundant GetParam calls - fix(metadata): scope metaCache key to app:registryId:revision to prevent cross-registry cache collisions - fix(metadata/mapping): bind MappingListener to deterministic primary report via GetMetadataReport() instead of non-deterministic i==0 - fix(metadata): include registryId in GetMetadataFromMetadataReport error message for easier misconfiguration diagnosis - test: add t.Cleanup to remove AddService/AddSubscribeURL entries and prevent global state leakage between tests - docs: restore TODO comment explaining GetMetadataService() singleton limitation in multi-registry metadata service URL alignment * fix: fix test isolation and cache key format after review changes - fix(test): update metaCache keys in service_instances_changed_listener_impl_test to match new app:registryId:revision format - test: use t.Name()+UnixNano as regID in service_discovery_registry_test to prevent cross-test global state pollution without expanding production API --------- Co-authored-by: xnlemon <xianingawa@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent cdf6f2f commit fc7f14f

14 files changed

Lines changed: 642 additions & 71 deletions

metadata/client.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,10 +42,10 @@ import (
4242

4343
const defaultTimeout = "5s" // s
4444

45-
func GetMetadataFromMetadataReport(revision string, instance registry.ServiceInstance) (*info.MetadataInfo, error) {
46-
report := GetMetadataReport()
45+
func GetMetadataFromMetadataReport(revision string, instance registry.ServiceInstance, registryId string) (*info.MetadataInfo, error) {
46+
report := GetMetadataReportByRegistry(registryId)
4747
if report == nil {
48-
return nil, perrors.New("no metadata report instance found,please check ")
48+
return nil, perrors.Errorf("no metadata report instance found for registryId=%s, please check metadata-report configuration", registryId)
4949
}
5050
return report.GetAppMetadata(instance.GetServiceName(), revision)
5151
}

metadata/client_test.go

Lines changed: 51 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ import (
3535
"dubbo.apache.org/dubbo-go/v3/common/constant"
3636
"dubbo.apache.org/dubbo-go/v3/common/extension"
3737
"dubbo.apache.org/dubbo-go/v3/metadata/info"
38+
"dubbo.apache.org/dubbo-go/v3/metadata/report"
3839
"dubbo.apache.org/dubbo-go/v3/protocol/base"
3940
"dubbo.apache.org/dubbo-go/v3/protocol/result"
4041
_ "dubbo.apache.org/dubbo-go/v3/proxy/proxy_factory"
@@ -62,22 +63,64 @@ var (
6263
)
6364

6465
func TestGetMetadataFromMetadataReport(t *testing.T) {
66+
t.Cleanup(func() { instances = make(map[string]report.MetadataReport) })
67+
6568
t.Run("no report instance", func(t *testing.T) {
66-
_, err := GetMetadataFromMetadataReport("1", ins)
69+
instances = make(map[string]report.MetadataReport)
70+
_, err := GetMetadataFromMetadataReport("1", ins, "default")
6771
require.Error(t, err)
6872
})
69-
mockReport := new(mockMetadataReport)
70-
defer mockReport.AssertExpectations(t)
71-
instances["default"] = mockReport
72-
t.Run("normal", func(t *testing.T) {
73+
74+
t.Run("default registry routes to default report", func(t *testing.T) {
75+
instances = make(map[string]report.MetadataReport)
76+
mockReport := new(mockMetadataReport)
77+
defer mockReport.AssertExpectations(t)
78+
instances["default"] = mockReport
79+
7380
mockReport.On("GetAppMetadata").Return(metadataInfo, nil).Once()
74-
got, err := GetMetadataFromMetadataReport("1", ins)
81+
got, err := GetMetadataFromMetadataReport("1", ins, "default")
82+
require.NoError(t, err)
83+
assert.Equal(t, metadataInfo, got)
84+
})
85+
86+
t.Run("specific registryId routes to its own report", func(t *testing.T) {
87+
instances = make(map[string]report.MetadataReport)
88+
defaultReport := new(mockMetadataReport)
89+
specificReport := new(mockMetadataReport)
90+
defer defaultReport.AssertExpectations(t)
91+
defer specificReport.AssertExpectations(t)
92+
instances["default"] = defaultReport
93+
instances["reg-a"] = specificReport
94+
95+
// specificReport must be called; defaultReport must NOT be called
96+
specificReport.On("GetAppMetadata").Return(metadataInfo, nil).Once()
97+
got, err := GetMetadataFromMetadataReport("1", ins, "reg-a")
98+
require.NoError(t, err)
99+
assert.Equal(t, metadataInfo, got)
100+
})
101+
102+
t.Run("unknown registryId falls back to default report", func(t *testing.T) {
103+
instances = make(map[string]report.MetadataReport)
104+
defaultReport := new(mockMetadataReport)
105+
defer defaultReport.AssertExpectations(t)
106+
instances["default"] = defaultReport
107+
108+
// When the specific registryId is not found, it falls back to "default"
109+
// so the default report's GetAppMetadata is called
110+
defaultReport.On("GetAppMetadata").Return(metadataInfo, nil).Once()
111+
got, err := GetMetadataFromMetadataReport("1", ins, "nonexistent-registry")
75112
require.NoError(t, err)
76113
assert.Equal(t, metadataInfo, got)
77114
})
78-
t.Run("error", func(t *testing.T) {
115+
116+
t.Run("report error propagated", func(t *testing.T) {
117+
instances = make(map[string]report.MetadataReport)
118+
mockReport := new(mockMetadataReport)
119+
defer mockReport.AssertExpectations(t)
120+
instances["default"] = mockReport
121+
79122
mockReport.On("GetAppMetadata").Return(metadataInfo, errors.New("mock error")).Once()
80-
_, err := GetMetadataFromMetadataReport("1", ins)
123+
_, err := GetMetadataFromMetadataReport("1", ins, "default")
81124
require.Error(t, err)
82125
})
83126
}

metadata/mapping/metadata/service_name_mapping.go

Lines changed: 44 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -119,18 +119,56 @@ func backoff(attempt int) time.Duration {
119119
// Get will return the application-level services. If not found, the empty set will be returned.
120120
func (d *ServiceNameMapping) Get(url *common.URL, listener mapping.MappingListener) (*gxset.HashSet, error) {
121121
serviceInterface := url.GetParam(constant.InterfaceKey, "")
122-
metadataReport := metadata.GetMetadataReport()
123-
if metadataReport == nil {
122+
metadataReports := metadata.GetMetadataReports()
123+
if len(metadataReports) == 0 {
124124
return nil, perrors.New("can not get mapping in remote cause no metadata report instance found")
125125
}
126-
return metadataReport.GetServiceAppMapping(serviceInterface, DefaultGroup, listener)
126+
// Attach the listener to the stable primary report only (GetMetadataReport uses
127+
// a deterministic selection: prefer "default", otherwise lexicographic first).
128+
// GetMetadataReports() iterates a map so its order is non-deterministic; using
129+
// i==0 as the anchor would bind the listener to a random backend each run.
130+
primaryReport := metadata.GetMetadataReport()
131+
var result *gxset.HashSet
132+
var errs []error
133+
for _, metadataReport := range metadataReports {
134+
var reportListener mapping.MappingListener
135+
if metadataReport == primaryReport {
136+
reportListener = listener
137+
}
138+
set, err := metadataReport.GetServiceAppMapping(serviceInterface, DefaultGroup, reportListener)
139+
if err != nil {
140+
errs = append(errs, err)
141+
continue
142+
}
143+
if result == nil {
144+
result = set
145+
} else {
146+
result.Add(set.Values()...)
147+
}
148+
}
149+
if result == nil {
150+
return nil, errors.Join(errs...)
151+
}
152+
return result, nil
127153
}
128154

155+
// Remove removes the service-to-app mapping for the given URL from all
156+
// registered metadata reports. Unlike Map (which stops on the first failure),
157+
// Remove is best-effort: it attempts every report and returns all errors
158+
// joined together so the caller can see the full failure picture. The
159+
// intent is to avoid leaving stale entries in any registry due to a transient
160+
// error in one of the others.
129161
func (d *ServiceNameMapping) Remove(url *common.URL) error {
130162
serviceInterface := url.GetParam(constant.InterfaceKey, "")
131-
metadataReport := metadata.GetMetadataReport()
132-
if metadataReport == nil {
163+
metadataReports := metadata.GetMetadataReports()
164+
if len(metadataReports) == 0 {
133165
return perrors.New("can not remove mapping in remote cause no metadata report instance found")
134166
}
135-
return metadataReport.RemoveServiceAppMappingListener(serviceInterface, DefaultGroup)
167+
var errs []error
168+
for _, metadataReport := range metadataReports {
169+
if err := metadataReport.RemoveServiceAppMappingListener(serviceInterface, DefaultGroup); err != nil {
170+
errs = append(errs, err)
171+
}
172+
}
173+
return errors.Join(errs...)
136174
}

metadata/mapping/metadata/service_name_mapping_test.go

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ package metadata
1919

2020
import (
2121
"errors"
22+
"sync"
2223
"testing"
2324
)
2425

@@ -147,6 +148,105 @@ func initMock() (*mockMetadataReport, error) {
147148
return metadataReport, err
148149
}
149150

151+
func initMockWithId(t *testing.T, registryId string) *mockMetadataReport {
152+
t.Helper()
153+
mockReport := new(mockMetadataReport)
154+
extension.SetMetadataReportFactory(registryId, func() report.MetadataReportFactory {
155+
return mockReport
156+
})
157+
opts := metadata.NewReportOptions(
158+
metadata.WithRegistryId(registryId),
159+
metadata.WithProtocol(registryId),
160+
metadata.WithAddress("127.0.0.1"),
161+
)
162+
require.NoError(t, opts.Init())
163+
return mockReport
164+
}
165+
166+
func TestServiceNameMappingRemoveFansOutToAllReports(t *testing.T) {
167+
metadata.ClearMetadataReportInstances()
168+
t.Cleanup(metadata.ClearMetadataReportInstances)
169+
serviceNameMappingOnce = sync.Once{}
170+
serviceNameMappingInstance = nil
171+
172+
r1 := initMockWithId(t, "reg-a")
173+
r2 := initMockWithId(t, "reg-b")
174+
175+
ins := GetNameMappingInstance()
176+
serviceUrl := common.NewURLWithOptions(
177+
common.WithInterface("org.example.FooService"),
178+
common.WithParamsValue(constant.ApplicationKey, "foo-app"),
179+
)
180+
181+
r1.On("RemoveServiceAppMappingListener").Return(nil).Once()
182+
r2.On("RemoveServiceAppMappingListener").Return(nil).Once()
183+
184+
err := ins.Remove(serviceUrl)
185+
require.NoError(t, err)
186+
r1.AssertExpectations(t)
187+
r2.AssertExpectations(t)
188+
}
189+
190+
func TestServiceNameMappingRemoveCollectsAllErrors(t *testing.T) {
191+
metadata.ClearMetadataReportInstances()
192+
t.Cleanup(metadata.ClearMetadataReportInstances)
193+
serviceNameMappingOnce = sync.Once{}
194+
serviceNameMappingInstance = nil
195+
196+
r1 := initMockWithId(t, "reg-c")
197+
r2 := initMockWithId(t, "reg-d")
198+
199+
ins := GetNameMappingInstance()
200+
serviceUrl := common.NewURLWithOptions(
201+
common.WithInterface("org.example.BarService"),
202+
common.WithParamsValue(constant.ApplicationKey, "bar-app"),
203+
)
204+
205+
err1 := errors.New("r1 failure")
206+
err2 := errors.New("r2 failure")
207+
208+
// both reports fail
209+
r1.On("RemoveServiceAppMappingListener").Return(err1).Once()
210+
r2.On("RemoveServiceAppMappingListener").Return(err2).Once()
211+
212+
err := ins.Remove(serviceUrl)
213+
require.Error(t, err)
214+
// both individual errors must be present in the returned error
215+
require.ErrorIs(t, err, err1)
216+
require.ErrorIs(t, err, err2)
217+
r1.AssertExpectations(t)
218+
r2.AssertExpectations(t)
219+
}
220+
221+
func TestServiceNameMappingRemoveContinuesAfterPartialFailure(t *testing.T) {
222+
metadata.ClearMetadataReportInstances()
223+
t.Cleanup(metadata.ClearMetadataReportInstances)
224+
serviceNameMappingOnce = sync.Once{}
225+
serviceNameMappingInstance = nil
226+
227+
r1 := initMockWithId(t, "reg-e")
228+
r2 := initMockWithId(t, "reg-f")
229+
230+
ins := GetNameMappingInstance()
231+
serviceUrl := common.NewURLWithOptions(
232+
common.WithInterface("org.example.BazService"),
233+
common.WithParamsValue(constant.ApplicationKey, "baz-app"),
234+
)
235+
236+
removeErr := errors.New("r1 partial failure")
237+
238+
// r1 fails, r2 succeeds — the loop must not short-circuit
239+
r1.On("RemoveServiceAppMappingListener").Return(removeErr).Once()
240+
r2.On("RemoveServiceAppMappingListener").Return(nil).Once()
241+
242+
err := ins.Remove(serviceUrl)
243+
// the joined error only contains r1's error; the call should error
244+
require.ErrorIs(t, err, removeErr)
245+
// both reports must have been called despite r1's failure
246+
r1.AssertExpectations(t)
247+
r2.AssertExpectations(t)
248+
}
249+
150250
type listener struct {
151251
}
152252

metadata/report_instance.go

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,12 @@
1818
package metadata
1919

2020
import (
21+
"sort"
2122
"time"
2223
)
2324

2425
import (
25-
"github.com/dubbogo/gost/container/set"
26+
gxset "github.com/dubbogo/gost/container/set"
2627
"github.com/dubbogo/gost/log/logger"
2728
)
2829

@@ -41,6 +42,12 @@ var (
4142
instances = make(map[string]report.MetadataReport)
4243
)
4344

45+
// ClearMetadataReportInstances resets the package-level instances map.
46+
// Intended for test isolation only; do not call in production code.
47+
func ClearMetadataReportInstances() {
48+
instances = make(map[string]report.MetadataReport)
49+
}
50+
4451
func addMetadataReport(registryId string, url *common.URL) error {
4552
fac := extension.GetMetadataReportFactory(url.Protocol)
4653
if fac == nil {
@@ -51,21 +58,46 @@ func addMetadataReport(registryId string, url *common.URL) error {
5158
return nil
5259
}
5360

61+
// GetMetadataReport returns a single metadata report for callers that lack
62+
// registry context. It prefers the "default" registry's report; when absent
63+
// it falls back to the lexicographically first registry id so the selection
64+
// is always stable across calls.
5465
func GetMetadataReport() report.MetadataReport {
55-
for _, v := range instances {
56-
return v
66+
if r, ok := instances[constant.DefaultKey]; ok {
67+
return r
68+
}
69+
keys := make([]string, 0, len(instances))
70+
for k := range instances {
71+
keys = append(keys, k)
72+
}
73+
sort.Strings(keys)
74+
if len(keys) > 0 {
75+
return instances[keys[0]]
5776
}
5877
return nil
5978
}
6079

80+
// GetMetadataReportByRegistry returns the metadata report bound to the given
81+
// registry id. When the registry id is empty the caller has no registry context,
82+
// so the stable default returned by GetMetadataReport is used. When a specific
83+
// (non-empty) registry id is not found, it falls back to the "default" report
84+
// if one exists. This handles the common case where a standalone metadata-report
85+
// config is registered under "default" while named registries (e.g. nacos, zk)
86+
// need to use it. nil is returned only when neither the specific id nor "default"
87+
// is registered.
6188
func GetMetadataReportByRegistry(registry string) report.MetadataReport {
6289
if len(registry) == 0 {
63-
registry = constant.DefaultKey
90+
return GetMetadataReport()
6491
}
6592
if r, ok := instances[registry]; ok {
6693
return r
6794
}
68-
return GetMetadataReport()
95+
if r, ok := instances[constant.DefaultKey]; ok {
96+
logger.Infof("[Metadata] no metadata report bound to registryId=%s, falling back to default", registry)
97+
return r
98+
}
99+
logger.Warnf("[Metadata] no metadata report found for registryId=%s", registry)
100+
return nil
69101
}
70102

71103
func GetMetadataReports() []report.MetadataReport {

0 commit comments

Comments
 (0)