Refactor: Centralize Observability in MongoDB & ArangoDB Modules#3102
Conversation
There was a problem hiding this comment.
Hey, thanks for putting this together — I can see the intent behind centralizing observability and it's clear a lot of thought went into it.
However, I'd like you to reconsider the approach here. I have a few concerns:
Cross-module coupling
Previously, each datasource (mongo, arangodb, etc.) defined its own Logger and Metrics interfaces locally via duck typing — fully independent, zero shared dependencies. This PR introduces a shared observability module that all datasource modules must now import, which means:
- If
observabilityinterfaces change (e.g., adding a method toLogger,Metrics, orObservableQuery), every dependent datasource module must be updated and released in lockstep. - Third-party datasource authors now take on a GoFr-internal dependency they didn't need before.
The ObservableQuery interface is too opinionated
It forces every datasource's QueryLog to conform to SetDuration, GetOperation, GetMetricLabels, GetTraceLabels. Different datasources have different observability needs — this abstraction becomes a lowest common denominator.
The defer bug fix doesn't require this scope
The original bug (defer placed after error check, span leak, wrong timing) is valid and worth fixing, but the fix is ~2-3 lines per method — move defer to the top, capture time.Now() before the operation. A simple helper in each datasource achieves the same structural safety:
func startOperation(ctx context.Context, tracer trace.Tracer, spanName string,
attrs map[string]string) (context.Context, trace.Span, time.Time) {
startTime := time.Now()
if tracer == nil {
return ctx, nil, startTime
}
ctx, span := tracer.Start(ctx, spanName)
for k, v := range attrs {
span.SetAttributes(attribute.String(k, v))
}
return ctx, span, startTime
}
func endOperation(span trace.Span, startTime time.Time, durationKey string,
logger Logger, ql *QueryLog, metrics Metrics, histogram string, labels ...string) {
duration := time.Since(startTime).Microseconds()
ql.Duration = duration
logger.Debug(ql)
metrics.RecordHistogram(context.Background(), histogram, float64(duration)/1e6, labels...)
if span != nil {
span.SetAttributes(attribute.Int64(durationKey, duration))
span.End()
}
}Yes, this is duplicated across mongo, arangodb, etc. But:
- It's ~20 lines of straightforward code
- Each datasource can tweak it freely
- Zero cross-module dependencies
- Zero interfaces
- Zero new Go modules
- The defer bug is still fixed
What I think is worth keeping
The instrumentDatasource() consolidation in external_db.go is a genuine improvement — the old code had 15+ identical UseLogger/UseMetrics/UseTracer/Connect blocks. But that can be a private helper in the gofr package using anonymous type assertions, no separate module needed:
func (a *App) instrumentDatasource(ds any, tracerName string) {
if l, ok := ds.(interface{ UseLogger(any) }); ok {
l.UseLogger(a.Logger())
}
if m, ok := ds.(interface{ UseMetrics(any) }); ok {
m.UseMetrics(a.Metrics())
}
if t, ok := ds.(interface{ UseTracer(any) }); ok {
t.UseTracer(otel.GetTracerProvider().Tracer(tracerName))
}
if c, ok := ds.(interface{ Connect() }); ok {
c.Connect()
}
}Summary
A whole module + interfaces + noop implementations + mocks for what amounts to "start timer, start span, defer end" feels like over-engineering. I'd suggest:
- Fix the defer bug directly in each affected datasource (small, targeted PR).
- Consolidate
external_db.gowiring with a private helper (separate PR). - Keep datasource observability local — each module owns its own logging, metrics, and tracing decisions.
Happy to discuss further if you see it differently!
We could split the
Breaking change in the interface won't be done unless it's a major release.
The concern is understandable - it however should get addressed by splitting the Instrumenter interface - data services interested to use the common capability from the observability package would continue to do it.
Keeping it local would remove the standardisation - this goes against the idea of an opinionated framework. |
aryanmehrotra
left a comment
There was a problem hiding this comment.
I have one major concern which is each datasource defines its own Logger/Metrics interfaces as they are the user of it, defining an interface globally would mean that all the interfaces will change, even though that datasource doesn't need a method it has to do, which is also against the user of the interface should define it, moreover what benefit apart from standardisation do we get here?
Moreover, brining a method to use metrics if seems easier, shouldn't that be part of the framework itself, because gofr in itself has abstracted log complexities.
@aryanmehrotra Are we looking to have the datasources be used independently outside of gofr? My core idea for proposing this change: GoFr is an opinionated framework, focussed to bring in standardisation more than flexibility. |
Replace the per-op addTrace + sendOperationStats + handleCollectionOperation plumbing with a single instrumentOp(ctx, *QueryLog) helper that captures start time before the operation and returns a cleanup closure. Apply the pattern uniformly across arango.go, arango_db.go, arango_document.go, arango_graph.go, and arango_helper.go so every public op becomes a two-line setup. Collateral fixes this consolidates: - handleCollectionOperation / executeCollectionOperation wrapped only getCollection() in their timing scope, so DropCollection/TruncateCollection reported lookup latency instead of the real work. The helper is removed and the collection-op sites now inline getCollection under the same instrumentOp scope. - QueryLog.traceAttrs() in logger.go centralizes span attribute construction so individual ops no longer hand-roll attribute.KeyValue slices. Drop the arango_document_test.go assertions that covered the removed executeCollectionOperation helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the per-op addTrace + sendOperationStats plumbing with a pair of
helpers on *Client:
- instrumentOp(ctx, *QueryLog) returns (ctx, cleanup). Start time is
captured synchronously at call site, before the op runs.
- instrumentQuery(ctx, collection, op, filter, id, update) wraps
instrumentOp for the common query shape.
Every public op becomes:
ctx, done := c.instrumentQuery(ctx, collection, "insertOne", document, nil, nil)
defer done()
return c.Database.Collection(collection).InsertOne(ctx, document)
This fixes two latent bugs the old pattern carried:
1. defer-arg time.Now() ran at defer registration with the arg captured
alongside, so sendOperationStats then computed time.Since(time.Now())
internally and every op reported ~0 us duration. The new cleanup
closure captures start synchronously and evaluates Since() only when
it actually fires.
2. Several defer sendOperationStats(...) sites sat after an
if err != nil { return err } early-return, so failing ops never
recorded metrics. The cleanup closure always fires.
sendOperationStats is removed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collapse the ~15 near-identical AddXxx methods in external_db.go into a single instrumentDatasource(ds any) helper that type-asserts the four optional setup interfaces (UseLogger, UseMetrics, UseTracer, Connect). The tracer name is resolved via a tracerName(ds any) switch keyed on the behavior interface rather than a per-method literal. Each AddXxx now collapses to two lines: instrumentDatasource(db); a.container.X = db. Change every AddXxx parameter from container.XxxProvider to the behavior interface (container.Mongo, container.ArangoDB, container.Clickhouse, container.OracleDB, container.CassandraWithContext, container.KVStore, container.Solr, container.Dgraph, container.OpenTSDB, container.ScyllaDB, container.SurrealDB, container.Elasticsearch, container.Couchbase, container.InfluxDB). AddPubSub takes pubsub.Client directly. The Provider shim is no longer part of any public Add* signature. Mark every *Provider interface in pkg/gofr/container/datasources.go as Deprecated, pointing users at the corresponding behavior interface. The shims are retained for backwards compatibility with downstream users who implement them today; removal is deferred to a future release once the godoc nudge has had time to land. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
85ddb9a to
9888d06
Compare
Refactor tracerName from a 14-case type switch into a local matcher slice so gocyclo drops below 10 without a nolint, and add the missing blank lines wsl_v5 flagged in the AddMongo, AddCassandra, and AddOracle tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover the remaining AddXxx test functions so golangci-lint v2.4.0's only-new-issues filter stops flagging them as the PR patch shifts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gocritic's unnamedResult rule flagged the (context.Context, func()) returns on the mongo and arangodb instrumentOp/instrumentQuery helpers. Name them (tracedCtx, done) so the submodule lint jobs pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review comments1. Fix: add a matcher arm for 2. 3. Test gap on the error path. The headline win of this refactor is "metrics now record on errors too." There is no test that asserts this. Add one negative-path test per datasource (one for mongo, one for arango is enough) that forces an error and verifies the histogram still recorded — otherwise the next refactor can silently undo this. 4. Deprecation comments mention removal but no version. The Nothing else blocking. Tests + lint are clean on touched files; observability surface (span names, metric names, attribute keys) is unchanged. |
… error-path metrics Address PR gofr-dev#3102 review: - Add container.DBResolverProvider arm to tracerName so AddDBResolver no longer silently drops tracing. - instrumentDatasource now warns when a datasource implements UseTracer but has no matcher entry, surfacing future omissions instead of degrading silently. - Add negative-path tests for mongo and arangodb asserting RecordHistogram still fires on operation errors. - Drop "will be removed in a future release" language from *Provider deprecation notes; retain plain Deprecated guidance pointing at the duck-typed API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…y ref in mongo tests The mongo_test.go observability import and Test_RecordHistogramOnError landed ahead of the observability package itself; reverting those test changes until the package PR lands. Wraps long RecordHistogram EXPECT lines in arangodb tests to satisfy lll. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description:
Centralize observability wiring for the mongo and arangodb datasources without introducing a new package. Adopts the pattern proposed on
feature/observability-refactor: a localinstrumentOphelper on each datasource client and a single duck-typedinstrumentDatasourceinpkg/gofr/external_db.go.What changed:
pkg/gofr/datasource/mongo/mongo.go— newinstrumentOp(ctx, *QueryLog) (ctx, func())andinstrumentQuery(...)helpers on*Client. Every public op becomesctx, done := c.instrumentQuery(...); defer done().sendOperationStatsis removed.pkg/gofr/datasource/arangodb/*.go— same pattern acrossarango.go,arango_db.go,arango_document.go,arango_graph.go,arango_helper.go.handleCollectionOperation/executeCollectionOperationare removed (see bug fix Remove ZopSmart specific things #3 below).logger.gogains a smallQueryLog.traceAttrs()helper so span attributes live next to the log struct.pkg/gofr/external_db.go— collapses ~15 near-identicalAddXxxmethods into a single duck-typedinstrumentDatasource(ds any)helper that optionally callsUseLogger/UseMetrics/UseTracer/Connect.tracerName(ds any)switches on the concrete behavior interface to pick the OTel tracer name. EveryAddXxxnow shrinks to two lines:a.instrumentDatasource(db); a.container.X = db.AddXxxsignatures — parameters changed fromcontainer.XxxProviderto the behavior interface (container.Mongo,container.ArangoDB,container.Clickhouse,container.OracleDB,container.CassandraWithContext,container.KVStore,container.Solr,container.Dgraph,container.OpenTSDB,container.ScyllaDB,container.SurrealDB,container.Elasticsearch,container.Couchbase,container.InfluxDB).AddPubSubtakespubsub.Client.pkg/gofr/container/datasources.go— every*Providerinterface is now annotated// Deprecated:pointing at the corresponding behavior interface. The shims are retained for backwards compatibility with downstream users who implement them today; removal is a follow-up.Latent bugs this fixes as a side effect:
defer c.sendOperationStats(&QueryLog{...}, time.Now(), "insert", span).time.Now()was captured at defer registration (start time — fine), butsendOperationStatsitself computedtime.Since(time.Now())internally, so every mongo op reported roughly zero-microsecond duration. The new cleanup closure capturesstartonce, at call site, and evaluatesSince()only when it fires.defer sendOperationStats(...)calls sat after anif err != nil { return err }early-return, so failing calls never recorded. The newdefer done()closure always fires.handleCollectionOperation/executeCollectionOperationwrapped onlygetCollection(...)with timing, soDropCollectionand friends were measuring the lookup, not the real work. The helper is removed; collection ops now inlinegetCollectionunder the singleinstrumentOpscope.Breaking changes:
AddMongo,AddArangoDB,AddClickhouse,AddOracle,AddCassandra,AddKVStore,AddSolr,AddDgraph,AddOpenTSDB,AddScyllaDB,AddSurrealDB,AddElasticsearch,AddCouchbase,AddInfluxDB,AddPubSubnow take the behavior interface instead of the*Providervariant. Downstream code that constructs its own datasource and calls one of these methods will compile unchanged if the type satisfies the behavior interface (which it must, since the Provider interfaces embed it).*Providerinterfaces are now marked// Deprecated:(compile-time warning fromstaticcheck/ IDEs). No removal in this PR.Non-goals:
Observableabstraction, no newgo.mod. Compared to the prior direction of this PR, thepkg/gofr/datasource/observability/module is gone.*Providerinterfaces — deferred to a follow-up once downstream users migrate.Additional Information:
feature/observability-refactorbranch that this PR now mirrors. The Provider deprecations are an additive divergence.Checklist:
goimportsandgolangci-lint.