Skip to content

Refactor: Centralize Observability in MongoDB & ArangoDB Modules#3102

Merged
Umang01-hash merged 13 commits into
gofr-dev:developmentfrom
akshat-kumar-singhal:development
May 7, 2026
Merged

Refactor: Centralize Observability in MongoDB & ArangoDB Modules#3102
Umang01-hash merged 13 commits into
gofr-dev:developmentfrom
akshat-kumar-singhal:development

Conversation

@akshat-kumar-singhal
Copy link
Copy Markdown
Contributor

@akshat-kumar-singhal akshat-kumar-singhal commented Mar 6, 2026

Description:

Centralize observability wiring for the mongo and arangodb datasources without introducing a new package. Adopts the pattern proposed on feature/observability-refactor: a local instrumentOp helper on each datasource client and a single duck-typed instrumentDatasource in pkg/gofr/external_db.go.

What changed:

  • pkg/gofr/datasource/mongo/mongo.go — new instrumentOp(ctx, *QueryLog) (ctx, func()) and instrumentQuery(...) helpers on *Client. Every public op becomes ctx, done := c.instrumentQuery(...); defer done(). sendOperationStats is removed.
  • pkg/gofr/datasource/arangodb/*.go — same pattern across arango.go, arango_db.go, arango_document.go, arango_graph.go, arango_helper.go. handleCollectionOperation / executeCollectionOperation are removed (see bug fix Remove ZopSmart specific things #3 below). logger.go gains a small QueryLog.traceAttrs() helper so span attributes live next to the log struct.
  • pkg/gofr/external_db.go — collapses ~15 near-identical AddXxx methods into a single duck-typed instrumentDatasource(ds any) helper that optionally calls UseLogger / UseMetrics / UseTracer / Connect. tracerName(ds any) switches on the concrete behavior interface to pick the OTel tracer name. Every AddXxx now shrinks to two lines: a.instrumentDatasource(db); a.container.X = db.
  • AddXxx signatures — parameters changed from container.XxxProvider to the behavior interface (container.Mongo, container.ArangoDB, container.Clickhouse, container.OracleDB, container.CassandraWithContext, container.KVStore, container.Solr, container.Dgraph, container.OpenTSDB, container.ScyllaDB, container.SurrealDB, container.Elasticsearch, container.Couchbase, container.InfluxDB). AddPubSub takes pubsub.Client.
  • pkg/gofr/container/datasources.go — every *Provider interface is now annotated // Deprecated: pointing at the corresponding behavior interface. The shims are retained for backwards compatibility with downstream users who implement them today; removal is a follow-up.

Latent bugs this fixes as a side effect:

  1. Mongo duration ≈ 0 µs. The old pattern was defer c.sendOperationStats(&QueryLog{...}, time.Now(), "insert", span). time.Now() was captured at defer registration (start time — fine), but sendOperationStats itself computed time.Since(time.Now()) internally, so every mongo op reported roughly zero-microsecond duration. The new cleanup closure captures start once, at call site, and evaluates Since() only when it fires.
  2. Mongo metrics dropped on errors. Several defer sendOperationStats(...) calls sat after an if err != nil { return err } early-return, so failing calls never recorded. The new defer done() closure always fires.
  3. Arango wrong-span timing. handleCollectionOperation / executeCollectionOperation wrapped only getCollection(...) with timing, so DropCollection and friends were measuring the lookup, not the real work. The helper is removed; collection ops now inline getCollection under the single instrumentOp scope.

Breaking changes:

  • AddMongo, AddArangoDB, AddClickhouse, AddOracle, AddCassandra, AddKVStore, AddSolr, AddDgraph, AddOpenTSDB, AddScyllaDB, AddSurrealDB, AddElasticsearch, AddCouchbase, AddInfluxDB, AddPubSub now take the behavior interface instead of the *Provider variant. Downstream code that constructs its own datasource and calls one of these methods will compile unchanged if the type satisfies the behavior interface (which it must, since the Provider interfaces embed it).
  • *Provider interfaces are now marked // Deprecated: (compile-time warning from staticcheck / IDEs). No removal in this PR.

Non-goals:

  • No new package, no new Observable abstraction, no new go.mod. Compared to the prior direction of this PR, the pkg/gofr/datasource/observability/ module is gone.
  • No mock regeneration or wholesale test rewrites. Test-file diff is mechanical only.
  • Removal of *Provider interfaces — deferred to a follow-up once downstream users migrate.

Additional Information:

  • Credit to @aryanmehrotra for the feature/observability-refactor branch that this PR now mirrors. The Provider deprecations are an additive divergence.
  • Final diff: 12 files, ~+561 / −608 (net −47 LOC), 3 commits.

Checklist:

  • I have formatted my code using goimports and golangci-lint.
  • All new code is covered by unit tests.
  • This PR does not decrease the overall code coverage.
  • I have reviewed the code comments and documentation for clarity.

Umang01-hash
Umang01-hash previously approved these changes Mar 9, 2026
Copy link
Copy Markdown
Member

@coolwednesday coolwednesday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for putting this together — I can see the intent behind centralizing observability and it's clear a lot of thought went into it.

However, I'd like you to reconsider the approach here. I have a few concerns:

Cross-module coupling

Previously, each datasource (mongo, arangodb, etc.) defined its own Logger and Metrics interfaces locally via duck typing — fully independent, zero shared dependencies. This PR introduces a shared observability module that all datasource modules must now import, which means:

  • If observability interfaces change (e.g., adding a method to Logger, Metrics, or ObservableQuery), every dependent datasource module must be updated and released in lockstep.
  • Third-party datasource authors now take on a GoFr-internal dependency they didn't need before.

The ObservableQuery interface is too opinionated

It forces every datasource's QueryLog to conform to SetDuration, GetOperation, GetMetricLabels, GetTraceLabels. Different datasources have different observability needs — this abstraction becomes a lowest common denominator.

The defer bug fix doesn't require this scope

The original bug (defer placed after error check, span leak, wrong timing) is valid and worth fixing, but the fix is ~2-3 lines per method — move defer to the top, capture time.Now() before the operation. A simple helper in each datasource achieves the same structural safety:

func startOperation(ctx context.Context, tracer trace.Tracer, spanName string,
    attrs map[string]string) (context.Context, trace.Span, time.Time) {
    startTime := time.Now()
    if tracer == nil {
        return ctx, nil, startTime
    }
    ctx, span := tracer.Start(ctx, spanName)
    for k, v := range attrs {
        span.SetAttributes(attribute.String(k, v))
    }
    return ctx, span, startTime
}

func endOperation(span trace.Span, startTime time.Time, durationKey string,
    logger Logger, ql *QueryLog, metrics Metrics, histogram string, labels ...string) {
    duration := time.Since(startTime).Microseconds()
    ql.Duration = duration
    logger.Debug(ql)
    metrics.RecordHistogram(context.Background(), histogram, float64(duration)/1e6, labels...)
    if span != nil {
        span.SetAttributes(attribute.Int64(durationKey, duration))
        span.End()
    }
}

Yes, this is duplicated across mongo, arangodb, etc. But:

  • It's ~20 lines of straightforward code
  • Each datasource can tweak it freely
  • Zero cross-module dependencies
  • Zero interfaces
  • Zero new Go modules
  • The defer bug is still fixed

What I think is worth keeping

The instrumentDatasource() consolidation in external_db.go is a genuine improvement — the old code had 15+ identical UseLogger/UseMetrics/UseTracer/Connect blocks. But that can be a private helper in the gofr package using anonymous type assertions, no separate module needed:

func (a *App) instrumentDatasource(ds any, tracerName string) {
    if l, ok := ds.(interface{ UseLogger(any) }); ok {
        l.UseLogger(a.Logger())
    }
    if m, ok := ds.(interface{ UseMetrics(any) }); ok {
        m.UseMetrics(a.Metrics())
    }
    if t, ok := ds.(interface{ UseTracer(any) }); ok {
        t.UseTracer(otel.GetTracerProvider().Tracer(tracerName))
    }
    if c, ok := ds.(interface{ Connect() }); ok {
        c.Connect()
    }
}

Summary

A whole module + interfaces + noop implementations + mocks for what amounts to "start timer, start span, defer end" feels like over-engineering. I'd suggest:

  1. Fix the defer bug directly in each affected datasource (small, targeted PR).
  2. Consolidate external_db.go wiring with a private helper (separate PR).
  3. Keep datasource observability local — each module owns its own logging, metrics, and tracing decisions.

Happy to discuss further if you see it differently!

@akshat-kumar-singhal
Copy link
Copy Markdown
Contributor Author

akshat-kumar-singhal commented Mar 10, 2026

It forces every datasource's QueryLog to conform to SetDuration, GetOperation, GetMetricLabels, GetTraceLabels. Different datasources have different observability needs — this abstraction becomes a lowest common denominator.

We could split the Instrumenter interface into 2 parts - initialisation (SetLogger/SetMetrics/SetTracer) and tracking (Logf/ Debugf/ etc) - this way an external datasource implementing the Instrumenter part 1 interface would get the observability tools.
The existing datasources implemented within the gofr ecosystem, albeit as an independent module benefit from the standardisation - unless we have a strong use case of these external modules being actually used independently without gofr.

If observability interfaces change (e.g., adding a method to Logger, Metrics, or ObservableQuery), every dependent datasource module must be updated and released in lockstep.

Breaking change in the interface won't be done unless it's a major release.

The ObservableQuery interface is too opinionated

The concern is understandable - it however should get addressed by splitting the Instrumenter interface - data services interested to use the common capability from the observability package would continue to do it.
PS: Gofr is opinionated 😄

Keep datasource observability local — each module owns its own logging, metrics, and tracing decisions.

Keeping it local would remove the standardisation - this goes against the idea of an opinionated framework.

Copy link
Copy Markdown
Member

@aryanmehrotra aryanmehrotra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one major concern which is each datasource defines its own Logger/Metrics interfaces as they are the user of it, defining an interface globally would mean that all the interfaces will change, even though that datasource doesn't need a method it has to do, which is also against the user of the interface should define it, moreover what benefit apart from standardisation do we get here?

Moreover, brining a method to use metrics if seems easier, shouldn't that be part of the framework itself, because gofr in itself has abstracted log complexities.

@akshat-kumar-singhal
Copy link
Copy Markdown
Contributor Author

I have one major concern which is each datasource defines its own Logger/Metrics interfaces as they are the user of it, defining an interface globally would mean that all the interfaces will change, even though that datasource doesn't need a method it has to do, which is also against the user of the interface should define it, moreover what benefit apart from standardisation do we get here?

Moreover, brining a method to use metrics if seems easier, shouldn't that be part of the framework itself, because gofr in itself has abstracted log complexities.

@aryanmehrotra Are we looking to have the datasources be used independently outside of gofr?
If yes, then this change doesn't make sense (@Umang01-hash however confirmed previously that we aren't looking to have them used outside the gofr realm - align with him please if needed)

My core idea for proposing this change: GoFr is an opinionated framework, focussed to bring in standardisation more than flexibility.

akshat-kumar-singhal and others added 3 commits April 14, 2026 13:05
Replace the per-op addTrace + sendOperationStats + handleCollectionOperation
plumbing with a single instrumentOp(ctx, *QueryLog) helper that captures
start time before the operation and returns a cleanup closure. Apply the
pattern uniformly across arango.go, arango_db.go, arango_document.go,
arango_graph.go, and arango_helper.go so every public op becomes a two-line
setup.

Collateral fixes this consolidates:
- handleCollectionOperation / executeCollectionOperation wrapped only
  getCollection() in their timing scope, so DropCollection/TruncateCollection
  reported lookup latency instead of the real work. The helper is removed
  and the collection-op sites now inline getCollection under the same
  instrumentOp scope.
- QueryLog.traceAttrs() in logger.go centralizes span attribute
  construction so individual ops no longer hand-roll attribute.KeyValue
  slices.

Drop the arango_document_test.go assertions that covered the removed
executeCollectionOperation helper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the per-op addTrace + sendOperationStats plumbing with a pair of
helpers on *Client:
- instrumentOp(ctx, *QueryLog) returns (ctx, cleanup). Start time is
  captured synchronously at call site, before the op runs.
- instrumentQuery(ctx, collection, op, filter, id, update) wraps
  instrumentOp for the common query shape.

Every public op becomes:

    ctx, done := c.instrumentQuery(ctx, collection, "insertOne", document, nil, nil)
    defer done()
    return c.Database.Collection(collection).InsertOne(ctx, document)

This fixes two latent bugs the old pattern carried:

1. defer-arg time.Now() ran at defer registration with the arg captured
   alongside, so sendOperationStats then computed time.Since(time.Now())
   internally and every op reported ~0 us duration. The new cleanup
   closure captures start synchronously and evaluates Since() only when
   it actually fires.
2. Several defer sendOperationStats(...) sites sat after an
   if err != nil { return err } early-return, so failing ops never
   recorded metrics. The cleanup closure always fires.

sendOperationStats is removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collapse the ~15 near-identical AddXxx methods in external_db.go into a
single instrumentDatasource(ds any) helper that type-asserts the four
optional setup interfaces (UseLogger, UseMetrics, UseTracer, Connect).
The tracer name is resolved via a tracerName(ds any) switch keyed on the
behavior interface rather than a per-method literal. Each AddXxx now
collapses to two lines: instrumentDatasource(db); a.container.X = db.

Change every AddXxx parameter from container.XxxProvider to the behavior
interface (container.Mongo, container.ArangoDB, container.Clickhouse,
container.OracleDB, container.CassandraWithContext, container.KVStore,
container.Solr, container.Dgraph, container.OpenTSDB, container.ScyllaDB,
container.SurrealDB, container.Elasticsearch, container.Couchbase,
container.InfluxDB). AddPubSub takes pubsub.Client directly. The Provider
shim is no longer part of any public Add* signature.

Mark every *Provider interface in pkg/gofr/container/datasources.go as
Deprecated, pointing users at the corresponding behavior interface. The
shims are retained for backwards compatibility with downstream users who
implement them today; removal is deferred to a future release once the
godoc nudge has had time to land.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
akshat-kumar-singhal and others added 2 commits April 14, 2026 13:30
Refactor tracerName from a 14-case type switch into a local
matcher slice so gocyclo drops below 10 without a nolint, and
add the missing blank lines wsl_v5 flagged in the AddMongo,
AddCassandra, and AddOracle tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover the remaining AddXxx test functions so golangci-lint v2.4.0's
only-new-issues filter stops flagging them as the PR patch shifts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@akshat-kumar-singhal akshat-kumar-singhal marked this pull request as draft April 14, 2026 08:29
gocritic's unnamedResult rule flagged the (context.Context, func())
returns on the mongo and arangodb instrumentOp/instrumentQuery
helpers. Name them (tracedCtx, done) so the submodule lint jobs pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@akshat-kumar-singhal akshat-kumar-singhal marked this pull request as ready for review April 22, 2026 08:50
aryanmehrotra
aryanmehrotra previously approved these changes Apr 28, 2026
@Umang01-hash
Copy link
Copy Markdown
Member

Review comments

1. AddDBResolver silently drops tracing. Before this PR, AddDBResolver explicitly wired otel.GetTracerProvider().Tracer("gofr-dbresolver") onto the resolver. Now it goes through instrumentDatasource, which only calls UseTracer when tracerName(ds) != "" — and DBResolverProvider is not in the matcher slice. Result: dbresolver loses tracing entirely. grep -rn "gofr-dbresolver" returns nothing.

Fix: add a matcher arm for container.DBResolverProvider returning "gofr-dbresolver", and add a test that asserts UseTracer is invoked on the resolver mock.

2. tracerName is a silent default. When it returns "", the entire UseTracer branch is skipped without a warning. The dbresolver bug above is the first casualty; the next datasource added without a matcher entry will hit the same trap. Either log a warning when a datasource implements UseTracer but has no tracer name, or fail loud during Add*.

3. Test gap on the error path. The headline win of this refactor is "metrics now record on errors too." There is no test that asserts this. Add one negative-path test per datasource (one for mongo, one for arango is enough) that forces an error and verifies the histogram still recorded — otherwise the next refactor can silently undo this.

4. Deprecation comments mention removal but no version. The *Provider deprecation notes say "will be removed in a future release." Pick a target (next major, or a date) so this doesn't sit forever. If there's no concrete plan, drop the removal language and just say deprecated.

Nothing else blocking. Tests + lint are clean on touched files; observability surface (span names, metric names, attribute keys) is unchanged.

akshat-kumar-singhal and others added 2 commits April 30, 2026 11:17
… error-path metrics

Address PR gofr-dev#3102 review:
- Add container.DBResolverProvider arm to tracerName so AddDBResolver no longer
  silently drops tracing.
- instrumentDatasource now warns when a datasource implements UseTracer but has
  no matcher entry, surfacing future omissions instead of degrading silently.
- Add negative-path tests for mongo and arangodb asserting RecordHistogram still
  fires on operation errors.
- Drop "will be removed in a future release" language from *Provider deprecation
  notes; retain plain Deprecated guidance pointing at the duck-typed API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
akshat-kumar-singhal and others added 2 commits April 30, 2026 11:53
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…y ref in mongo tests

The mongo_test.go observability import and Test_RecordHistogramOnError landed
ahead of the observability package itself; reverting those test changes until
the package PR lands. Wraps long RecordHistogram EXPECT lines in arangodb
tests to satisfy lll.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Umang01-hash Umang01-hash merged commit a972528 into gofr-dev:development May 7, 2026
17 checks passed
@Umang01-hash Umang01-hash mentioned this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove ZopSmart specific things

4 participants