Skip to content

feat: Add tracing channel support#16275

Open
logaretm wants to merge 12 commits into
Automattic:masterfrom
logaretm:feat/tracing-channel-support
Open

feat: Add tracing channel support#16275
logaretm wants to merge 12 commits into
Automattic:masterfrom
logaretm:feat/tracing-channel-support

Conversation

@logaretm
Copy link
Copy Markdown

@logaretm logaretm commented May 7, 2026

This PR explores adding tracing channels to enable agnostic observability for users and APMs alike without having to resort to monkey patching or module patchers.

Each operation type has its own dedicated TracingChannel, so subscribers only pay overhead for the operations they care about. The channels are:

Channel Operations
mongoose:query find, findOne, updateOne, deleteOne, deleteMany, count, distinct, etc.
mongoose:aggregate aggregate
mongoose:model:save save
mongoose:model:insertMany insertMany
mongoose:model:bulkWrite bulkWrite
mongoose:cursor:next Cursor iteration (both query and aggregation cursors)

Payload

All channels emit the following context:

Field Type Description
operation string The operation name (e.g. find, updateOne, aggregate)
collection string MongoDB collection name
database string Database name
serverAddress string DB host
serverPort number DB port
args Record<string, any> Operation-specific arguments

I also exported the types for use by consumers.

Example Consumer

const dc = require('node:diagnostics_channel');

// Subscribe to query operations only
const queryChannel = dc.tracingChannel('mongoose:query');

queryChannel.subscribe({
  start({ operation, collection, database, serverAddress, serverPort, args }) {
    console.log('[query:start]', operation, collection, args);
  },
  asyncEnd({ operation, collection, result }) {
    console.log('[query:end]', operation, collection);
  },
  error({ operation, collection, error }) {
    console.error('[query:error]', operation, collection, error);
  }
});

this is what it looks like after consuming the events in an APM, which is inline with current instrumentations today.

CleanShot 2026-05-07 at 13 08 23@2x

closes #16105

logaretm added 2 commits May 5, 2026 14:19
Add first-class Node.js TracingChannel (diagnostics_channel) support
so APM tools can instrument Mongoose without monkey-patching.

Four channels covering all operation types:
- mongoose:query (Query.exec - find, update, delete, count, distinct)
- mongoose:aggregate (Aggregate.exec)
- mongoose:save (Model.prototype.save)
- mongoose:model (insertMany, bulkWrite)

Zero-cost when no subscribers are registered.

Ref: Automattic#16105
Collapse 4 channels into one. All operations (find, save, aggregate,
insertMany, bulkWrite) emit on mongoose:query with a common payload:
operation, collection, database, serverAddress, serverPort, args.

The args field carries operation-specific data (filter, pipeline, docs,
ops) without imposing a typed shape — APM tools that need it can dig in.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class Node.js diagnostics_channel TracingChannel instrumentation to Mongoose using a single channel (mongoose:query) so APMs/observability tooling can trace Mongoose operations without monkey-patching.

Changes:

  • Introduces a new internal tracing helper (lib/tracing.js) that wraps promise-returning operations with TracingChannel#tracePromise().
  • Wraps core execution paths (Query#exec(), Aggregate#exec(), Model#save(), Model.insertMany(), Model.bulkWrite()) to emit tracing lifecycle events with operation + connection + args context.
  • Adds exported TypeScript types for the tracing payload and a comprehensive mocha test suite validating emitted events.

Reviewed changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
types/tracing.d.ts Adds the exported TracingContext type for tracing consumers.
types/index.d.ts Wires the new tracing type file into the published type entrypoint.
lib/tracing.js Adds the internal tracing channel + trace() wrapper helper.
lib/query.js Wraps Query.prototype.exec() in trace() and emits query context.
lib/aggregate.js Wraps Aggregate.prototype.exec() in trace() and emits aggregate context.
lib/model.js Wraps save, insertMany, and bulkWrite in trace() and emits model-operation context.
test/tracing.test.js Adds tests validating tracing lifecycle events across operation types.

@logaretm
Copy link
Copy Markdown
Author

Copilot deciding not to wipe the floor with this PR with endless comments is a rare one.

Copy link
Copy Markdown
Collaborator

@vkarpov15 vkarpov15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very neat! With a little bit of work I was able to get some basic spans working with HyperDX:

Image

and desktop-otel-viewer:

Image

I just had some high level questions/concerns. Sorry if any of them seem nitpicky, I'm just trying to understand as much of the potential consequences as I can because I'm not an expert on APM.

Comment thread lib/query.js
Comment thread lib/query.js Outdated
Comment thread lib/aggregate.js Outdated
Comment thread lib/aggregate.js
*/

Aggregate.prototype[Symbol.asyncIterator] = function() {
return this.cursor({ useMongooseAggCursor: true }).transformNull()._transformForAsyncIterator();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you recommend we handle cursors for the purposes of tracing channels? Cursors are effectively query iterators: they let you call next() to get the next batch of results to avoid loading the full result set into memory at once. Cursors can also be used as streams or with for/await/of. Should we wrap cursor next() in trace() as well?

Copy link
Copy Markdown
Author

@logaretm logaretm May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @Qard will have some insights here, tracing channels lifecycle model doesn't work with async iterators because they don't exactly have a known start/finish signals and both the start and finish may happen in different async contexts.

so we have two options, either:

  • Trace next calls, allowing APMs to create a span for each call or async tick in this case.
  • Wrap cursors with a trace to cover the whole thing, but there is no guarantee the iterator will be fully consumed, so if an API creates a span for it, it may never end and will block any child spans from being reported depending on the implementation.

This is very similar to streams, tracing them is tricky because u either create a span right away and patch its end time, but u risk extending your buffer duration for an unknown time, blocking child spans. Or you create a span later when it is done and override the start time, but u risk creating a span without any trace relationship. Both aren't great options.

I would say we can set them aside for now, and once we figure out a way for it we can create a new channel for them. @Qard, thoughts on this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this, I think the best approach for now would be to wrap next(). So given const cursor = await User.find().cursor(), you could see how long each next() call took. Is there any downside to tracing next()?

Ideally, it would be great if the cursor itself was a span, and next() calls as sub-spans of the cursor itself. But you are right that there's no guarantee that the cursor will be fully consumed. For example, change streams are effectively cursors, and those can last for the entire application lifecycle.

Copy link
Copy Markdown
Author

@logaretm logaretm May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there are any downsides, and it really drives the point of all of this effort here: libraries should own their observability and expose what they believe is useful, APMs can then pick and choose.

I think we could create a specific channel for the next calls to allow APMs to be register their interest or skip it. Since I think cursors can be noisy, right?

Like a mongoose:cursor:next which leaves the door open to another channel in the future that overarches the entire cursor lifecycle if we figure out a way to do it. I will add the next tracing right now.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, iterators are not yet a solved problem for diagnostics_channel. There is work in nodejs/node#62641 to address that need now that async iterators are becoming a common interface for AI SDKs, and that can apply similarly to tracing libraries. If you do want to do something with that now, I would suggest looking at that Node.js core PR and try to align with that.

Comment thread lib/model.js
Model.bulkWrite = function bulkWrite(ops, options) {
_checkContext(this, 'bulkWrite');

if (typeof options === 'function' ||
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the trade-offs of including these sort of error checks within the trace vs outside of the trace? For example, if user passes in a callback, as written that error would bypass tracing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a design choice, I saw those as API validation so no actual work is being done but maybe you see it differently and want APMs to report those.

Up to you, I don't have strong opinions on this one.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want these either inside the traced function or restore the async function marker as otherwise these will throw rather than reject.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! good catch!

Comment thread lib/tracing.js
Comment thread types/tracing.d.ts
@vkarpov15
Copy link
Copy Markdown
Collaborator

Thanks! I have a couple of more questions: #16275 (comment) , #16275 (comment) . Otherwise I think this should be ready to go for 9.7 release.

@vkarpov15 vkarpov15 added this to the 9.7 milestone May 14, 2026
@logaretm
Copy link
Copy Markdown
Author

Thanks for the review and comments, I replied to the last two!

logaretm added 2 commits May 16, 2026 16:40
Adds a dedicated `mongoose:cursor:next` channel that traces each
`next()` call on both QueryCursor and AggregationCursor. Kept separate
from `mongoose:query` so APMs can subscribe independently and a future
`mongoose:cursor` parent span channel won't conflict.

Also refactors tracing.js: bulletproof dc import with process guard and
try/catch, and a createTracedChannel factory to eliminate duplication.
Consumers import { queryChannel } or { cursorNextChannel } and use
.trace() or .channel directly. Adding a new channel is a single line
in tracing.js with no export changes needed.
@logaretm logaretm force-pushed the feat/tracing-channel-support branch from a730066 to 2f74ac0 Compare May 16, 2026 20:41
Exposes cursor-specific metadata in the mongoose:cursor:next channel
so APMs can profile batch performance and identify tailable cursors.
Copy link
Copy Markdown

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly seems fine, but I would suggest undoing the removal of the async keyword in a few spots, and you may want to consider splitting up to more channels by source so subscribers can be a bit more granular.

Comment thread lib/model.js Outdated
*/

Model.insertMany = async function insertMany(arr, options) {
Model.insertMany = function insertMany(arr, options) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would result in the input validation error being thrown rather than rejected. This should probably be left as it was.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I think I dropped it while refactoring from anon functions...

Comment thread lib/model.js
Model.bulkWrite = function bulkWrite(ops, options) {
_checkContext(this, 'bulkWrite');

if (typeof options === 'function' ||
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want these either inside the traced function or restore the async function marker as otherwise these will throw rather than reject.

Comment thread lib/tracing.js
}

const traced = ch.tracePromise(fn, contextFactory());
traced.catch(noop);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you catching and ignoring the errors here? Are you trying to silence the unhandled rejection on that for some reason?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that tracePromise returns a new promise, so there is a chance that unhandledRejection is fired randomly if the promise rejects. Especially if there is more async work to be done after the traced call microtask, it's a defensive mechanism.

It doesn't silence the error as it is still returned to the user, if they handle it then it works, if they don't then it produces the warning as it would in their untraced path.

What do you think?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd leave it out, but I'm not against it. In my opinion the unhandledRejection event handling is a bit fragile, so I would not be against trying to silence it.

@logaretm
Copy link
Copy Markdown
Author

you may want to consider splitting up to more channels by source so subscribers can be a bit more granular.

Do you have channels in mind? the original proposal had:

  • mongoose:query
  • mongoose:aggregate
  • mongoose:save
  • mongoose:model (insertMany and bulkWrite)

Which felt like too much and vague sometimes on what the grouping needed to be for which operations. Do you have ideas on mind how to split them up? Right now there is a operation field which APMs can use to filter out what they don't want to handle but will still incur the overhead.

@Qard
Copy link
Copy Markdown

Qard commented May 20, 2026

I'd basically just raise the operation (whatever it may be) to be the channel name and allow listening to whichever operations one cares about. It's slightly more effort to listen to more channels, but it's a lot cheaper than branching/filtering on every message. The diagnostics_channel design with channel objects is intended to encourage making lots of granular channels and letting you subscribe to many different channels with the same subscriber if you want to. The subscriber receives the channel name as the second parameter, so you can filter with that when you use the same subscriber function on many channels. You only pay for as many channels as you subscribe to though, and you don't need to pay any cost if you only subscriber to one channel, which is presumed to be the default.

logaretm added 2 commits May 20, 2026 10:25
Aggregate operations now use a dedicated `mongoose:aggregate` channel
instead of sharing `mongoose:query`. This lets subscribers opt into
aggregate tracing independently, avoiding overhead for unrelated ops.
Each operation now has a dedicated channel so subscribers only pay
overhead for the operations they care about:

- mongoose:query (Query.exec)
- mongoose:aggregate (Aggregate.exec)
- mongoose:model:save
- mongoose:model:insertMany
- mongoose:model:bulkWrite
- mongoose:cursor:next

tracing.js exports createTracedChannel so each module manages its own
channel. Tests subscribe by channel name via diagnostics_channel directly.
@logaretm
Copy link
Copy Markdown
Author

I just split operations into dedicated TracingChannels so subscribers only pay overhead for what they
listen to:

  • mongoose:query Query.exec (find, findOne, updateOne, deleteOne, etc.)
  • mongoose:aggregate Aggregate.exec
  • mongoose:model:save document save
  • mongoose:model:insertMany insertMany
  • mongoose:model:bulkWrite bulkWrite
  • mongoose:cursor:next cursor iteration (query + aggregation)

@vkarpov15 Do you prefer this current split or did you think the previous state was better? The argument here is with fine grained channels, overhead is only incurred for the channels that the APM is interested in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TracingChannel Proposal for observability

4 participants