feat: Add tracing channel support#16275
Conversation
Add first-class Node.js TracingChannel (diagnostics_channel) support so APM tools can instrument Mongoose without monkey-patching. Four channels covering all operation types: - mongoose:query (Query.exec - find, update, delete, count, distinct) - mongoose:aggregate (Aggregate.exec) - mongoose:save (Model.prototype.save) - mongoose:model (insertMany, bulkWrite) Zero-cost when no subscribers are registered. Ref: Automattic#16105
Collapse 4 channels into one. All operations (find, save, aggregate, insertMany, bulkWrite) emit on mongoose:query with a common payload: operation, collection, database, serverAddress, serverPort, args. The args field carries operation-specific data (filter, pipeline, docs, ops) without imposing a typed shape — APM tools that need it can dig in.
There was a problem hiding this comment.
Pull request overview
This PR adds first-class Node.js diagnostics_channel TracingChannel instrumentation to Mongoose using a single channel (mongoose:query) so APMs/observability tooling can trace Mongoose operations without monkey-patching.
Changes:
- Introduces a new internal tracing helper (
lib/tracing.js) that wraps promise-returning operations withTracingChannel#tracePromise(). - Wraps core execution paths (
Query#exec(),Aggregate#exec(),Model#save(),Model.insertMany(),Model.bulkWrite()) to emit tracing lifecycle events with operation + connection + args context. - Adds exported TypeScript types for the tracing payload and a comprehensive mocha test suite validating emitted events.
Reviewed changes
Copilot reviewed 5 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| types/tracing.d.ts | Adds the exported TracingContext type for tracing consumers. |
| types/index.d.ts | Wires the new tracing type file into the published type entrypoint. |
| lib/tracing.js | Adds the internal tracing channel + trace() wrapper helper. |
| lib/query.js | Wraps Query.prototype.exec() in trace() and emits query context. |
| lib/aggregate.js | Wraps Aggregate.prototype.exec() in trace() and emits aggregate context. |
| lib/model.js | Wraps save, insertMany, and bulkWrite in trace() and emits model-operation context. |
| test/tracing.test.js | Adds tests validating tracing lifecycle events across operation types. |
|
Copilot deciding not to wipe the floor with this PR with endless comments is a rare one. |
vkarpov15
left a comment
There was a problem hiding this comment.
This is very neat! With a little bit of work I was able to get some basic spans working with HyperDX:
and desktop-otel-viewer:
I just had some high level questions/concerns. Sorry if any of them seem nitpicky, I'm just trying to understand as much of the potential consequences as I can because I'm not an expert on APM.
| */ | ||
|
|
||
| Aggregate.prototype[Symbol.asyncIterator] = function() { | ||
| return this.cursor({ useMongooseAggCursor: true }).transformNull()._transformForAsyncIterator(); |
There was a problem hiding this comment.
How do you recommend we handle cursors for the purposes of tracing channels? Cursors are effectively query iterators: they let you call next() to get the next batch of results to avoid loading the full result set into memory at once. Cursors can also be used as streams or with for/await/of. Should we wrap cursor next() in trace() as well?
There was a problem hiding this comment.
Maybe @Qard will have some insights here, tracing channels lifecycle model doesn't work with async iterators because they don't exactly have a known start/finish signals and both the start and finish may happen in different async contexts.
so we have two options, either:
- Trace
nextcalls, allowing APMs to create a span for each call or async tick in this case. - Wrap cursors with a trace to cover the whole thing, but there is no guarantee the iterator will be fully consumed, so if an API creates a span for it, it may never end and will block any child spans from being reported depending on the implementation.
This is very similar to streams, tracing them is tricky because u either create a span right away and patch its end time, but u risk extending your buffer duration for an unknown time, blocking child spans. Or you create a span later when it is done and override the start time, but u risk creating a span without any trace relationship. Both aren't great options.
I would say we can set them aside for now, and once we figure out a way for it we can create a new channel for them. @Qard, thoughts on this?
There was a problem hiding this comment.
Given this, I think the best approach for now would be to wrap next(). So given const cursor = await User.find().cursor(), you could see how long each next() call took. Is there any downside to tracing next()?
Ideally, it would be great if the cursor itself was a span, and next() calls as sub-spans of the cursor itself. But you are right that there's no guarantee that the cursor will be fully consumed. For example, change streams are effectively cursors, and those can last for the entire application lifecycle.
There was a problem hiding this comment.
I don't think there are any downsides, and it really drives the point of all of this effort here: libraries should own their observability and expose what they believe is useful, APMs can then pick and choose.
I think we could create a specific channel for the next calls to allow APMs to be register their interest or skip it. Since I think cursors can be noisy, right?
Like a mongoose:cursor:next which leaves the door open to another channel in the future that overarches the entire cursor lifecycle if we figure out a way to do it. I will add the next tracing right now.
There was a problem hiding this comment.
Yeah, iterators are not yet a solved problem for diagnostics_channel. There is work in nodejs/node#62641 to address that need now that async iterators are becoming a common interface for AI SDKs, and that can apply similarly to tracing libraries. If you do want to do something with that now, I would suggest looking at that Node.js core PR and try to align with that.
| Model.bulkWrite = function bulkWrite(ops, options) { | ||
| _checkContext(this, 'bulkWrite'); | ||
|
|
||
| if (typeof options === 'function' || |
There was a problem hiding this comment.
What are the trade-offs of including these sort of error checks within the trace vs outside of the trace? For example, if user passes in a callback, as written that error would bypass tracing.
There was a problem hiding this comment.
I think this is a design choice, I saw those as API validation so no actual work is being done but maybe you see it differently and want APMs to report those.
Up to you, I don't have strong opinions on this one.
There was a problem hiding this comment.
You'll want these either inside the traced function or restore the async function marker as otherwise these will throw rather than reject.
|
Thanks! I have a couple of more questions: #16275 (comment) , #16275 (comment) . Otherwise I think this should be ready to go for 9.7 release. |
|
Thanks for the review and comments, I replied to the last two! |
Adds a dedicated `mongoose:cursor:next` channel that traces each `next()` call on both QueryCursor and AggregationCursor. Kept separate from `mongoose:query` so APMs can subscribe independently and a future `mongoose:cursor` parent span channel won't conflict. Also refactors tracing.js: bulletproof dc import with process guard and try/catch, and a createTracedChannel factory to eliminate duplication.
Consumers import { queryChannel } or { cursorNextChannel } and use
.trace() or .channel directly. Adding a new channel is a single line
in tracing.js with no export changes needed.
a730066 to
2f74ac0
Compare
Exposes cursor-specific metadata in the mongoose:cursor:next channel so APMs can profile batch performance and identify tailable cursors.
Qard
left a comment
There was a problem hiding this comment.
Mostly seems fine, but I would suggest undoing the removal of the async keyword in a few spots, and you may want to consider splitting up to more channels by source so subscribers can be a bit more granular.
| */ | ||
|
|
||
| Model.insertMany = async function insertMany(arr, options) { | ||
| Model.insertMany = function insertMany(arr, options) { |
There was a problem hiding this comment.
This would result in the input validation error being thrown rather than rejected. This should probably be left as it was.
There was a problem hiding this comment.
Good catch, I think I dropped it while refactoring from anon functions...
| Model.bulkWrite = function bulkWrite(ops, options) { | ||
| _checkContext(this, 'bulkWrite'); | ||
|
|
||
| if (typeof options === 'function' || |
There was a problem hiding this comment.
You'll want these either inside the traced function or restore the async function marker as otherwise these will throw rather than reject.
| } | ||
|
|
||
| const traced = ch.tracePromise(fn, contextFactory()); | ||
| traced.catch(noop); |
There was a problem hiding this comment.
Why are you catching and ignoring the errors here? Are you trying to silence the unhandled rejection on that for some reason?
There was a problem hiding this comment.
I noticed that tracePromise returns a new promise, so there is a chance that unhandledRejection is fired randomly if the promise rejects. Especially if there is more async work to be done after the traced call microtask, it's a defensive mechanism.
It doesn't silence the error as it is still returned to the user, if they handle it then it works, if they don't then it produces the warning as it would in their untraced path.
What do you think?
There was a problem hiding this comment.
Personally, I'd leave it out, but I'm not against it. In my opinion the unhandledRejection event handling is a bit fragile, so I would not be against trying to silence it.
Do you have channels in mind? the original proposal had:
Which felt like too much and vague sometimes on what the grouping needed to be for which operations. Do you have ideas on mind how to split them up? Right now there is a |
|
I'd basically just raise the operation (whatever it may be) to be the channel name and allow listening to whichever operations one cares about. It's slightly more effort to listen to more channels, but it's a lot cheaper than branching/filtering on every message. The diagnostics_channel design with channel objects is intended to encourage making lots of granular channels and letting you subscribe to many different channels with the same subscriber if you want to. The subscriber receives the channel name as the second parameter, so you can filter with that when you use the same subscriber function on many channels. You only pay for as many channels as you subscribe to though, and you don't need to pay any cost if you only subscriber to one channel, which is presumed to be the default. |
Aggregate operations now use a dedicated `mongoose:aggregate` channel instead of sharing `mongoose:query`. This lets subscribers opt into aggregate tracing independently, avoiding overhead for unrelated ops.
Each operation now has a dedicated channel so subscribers only pay overhead for the operations they care about: - mongoose:query (Query.exec) - mongoose:aggregate (Aggregate.exec) - mongoose:model:save - mongoose:model:insertMany - mongoose:model:bulkWrite - mongoose:cursor:next tracing.js exports createTracedChannel so each module manages its own channel. Tests subscribe by channel name via diagnostics_channel directly.
|
I just split operations into dedicated
@vkarpov15 Do you prefer this current split or did you think the previous state was better? The argument here is with fine grained channels, overhead is only incurred for the channels that the APM is interested in. |
This PR explores adding tracing channels to enable agnostic observability for users and APMs alike without having to resort to monkey patching or module patchers.
Each operation type has its own dedicated
TracingChannel, so subscribers only pay overhead for the operations they care about. The channels are:mongoose:queryfind,findOne,updateOne,deleteOne,deleteMany,count,distinct, etc.mongoose:aggregateaggregatemongoose:model:savesavemongoose:model:insertManyinsertManymongoose:model:bulkWritebulkWritemongoose:cursor:nextPayload
All channels emit the following context:
operationstringfind,updateOne,aggregate)collectionstringdatabasestringserverAddressstringserverPortnumberargsRecord<string, any>I also exported the types for use by consumers.
Example Consumer
this is what it looks like after consuming the events in an APM, which is inline with current instrumentations today.
closes #16105