fix: emit telemetry eagerly in dev --logs to avoid SIGINT race#1372
fix: emit telemetry eagerly in dev --logs to avoid SIGINT race#1372Hweinstock wants to merge 1 commit into
Conversation
The global SIGINT handler in cli.ts calls process.exit(0) synchronously, which terminates the process before withCommandRunTelemetry can flush. Emit telemetry before the blocking server loop, matching the existing browser-mode pattern.
|
Claude Security Review: no high-confidence findings. (run) |
Package TarballHow to installgh release download pr-1372-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.14.2.tgz |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Thanks for the fix — diagnosis of the SIGINT race is correct, and the eager-emit pattern matches the existing browser-mode code path so it's reasonable to follow that precedent.
Two concerns before merging:
-
Failure telemetry regression. The previous
withCommandRunTelemetrywrapper recordedexit_reason: 'failure'(with classifiederror_name/error_source) whenserver.start()rejected oronExitfired with a non-zero code. After this change, those failures are emitted assuccessbecause the eager emit happens before startup, and the outercatch (error)at the bottom of the handler doesn't emit any failure telemetry. We not only lose failure signal, we actively mislabel crashes as successes. -
No test for the fix. The existing integ test for
dev --logs(integ-tests/dev-server.test.ts) spawns the dev server withouttelemetry.envand only asserts telemetry ondev <prompt>invocations, so nothing prevents this from regressing again. Since the whole point of the PR is to make telemetry actually fire on this code path, it would be good to extend that test (spawn withtelemetry.env, send SIGTERM/SIGINT, assert thedev_action: 'server',ui_mode: 'terminal'metric was emitted).
Details inline.
| client.emit('cli.command_run', 0, { | ||
| command_group: 'dev', | ||
| command: 'dev', | ||
| exit_reason: 'success', |
There was a problem hiding this comment.
Hardcoding exit_reason: 'success' here loses the failure-telemetry behavior the previous withCommandRunTelemetry wrapper provided. If server.start() rejects (e.g. binary not found, port grabbed between findAvailablePort and server.start), or onExit fires with a non-zero code, the await new Promise(...) below throws, the outer catch (error) at line 513 prints the error and exits 1 — but telemetry has already recorded this as a successful run. Previously trackCommandRun/classifyError would have emitted exit_reason: 'failure' with error_name and error_source.
A few ways to fix:
- (a) Move the eager success emit to just before the
await new Promiseand wrapserver.start()/ startup in a try-block such that failures emit a separate failure event (and skip the success emit). - (b) Don't emit eagerly — instead, refactor so the dev command's SIGINT handler can perform async cleanup (e.g. teach
setupGlobalCleanupto await registered async cleanups beforeprocess.exit), then keepwithCommandRunTelemetryand get correct success/failure telemetry for free. Bigger change but fixes the root cause and would also help browser-mode. - (c) Accept the regression and document it in a comment, mirroring the note above
runBrowserMode. Least work but means we lose error classification on dev-server startup failures.
My preference would be (a) since it's localized and preserves the existing failure-telemetry contract.
| } | ||
| ); | ||
| if (!devResult.success) throw devResult.error; | ||
| } |
There was a problem hiding this comment.
No test coverage for this fix. integ-tests/dev-server.test.ts already spawns agentcore dev --logs but doesn't pass telemetry.env and only asserts telemetry from the separate dev <prompt> invoke calls, so nothing currently verifies that the SIGINT fix works or guards against it regressing. Could you extend that test to spawn with telemetry.env, send SIGINT, wait for exit, and telemetry.assertMetricEmitted({ command: 'dev', dev_action: 'server', ui_mode: 'terminal', exit_reason: 'success' })?
Description
dev --logsnever emits telemetry because the global SIGINT handler incli.ts(setupGlobalCleanup) callsprocess.exit(0)synchronously, terminating the process beforewithCommandRunTelemetrycan flush. The dev command's own SIGINT handler (registered later) never runs.This change emits telemetry eagerly before the blocking server loop, matching the existing browser-mode pattern which has the same constraint.
Related Issue
Closes #
Documentation PR
N/A
Type of Change
Testing
How have you tested the change?
npm run test:unitandnpm run test:integnpm run typechecknpm run lintsrc/assets/, I rannpm run test:update-snapshotsand committed the updated snapshotsChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.