feat(observability): add @sourceloop/observability with OTEL-first bootstrap, profiles, and pluggable instrumentation#2502
feat(observability): add @sourceloop/observability with OTEL-first bootstrap, profiles, and pluggable instrumentation#2502samarpan-b wants to merge 3 commits intomasterfrom
Conversation
…observability and tracing gh-0
Remediation Agent Summary 📊
Suggested fixes (4)
Issues requiring manual fix (2)
🤖 Agent created PR #2503
|
There was a problem hiding this comment.
Pull request overview
Adds a new @sourceloop/observability package that bootstraps OpenTelemetry tracing early in service startup, with optional LoopBack integration (bindings + extension point) and a small helper API for spans/correlation/errors.
Changes:
- Introduces a new
packages/observabilityworkspace package (bootstrap/runtime, LoopBack component, profiles, config resolution/validation, tracing helpers). - Adds built-in OTLP-based profiles (
default,newrelic,signoz,datadog) and optional auto-instrumentation loading via optional peer deps. - Adds unit/integration tests and wires the new package into the monorepo workspace + lockfile.
Reviewed changes
Copilot reviewed 37 out of 38 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/observability/tsconfig.json | TypeScript build configuration for the new package |
| packages/observability/src/types.ts | Public types for config, profiles, runtime, and instrumentations |
| packages/observability/src/tracing.ts | withSpan / addSpanAttributes helpers |
| packages/observability/src/runtime.ts | Singleton runtime state management helpers |
| packages/observability/src/profiles/signoz.profile.ts | SigNoz OTLP defaults profile |
| packages/observability/src/profiles/registry.service.ts | LoopBack extension-point registry for profiles |
| packages/observability/src/profiles/otlp.profile.ts | Base OTLP profile: resource/sampler/exporter + auto-instrumentations |
| packages/observability/src/profiles/newrelic.profile.ts | New Relic OTLP defaults profile |
| packages/observability/src/profiles/keys.ts | LoopBack extension-point keys + binding template helper |
| packages/observability/src/profiles/instrumentations.ts | Optional peer-dep instrumentation loading + toggles |
| packages/observability/src/profiles/index.ts | Public exports for profiles |
| packages/observability/src/profiles/datadog.profile.ts | Datadog OTLP defaults profile |
| packages/observability/src/keys.ts | LoopBack binding keys for config/runtime/resolved config |
| packages/observability/src/index.ts | Package public API surface |
| packages/observability/src/errors.ts | recordException span helper |
| packages/observability/src/correlation.ts | Trace context propagation/correlation helpers |
| packages/observability/src/config/validate-config.ts | Startup validation (dependencies + coherent toggles) |
| packages/observability/src/config/resolve-config.ts | Env/bootstrap/DI config resolution logic |
| packages/observability/src/config/defaults.ts | Default resolved config + default instrumentation toggles |
| packages/observability/src/component.ts | Optional LoopBack component wiring + built-in profile bindings |
| packages/observability/src/bootstrap.ts | Core bootstrap/shutdown/runtime creation |
| packages/observability/src/tests/unit/validate-config.unit.ts | Unit test for config validation |
| packages/observability/src/tests/unit/resolve-config.unit.ts | Unit test for config resolution precedence |
| packages/observability/src/tests/unit/profile-defaults.unit.ts | Unit tests for built-in profile defaults |
| packages/observability/src/tests/unit/instrumentations.unit.ts | Unit tests for instrumentation loading/enabling |
| packages/observability/src/tests/unit/README.md | Unit test folder marker doc |
| packages/observability/src/tests/integration/runtime.integration.ts | Integration test for runtime initialization + helpers |
| packages/observability/src/tests/integration/README.md | Integration test folder marker doc |
| packages/observability/src/tests/acceptance/README.md | Acceptance test folder marker doc |
| packages/observability/package.json | New package manifest (deps/peers/scripts) |
| packages/observability/README.md | Package documentation (usage, resolution model, profiles) |
| packages/observability/.prettierrc | Prettier config for the new package |
| packages/observability/.prettierignore | Prettier ignore rules |
| packages/observability/.gitignore | Package-level ignore rules |
| packages/observability/.eslintrc.js | ESLint config (typed linting) |
| packages/observability/.eslintignore | ESLint ignore rules |
| package.json | Adds packages/observability/ to workspaces |
| package-lock.json | Lockfile updates for the new workspace package and dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return { | ||
| exporterName: loadOtlpExporter.name, | ||
| tracerProvider, | ||
| }; |
There was a problem hiding this comment.
ProfileInitResult.exporterName is currently hard-coded to loadOtlpExporter.name, even though the exporter instance is created via context.createExporter(config). This can report a misleading exporter name whenever a caller supplies a custom createExporter (the integration test does this with InMemorySpanExporter). Consider deriving exporterName from context.createExporter or from the created exporter instance (e.g., constructor name) so it reflects what was actually used.
| function assertDependencyInstalled(moduleName: string, message: string): void { | ||
| try { | ||
| require.resolve(moduleName); | ||
| } catch { | ||
| throw new Error(message); | ||
| } | ||
| } | ||
|
|
||
| function isDependencyInstalled(moduleName: string): boolean { | ||
| try { | ||
| require.resolve(moduleName); | ||
| return true; | ||
| } catch { | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
validateObservabilityConfig re-implements dependency resolution helpers (assertDependencyInstalled / isDependencyInstalled), while profiles/instrumentations.ts already contains very similar isModuleInstalled. Consider consolidating these helpers in one place to avoid divergence (e.g., inconsistent module resolution behavior) as new instrumentations are added.
| export function getRuntimeState(): RuntimeState { | ||
| return state; | ||
| } |
There was a problem hiding this comment.
getRuntimeState() returns the mutable singleton state object by reference. Because getRuntimeState is exported from the package public API, external callers can mutate internal runtime state without going through updateRuntimeState() / clearRuntimeState(), which can lead to hard-to-debug behavior (especially in tests). Consider returning a read-only view (e.g., Readonly<RuntimeState>) and/or a shallow copy to prevent accidental mutation.
| const exporterProtocol = (bootstrapOverrides?.exporterProtocol ?? | ||
| (env.OTEL_EXPORTER_OTLP_PROTOCOL as ExporterProtocol | undefined) ?? | ||
| DEFAULT_OBSERVABILITY_CONFIG.exporterProtocol) as ExporterProtocol; | ||
|
|
||
| const sampler = (bootstrapOverrides?.sampler ?? | ||
| (env.OTEL_TRACES_SAMPLER as SamplerName | undefined) ?? | ||
| DEFAULT_OBSERVABILITY_CONFIG.sampler) as SamplerName; | ||
|
|
There was a problem hiding this comment.
ResolvedObservabilityConfig.exporterProtocol is treated as a discriminant in loadOtlpExporter, but the value is pulled from env and then force-cast to ExporterProtocol in resolveBootstrapConfig(). If the env var contains an unexpected value, the code silently falls back to the HTTP exporter branch instead of failing fast. Consider validating exporterProtocol (and sampler) against the allowed set during config resolution or in validateObservabilityConfig, and throw a clear startup error for invalid values.
| function loadOtlpExporter(config: ResolvedObservabilityConfig): SpanExporter { | ||
| if (config.exporterProtocol === 'grpc') { | ||
| const otlpGrpc = require('@opentelemetry/exporter-trace-otlp-grpc') as { | ||
| OTLPTraceExporter: new (options?: object) => SpanExporter; | ||
| }; | ||
|
|
||
| return new otlpGrpc.OTLPTraceExporter({ | ||
| url: config.otlpEndpoint, | ||
| metadata: Object.keys(config.otlpHeaders).length | ||
| ? config.otlpHeaders | ||
| : undefined, | ||
| }); | ||
| } | ||
|
|
||
| const otlpHttp = require('@opentelemetry/exporter-trace-otlp-http') as { | ||
| OTLPTraceExporter: new (options?: object) => SpanExporter; | ||
| }; |
There was a problem hiding this comment.
loadOtlpExporter() uses require() and types the exporter constructor as new (options?: object) => SpanExporter, which removes compile-time checking of exporter options. This makes it easy to pass an invalid config shape (e.g., gRPC vs HTTP header/metadata options) without TypeScript catching it. Prefer using typed imports (or typeof import(...)) for the exporter modules so the options object is type-checked.
…plexity from #2502 (#2503) * fix: Commit 1 - Fully fix typescript:S1541 Commit 1 of SonarQube suggestions Fully fixed issues: - [typescript:S1541] AZ2I0JCED4ljEi2GS_-X: Function has a complexity of 13 which is greater than 10 authorized. - [typescript:S1541] AZ2I0JCED4ljEi2GS_-Y: Function has a complexity of 21 which is greater than 10 authorized. Generated by SonarQube Agent * fix: Commit 2 - Fully fix typescript:S1541, typescript:S134 Commit 2 of SonarQube suggestions Fully fixed issues: - [typescript:S1541] AZ2I0JE5D4ljEi2GS_-c: Function has a complexity of 14 which is greater than 10 authorized. - [typescript:S134] AZ2I0JE5D4ljEi2GS_-d: Refactor this code to not nest more than 3 if/for/while/switch/try statements. Generated by SonarQube Agent --------- Co-authored-by: sonarqube-agent[bot] <210722872+sonarqube-agent[bot]@users.noreply.github.com>
SonarQube reviewer guideSummary: Introduces a new Review Focus:
Start review at:
|
| } | ||
|
|
||
| initialize( | ||
| config: ResolvedObservabilityConfig, |
There was a problem hiding this comment.
I noticed that bootstrapObservability() in bootstrap.ts already calls profile.applyDefaults(resolvedConfig) before invoking profile.initialize(). Since this override calls this.applyDefaults(config) again before delegating to super, applyDefaults ends up running twice — headers get merged twice and vendor.apm gets set twice. The same pattern exists in DatadogObservabilityProfile and SignozObservabilityProfile.
Removing the initialize() override here (and in Datadog/Signoz) should be enough — BaseOtlpObservabilityProfile.initialize handles initialization fully, and applyDefaults is already called by the bootstrap flow.
// applyDefaults() is already called by bootstrapObservability() before initialize() is invoked.
// The base class implementation is sufficient — no need to override initialize() here.
export class NewRelicObservabilityProfile extends BaseOtlpObservabilityProfile {
name: ObservabilityProfileName = 'newrelic';
applyDefaults(config: ResolvedObservabilityConfig): ResolvedObservabilityConfig {
const licenseKey = process.env.NEW_RELIC_LICENSE_KEY?.trim();
return {
...config,
otlpEndpoint:
config.otlpEndpoint ??
(config.exporterProtocol === 'grpc'
? 'https://otlp.nr-data.net:4317'
: 'https://otlp.nr-data.net:4318/v1/traces'),
otlpHeaders: {
...(licenseKey ? {'api-key': licenseKey} : {}),
...config.otlpHeaders,
},
resourceAttributes: {
'vendor.apm': 'newrelic',
...config.resourceAttributes,
},
};
}
}With the double call, the config.otlpHeaders spread runs twice, which can put headers in a state that's tricky to trace back to the source.
| return await fn(); | ||
| } catch (error) { | ||
| span.recordException(error as Error); | ||
| throw error; |
There was a problem hiding this comment.
The catch block records the exception but doesn't follow up with span.setStatus({ code: SpanStatusCode.ERROR }). Per the OpenTelemetry spec, recording an exception and setting the span status are two separate operations — the exception is attached as an event on the span, but the span status stays UNSET unless explicitly updated. Most trace backends (Jaeger, Grafana Tempo, Datadog, New Relic) use the span status to colour and surface error traces, so without this, spans that throw would appear successful in dashboards.
Worth noting that recordException in errors.ts already handles this correctly — withSpan just needs to match it:
import {context, trace, SpanStatusCode} from '@opentelemetry/api';
// inside the catch block:
} catch (error) {
span.recordException(error as Error);
span.setStatus({
code: SpanStatusCode.ERROR,
message: error instanceof Error ? error.message : String(error),
});
throw error;
} finally {
span.end();
}This is the one thing I'd consider a hard blocker — without it the package would make errors harder to find, which is the opposite of what observability is for.
| validateObservabilityConfig(profileConfig); | ||
|
|
||
| const initResult = profile.initialize(profileConfig, { | ||
| createExporter: createOtlpExporter, |
There was a problem hiding this comment.
Two related reliability gaps I wanted to flag together here.
First — if profile.initialize() throws (an instrumentation ABI mismatch, a tracerProvider.register() failure, a missing peer dep slipping past validation), updateRuntimeState({profile, config}) has already run but updateRuntimeState({runtime}) hasn't. A subsequent call to bootstrapObservability() then sees runtime === undefined, re-enters against corrupted partial state, and the original error is never logged. This can be surprisingly hard to diagnose in production.
Second — await runtime.shutdown() in shutdownObservability() isn't wrapped in a try-catch. The OTel SDK's tracerProvider.shutdown() can reject on flush timeout, which becomes an unhandled rejection in a SIGTERM handler and crashes the process in Node 15+.
// bootstrapObservability — wrap initialization in try-catch
try {
const profile = resolveProfile(resolvedConfig.profile);
const profileConfig = profile.applyDefaults(resolvedConfig);
validateObservabilityConfig(profileConfig);
const initResult = profile.initialize(profileConfig, {
createExporter: createOtlpExporter,
});
const runtime: ObservabilityRuntime = {
enabled: true,
profile: profile.name,
config: profileConfig,
tracerProvider: initResult.tracerProvider,
async shutdown() {
await profile.shutdown?.();
clearRuntimeState();
},
};
updateRuntimeState({profile, config: profileConfig, runtime});
return runtime;
} catch (err) {
clearRuntimeState(); // prevent corrupted partial state on retry
console.error('[observability] Initialization failed. Tracing will be disabled.', err);
throw err;
}
// shutdownObservability — guard against rejection
export async function shutdownObservability(): Promise<void> {
const runtime = getRuntimeState().runtime;
if (!runtime) return;
try {
await runtime.shutdown();
} catch (err) {
// Log but don't rethrow — a shutdown error must not crash
// the process during SIGTERM handling.
console.error(
'[observability] Error during shutdown. Some spans may not have been flushed.',
err,
);
}
}The clearRuntimeState() on failure is the key part — it ensures a retry starts clean rather than against partial state.
There was a problem hiding this comment.
Really impressive work here — the 3-tier config pipeline, the BaseOtlpObservabilityProfile abstraction shared cleanly across all four vendor profiles, and the lazy require() approach for optional instrumentation are all the right design choices for a package like this. The ProfileBootstrapContext injection seam is a particularly nice touch that makes testing straightforward without pulling in real OTel packages.
A few things I've flagged inline that I'd like to see addressed before this merges:
Worth resolving before merge:
- Double
applyDefaultsinNewRelicObservabilityProfile,DatadogObservabilityProfile, andSignozObservabilityProfile—bootstrap.tsalready callsapplyDefaultsbeforeinitialize, so the overrides run it twice SpanStatusCode.ERRORmissing inwithSpancatch block — without this, error spans appear successful in every trace backend (Jaeger, Datadog, Grafana Tempo, etc.)- No error handling around
profile.initialize()inbootstrapObservability()— a failure there leaves the module singleton in a corrupted partial state with no logging; andshutdownObservability()needs a try-catch guard sincetracerProvider.shutdown()can reject in a SIGTERM handler
Happy to defer to a follow-up:
ObservabilityProfileRegistryis wired into the component butbootstrapObservability()never consults it — a custom profile registered via LoopBack DI is silently ignored today. Worth either connecting it or leaving a comment noting it's planned for a later iteration- Exporter peer deps (
@opentelemetry/exporter-trace-otlp-http,-grpc) don't have the same pre-flight check that instrumentation packages get invalidateObservabilityConfig— a missing exporter gives a rawMODULE_NOT_FOUNDinstead of a helpful install prompt - Test coverage for the idempotency guard,
shutdownObservability(), and thewithSpanerror rethrow path
Also worth confirming — are the prettier failures and Node 24 EBADENGINE warnings in CI pre-existing on the base branch, or introduced here?



Description
This PR introduces a new reusable package, @sourceloop/observability, to provide a generic OpenTelemetry-first observability layer for Sourceloop services.
The package is designed around:
Configuration model
Startup-critical observability config is resolved during bootstrap from:
LoopBack DI config is still supported, but only for component-level enrichment rather than early startup control.
Instrumentation behavior
By default, only:
are enabled.
All other built-in instrumentations are disabled by default and must be explicitly enabled through bootstrap config or env vars.
Instrumentation packages are optional peer dependencies. If an instrumentation is enabled but its package is not installed, bootstrap fails fast with a clear validation error.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Verified locally with:
Checklist: