Skip to content

handle test failure#2443

Merged
laurit merged 1 commit intoopen-telemetry:mainfrom
jackshirazi:fix-inferred-spans-npe
Nov 20, 2025
Merged

handle test failure#2443
laurit merged 1 commit intoopen-telemetry:mainfrom
jackshirazi:fix-inferred-spans-npe

Conversation

@jackshirazi
Copy link
Copy Markdown
Contributor

fixes #2429

@jackshirazi jackshirazi requested a review from a team as a code owner November 18, 2025 12:14
@laurit
Copy link
Copy Markdown
Contributor

laurit commented Nov 18, 2025

I don't see why it is a GC issue. WeakConcurrentMap is an identity map. To get an element back from it you need to use the same key as was used to add the element. The element from map can't be collected as long as you have a reference to the key. So when you get null back from the map it would mean that the key you are using was never added. Could you help me understand what I'm missing and how GC comes to play.

@jackshirazi
Copy link
Copy Markdown
Contributor Author

jackshirazi commented Nov 18, 2025

I believe this comes down to inferred spans being created when the profiles are resolved, which could well be after the parent span has gone out of scope. It could use a more comprehensive fix since this could occur even not in tests, but we've not seen any issues and it seems rare enough that I think this is adequate

Actually, I see that this can't be the case because we're using the span as the key, as you said. I'll think some more

@jackshirazi
Copy link
Copy Markdown
Contributor Author

You're right, I've tried to work through all the possibilities I can think of

  • GC causing the element to be removed
  • Span identity changed from serialization or other reconstruction
  • Race condition for the test, causing getAnchor() to be called before onSpanStart()

I can't find any mechanism that's causing the null. Obviously I've missed something because this is stack trace below. In any case, I think the change here is a valid workaround

CallTreeSpanifyTest > testSpanification() FAILED
    java.lang.NullPointerException: Cannot invoke "java.lang.Long.longValue()" because the return value of "com.blogspot.mydailyjava.weaklockfree.WeakConcurrentMap.get(Object)" is null
        at io.opentelemetry.contrib.inferredspans.internal.SpanAnchoredClock.getAnchor(SpanAnchoredClock.java:43)
        at io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent.set(SamplingProfiler.java:877)
        at io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent.activation(SamplingProfiler.java:858)
        at io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler.lambda$new$1(SamplingProfiler.java:191)
        at com.lmax.disruptor.RingBuffer.translateAndPublish(RingBuffer.java:986)
        at com.lmax.disruptor.RingBuffer.tryPublishEvent(RingBuffer.java:538)
        at io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler.onActivation(SamplingProfiler.java:321)
        at io.opentelemetry.contrib.inferredspans.internal.ProfilingActivationListener.beforeActivate(ProfilingActivationListener.java:121)
        at io.opentelemetry.contrib.inferredspans.internal.ProfilingActivationListener$ContextStorageWrapper.attach(ProfilingActivationListener.java:61)
        at io.opentelemetry.context.Context.makeCurrent(Context.java:231)
        at io.opentelemetry.context.ImplicitContextKeyed.makeCurrent(ImplicitContextKeyed.java:33)
        at io.opentelemetry.contrib.inferredspans.internal.CallTreeTest.getCallTree(CallTreeTest.java:1005)
        at io.opentelemetry.contrib.inferredspans.internal.CallTreeSpanifyTest.testSpanification(CallTreeSpanifyTest.java:50)

Copy link
Copy Markdown
Contributor

@SylvainJuge SylvainJuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is he effect of returning 0 here? Is this case already properly handled by callers of getAnchor method ?

@jackshirazi
Copy link
Copy Markdown
Contributor Author

What is he effect of returning 0 here? Is this case already properly handled by callers of getAnchor method ?

The FixedClock always return 0 for this, so it's consistent with that. In production, this would cause the inferred child span to have the wrong timestamp (though correct duration). Given that this failure only occurs very rarely, and I suspect is a test setup race case rather than a real one (eg the test creates an inactive profiler but artificially sets it's profilingSessionOngoing to true), I think it's fine to run with.

@jackshirazi jackshirazi changed the title handle GCed value handle test failure Nov 19, 2025
@laurit laurit merged commit 89a7086 into open-telemetry:main Nov 20, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inferred spans: sporadic test failure

3 participants