Java agent insturmentation added for Failsafe 3.0 by onurkybsi · Pull Request #15759 · open-telemetry/opentelemetry-java-instrumentation

onurkybsi · 2026-01-02T06:47:29Z

Only RetryPolicyInstrumentation added, CircuitBreakerInstrumentation will be added with another PR.

jaydeluca · 2026-01-28T11:59:17Z

+import net.bytebuddy.description.type.TypeDescription;
+import net.bytebuddy.matcher.ElementMatcher;
+
+final class RetryPolicyInstrumentation implements TypeInstrumentation {


Suggested change

final class RetryPolicyInstrumentation implements TypeInstrumentation {

public class RetryPolicyInstrumentation implements TypeInstrumentation {

Why is public needed?

that is how all other TypeInstrumentations in this project are, as well as how it is in the contributing docs

jaydeluca · 2026-01-28T12:01:07Z

+  public static final class BuildAdvice {
+    @Advice.OnMethodExit
+    public static void onExit(@Advice.Return Object retryPolicyImpl)
+        throws NoSuchFieldException, IllegalAccessException {


instead of allowing advice to throw exceptions which could break an application, we typically suppress them like:

Suggested change

public static final class BuildAdvice {

@Advice.OnMethodExit

public static void onExit(@Advice.Return Object retryPolicyImpl)

throws NoSuchFieldException, IllegalAccessException {

public static final class BuildAdvice {

@Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class)

public static void onExit(@Advice.Return Object retryPolicyImpl) {

jaydeluca · 2026-01-28T12:05:30Z

+      RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl;
+      FailsafeTelemetry failsafeTelemetry = FailsafeTelemetry.create(GlobalOpenTelemetry.get());
+
+      Field failureListenerField = PolicyConfig.class.getDeclaredField("failureListener");


could we cache these lookups for better efficiency?

public static final class BuildAdvice { private static final Field failureListenerField; private static final Field successListenerField; static { try { failureListenerField = PolicyConfig.class.getDeclaredField("failureListener"); failureListenerField.setAccessible(true); successListenerField = PolicyConfig.class.getDeclaredField("successListener"); successListenerField.setAccessible(true); } catch (NoSuchFieldException e) { // not sure if we should ignore or throw here throw new IllegalStateException("Failed to initialize Reflective fields", e); } } ... etc

Wouldn't throwing the exception break the application?

jaydeluca · 2026-01-28T12:09:38Z

+  protected abstract RetryPolicy<Object> configure(RetryPolicy<Object> userRetryPolicy);
+
+  @Test
+  public void captureCircuitBreakerMetrics() {


Suggested change

public void captureCircuitBreakerMetrics() {

void captureCircuitBreakerMetrics() {

For now, I can't do this since there is no way to disable the test in FailsafeInstrumentationTest otherwise.

jaydeluca

looks like theres a merge conflict in case you haven't seen it (sorry it took so long for me to get back to this)

onurkybsi · 2026-03-10T18:55:12Z

looks like theres a merge conflict in case you haven't seen it (sorry it took so long for me to get back to this)

Hey @jaydeluca, could you re-approve please? I applied your suggestion but I had to take it back as mentioned above.

trask · 2026-03-11T00:11:28Z

+      }
+
+      RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl;
+      FailsafeTelemetry failsafeTelemetry = FailsafeTelemetry.create(GlobalOpenTelemetry.get());


probably worth extracting FailsafeSingletons (see *Singletons pattern in other javaagent modules) so you can reuse the same FailsafeTelemetry instance all the time

trask · 2026-03-11T00:11:39Z

+    }
+
+    @Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class)
+    public static void onExit(@Advice.Return Object retryPolicyImpl) throws IllegalAccessException {


Suggested change

public static void onExit(@Advice.Return Object retryPolicyImpl) throws IllegalAccessException {

public static void onExit(@Advice.Return Object retryPolicyImpl) {

trask · 2026-03-11T00:23:12Z

+   * @param <R> {@link RetryPolicyConfig}'s result type
+   * @return instrumented failure listener
+   */
+  public <R> EventListener<ExecutionCompletedEvent<R>> createInstrumentedFailureListener(


do library users need createInstrumentedFailureListener/createInstrumentedSuccessListener? if they're only needed for javaagent instrumentation, let's move them out of public API surface

laurit · 2026-03-11T08:44:20Z

+import java.util.List;
+
+@AutoService(InstrumentationModule.class)
+public class FailsafeInstrumentationModule extends InstrumentationModule


we have a convention that instrumentations that don't follow any semantic conventions are disabled by default

laurit · 2026-03-11T08:48:02Z

+      RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl;
+      FailsafeTelemetry failsafeTelemetry = FailsafeTelemetry.create(GlobalOpenTelemetry.get());
+
+      FAILURE_LISTENER_FIELD.set(


why didn't you use the same strategy as in the library instrumentation where you build a new RetryPolicy from the configuration of the existing RetryPolicy?

Because FailsafeTelemetry.createRetryPolicy is using RetryPolicy.builder(...).build() and RetryPolicyInstrumentation.BuildAdvice.onExit runs on RetryPolicy.builder(...).build() exit. So, if I would do it, RetryPolicyInstrumentation.BuildAdvice.onExit would be called recursively again and again.

laurit · 2026-03-11T08:54:25Z

+
+      FAILURE_LISTENER_FIELD.set(
+          impl.getConfig(),
+          failsafeTelemetry.createInstrumentedFailureListener(impl.getConfig(), impl.toString()));


Do I understand that the value from impl.toString() will be added as an attribute to the metrics? I'd imagine that it won't be too useful as it won't let user figure out which policy this metric is about. @trask would such attribute be acceptable at all?

Hey, impl.toString would be used for failsafe.retry_policy.name attributes. In library instrumentation, user gives it through retryPolicyName parameter but I didn't know how to name the instrumented policies without user input. Any other suggestion for that?

what are the values of impl.toString()? is it something like dev.failsafe.internal.RetryPolicyImpl@1a2b3c4d? thanks

Yeah, that's true.

that's going to be high-cardinality (across restarts)

Could building a name using some config values be an option? As an example: RetryPolicy{maxRetries=3, delay=PT1S, maxDuration=PT30S}.

I think you have to consider what is the use of these metrics. If you use something like RetryPolicy{maxRetries=3, delay=PT1S, maxDuration=PT30S} then I'd guess this won't be unique and you could end up merging the values from multiple policies. Is that fine? If that is fine perhaps you don't need to add the retry policy name attribute at all. Would the metric still be useful this way?
impl.toString() isn't good because it has a high cardinality and also you won't be able to figure out which instrument it represents. Even policy-{counter} could be better in the sense that there at least is a chance to figure out which policy the metric belongs to if the policies are created in a predictable order. Though if they aren't then could also be completely misleading. Another possible alternative could be to capture where the policy was created from (perhaps some combination of class name, method name and line number), but this won't work if all the policies are created in a common util method. In short, there probably isn't a good option for producing a name.

Yeah, I get your points. What about adding policyName in the instrumented library(https://github.com/failsafe-lib/failsafe) and support only the versions that include this field for the Java agent, would that make sense?

hey, sure, that would work (we would still support existing versions, just wouldn't capture policy name unless available from the upstream library)

# Conflicts: # instrumentation/failsafe-3.0/library/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetry.java # Conflicts: # instrumentation/failsafe-3.0/library/src/test/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetryTest.java

# Conflicts: # instrumentation/failsafe-3.0/library/src/test/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetryTest.java

trask

Purely AI generated review below

Review of the Failsafe 3.0 javaagent instrumentation + testing module extraction. A few correctness/style issues worth addressing:

javaagent advice: @Advice.OnMethodExit has onThrowable = Throwable.class but the body is return-only (it casts @Advice.Return and derives values from it). On the exceptional path the return is null and the cast triggers a (suppressed) exception for no benefit — drop onThrowable.
Default policy name in javaagent mode: impl.toString() is used as the retry-policy name, which will produce uninformative values like dev.failsafe.internal.RetryPolicyImpl@2f4d3e5a as the failsafe.retry_policy.name attribute. Worth considering a cleaner fallback (e.g., the implementation class simple name, or just leaving the attribute out when no user-supplied name exists).
Reflection-based mutation of PolicyConfig: writing to private failureListener/successListener fields is brittle and muzzle cannot protect against these fields being renamed/removed in a future Failsafe version. Worth noting in a comment and/or considering a fail muzzle block or explicit guard.
Style nits: catch-parameter naming (ignored over e for unused), @Nullable in the testing module, redundant FQN dev.failsafe.RetryPolicy.builder(), and settings.gradle.kts alphabetical ordering.

See inline comments.

trask · 2026-04-20T00:37:50Z

+include(":instrumentation:failsafe-3.0:testing")
 include(":instrumentation:failsafe-3.0:library")
 include(":instrumentation:finagle-http-23.11:javaagent")


[Build] settings.gradle.kts entries must be in alphabetical order within the group. library sorts before testing, so the three failsafe-3.0 lines should be ordered javaagent, library, testing.

Suggested change

include(":instrumentation:failsafe-3.0:testing")

include(":instrumentation:failsafe-3.0:library")

include(":instrumentation:finagle-http-23.11:javaagent")

include(":instrumentation:failsafe-3.0:javaagent")

include(":instrumentation:failsafe-3.0:library")

include(":instrumentation:failsafe-3.0:testing")

trask · 2026-04-20T00:37:50Z

+      SUCCESS_LISTENER_FIELD = successListenerField;
+    }
+
+    @Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class)


[Javaagent] Return-only exit advice should omit onThrowable. This advice only processes @Advice.Return (it has no @Advice.Enter state to clean up). When build() throws, retryPolicyImpl is null and the cast on L62 triggers a (suppressed) ClassCastException/NullPointerException for no benefit. Keep suppress = Throwable.class but drop onThrowable.

Suggested change

@Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class)

@Advice.OnMethodExit(suppress = Throwable.class)

trask · 2026-04-20T00:37:50Z

+        failureListenerField.setAccessible(true);
+
+        successListenerField = PolicyConfig.class.getDeclaredField("successListener");
+        successListenerField.setAccessible(true);


[Style] The caught exception is intentionally unused, so the preferred name is ignored (the // Ignored comment can be removed — the name itself conveys intent and IntelliJ's CatchMayIgnoreException inspection recognizes it).

Suggested change

successListenerField.setAccessible(true);

} catch (Exception ignored) {

// field layout differs from the expected Failsafe 3.0 layout

}

trask · 2026-04-20T00:37:50Z

+                GlobalOpenTelemetry.get(), impl.getConfig(), impl.toString()));
+        SUCCESS_LISTENER_FIELD.set(
+            impl.getConfig(),
+            buildInstrumentedSuccessListener(
+                GlobalOpenTelemetry.get(), impl.getConfig(), impl.toString()));


[General] Using impl.toString() as the retry-policy name will produce values like dev.failsafe.internal.RetryPolicyImpl@1a2b3c4d for the failsafe.retry_policy.name metric attribute — high-cardinality and not human-meaningful. Since RetryPolicyBuilder does not expose a user-supplied name, consider either omitting the name attribute when none was provided, or using a stable placeholder such as the simple class name. Also, GlobalOpenTelemetry.get() is called twice — a single local would be clearer.

Hey @trask, I proposed a solution regarding this above, let me know if it makes sense.

trask · 2026-04-20T00:37:51Z

+      }
+
+      try {
+        RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl;


[Javaagent] This instrumentation mutates private fields of dev.failsafe.PolicyConfig (failureListener, successListener) via reflection. Muzzle cannot detect field renames/removals in future Failsafe versions, so a breaking change upstream will silently disable only the field-lookup static block (best case) or silently swap in listeners that clobber something unexpected (worst case). Consider:

Adding a comment documenting the tight coupling to the Failsafe 3.0 field layout.

Tightening the muzzle range (failsafe:[3.0,4) with a fail block for 4.0+) to force an explicit revisit when Failsafe 4 lands.

trask · 2026-04-20T00:37:51Z

+import java.time.Duration;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Consumer;
+import javax.annotation.Nullable;


[Testing] Per the style guide, @Nullable should not be used in test code (this module is a testing/ shared test base). Drop the import and the @Nullable on the method parameter below.

trask · 2026-04-20T00:37:51Z

+                                    buildCircuitBreakerAssertion(
+                                        1, "failsafe.circuit_breaker.state", "closed"))));
+  }
+


[Testing] Remove @Nullable — test code should not use nullability annotations.

Suggested change

protected void captureRetryPolicyMetrics(String expectedPolicyName) {

trask · 2026-04-20T00:37:51Z

+
+  protected void captureRetryPolicyMetrics(@Nullable String expectedPolicyName) {
+    RetryPolicy<Object> userRetryPolicy =
+        dev.failsafe.RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build();


[Style] RetryPolicy is already imported on line 15 — the fully qualified name here is redundant.

Suggested change

dev.failsafe.RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build();

RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build();

trask · 2026-04-20T00:37:51Z

+
+  private void captureRetryPolicyMetrics(
+      RetryPolicy<Object> instrumentedRetryPolicy, String expectedPolicyName) {
+    // given


[Style] Leftover // given comment with nothing under it — please remove.

trask · 2026-04-20T00:37:51Z

+  api("org.mockito:mockito-core")
+  api("org.mockito:mockito-junit-jupiter")


[Build] The abstract test base in this module does not reference Mockito — the mock(...)/when(...) usage lives only in the library's FailsafeTelemetryTest. Exposing mockito-core and mockito-junit-jupiter as api here leaks them into every downstream test module that consumes :instrumentation:failsafe-3.0:testing. Consider dropping these two dependencies from the shared testing module and letting individual test modules declare them as testImplementation themselves (the javaagent test doesn't need Mockito at all).

# Conflicts: # instrumentation/failsafe-3.0/library/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/RetryPolicyEventListenerBuilders.java # instrumentation/failsafe-3.0/library/src/test/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetryTest.java

onurkybsi requested a review from a team as a code owner January 2, 2026 06:47

onurkybsi force-pushed the failsafe-3.0-javaagent branch from c63d7f0 to 7d1d41b Compare January 27, 2026 06:36

jaydeluca reviewed Jan 28, 2026

View reviewed changes

onurkybsi requested a review from jaydeluca February 2, 2026 06:30

onurkybsi force-pushed the failsafe-3.0-javaagent branch from e07523a to f58ed1f Compare March 2, 2026 08:11

jaydeluca approved these changes Mar 9, 2026

View reviewed changes

Comment thread ...main/java/io/opentelemetry/instrumentation/failsafe/AbstractFailsafeInstrumentationTest.java

onurkybsi force-pushed the failsafe-3.0-javaagent branch from c7af16f to f5d866f Compare March 9, 2026 20:21

jaydeluca approved these changes Mar 10, 2026

View reviewed changes

trask reviewed Mar 11, 2026

View reviewed changes

laurit reviewed Mar 11, 2026

View reviewed changes

onurkybsi added 15 commits March 18, 2026 21:45

Failsafe testing module added

1fbe548

# Conflicts: # instrumentation/failsafe-3.0/library/src/test/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetryTest.java

Failsafe javaagent module added

347fdbc

Fossa configuration task executed

02c3c61

BuildAdvice was made public

dafe3f5

Attempt to fix failing pipeline task

5acf273

PR comments addressed

8f3e6fd

Compile issue fixed

a36f1d5

PR comments addressed

9332768

Minor changes after rebase main

c20d5be

Review comments addressed

b0a29f3

captureCircuitBreakerMetrics made public again

7095e91

Review comments addressed

c477ecd

Review comments addressed

e10b973

Javaagent tests fixed

c1b4af6

onurkybsi force-pushed the failsafe-3.0-javaagent branch from 6c7db3d to c1b4af6 Compare March 18, 2026 21:04

Unused buildRetryPolicyAssertion removed

9e7dd8e

Muzzle fix

47486f7

trask reviewed Apr 20, 2026

View reviewed changes

trask mentioned this pull request Apr 30, 2026

Pull Request Dashboard #18435

Open

github-actions Bot mentioned this pull request May 5, 2026

Pull Request Dashboard #18604

Closed

	final class RetryPolicyInstrumentation implements TypeInstrumentation {
	public class RetryPolicyInstrumentation implements TypeInstrumentation {

	public void captureCircuitBreakerMetrics() {
	void captureCircuitBreakerMetrics() {

	public static void onExit(@Advice.Return Object retryPolicyImpl) throws IllegalAccessException {
	public static void onExit(@Advice.Return Object retryPolicyImpl) {

	@Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class)
	@Advice.OnMethodExit(suppress = Throwable.class)

-        successListenerField.setAccessible(true);
+      } catch (Exception ignored) {
+        // field layout differs from the expected Failsafe 3.0 layout
+      }


	protected void captureRetryPolicyMetrics(String expectedPolicyName) {

	dev.failsafe.RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build();
	RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build();

		api("org.mockito:mockito-core")
		api("org.mockito:mockito-junit-jupiter")

Conversation

onurkybsi commented Jan 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

onurkybsi Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jaydeluca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

onurkybsi commented Mar 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

onurkybsi Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trask left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

onurkybsi Feb 2, 2026 •

edited

Loading

onurkybsi Mar 19, 2026 •

edited

Loading