Skip to content

Java agent insturmentation added for Failsafe 3.0#15759

Open
onurkybsi wants to merge 18 commits into
open-telemetry:mainfrom
onurkybsi:failsafe-3.0-javaagent
Open

Java agent insturmentation added for Failsafe 3.0#15759
onurkybsi wants to merge 18 commits into
open-telemetry:mainfrom
onurkybsi:failsafe-3.0-javaagent

Conversation

@onurkybsi

Copy link
Copy Markdown
Contributor

Only RetryPolicyInstrumentation added, CircuitBreakerInstrumentation will be added with another PR.

@onurkybsi onurkybsi requested a review from a team as a code owner January 2, 2026 06:47
@onurkybsi onurkybsi force-pushed the failsafe-3.0-javaagent branch from c63d7f0 to 7d1d41b Compare January 27, 2026 06:36
import net.bytebuddy.description.type.TypeDescription;
import net.bytebuddy.matcher.ElementMatcher;

final class RetryPolicyInstrumentation implements TypeInstrumentation {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final class RetryPolicyInstrumentation implements TypeInstrumentation {
public class RetryPolicyInstrumentation implements TypeInstrumentation {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is public needed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is how all other TypeInstrumentations in this project are, as well as how it is in the contributing docs

Comment on lines +34 to +37
public static final class BuildAdvice {
@Advice.OnMethodExit
public static void onExit(@Advice.Return Object retryPolicyImpl)
throws NoSuchFieldException, IllegalAccessException {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of allowing advice to throw exceptions which could break an application, we typically suppress them like:

Suggested change
public static final class BuildAdvice {
@Advice.OnMethodExit
public static void onExit(@Advice.Return Object retryPolicyImpl)
throws NoSuchFieldException, IllegalAccessException {
public static final class BuildAdvice {
@Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class)
public static void onExit(@Advice.Return Object retryPolicyImpl) {

RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl;
FailsafeTelemetry failsafeTelemetry = FailsafeTelemetry.create(GlobalOpenTelemetry.get());

Field failureListenerField = PolicyConfig.class.getDeclaredField("failureListener");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we cache these lookups for better efficiency?

public static final class BuildAdvice {
  private static final Field failureListenerField;
  private static final Field successListenerField;

  static {
    try {
      failureListenerField = PolicyConfig.class.getDeclaredField("failureListener");
      failureListenerField.setAccessible(true);

      successListenerField = PolicyConfig.class.getDeclaredField("successListener");
      successListenerField.setAccessible(true);
    } catch (NoSuchFieldException e) {
      // not sure if we should ignore or throw here
      throw new IllegalStateException("Failed to initialize Reflective fields", e);
    }
  }
... etc

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't throwing the exception break the application?

protected abstract RetryPolicy<Object> configure(RetryPolicy<Object> userRetryPolicy);

@Test
public void captureCircuitBreakerMetrics() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void captureCircuitBreakerMetrics() {
void captureCircuitBreakerMetrics() {

@onurkybsi onurkybsi Feb 2, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I can't do this since there is no way to disable the test in FailsafeInstrumentationTest otherwise.

Comment thread instrumentation/failsafe-3.0/javaagent/build.gradle.kts Outdated
@onurkybsi onurkybsi requested a review from jaydeluca February 2, 2026 06:30
@onurkybsi onurkybsi force-pushed the failsafe-3.0-javaagent branch from e07523a to f58ed1f Compare March 2, 2026 08:11

@jaydeluca jaydeluca left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like theres a merge conflict in case you haven't seen it (sorry it took so long for me to get back to this)

@onurkybsi onurkybsi force-pushed the failsafe-3.0-javaagent branch from c7af16f to f5d866f Compare March 9, 2026 20:21
@onurkybsi

Copy link
Copy Markdown
Contributor Author

looks like theres a merge conflict in case you haven't seen it (sorry it took so long for me to get back to this)

Hey @jaydeluca, could you re-approve please? I applied your suggestion but I had to take it back as mentioned above.

}

RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl;
FailsafeTelemetry failsafeTelemetry = FailsafeTelemetry.create(GlobalOpenTelemetry.get());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth extracting FailsafeSingletons (see *Singletons pattern in other javaagent modules) so you can reuse the same FailsafeTelemetry instance all the time

}

@Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class)
public static void onExit(@Advice.Return Object retryPolicyImpl) throws IllegalAccessException {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static void onExit(@Advice.Return Object retryPolicyImpl) throws IllegalAccessException {
public static void onExit(@Advice.Return Object retryPolicyImpl) {

* @param <R> {@link RetryPolicyConfig}'s result type
* @return instrumented failure listener
*/
public <R> EventListener<ExecutionCompletedEvent<R>> createInstrumentedFailureListener(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do library users need createInstrumentedFailureListener/createInstrumentedSuccessListener? if they're only needed for javaagent instrumentation, let's move them out of public API surface

import java.util.List;

@AutoService(InstrumentationModule.class)
public class FailsafeInstrumentationModule extends InstrumentationModule

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a convention that instrumentations that don't follow any semantic conventions are disabled by default

RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl;
FailsafeTelemetry failsafeTelemetry = FailsafeTelemetry.create(GlobalOpenTelemetry.get());

FAILURE_LISTENER_FIELD.set(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why didn't you use the same strategy as in the library instrumentation where you build a new RetryPolicy from the configuration of the existing RetryPolicy?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because FailsafeTelemetry.createRetryPolicy is using RetryPolicy.builder(...).build() and RetryPolicyInstrumentation.BuildAdvice.onExit runs on RetryPolicy.builder(...).build() exit. So, if I would do it, RetryPolicyInstrumentation.BuildAdvice.onExit would be called recursively again and again.


FAILURE_LISTENER_FIELD.set(
impl.getConfig(),
failsafeTelemetry.createInstrumentedFailureListener(impl.getConfig(), impl.toString()));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand that the value from impl.toString() will be added as an attribute to the metrics? I'd imagine that it won't be too useful as it won't let user figure out which policy this metric is about. @trask would such attribute be acceptable at all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, impl.toString would be used for failsafe.retry_policy.name attributes. In library instrumentation, user gives it through retryPolicyName parameter but I didn't know how to name the instrumented policies without user input. Any other suggestion for that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the values of impl.toString()? is it something like dev.failsafe.internal.RetryPolicyImpl@1a2b3c4d? thanks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's true.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's going to be high-cardinality (across restarts)

@onurkybsi onurkybsi Mar 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could building a name using some config values be an option? As an example: RetryPolicy{maxRetries=3, delay=PT1S, maxDuration=PT30S}.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to consider what is the use of these metrics. If you use something like RetryPolicy{maxRetries=3, delay=PT1S, maxDuration=PT30S} then I'd guess this won't be unique and you could end up merging the values from multiple policies. Is that fine? If that is fine perhaps you don't need to add the retry policy name attribute at all. Would the metric still be useful this way?
impl.toString() isn't good because it has a high cardinality and also you won't be able to figure out which instrument it represents. Even policy-{counter} could be better in the sense that there at least is a chance to figure out which policy the metric belongs to if the policies are created in a predictable order. Though if they aren't then could also be completely misleading. Another possible alternative could be to capture where the policy was created from (perhaps some combination of class name, method name and line number), but this won't work if all the policies are created in a common util method. In short, there probably isn't a good option for producing a name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I get your points. What about adding policyName in the instrumented library(https://github.com/failsafe-lib/failsafe) and support only the versions that include this field for the Java agent, would that make sense?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, sure, that would work (we would still support existing versions, just wouldn't capture policy name unless available from the upstream library)

# Conflicts:
#	instrumentation/failsafe-3.0/library/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetry.java

# Conflicts:
#	instrumentation/failsafe-3.0/library/src/test/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetryTest.java
# Conflicts:
#	instrumentation/failsafe-3.0/library/src/test/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetryTest.java
@onurkybsi onurkybsi force-pushed the failsafe-3.0-javaagent branch from 6c7db3d to c1b4af6 Compare March 18, 2026 21:04

@trask trask left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purely AI generated review below


Review of the Failsafe 3.0 javaagent instrumentation + testing module extraction. A few correctness/style issues worth addressing:

  • javaagent advice: @Advice.OnMethodExit has onThrowable = Throwable.class but the body is return-only (it casts @Advice.Return and derives values from it). On the exceptional path the return is null and the cast triggers a (suppressed) exception for no benefit — drop onThrowable.
  • Default policy name in javaagent mode: impl.toString() is used as the retry-policy name, which will produce uninformative values like dev.failsafe.internal.RetryPolicyImpl@2f4d3e5a as the failsafe.retry_policy.name attribute. Worth considering a cleaner fallback (e.g., the implementation class simple name, or just leaving the attribute out when no user-supplied name exists).
  • Reflection-based mutation of PolicyConfig: writing to private failureListener/successListener fields is brittle and muzzle cannot protect against these fields being renamed/removed in a future Failsafe version. Worth noting in a comment and/or considering a fail muzzle block or explicit guard.
  • Style nits: catch-parameter naming (ignored over e for unused), @Nullable in the testing module, redundant FQN dev.failsafe.RetryPolicy.builder(), and settings.gradle.kts alphabetical ordering.

See inline comments.

Comment thread settings.gradle.kts
Comment on lines +274 to 276
include(":instrumentation:failsafe-3.0:testing")
include(":instrumentation:failsafe-3.0:library")
include(":instrumentation:finagle-http-23.11:javaagent")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Build] settings.gradle.kts entries must be in alphabetical order within the group. library sorts before testing, so the three failsafe-3.0 lines should be ordered javaagent, library, testing.

Suggested change
include(":instrumentation:failsafe-3.0:testing")
include(":instrumentation:failsafe-3.0:library")
include(":instrumentation:finagle-http-23.11:javaagent")
include(":instrumentation:failsafe-3.0:javaagent")
include(":instrumentation:failsafe-3.0:library")
include(":instrumentation:failsafe-3.0:testing")

SUCCESS_LISTENER_FIELD = successListenerField;
}

@Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Javaagent] Return-only exit advice should omit onThrowable. This advice only processes @Advice.Return (it has no @Advice.Enter state to clean up). When build() throws, retryPolicyImpl is null and the cast on L62 triggers a (suppressed) ClassCastException/NullPointerException for no benefit. Keep suppress = Throwable.class but drop onThrowable.

Suggested change
@Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class)
@Advice.OnMethodExit(suppress = Throwable.class)

failureListenerField.setAccessible(true);

successListenerField = PolicyConfig.class.getDeclaredField("successListener");
successListenerField.setAccessible(true);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] The caught exception is intentionally unused, so the preferred name is ignored (the // Ignored comment can be removed — the name itself conveys intent and IntelliJ's CatchMayIgnoreException inspection recognizes it).

Suggested change
successListenerField.setAccessible(true);
} catch (Exception ignored) {
// field layout differs from the expected Failsafe 3.0 layout
}

Comment on lines +67 to +71
GlobalOpenTelemetry.get(), impl.getConfig(), impl.toString()));
SUCCESS_LISTENER_FIELD.set(
impl.getConfig(),
buildInstrumentedSuccessListener(
GlobalOpenTelemetry.get(), impl.getConfig(), impl.toString()));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[General] Using impl.toString() as the retry-policy name will produce values like dev.failsafe.internal.RetryPolicyImpl@1a2b3c4d for the failsafe.retry_policy.name metric attribute — high-cardinality and not human-meaningful. Since RetryPolicyBuilder does not expose a user-supplied name, consider either omitting the name attribute when none was provided, or using a stable placeholder such as the simple class name. Also, GlobalOpenTelemetry.get() is called twice — a single local would be clearer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @trask, I proposed a solution regarding this above, let me know if it makes sense.

}

try {
RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Javaagent] This instrumentation mutates private fields of dev.failsafe.PolicyConfig (failureListener, successListener) via reflection. Muzzle cannot detect field renames/removals in future Failsafe versions, so a breaking change upstream will silently disable only the field-lookup static block (best case) or silently swap in listeners that clobber something unexpected (worst case). Consider:

  • Adding a comment documenting the tight coupling to the Failsafe 3.0 field layout.
  • Tightening the muzzle range (failsafe:[3.0,4) with a fail block for 4.0+) to force an explicit revisit when Failsafe 4 lands.

import java.time.Duration;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Consumer;
import javax.annotation.Nullable;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Testing] Per the style guide, @Nullable should not be used in test code (this module is a testing/ shared test base). Drop the import and the @Nullable on the method parameter below.

buildCircuitBreakerAssertion(
1, "failsafe.circuit_breaker.state", "closed"))));
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Testing] Remove @Nullable — test code should not use nullability annotations.

Suggested change
protected void captureRetryPolicyMetrics(String expectedPolicyName) {


protected void captureRetryPolicyMetrics(@Nullable String expectedPolicyName) {
RetryPolicy<Object> userRetryPolicy =
dev.failsafe.RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] RetryPolicy is already imported on line 15 — the fully qualified name here is redundant.

Suggested change
dev.failsafe.RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build();
RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build();


private void captureRetryPolicyMetrics(
RetryPolicy<Object> instrumentedRetryPolicy, String expectedPolicyName) {
// given

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] Leftover // given comment with nothing under it — please remove.

Comment on lines +7 to +8
api("org.mockito:mockito-core")
api("org.mockito:mockito-junit-jupiter")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Build] The abstract test base in this module does not reference Mockito — the mock(...)/when(...) usage lives only in the library's FailsafeTelemetryTest. Exposing mockito-core and mockito-junit-jupiter as api here leaks them into every downstream test module that consumes :instrumentation:failsafe-3.0:testing. Consider dropping these two dependencies from the shared testing module and letting individual test modules declare them as testImplementation themselves (the javaagent test doesn't need Mockito at all).

# Conflicts:
#	instrumentation/failsafe-3.0/library/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/RetryPolicyEventListenerBuilders.java
#	instrumentation/failsafe-3.0/library/src/test/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetryTest.java
@github-actions github-actions Bot mentioned this pull request May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants