Java agent insturmentation added for Failsafe 3.0#15759
Conversation
c63d7f0 to
7d1d41b
Compare
| import net.bytebuddy.description.type.TypeDescription; | ||
| import net.bytebuddy.matcher.ElementMatcher; | ||
|
|
||
| final class RetryPolicyInstrumentation implements TypeInstrumentation { |
There was a problem hiding this comment.
| final class RetryPolicyInstrumentation implements TypeInstrumentation { | |
| public class RetryPolicyInstrumentation implements TypeInstrumentation { |
There was a problem hiding this comment.
Why is public needed?
There was a problem hiding this comment.
that is how all other TypeInstrumentations in this project are, as well as how it is in the contributing docs
| public static final class BuildAdvice { | ||
| @Advice.OnMethodExit | ||
| public static void onExit(@Advice.Return Object retryPolicyImpl) | ||
| throws NoSuchFieldException, IllegalAccessException { |
There was a problem hiding this comment.
instead of allowing advice to throw exceptions which could break an application, we typically suppress them like:
| public static final class BuildAdvice { | |
| @Advice.OnMethodExit | |
| public static void onExit(@Advice.Return Object retryPolicyImpl) | |
| throws NoSuchFieldException, IllegalAccessException { | |
| public static final class BuildAdvice { | |
| @Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class) | |
| public static void onExit(@Advice.Return Object retryPolicyImpl) { |
| RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl; | ||
| FailsafeTelemetry failsafeTelemetry = FailsafeTelemetry.create(GlobalOpenTelemetry.get()); | ||
|
|
||
| Field failureListenerField = PolicyConfig.class.getDeclaredField("failureListener"); |
There was a problem hiding this comment.
could we cache these lookups for better efficiency?
public static final class BuildAdvice {
private static final Field failureListenerField;
private static final Field successListenerField;
static {
try {
failureListenerField = PolicyConfig.class.getDeclaredField("failureListener");
failureListenerField.setAccessible(true);
successListenerField = PolicyConfig.class.getDeclaredField("successListener");
successListenerField.setAccessible(true);
} catch (NoSuchFieldException e) {
// not sure if we should ignore or throw here
throw new IllegalStateException("Failed to initialize Reflective fields", e);
}
}
... etcThere was a problem hiding this comment.
Wouldn't throwing the exception break the application?
| protected abstract RetryPolicy<Object> configure(RetryPolicy<Object> userRetryPolicy); | ||
|
|
||
| @Test | ||
| public void captureCircuitBreakerMetrics() { |
There was a problem hiding this comment.
| public void captureCircuitBreakerMetrics() { | |
| void captureCircuitBreakerMetrics() { |
There was a problem hiding this comment.
For now, I can't do this since there is no way to disable the test in FailsafeInstrumentationTest otherwise.
e07523a to
f58ed1f
Compare
jaydeluca
left a comment
There was a problem hiding this comment.
looks like theres a merge conflict in case you haven't seen it (sorry it took so long for me to get back to this)
c7af16f to
f5d866f
Compare
Hey @jaydeluca, could you re-approve please? I applied your suggestion but I had to take it back as mentioned above. |
| } | ||
|
|
||
| RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl; | ||
| FailsafeTelemetry failsafeTelemetry = FailsafeTelemetry.create(GlobalOpenTelemetry.get()); |
There was a problem hiding this comment.
probably worth extracting FailsafeSingletons (see *Singletons pattern in other javaagent modules) so you can reuse the same FailsafeTelemetry instance all the time
| } | ||
|
|
||
| @Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class) | ||
| public static void onExit(@Advice.Return Object retryPolicyImpl) throws IllegalAccessException { |
There was a problem hiding this comment.
| public static void onExit(@Advice.Return Object retryPolicyImpl) throws IllegalAccessException { | |
| public static void onExit(@Advice.Return Object retryPolicyImpl) { |
| * @param <R> {@link RetryPolicyConfig}'s result type | ||
| * @return instrumented failure listener | ||
| */ | ||
| public <R> EventListener<ExecutionCompletedEvent<R>> createInstrumentedFailureListener( |
There was a problem hiding this comment.
do library users need createInstrumentedFailureListener/createInstrumentedSuccessListener? if they're only needed for javaagent instrumentation, let's move them out of public API surface
| import java.util.List; | ||
|
|
||
| @AutoService(InstrumentationModule.class) | ||
| public class FailsafeInstrumentationModule extends InstrumentationModule |
There was a problem hiding this comment.
we have a convention that instrumentations that don't follow any semantic conventions are disabled by default
| RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl; | ||
| FailsafeTelemetry failsafeTelemetry = FailsafeTelemetry.create(GlobalOpenTelemetry.get()); | ||
|
|
||
| FAILURE_LISTENER_FIELD.set( |
There was a problem hiding this comment.
why didn't you use the same strategy as in the library instrumentation where you build a new RetryPolicy from the configuration of the existing RetryPolicy?
There was a problem hiding this comment.
Because FailsafeTelemetry.createRetryPolicy is using RetryPolicy.builder(...).build() and RetryPolicyInstrumentation.BuildAdvice.onExit runs on RetryPolicy.builder(...).build() exit. So, if I would do it, RetryPolicyInstrumentation.BuildAdvice.onExit would be called recursively again and again.
|
|
||
| FAILURE_LISTENER_FIELD.set( | ||
| impl.getConfig(), | ||
| failsafeTelemetry.createInstrumentedFailureListener(impl.getConfig(), impl.toString())); |
There was a problem hiding this comment.
Do I understand that the value from impl.toString() will be added as an attribute to the metrics? I'd imagine that it won't be too useful as it won't let user figure out which policy this metric is about. @trask would such attribute be acceptable at all?
There was a problem hiding this comment.
Hey, impl.toString would be used for failsafe.retry_policy.name attributes. In library instrumentation, user gives it through retryPolicyName parameter but I didn't know how to name the instrumented policies without user input. Any other suggestion for that?
There was a problem hiding this comment.
what are the values of impl.toString()? is it something like dev.failsafe.internal.RetryPolicyImpl@1a2b3c4d? thanks
There was a problem hiding this comment.
Yeah, that's true.
There was a problem hiding this comment.
that's going to be high-cardinality (across restarts)
There was a problem hiding this comment.
Could building a name using some config values be an option? As an example: RetryPolicy{maxRetries=3, delay=PT1S, maxDuration=PT30S}.
There was a problem hiding this comment.
I think you have to consider what is the use of these metrics. If you use something like RetryPolicy{maxRetries=3, delay=PT1S, maxDuration=PT30S} then I'd guess this won't be unique and you could end up merging the values from multiple policies. Is that fine? If that is fine perhaps you don't need to add the retry policy name attribute at all. Would the metric still be useful this way?
impl.toString() isn't good because it has a high cardinality and also you won't be able to figure out which instrument it represents. Even policy-{counter} could be better in the sense that there at least is a chance to figure out which policy the metric belongs to if the policies are created in a predictable order. Though if they aren't then could also be completely misleading. Another possible alternative could be to capture where the policy was created from (perhaps some combination of class name, method name and line number), but this won't work if all the policies are created in a common util method. In short, there probably isn't a good option for producing a name.
There was a problem hiding this comment.
Yeah, I get your points. What about adding policyName in the instrumented library(https://github.com/failsafe-lib/failsafe) and support only the versions that include this field for the Java agent, would that make sense?
There was a problem hiding this comment.
hey, sure, that would work (we would still support existing versions, just wouldn't capture policy name unless available from the upstream library)
# Conflicts: # instrumentation/failsafe-3.0/library/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetry.java # Conflicts: # instrumentation/failsafe-3.0/library/src/test/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetryTest.java
# Conflicts: # instrumentation/failsafe-3.0/library/src/test/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetryTest.java
6c7db3d to
c1b4af6
Compare
trask
left a comment
There was a problem hiding this comment.
Purely AI generated review below
Review of the Failsafe 3.0 javaagent instrumentation + testing module extraction. A few correctness/style issues worth addressing:
- javaagent advice:
@Advice.OnMethodExithasonThrowable = Throwable.classbut the body is return-only (it casts@Advice.Returnand derives values from it). On the exceptional path the return isnulland the cast triggers a (suppressed) exception for no benefit — droponThrowable. - Default policy name in javaagent mode:
impl.toString()is used as the retry-policy name, which will produce uninformative values likedev.failsafe.internal.RetryPolicyImpl@2f4d3e5aas thefailsafe.retry_policy.nameattribute. Worth considering a cleaner fallback (e.g., the implementation class simple name, or just leaving the attribute out when no user-supplied name exists). - Reflection-based mutation of
PolicyConfig: writing to privatefailureListener/successListenerfields is brittle and muzzle cannot protect against these fields being renamed/removed in a future Failsafe version. Worth noting in a comment and/or considering afailmuzzle block or explicit guard. - Style nits: catch-parameter naming (
ignoredoverefor unused),@Nullablein the testing module, redundant FQNdev.failsafe.RetryPolicy.builder(), andsettings.gradle.ktsalphabetical ordering.
See inline comments.
| include(":instrumentation:failsafe-3.0:testing") | ||
| include(":instrumentation:failsafe-3.0:library") | ||
| include(":instrumentation:finagle-http-23.11:javaagent") |
There was a problem hiding this comment.
[Build] settings.gradle.kts entries must be in alphabetical order within the group. library sorts before testing, so the three failsafe-3.0 lines should be ordered javaagent, library, testing.
| include(":instrumentation:failsafe-3.0:testing") | |
| include(":instrumentation:failsafe-3.0:library") | |
| include(":instrumentation:finagle-http-23.11:javaagent") | |
| include(":instrumentation:failsafe-3.0:javaagent") | |
| include(":instrumentation:failsafe-3.0:library") | |
| include(":instrumentation:failsafe-3.0:testing") |
| SUCCESS_LISTENER_FIELD = successListenerField; | ||
| } | ||
|
|
||
| @Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class) |
There was a problem hiding this comment.
[Javaagent] Return-only exit advice should omit onThrowable. This advice only processes @Advice.Return (it has no @Advice.Enter state to clean up). When build() throws, retryPolicyImpl is null and the cast on L62 triggers a (suppressed) ClassCastException/NullPointerException for no benefit. Keep suppress = Throwable.class but drop onThrowable.
| @Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class) | |
| @Advice.OnMethodExit(suppress = Throwable.class) |
| failureListenerField.setAccessible(true); | ||
|
|
||
| successListenerField = PolicyConfig.class.getDeclaredField("successListener"); | ||
| successListenerField.setAccessible(true); |
There was a problem hiding this comment.
[Style] The caught exception is intentionally unused, so the preferred name is ignored (the // Ignored comment can be removed — the name itself conveys intent and IntelliJ's CatchMayIgnoreException inspection recognizes it).
| successListenerField.setAccessible(true); | |
| } catch (Exception ignored) { | |
| // field layout differs from the expected Failsafe 3.0 layout | |
| } |
| GlobalOpenTelemetry.get(), impl.getConfig(), impl.toString())); | ||
| SUCCESS_LISTENER_FIELD.set( | ||
| impl.getConfig(), | ||
| buildInstrumentedSuccessListener( | ||
| GlobalOpenTelemetry.get(), impl.getConfig(), impl.toString())); |
There was a problem hiding this comment.
[General] Using impl.toString() as the retry-policy name will produce values like dev.failsafe.internal.RetryPolicyImpl@1a2b3c4d for the failsafe.retry_policy.name metric attribute — high-cardinality and not human-meaningful. Since RetryPolicyBuilder does not expose a user-supplied name, consider either omitting the name attribute when none was provided, or using a stable placeholder such as the simple class name. Also, GlobalOpenTelemetry.get() is called twice — a single local would be clearer.
| } | ||
|
|
||
| try { | ||
| RetryPolicyImpl<?> impl = (RetryPolicyImpl<?>) retryPolicyImpl; |
There was a problem hiding this comment.
[Javaagent] This instrumentation mutates private fields of dev.failsafe.PolicyConfig (failureListener, successListener) via reflection. Muzzle cannot detect field renames/removals in future Failsafe versions, so a breaking change upstream will silently disable only the field-lookup static block (best case) or silently swap in listeners that clobber something unexpected (worst case). Consider:
- Adding a comment documenting the tight coupling to the Failsafe 3.0 field layout.
- Tightening the muzzle range (
failsafe:[3.0,4)with afailblock for4.0+) to force an explicit revisit when Failsafe 4 lands.
| import java.time.Duration; | ||
| import java.util.concurrent.atomic.AtomicInteger; | ||
| import java.util.function.Consumer; | ||
| import javax.annotation.Nullable; |
There was a problem hiding this comment.
[Testing] Per the style guide, @Nullable should not be used in test code (this module is a testing/ shared test base). Drop the import and the @Nullable on the method parameter below.
| buildCircuitBreakerAssertion( | ||
| 1, "failsafe.circuit_breaker.state", "closed")))); | ||
| } | ||
|
|
There was a problem hiding this comment.
[Testing] Remove @Nullable — test code should not use nullability annotations.
| protected void captureRetryPolicyMetrics(String expectedPolicyName) { |
|
|
||
| protected void captureRetryPolicyMetrics(@Nullable String expectedPolicyName) { | ||
| RetryPolicy<Object> userRetryPolicy = | ||
| dev.failsafe.RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build(); |
There was a problem hiding this comment.
[Style] RetryPolicy is already imported on line 15 — the fully qualified name here is redundant.
| dev.failsafe.RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build(); | |
| RetryPolicy.builder().handleResult(null).withMaxAttempts(3).build(); |
|
|
||
| private void captureRetryPolicyMetrics( | ||
| RetryPolicy<Object> instrumentedRetryPolicy, String expectedPolicyName) { | ||
| // given |
There was a problem hiding this comment.
[Style] Leftover // given comment with nothing under it — please remove.
| api("org.mockito:mockito-core") | ||
| api("org.mockito:mockito-junit-jupiter") |
There was a problem hiding this comment.
[Build] The abstract test base in this module does not reference Mockito — the mock(...)/when(...) usage lives only in the library's FailsafeTelemetryTest. Exposing mockito-core and mockito-junit-jupiter as api here leaks them into every downstream test module that consumes :instrumentation:failsafe-3.0:testing. Consider dropping these two dependencies from the shared testing module and letting individual test modules declare them as testImplementation themselves (the javaagent test doesn't need Mockito at all).
# Conflicts: # instrumentation/failsafe-3.0/library/src/main/java/io/opentelemetry/instrumentation/failsafe/v3_0/RetryPolicyEventListenerBuilders.java # instrumentation/failsafe-3.0/library/src/test/java/io/opentelemetry/instrumentation/failsafe/v3_0/FailsafeTelemetryTest.java
Only
RetryPolicyInstrumentationadded,CircuitBreakerInstrumentationwill be added with another PR.