Skip to content

Commit e8b6d50

Browse files
impl(o11y): introduce resend_count logic (#4156)
Introduces `gcp.grpc.resend_count` and `http.request.resend_count` by relying on the parameter passed to `ApiTracer`. --- ### Reliability of `attemptNumber` in `ApiTracer` The `attemptNumber` is passed from the retrying logic in `gax-java/gax`. For unary calls, here is an example of how the overall attempt count behaves: **How it is updated:** The attempt count is managed by the retry algorithms. For example, `ExponentialRetryAlgorithm` is thread-safe and creates immutable `TimedAttemptSettings`. It initializes `overallAttemptCount` to `0` and increments it for each subsequent attempt. ```java // gax-java/gax/src/main/java/com/google/api/gax/retrying/ExponentialRetryAlgorithm.java @OverRide public TimedAttemptSettings createFirstAttempt() { return TimedAttemptSettings.newBuilder() // ... .setOverallAttemptCount(0) // ... .build(); } @OverRide public TimedAttemptSettings createNextAttempt(TimedAttemptSettings previousSettings) { // ... return TimedAttemptSettings.newBuilder() // ... .setOverallAttemptCount(previousSettings.getOverallAttemptCount() + 1) // ... .build(); } ``` **How it is consumed:** `AttemptCallable` executes unary calls. `externalFuture` is marked as `volatile` to handle potential visibility between the thread that schedules the retry and the thread executing `call()`. Before starting the RPC call, `AttemptCallable` invokes `attemptStarted` using the `overallAttemptCount` from `externalFuture`. ```java // gax-java/gax/src/main/java/com/google/api/gax/rpc/AttemptCallable.java class AttemptCallable<RequestT, ResponseT> implements Callable<ResponseT> { // ... private volatile RetryingFuture<ResponseT> externalFuture; // ... @OverRide public ResponseT call() { // ... callContext .getTracer() .attemptStarted(request, externalFuture.getAttemptSettings().getOverallAttemptCount()); // ... } } ``` **Thread Safety:** Attempts for a single RPC are strictly sequential; a new attempt is only constructed and scheduled after the previous attempt fails and the retry delay elapses. `ExponentialRetryAlgorithm` is thread-safe, `TimedAttemptSettings` is immutable, and the `externalFuture` reference inside `AttemptCallable` is `volatile`. This combination ensures thread-safe observation of the linearly increasing `overallAttemptCount`. #### Conclusion Since the retrying infrastructure sequentially updates and consumes `getOverallAttemptCount()` in a thread-safe manner, we can confidently and safely use `attemptNumber` in `SpanTracer` to populate the `rpc.grpc.resend_count` or `rpc.http.resend_count` attributes. This approach directly leverages the existing reliable retry logic rather than implementing a redundant internal counter.
1 parent e0a33f2 commit e8b6d50

File tree

5 files changed

+295
-1
lines changed

5 files changed

+295
-1
lines changed

gax-java/gax/src/main/java/com/google/api/gax/tracing/ObservabilityAttributes.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,4 +84,10 @@ public class ObservabilityAttributes {
8484

8585
/** The url template of the request (e.g. /v1/{name}:access). */
8686
public static final String URL_TEMPLATE_ATTRIBUTE = "url.template";
87+
88+
/** The resend count of the request. Only used in HTTP transport. */
89+
public static final String HTTP_RESEND_COUNT_ATTRIBUTE = "http.request.resend_count";
90+
91+
/** The resend count of the request. Only used in gRPC transport. */
92+
public static final String GRPC_RESEND_COUNT_ATTRIBUTE = "gcp.grpc.resend_count";
8793
}

gax-java/gax/src/main/java/com/google/api/gax/tracing/ObservabilityUtils.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,8 @@ static Attributes toOtelAttributes(Map<String, Object> attributes) {
6565
(k, v) -> {
6666
if (v instanceof String) {
6767
attributesBuilder.put(k, (String) v);
68+
} else if (v instanceof Long) {
69+
attributesBuilder.put(k, (Long) v);
6870
} else if (v instanceof Integer) {
6971
attributesBuilder.put(k, (long) (Integer) v);
7072
}

gax-java/gax/src/main/java/com/google/api/gax/tracing/SpanTracer.java

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,12 +105,26 @@ private void buildAttributes() {
105105

106106
@Override
107107
public void attemptStarted(Object request, int attemptNumber) {
108+
Map<String, Object> currentAttemptAttributes = new HashMap<>(this.attemptAttributes);
109+
110+
if (attemptNumber > 0) {
111+
ApiTracerContext.Transport transport = apiTracerContext.transport();
112+
if (transport == ApiTracerContext.Transport.GRPC) {
113+
currentAttemptAttributes.put(
114+
ObservabilityAttributes.GRPC_RESEND_COUNT_ATTRIBUTE, (long) attemptNumber);
115+
} else if (transport == ApiTracerContext.Transport.HTTP) {
116+
currentAttemptAttributes.put(
117+
ObservabilityAttributes.HTTP_RESEND_COUNT_ATTRIBUTE, (long) attemptNumber);
118+
}
119+
}
120+
108121
SpanBuilder spanBuilder = tracer.spanBuilder(attemptSpanName);
109122

110123
// Attempt spans are of the CLIENT kind
111124
spanBuilder.setSpanKind(SpanKind.CLIENT);
112125

113-
spanBuilder.setAllAttributes(ObservabilityUtils.toOtelAttributes(this.attemptAttributes));
126+
// Pass the combined attributes to the new SpanBuilder method
127+
spanBuilder.setAllAttributes(ObservabilityUtils.toOtelAttributes(currentAttemptAttributes));
114128

115129
this.attemptSpan = spanBuilder.startSpan();
116130
}
@@ -120,6 +134,26 @@ public void attemptSucceeded() {
120134
endAttempt();
121135
}
122136

137+
@Override
138+
public void attemptCancelled() {
139+
endAttempt();
140+
}
141+
142+
@Override
143+
public void attemptFailedDuration(Throwable error, java.time.Duration delay) {
144+
endAttempt();
145+
}
146+
147+
@Override
148+
public void attemptFailedRetriesExhausted(Throwable error) {
149+
endAttempt();
150+
}
151+
152+
@Override
153+
public void attemptPermanentFailure(Throwable error) {
154+
endAttempt();
155+
}
156+
123157
private void endAttempt() {
124158
if (attemptSpan != null) {
125159
attemptSpan.end();

gax-java/gax/src/test/java/com/google/api/gax/tracing/SpanTracerTest.java

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,4 +86,104 @@ void testAttemptStarted_includesLanguageAttribute() {
8686
io.opentelemetry.api.common.AttributeKey.stringKey(SpanTracer.LANGUAGE_ATTRIBUTE),
8787
SpanTracer.DEFAULT_LANGUAGE);
8888
}
89+
90+
@Test
91+
void testAttemptStarted_noRetryAttributes_grpc() {
92+
ApiTracerContext grpcContext =
93+
ApiTracerContext.newBuilder()
94+
.setLibraryMetadata(com.google.api.gax.rpc.LibraryMetadata.empty())
95+
.setTransport(ApiTracerContext.Transport.GRPC)
96+
.build();
97+
SpanTracer grpcTracer = new SpanTracer(tracer, grpcContext, ATTEMPT_SPAN_NAME);
98+
99+
// Initial attempt, attemptNumber is 0
100+
grpcTracer.attemptStarted(new Object(), 0);
101+
ArgumentCaptor<Attributes> attributesCaptor = ArgumentCaptor.forClass(Attributes.class);
102+
verify(spanBuilder).setAllAttributes(attributesCaptor.capture());
103+
assertThat(attributesCaptor.getValue().asMap())
104+
.doesNotContainKey(
105+
io.opentelemetry.api.common.AttributeKey.longKey(
106+
ObservabilityAttributes.GRPC_RESEND_COUNT_ATTRIBUTE));
107+
assertThat(attributesCaptor.getValue().asMap())
108+
.doesNotContainKey(
109+
io.opentelemetry.api.common.AttributeKey.longKey(
110+
ObservabilityAttributes.HTTP_RESEND_COUNT_ATTRIBUTE));
111+
}
112+
113+
@Test
114+
void testAttemptStarted_retryAttributes_grpc() {
115+
ApiTracerContext grpcContext =
116+
ApiTracerContext.newBuilder()
117+
.setLibraryMetadata(com.google.api.gax.rpc.LibraryMetadata.empty())
118+
.setTransport(ApiTracerContext.Transport.GRPC)
119+
.build();
120+
SpanTracer grpcTracer = new SpanTracer(tracer, grpcContext, ATTEMPT_SPAN_NAME);
121+
122+
// N-th retry, attemptNumber is 5
123+
grpcTracer.attemptStarted(new Object(), 5);
124+
ArgumentCaptor<Attributes> attributesCaptor = ArgumentCaptor.forClass(Attributes.class);
125+
verify(spanBuilder).setAllAttributes(attributesCaptor.capture());
126+
java.util.Map<io.opentelemetry.api.common.AttributeKey<?>, Object> capturedAttributes =
127+
attributesCaptor.getValue().asMap();
128+
assertThat(capturedAttributes)
129+
.containsEntry(
130+
io.opentelemetry.api.common.AttributeKey.longKey(
131+
ObservabilityAttributes.GRPC_RESEND_COUNT_ATTRIBUTE),
132+
5L);
133+
assertThat(capturedAttributes)
134+
.doesNotContainKey(
135+
io.opentelemetry.api.common.AttributeKey.longKey(
136+
ObservabilityAttributes.HTTP_RESEND_COUNT_ATTRIBUTE));
137+
}
138+
139+
@Test
140+
void testAttemptStarted_noRetryAttributes_http() {
141+
ApiTracerContext httpContext =
142+
ApiTracerContext.newBuilder()
143+
.setLibraryMetadata(com.google.api.gax.rpc.LibraryMetadata.empty())
144+
.setTransport(ApiTracerContext.Transport.HTTP)
145+
.build();
146+
SpanTracer httpTracer = new SpanTracer(tracer, httpContext, ATTEMPT_SPAN_NAME);
147+
148+
// Initial attempt, attemptNumber is 0
149+
httpTracer.attemptStarted(new Object(), 0);
150+
ArgumentCaptor<Attributes> attributesCaptor = ArgumentCaptor.forClass(Attributes.class);
151+
verify(spanBuilder).setAllAttributes(attributesCaptor.capture());
152+
java.util.Map<io.opentelemetry.api.common.AttributeKey<?>, Object> capturedAttributes =
153+
attributesCaptor.getValue().asMap();
154+
assertThat(capturedAttributes)
155+
.doesNotContainKey(
156+
io.opentelemetry.api.common.AttributeKey.longKey(
157+
ObservabilityAttributes.GRPC_RESEND_COUNT_ATTRIBUTE));
158+
assertThat(capturedAttributes)
159+
.doesNotContainKey(
160+
io.opentelemetry.api.common.AttributeKey.longKey(
161+
ObservabilityAttributes.HTTP_RESEND_COUNT_ATTRIBUTE));
162+
}
163+
164+
@Test
165+
void testAttemptStarted_retryAttributes_http() {
166+
ApiTracerContext httpContext =
167+
ApiTracerContext.newBuilder()
168+
.setLibraryMetadata(com.google.api.gax.rpc.LibraryMetadata.empty())
169+
.setTransport(ApiTracerContext.Transport.HTTP)
170+
.build();
171+
SpanTracer httpTracer = new SpanTracer(tracer, httpContext, ATTEMPT_SPAN_NAME);
172+
173+
// N-th retry, attemptNumber is 5
174+
httpTracer.attemptStarted(new Object(), 5);
175+
ArgumentCaptor<Attributes> attributesCaptor = ArgumentCaptor.forClass(Attributes.class);
176+
verify(spanBuilder).setAllAttributes(attributesCaptor.capture());
177+
java.util.Map<io.opentelemetry.api.common.AttributeKey<?>, Object> capturedAttributes =
178+
attributesCaptor.getValue().asMap();
179+
assertThat(capturedAttributes)
180+
.doesNotContainKey(
181+
io.opentelemetry.api.common.AttributeKey.longKey(
182+
ObservabilityAttributes.GRPC_RESEND_COUNT_ATTRIBUTE));
183+
assertThat(capturedAttributes)
184+
.containsEntry(
185+
io.opentelemetry.api.common.AttributeKey.longKey(
186+
ObservabilityAttributes.HTTP_RESEND_COUNT_ATTRIBUTE),
187+
5L);
188+
}
89189
}

java-showcase/gapic-showcase/src/test/java/com/google/showcase/v1beta1/it/ITOtelTracing.java

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,23 @@
3131
package com.google.showcase.v1beta1.it;
3232

3333
import static com.google.common.truth.Truth.assertThat;
34+
import static org.junit.Assert.assertThrows;
3435

36+
import com.google.api.client.http.javanet.NetHttpTransport;
37+
import com.google.api.gax.core.NoCredentialsProvider;
38+
import com.google.api.gax.retrying.RetrySettings;
39+
import com.google.api.gax.rpc.StatusCode;
40+
import com.google.api.gax.rpc.UnavailableException;
3541
import com.google.api.gax.tracing.ObservabilityAttributes;
3642
import com.google.api.gax.tracing.SpanTracer;
3743
import com.google.api.gax.tracing.SpanTracerFactory;
44+
import com.google.rpc.Status;
3845
import com.google.showcase.v1beta1.EchoClient;
3946
import com.google.showcase.v1beta1.EchoRequest;
47+
import com.google.showcase.v1beta1.EchoSettings;
4048
import com.google.showcase.v1beta1.it.util.TestClientInitializer;
49+
import com.google.showcase.v1beta1.stub.EchoStub;
50+
import com.google.showcase.v1beta1.stub.EchoStubSettings;
4151
import io.opentelemetry.api.GlobalOpenTelemetry;
4252
import io.opentelemetry.api.common.AttributeKey;
4353
import io.opentelemetry.api.trace.SpanKind;
@@ -200,4 +210,146 @@ void testTracing_successfulEcho_httpjson() throws Exception {
200210
.isEqualTo("v1beta1/echo:echo");
201211
}
202212
}
213+
214+
@Test
215+
void testTracing_retry_grpc() throws Exception {
216+
final int attempts = 5;
217+
final StatusCode.Code statusCode = StatusCode.Code.UNAVAILABLE;
218+
// A custom EchoClient is used in this test because retries have jitter, and we cannot
219+
// predict the number of attempts that are scheduled for an RPC invocation otherwise.
220+
// The custom retrySettings limit to a set number of attempts before the call gives up.
221+
RetrySettings retrySettings =
222+
RetrySettings.newBuilder()
223+
.setTotalTimeout(org.threeten.bp.Duration.ofMillis(5000L))
224+
.setMaxAttempts(attempts)
225+
.build();
226+
227+
EchoStubSettings.Builder grpcEchoSettingsBuilder = EchoStubSettings.newBuilder();
228+
grpcEchoSettingsBuilder
229+
.echoSettings()
230+
.setRetrySettings(retrySettings)
231+
.setRetryableCodes(statusCode);
232+
EchoSettings grpcEchoSettings = EchoSettings.create(grpcEchoSettingsBuilder.build());
233+
grpcEchoSettings =
234+
grpcEchoSettings.toBuilder()
235+
.setCredentialsProvider(NoCredentialsProvider.create())
236+
.setTransportChannelProvider(EchoSettings.defaultGrpcTransportProviderBuilder().build())
237+
.setEndpoint("localhost:7469")
238+
.build();
239+
240+
SpanTracerFactory tracingFactory = new SpanTracerFactory(openTelemetrySdk);
241+
242+
EchoStubSettings echoStubSettings =
243+
(EchoStubSettings)
244+
grpcEchoSettings.getStubSettings().toBuilder().setTracerFactory(tracingFactory).build();
245+
EchoStub stub = echoStubSettings.createStub();
246+
EchoClient grpcClient = EchoClient.create(stub);
247+
248+
EchoRequest echoRequest =
249+
EchoRequest.newBuilder()
250+
.setError(Status.newBuilder().setCode(statusCode.ordinal()).build())
251+
.build();
252+
253+
assertThrows(UnavailableException.class, () -> grpcClient.echo(echoRequest));
254+
255+
List<SpanData> spans = spanExporter.getFinishedSpanItems();
256+
assertThat(spans).hasSize(attempts); // Expect exactly one span for the successful retry
257+
258+
// This single span represents the successful retry, which has resend_count=1
259+
// The first attempt has no resend_count. The subsequent retries will have a resend_count,
260+
// starting from 1.
261+
List<Long> resendCounts =
262+
spans.stream()
263+
.map(
264+
span ->
265+
(Long)
266+
span.getAttributes()
267+
.asMap()
268+
.get(
269+
AttributeKey.longKey(
270+
ObservabilityAttributes.GRPC_RESEND_COUNT_ATTRIBUTE)))
271+
.filter(java.util.Objects::nonNull)
272+
.sorted()
273+
.collect(java.util.stream.Collectors.toList());
274+
275+
List<Long> expectedCounts =
276+
java.util.stream.LongStream.range(1, attempts)
277+
.boxed()
278+
.collect(java.util.stream.Collectors.toList());
279+
assertThat(resendCounts).containsExactlyElementsIn(expectedCounts).inOrder();
280+
}
281+
282+
@Test
283+
void testTracing_retry_httpjson() throws Exception {
284+
final int attempts = 5;
285+
final StatusCode.Code statusCode = StatusCode.Code.UNAVAILABLE;
286+
// A custom EchoClient is used in this test because retries have jitter, and we cannot
287+
// predict the number of attempts that are scheduled for an RPC invocation otherwise.
288+
// The custom retrySettings limit to a set number of attempts before the call gives up.
289+
RetrySettings retrySettings =
290+
RetrySettings.newBuilder()
291+
.setTotalTimeout(org.threeten.bp.Duration.ofMillis(5000L))
292+
.setMaxAttempts(attempts)
293+
.build();
294+
295+
EchoStubSettings.Builder httpJsonEchoSettingsBuilder = EchoStubSettings.newHttpJsonBuilder();
296+
httpJsonEchoSettingsBuilder
297+
.echoSettings()
298+
.setRetrySettings(retrySettings)
299+
.setRetryableCodes(statusCode);
300+
EchoSettings httpJsonEchoSettings = EchoSettings.create(httpJsonEchoSettingsBuilder.build());
301+
httpJsonEchoSettings =
302+
httpJsonEchoSettings.toBuilder()
303+
.setCredentialsProvider(NoCredentialsProvider.create())
304+
.setTransportChannelProvider(
305+
EchoSettings.defaultHttpJsonTransportProviderBuilder()
306+
.setHttpTransport(
307+
new NetHttpTransport.Builder().doNotValidateCertificate().build())
308+
.setEndpoint("http://localhost:7469")
309+
.build())
310+
.build();
311+
312+
SpanTracerFactory tracingFactory = new SpanTracerFactory(openTelemetrySdk);
313+
314+
EchoStubSettings echoStubSettings =
315+
(EchoStubSettings)
316+
httpJsonEchoSettings.getStubSettings().toBuilder()
317+
.setTracerFactory(tracingFactory)
318+
.build();
319+
EchoStub stub = echoStubSettings.createStub();
320+
EchoClient httpClient = EchoClient.create(stub);
321+
322+
EchoRequest echoRequest =
323+
EchoRequest.newBuilder()
324+
.setError(Status.newBuilder().setCode(statusCode.ordinal()).build())
325+
.build();
326+
327+
assertThrows(UnavailableException.class, () -> httpClient.echo(echoRequest));
328+
329+
List<SpanData> spans = spanExporter.getFinishedSpanItems();
330+
assertThat(spans).hasSize(attempts); // Expect exactly one span for the successful retry
331+
332+
// This single span represents the successful retry, which has resend_count=1
333+
// The first attempt has no resend_count. The subsequent retries will have a resend_count,
334+
// starting from 1.
335+
List<Long> resendCounts =
336+
spans.stream()
337+
.map(
338+
span ->
339+
(Long)
340+
span.getAttributes()
341+
.asMap()
342+
.get(
343+
AttributeKey.longKey(
344+
ObservabilityAttributes.HTTP_RESEND_COUNT_ATTRIBUTE)))
345+
.filter(java.util.Objects::nonNull)
346+
.sorted()
347+
.collect(java.util.stream.Collectors.toList());
348+
349+
List<Long> expectedCounts =
350+
java.util.stream.LongStream.range(1, attempts)
351+
.boxed()
352+
.collect(java.util.stream.Collectors.toList());
353+
assertThat(resendCounts).containsExactlyElementsIn(expectedCounts).inOrder();
354+
}
203355
}

0 commit comments

Comments
 (0)