Skip to content

Commit 14bcc29

Browse files
allanrogerrclaude
andcommitted
fix(retry): address bala review batch on PR #1701 review 4250022733
Five reviewer threads addressed plus a WHY-only comment trim across the touched files. Each thread is mapped to the reviewer's discussion ID and verified locally against :api:check / :api:javadoc / a production-cluster repro on the local mint-erasure compose. r3205524097 / r3206746125 ("we should not wrap") - BaseS3Client.java: drop wrapWithRetry() and the matching call sites in both ctors. Caller-supplied OkHttpClient is now used verbatim — no newBuilder(), no interceptor strip-and-add, no retryOnConnectionFailure override — per bala's principle that the user's client is the user's client. Field/Javadoc updates to setMaxRetries, executeAsync, the copy-ctor, and the maxRetries field reflect the new contract. - Http.java: add newDefaultClient(IntSupplier) overload so the SDK-default client can have its retry budget bound to BaseS3Client.maxRetries via a supplier. The no-arg form delegates with () -> Retry.MAX_RETRY for backward compatibility (MinioAdminClient, functional/TestArgs). - MinioAsyncClient.java: Builder.build() constructs the client first, then assigns httpClient via the supplier-bound newDefaultClient when the caller didn't supply one. Builder.httpClient(...) and Builder.maxRetries Javadocs spell out the new caller-supplied-client contract. - RetryTest.java: replace testUserSuppliedClientStillRetries (which exercised the now-gone wrap behaviour) with two contract tests: testUserSuppliedClientWithoutInterceptorDoesNotRetry and testUserSuppliedClientWithInterceptorRetries. r3205557447 (thread-safety on this.httpClient reassignment) - BaseS3Client.java:114: protected OkHttpClient httpClient -> protected volatile OkHttpClient httpClient. Matches the precedent of volatile int maxRetries on the same class. Provides JMM happens-before between writers (setTimeout / ignoreCertCheck reassigning the field) and readers (executeAsync). The local-snapshot pattern bala originally proposed does not solve the visibility problem; the field qualifier does. The "no two timeouts mid-call" property bala raised separately is already guaranteed by OkHttp Call binding (a Call captures its OkHttpClient at newCall time and the chain reuses it for retries). r3205580978 ("createBody save/restore not needed") - Checksum.java: rewrite the RandomAccessFile branch of update() to use positional FileChannel.read(ByteBuffer, long) so the method no longer mutates the file pointer. Side-effect-free helper now; callers don't need to bracket with getFilePointer / seek. - BaseS3Client.java: drop the createBody fileStartPos capture and the matching seek-back. Empirically verified on a live MinIO erasure cluster: client.putObject(PutObjectAPIArgs) with a RandomAccessFile body and no pre-set checksum headers now completes successfully (the prior incorrect "remove these lines" world fails with EOFException because Http.RequestBody captures the post-checksum EOF position). - MinioAsyncClient.java: drop the two cousin save/restore patterns around Checksum.update at lines ~2010 and ~3560 — both are now redundant for the same reason. r3205604391 ("don't probe S3 error codes") - Retry.java: drop RETRYABLE_S3_CODES (12-element set) and isS3CodeRetryable(String). Class is now narrower: HTTP status set, IOException classifier, and full-jitter backoff formula. minio-go- parity comment removed since the SDK no longer mirrors that list. - Http.java: drop MAX_PEEK_BYTES, S3_ERROR_CODE_PATTERN, the peekS3ErrorCode helper, and the body-probing branch in RetryInterceptor.intercept. RetryInterceptor Javadoc updated to drop the S3-code bullet from "Retries on:". Unused java.util.regex.Pattern import removed (Matcher is still used elsewhere in the file). - RetryTest.java: drop testIsS3CodeRetryable_retryable, testIsS3CodeRetryable_notRetryable, and the integration test testRetryOnRetryableS3CodeIn400Body. 28 RetryTest cases remain, all PASS. r3206792852 ("we already have RANDOM") - MinioAsyncClient.java:96, 3382: drop the java.util.concurrent.ThreadLocalRandom import and swap ThreadLocalRandom.current() for the inherited BaseS3Client.RANDOM at the fan-out object-name suffix site. The ThreadLocalRandom usage in Retry.exponentialBackoffMs is a separate, hot-path concern (added in 60aeaed to address Copilot review 3191258329 about RANDOM contention); bala did not flag it and it remains. Comment trim - WHY-only comments across the six files: removed restatements of what the code does, kept only non-obvious rationale (order constraints, cross-class contracts, gotchas). Javadocs trimmed to the user-facing contract; mechanics moved out. Validated on Java 17: - :api:check (compile + spotless + spotbugs + 28 RetryTest cases) PASS - :api:javadoc PASS - Live mint-erasure compose cluster + ReproUploadObject driving the no-pre-set-checksum-headers PUT path: PASS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9a5c8e4 commit 14bcc29

6 files changed

Lines changed: 124 additions & 227 deletions

File tree

api/src/main/java/io/minio/BaseS3Client.java

Lines changed: 15 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -111,14 +111,12 @@ public abstract class BaseS3Client implements AutoCloseable {
111111

112112
protected Http.BaseUrl baseUrl;
113113
protected Provider provider;
114-
protected OkHttpClient httpClient;
114+
protected volatile OkHttpClient httpClient;
115115
protected boolean closeHttpClient;
116116

117117
/**
118-
* Maximum attempts per S3 request. Default {@link Retry#MAX_RETRY}. Read on every request via the
119-
* retry interceptor's supplier so runtime tuning via {@link #setMaxRetries(int)} takes immediate
120-
* effect. Package-private so the {@code Builder} classes in this package can seed it via the
121-
* setter; subclasses must go through {@link #setMaxRetries(int)}.
118+
* Maximum attempts per S3 request. Effective only on the SDK-default {@link OkHttpClient};
119+
* caller-supplied clients are used verbatim and the SDK does not modify their retry policy.
122120
*/
123121
volatile int maxRetries = Retry.MAX_RETRY;
124122

@@ -127,40 +125,27 @@ protected BaseS3Client(
127125
this.baseUrl = baseUrl;
128126
this.provider = provider;
129127
this.closeHttpClient = closeHttpClient;
130-
this.httpClient = wrapWithRetry(httpClient);
128+
this.httpClient = httpClient;
131129
}
132130

131+
/**
132+
* Copies share the source's {@link OkHttpClient} (and its retry interceptor binding), so {@link
133+
* #setMaxRetries(int)} on a copy does not affect the shared retry budget.
134+
*/
133135
protected BaseS3Client(BaseS3Client client) {
134136
this.baseUrl = client.baseUrl;
135137
this.provider = client.provider;
136138
this.closeHttpClient = client.closeHttpClient;
137139
this.maxRetries = client.maxRetries;
138-
this.httpClient = wrapWithRetry(client.httpClient);
139-
}
140-
141-
/**
142-
* Re-wires the retry interceptor on {@code client} so it reads {@link #maxRetries} from this
143-
* instance. Strips any prior {@link Http.RetryInterceptor} from BOTH the application-interceptor
144-
* and network-interceptor chains (e.g. one bound to a different instance via the copy
145-
* constructor, or accidentally registered on the network chain) before installing this one as an
146-
* application interceptor.
147-
*
148-
* <p>Also forces {@code retryOnConnectionFailure(false)} on the underlying client. The SDK's own
149-
* {@link Http.RetryInterceptor} is the single source of retry policy; layering OkHttp's
150-
* connection-level retry on top would double-count attempts and obscure the {@link #maxRetries}
151-
* budget. This silently overrides any {@code retryOnConnectionFailure(true)} that a
152-
* caller-supplied client was configured with — by design.
153-
*/
154-
private OkHttpClient wrapWithRetry(OkHttpClient client) {
155-
OkHttpClient.Builder builder = client.newBuilder().retryOnConnectionFailure(false);
156-
builder.interceptors().removeIf(i -> i instanceof Http.RetryInterceptor);
157-
builder.networkInterceptors().removeIf(i -> i instanceof Http.RetryInterceptor);
158-
return builder.addInterceptor(new Http.RetryInterceptor(() -> this.maxRetries)).build();
140+
this.httpClient = client.httpClient;
159141
}
160142

161143
/**
162144
* Sets the maximum number of attempts for transient HTTP failures. Pass {@code 1} to disable
163-
* automatic retries. Defaults to {@code 10}.
145+
* automatic retries. Default {@code 10}.
146+
*
147+
* <p>Effective only on the SDK-default {@link OkHttpClient}; caller-supplied clients are used
148+
* verbatim.
164149
*
165150
* @param maxRetries maximum attempts (must be {@code >= 1}).
166151
*/
@@ -315,11 +300,8 @@ private String[] handleRedirectResponse(
315300
}
316301

317302
/**
318-
* Execute HTTP request asynchronously for given parameters. Retries on retryable IOException,
319-
* HTTP status, and S3 error code are handled by {@link Http.RetryInterceptor}, which {@link
320-
* #wrapWithRetry} installs on the underlying {@link OkHttpClient} regardless of whether the
321-
* client is the default or caller-supplied. The attempt budget is read from {@link #maxRetries}
322-
* on every request.
303+
* Execute HTTP request asynchronously. Retry handling is delegated to {@link
304+
* Http.RetryInterceptor} on the {@link OkHttpClient}, so this method itself never loops.
323305
*/
324306
protected CompletableFuture<Response> executeAsync(Http.S3Request s3request, String region) {
325307
Credentials credentials = (provider == null) ? null : provider.fetch();
@@ -1269,15 +1251,6 @@ private Object[] createBody(PutObjectAPIBaseArgs args, MediaType contentType)
12691251
boolean checksumHeader = headers.namePrefixAny("x-amz-checksum-");
12701252
String md5Hash = headers.getFirst(Http.Headers.CONTENT_MD5);
12711253

1272-
long fileStartPos = 0;
1273-
if (args.file() != null) {
1274-
try {
1275-
fileStartPos = args.file().getFilePointer();
1276-
} catch (IOException e) {
1277-
throw new MinioException(e);
1278-
}
1279-
}
1280-
12811254
if (sha256HexString == null && sha256Base64String == null) {
12821255
if (!baseUrl.isHttps()) {
12831256
Checksum.Hasher hasher = Checksum.Algorithm.SHA256.hasher();
@@ -1333,14 +1306,6 @@ private Object[] createBody(PutObjectAPIBaseArgs args, MediaType contentType)
13331306
}
13341307
}
13351308

1336-
if (args.file() != null) {
1337-
try {
1338-
args.file().seek(fileStartPos);
1339-
} catch (IOException e) {
1340-
throw new MinioException(e);
1341-
}
1342-
}
1343-
13441309
Http.Body body = null;
13451310
if (args.file() != null) {
13461311
body = new Http.Body(args.file(), args.length(), contentType, sha256HexString, md5Hash);

api/src/main/java/io/minio/Checksum.java

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,12 +104,31 @@ private static void update(
104104
size = buffer.length();
105105
}
106106

107+
// Positional reads keep this method side-effect-free on the file pointer; callers don't need
108+
// to bracket with seek/restore.
109+
long filePos = 0L;
110+
java.nio.channels.FileChannel channel = null;
111+
if (file != null) {
112+
try {
113+
filePos = file.getFilePointer();
114+
channel = file.getChannel();
115+
} catch (IOException e) {
116+
throw new MinioException(e);
117+
}
118+
}
119+
107120
byte[] buf16k = new byte[16384];
108121
long bytesRead = 0;
109122
while (bytesRead != size) {
110123
try {
111124
int length = (int) Math.min(size - bytesRead, buf16k.length);
112-
int n = file != null ? file.read(buf16k, 0, length) : stream.read(buf16k, 0, length);
125+
int n;
126+
if (file != null) {
127+
java.nio.ByteBuffer dst = java.nio.ByteBuffer.wrap(buf16k, 0, length);
128+
n = channel.read(dst, filePos + bytesRead);
129+
} else {
130+
n = stream.read(buf16k, 0, length);
131+
}
113132
if (n < 0) throw new MinioException("unexpected EOF");
114133
if (n != 0) {
115134
bytesRead += n;

api/src/main/java/io/minio/Http.java

Lines changed: 36 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,6 @@
5555
import java.util.concurrent.TimeUnit;
5656
import java.util.function.IntSupplier;
5757
import java.util.regex.Matcher;
58-
import java.util.regex.Pattern;
5958
import java.util.stream.Collectors;
6059
import java.util.stream.Stream;
6160
import javax.annotation.Nonnull;
@@ -612,10 +611,18 @@ public static OkHttpClient enableExternalCertificatesFromEnv(OkHttpClient client
612611
}
613612

614613
/**
615-
* Creates new HTTP client with default timeout with additional TLS certificates from
616-
* SSL_CERT_FILE and SSL_CERT_DIR environment variables if present.
614+
* Creates the SDK's default HTTP client with a fixed retry budget of {@link Retry#MAX_RETRY}. For
615+
* a runtime-tunable budget, use {@link #newDefaultClient(IntSupplier)}.
617616
*/
618617
public static OkHttpClient newDefaultClient() {
618+
return newDefaultClient(() -> Retry.MAX_RETRY);
619+
}
620+
621+
/**
622+
* Creates the SDK's default HTTP client; the {@link RetryInterceptor}'s attempt budget is read
623+
* from {@code maxAttemptsSupplier} on each call so it can track a runtime-tunable value.
624+
*/
625+
public static OkHttpClient newDefaultClient(IntSupplier maxAttemptsSupplier) {
619626
OkHttpClient client =
620627
new OkHttpClient()
621628
.newBuilder()
@@ -624,7 +631,7 @@ public static OkHttpClient newDefaultClient() {
624631
.readTimeout(DEFAULT_TIMEOUT, TimeUnit.MILLISECONDS)
625632
.protocols(Arrays.asList(Protocol.HTTP_1_1))
626633
.retryOnConnectionFailure(false)
627-
.addInterceptor(new RetryInterceptor())
634+
.addInterceptor(new RetryInterceptor(maxAttemptsSupplier))
628635
.build();
629636
try {
630637
return enableExternalCertificatesFromEnv(client);
@@ -691,72 +698,54 @@ public static OkHttpClient setTimeout(
691698
/**
692699
* OkHttp interceptor that retries transient HTTP failures with full-jitter exponential backoff.
693700
*
694-
* <p>This is part of the SDK's supported public API: it is installed automatically by {@link
695-
* BaseS3Client} on every supplied {@link OkHttpClient} (default or caller-provided), and may also
696-
* be registered explicitly on a stand-alone {@link OkHttpClient} via {@code .addInterceptor(new
697-
* Http.RetryInterceptor())}.
701+
* <p>Installed automatically by {@link Http#newDefaultClient(IntSupplier)} on the SDK-default
702+
* client. <em>Not</em> installed on caller-supplied clients passed via {@code
703+
* MinioClient.Builder.httpClient(...)} — those are used verbatim. To opt into the SDK retry
704+
* policy on a custom client, register this interceptor and pair it with {@code
705+
* retryOnConnectionFailure(false)}.
698706
*
699707
* <p>Retries on:
700708
*
701709
* <ul>
702710
* <li>retryable IOException — connection reset, EOF, socket timeout, idle-connection close.
703711
* Excludes TLS handshake / unknown-CA / HTTPS protocol mismatch.
704712
* <li>retryable HTTP status code — 408, 429, 499, 500, 502, 503, 504, 520.
705-
* <li>retryable S3 error code in a non-2xx response body — {@code SlowDown}, {@code
706-
* InternalError}, {@code ExpiredToken}, etc.
707713
* </ul>
708714
*
709715
* <p>Backoff is full-jitter exponential, with a 200&nbsp;ms unit and a 1&nbsp;s per-attempt cap.
710-
* The maximum number of attempts is supplied per intercept call via {@link IntSupplier} so that
711-
* an SDK client can expose runtime tuning while keeping the interceptor itself stateless. The
712-
* no-arg constructor uses the package default of 10 attempts.
716+
* Attempt budget is read from an {@link IntSupplier} on every call so SDK clients can expose
717+
* runtime tuning while the interceptor itself stays stateless.
713718
*
714-
* <p><b>Threading.</b> Backoff sleeps on the OkHttp dispatcher thread that owns the call. Under
715-
* sustained 5xx/429 storms this can hold dispatcher slots idle while waiting. Callers that need
716-
* higher concurrency under widespread retry should size the dispatcher worker pool accordingly
717-
* (see {@link okhttp3.Dispatcher#setMaxRequests} / {@code setMaxRequestsPerHost}).
719+
* <p><b>Threading.</b> Backoff sleeps on the OkHttp dispatcher thread that owns the call; under
720+
* sustained 5xx/429 storms this can hold dispatcher slots idle. Size {@link
721+
* okhttp3.Dispatcher#setMaxRequests} / {@code setMaxRequestsPerHost} accordingly.
718722
*
719-
* <p><b>Cancellation.</b> The interceptor checks {@code chain.call().isCanceled()} before each
720-
* attempt and after any thrown {@link IOException}, so a {@code Call.cancel()} from the caller
721-
* (or an upstream interceptor) terminates the retry loop instead of treating the cancellation as
722-
* a retryable transport error.
723+
* <p><b>Cancellation.</b> {@code Call.cancel()} short-circuits the retry loop instead of being
724+
* mistaken for a retryable transport error.
723725
*
724-
* <p><b>Replayability.</b> Retries call {@code chain.proceed(request)} again, which re-invokes
725-
* {@code request.body().writeTo(sink)}. Callers using the SDK's own body types ({@link Body} for
726-
* {@code byte[]}, {@link ByteBuffer}, or {@link RandomAccessFile}) are safe — those bodies are
727-
* replayable. A caller-supplied {@link okhttp3.RequestBody} that overrides {@code isOneShot()} to
728-
* return {@code true} (or otherwise consumes its source on the first {@code writeTo}) MUST NOT be
729-
* retried; either disable retries via {@link BaseS3Client#setMaxRetries(int)} {@code (1)} for
730-
* those calls, or wrap the body in a replayable form before submission.
726+
* <p><b>Replayability.</b> SDK-owned body types ({@link Body} over {@code byte[]}, {@link
727+
* ByteBuffer}, {@link RandomAccessFile}) are retry-safe. A caller-supplied {@link
728+
* okhttp3.RequestBody} that overrides {@code isOneShot()} to {@code true} MUST NOT be retried;
729+
* either disable retries for those calls via {@link BaseS3Client#setMaxRetries(int)} {@code (1)}
730+
* or wrap the body in a replayable form.
731731
*
732-
* <p><b>Request reuse.</b> The interceptor passes the same signed {@link okhttp3.Request} on
733-
* every attempt, so the {@code X-Amz-Date} / {@code Authorization} headers are not refreshed
734-
* between attempts. This is harmless for the default attempt budget (worst-case wall-clock
735-
* &lt;&nbsp;10&nbsp;s, well inside the 15-minute S3 signing window) but extreme {@code
736-
* maxRetries} values combined with high backoff caps could outlast either the signing window or a
737-
* short-lived STS credential. minio-go's request-level retry rebuilds and re-signs each attempt;
738-
* that semantic is intentionally not replicated here in exchange for the simpler interceptor
739-
* model.
740-
*
741-
* <p>To opt out of retries on a stand-alone {@link OkHttpClient}, simply do not register this
742-
* interceptor.
732+
* <p><b>Request reuse.</b> Each attempt sends the same signed {@link okhttp3.Request}, so {@code
733+
* X-Amz-Date}/{@code Authorization} are not refreshed. Harmless at the default budget; an extreme
734+
* {@code maxRetries} combined with high backoff could outlast the 15-minute signing window or a
735+
* short-lived STS credential. (minio-go re-signs per attempt; this Java implementation
736+
* deliberately does not, in exchange for a simpler interceptor model.)
743737
*/
744738
public static class RetryInterceptor implements Interceptor {
745-
/** Maximum body bytes inspected when probing for an S3 {@code <Code>} value. */
746-
private static final long MAX_PEEK_BYTES = 5L * 1024L * 1024L;
747-
748-
private static final Pattern S3_ERROR_CODE_PATTERN = Pattern.compile("<Code>([^<]+)</Code>");
749-
750739
private final IntSupplier maxAttemptsSupplier;
751740

752-
/** Creates a retry interceptor that uses {@link Retry#MAX_RETRY} attempts. */
741+
/** Uses a fixed budget of {@link Retry#MAX_RETRY} attempts. */
753742
public RetryInterceptor() {
754743
this(() -> Retry.MAX_RETRY);
755744
}
756745

757746
/**
758-
* Creates a retry interceptor that reads its attempt budget from the supplier on every
759-
* intercept call. Supplier values less than 1 are clamped to 1 (single attempt = retry off).
747+
* Reads the attempt budget from {@code maxAttemptsSupplier} on every call so it can track a
748+
* runtime-tunable value. Values below 1 are clamped to 1 (= retry off).
760749
*/
761750
public RetryInterceptor(IntSupplier maxAttemptsSupplier) {
762751
this.maxAttemptsSupplier = Objects.requireNonNull(maxAttemptsSupplier, "maxAttemptsSupplier");
@@ -812,32 +801,12 @@ public okhttp3.Response intercept(Chain chain) throws IOException {
812801
continue;
813802
}
814803

815-
if (!response.isSuccessful()) {
816-
String s3Code = peekS3ErrorCode(response);
817-
if (Retry.isS3CodeRetryable(s3Code)) {
818-
lastException = null;
819-
continue;
820-
}
821-
}
822-
823804
return response;
824805
}
825806

826807
if (lastException != null) throw lastException;
827808
return response;
828809
}
829-
830-
/** Returns the S3 {@code <Code>} value if the body is XML containing one, else null. */
831-
private static String peekS3ErrorCode(okhttp3.Response response) {
832-
try {
833-
String body = response.peekBody(MAX_PEEK_BYTES).string();
834-
if (body.isEmpty() || !body.contains("<Error")) return null;
835-
Matcher m = S3_ERROR_CODE_PATTERN.matcher(body);
836-
return m.find() ? m.group(1) : null;
837-
} catch (IOException e) {
838-
return null;
839-
}
840-
}
841810
}
842811

843812
/** HTTP body of {@link RandomAccessFile}, {@link ByteBuffer} or {@link byte} array. */

0 commit comments

Comments
 (0)