-
Notifications
You must be signed in to change notification settings - Fork 995
S3 downloads with CRT client cause silent errors for concurrently modified files #4007
Description
Describe the bug
When downloading a file using S3 Transfer Manager, the returned data may contain an XML error document with no further indication that the download failed:
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>PreconditionFailed</Code><Message>At least one of the pre-conditions you specified did not hold</Message><Condition>If-Match</Condition><RequestId>...</RequestId><HostId>...</HostId></Error>
It can be triggered when using the CRT client with default settings and concurrently modifying the downloaded file:
- Create transfermanager with CRT client, otherwise default settings
- Upload a file
- In a loop download the file, check the returned file contents
- Using e.g. AWS CLI upload a new version of the file several times
Expected Behavior
The inline error is detected and raised as such or handling of concurrent files itself is improved. Alternatively, the CRT client does handle concurrency issues internally.
Current Behavior
When downloading a file via S3TransferManager#downloadFile it is possible to get back a "successful" response but the payload contains a XML error document.
Reproduction Steps
Not fully standalone. To keep the code to a minimum, I skipped the upload of the file in Java, which is as simple as
date >> key && aws s3 cp key s3://bucket/key
public static void main(String[] args) throws Exception {
StaticCredentialsProvider credentialsProvider =
StaticCredentialsProvider.create(AwsBasicCredentials.create("key", "secret"));
S3TransferManager s3TransferManager = S3TransferManager.builder()
.s3Client(S3AsyncClient.crtBuilder()
.checksumValidationEnabled(false)
.credentialsProvider(credentialsProvider)
.region(Region.EU_CENTRAL_1)
.build())
.build();
while (true) {
CompletedDownload<ResponseBytes<GetObjectResponse>> download = s3TransferManager
.download(DownloadRequest.builder()
.getObjectRequest(GetObjectRequest.builder()
.bucket("bucket")
.key("key")
.build())
.responseTransformer(AsyncResponseTransformer.toBytes())
.build())
.completionFuture()
.join();
// using AWS cli upload a new random version of the same file multiple times, after a couple of tries
// you should see the error xml
String content = new String(download.result().asByteArrayUnsafe(), StandardCharsets.UTF_8);
if (content.startsWith("<?xml")) {
System.out.println("BUG");
}
Thread.sleep(10);
}
}
Possible Solution
We only rely on the consistency guarantees provided by S3 and were able to "fix" the problem by disabling checksum validation in the CRT client. This then leads to a single GET request, which according to the S3 documentation should always return either the old or the new file content. Alternatively, using the non-crt client also worked as expected.
A workaround to manually parse the received file content and detect an error was dismissed as too invasive as there would have been no way for us to detect if the downloaded file really has this content or if it is an error message.
Additional Information/Context
No response
AWS Java SDK version used
2.20.65
JDK version used
openjdk version "11.0.3" 2019-04-16 LTS OpenJDK Runtime Environment Corretto-11.0.3.7.1 (build 11.0.3+7-LTS) OpenJDK 64-Bit Server VM Corretto-11.0.3.7.1 (build 11.0.3+7-LTS, mixed mode)
Operating System and version
macos 13.0.1