HDDS-15179. Support Chunked Transfer without Content Length#10196
HDDS-15179. Support Chunked Transfer without Content Length#10196peterxcli wants to merge 4 commits intoapache:masterfrom
Conversation
…quests Signed-off-by: peterxcli <peterxcli@gmail.com>
|
I notice the jira ticket should be as a subtask under https://issues.apache.org/jira/browse/HDDS-8423, is there any way to fix it? I couldn't found the move option. |
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @peterxcli for the patch.
Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
There was a problem hiding this comment.
Thanks @peterxcli for the patch.
Based on my understanding of S3, requests fall into one of two cases: they either include a Content-Length header or an x-amz-decoded-content-length header. That being said, I totally understand that this proposed change adds flexibility to the upload process, and for me it is ok.
| OzoneBucket bucket = context.getBucket(); | ||
| final String lengthHeader = getHeaders().getHeaderString(HttpHeaders.CONTENT_LENGTH); | ||
| long length = lengthHeader != null ? Long.parseLong(lengthHeader) : 0; | ||
| if (lengthHeader == null && body != null) { |
There was a problem hiding this comment.
Regarding S3, we should also consider the scenario where the Content-Length header is absent, but the x-amz-decoded-content-length header is provided. In this case, since we already know the exact size of the payload, we wouldn't need to manually calculate the upload length. See https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html
For all requests, you must include the x-amz-decoded-content-length header, specifying the size of the object in bytes.
Signed-off-by: peterxcli <peterxcli@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
chungen0126
left a comment
There was a problem hiding this comment.
+1 LGTM. Let's wait for the CI to pass.
| if (lengthHeader == null && body != null && !hasMultiChunksUpload) { | ||
| spooledBody = new FileBackedOutputStream(32); | ||
| length = IOUtils.copyLarge(body, spooledBody, new byte[getIOBufferSize(0)]); | ||
| body = spooledBody.asByteSource().openStream(); | ||
| hasCalculatedLength = true; | ||
| } |
There was a problem hiding this comment.
Is this going to copy the whole object into a temporary file on S3G machine? In other words, is S3G required to allocate some disk space for this length calculation?
There was a problem hiding this comment.
If so, this might be problematic since firstly currently a lot of cluster (including mine) assume that S3G does not have variable disk requirement (this then require some kind of persistent volume implementation for S3G deployed in K8s). Additionally, putting the whole object into a file and then re-reading again might incur additional disk IO.
| if (canCreateDirectory && | ||
| (length == 0 || hasAmzDecodedLengthZero) && | ||
| hasKnownZeroLength && | ||
| StringUtils.endsWith(keyPath, "/") |
There was a problem hiding this comment.
From what I see, the length is only need to check whether the length is 0 or not. In that case is it possible to simply check the first byte existence?
| if (lengthHeader == null && body != null && !hasMultiChunksUpload) { | ||
| spooledBody = new FileBackedOutputStream(32); | ||
| length = IOUtils.copyLarge(body, spooledBody, new byte[getIOBufferSize(0)]); | ||
| body = spooledBody.asByteSource().openStream(); | ||
| hasCalculatedLength = true; | ||
| } |
There was a problem hiding this comment.
If so, this might be problematic since firstly currently a lot of cluster (including mine) assume that S3G does not have variable disk requirement (this then require some kind of persistent volume implementation for S3G deployed in K8s). Additionally, putting the whole object into a file and then re-reading again might incur additional disk IO.
|
thanks for the review, let me think about this. |
What changes were proposed in this pull request?
Currently chunked transfer would fail if we don't specify content length
with the following error:
https://github.com/ceph/s3-tests/blob/fb8b73092bb1dd8db829f1205a9e52e73bf9a232/s3tests/functional/test_s3.py#L1589-L1599
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15179
How was this patch tested?
(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)