CLDSRV-898: handle checksums in CompleteMultipartUpload#6170
CLDSRV-898: handle checksums in CompleteMultipartUpload#6170leif-scality wants to merge 9 commits into
Conversation
|
LGTM |
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 3 files with indirect coverage changes @@ Coverage Diff @@
## development/9.4 #6170 +/- ##
===================================================
+ Coverage 85.25% 85.39% +0.13%
===================================================
Files 208 208
Lines 13919 14065 +146
===================================================
+ Hits 11867 12011 +144
- Misses 2052 2054 +2
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
2493909 to
538c16c
Compare
Hello leif-scality,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the |
27b4a43 to
a90bd94
Compare
|
LGTM |
| const parts = jsonList.Part || []; | ||
| for (let i = 0; i < parts.length; i++) { | ||
| const part = parts[i]; |
There was a problem hiding this comment.
| const parts = jsonList.Part || []; | |
| for (let i = 0; i < parts.length; i++) { | |
| const part = parts[i]; | |
| for (const part of (jsonList.Part || [])) { |
| if (tag !== expectedTag) { | ||
| const algoLabel = tag.replace(/^Checksum/, '').toLowerCase(); | ||
| return errorInstances.BadDigest.customizeDescription( | ||
| `The ${algoLabel} you specified for part ${partNumber} ` + 'did not match what we received.', |
There was a problem hiding this comment.
| `The ${algoLabel} you specified for part ${partNumber} ` + 'did not match what we received.', | |
| `The ${algoLabel} you specified for part ${partNumber} did not match what we received.`, |
There was a problem hiding this comment.
There's already a test file multipartUpload.js, wouldn't it be better to add the tests there? Or if a refactor into multiple files is beneficial, maybe port some related tests from there to this new file to avoid scattering the complete MPU tests into multiple files.
There was a problem hiding this comment.
moved the tests to multipartUpload.js. This also triggered the new prettier linter in the file, I added it as a separate commit
|
|
||
| // XML element name AWS uses for each algorithm in CompleteMultipartUpload's | ||
| // per-part body. | ||
| const TAG_BY_ALGO = { |
There was a problem hiding this comment.
Shouldn't this be in constants too?
There was a problem hiding this comment.
The algorithms object contains the TAG of each algo already, this object is just for testing that the tag was not changed in the algorithms object
There was a problem hiding this comment.
Could you construct it from algorithms? If algorithm is supposed to be a constant and we are concerned that it may be accidentally changed, what about using Object.freeze on it?
| assert.strictEqual( | ||
| err.description, | ||
| 'One or more of the specified parts could not be ' + | ||
| 'found. The part may not have been uploaded, or ' + |
There was a problem hiding this comment.
Is the double space intentional? Also not very fond of matching against such a long error message that may vary, but that's okay I think.
There was a problem hiding this comment.
AWS returns two spaces. For all the final API/XML errors I return exactly what AWS returns
<Error><Code>InvalidPart</Code>
<Message>One or more of the specified parts could not be found. The part may not have been uploaded, or the specified entity
tag may not match the part's entity tag.</Message>
<UploadId>CTFNKyL2hI6n6irH0zbVWDzdPZ4n2ueJceRh1juCeuL2X5HOjrvCmXQMEqaoAatEWTEa3pWWxC7t9lOStMzjo0nJb4pv8Ct6oT
Hv2n8mggVXRQ8RxiXyVyt3.3zpY98HVsZd.ozihhJ1HdUjLCkJtwJMQdNBd4fSdG9drS80vdg-</UploadId>
<PartNumber>1</PartNumber>
<ETag>0ebf9257a12e808d107b2ed1a826c122</ETag>
<RequestId>DAQAPVMCMSY4PDPV</RequestId>
<HostId>XS/2re3ieUQRTKdANLtZv14qyB2h3LjVHnmVrvjP0cj1PazPO16KkArQMtBLBy8S4mmLzQkuXZc=</HostId></Error>
There was a problem hiding this comment.
okay fair enough (my little voice tells me that AWS didn't care that much about the beauty of their error messages 🙂 )
There was a problem hiding this comment.
I believe no sane application would match an error by their exact error message to the dot, but I'm good with sticking to AWS to avoid ourselves asking the question. Personally I wouldn't care though 😛
There was a problem hiding this comment.
Also I believe the error messages are not enforced by the API contract, hence may change at any time on AWS side, which would make our effort useless in the end. Just my two cents and my last comment on this 😉
| // `x-amz-checksum-type` and `x-amz-checksum-algorithm` are configuration | ||
| // headers (MPU completeness mode / SDK algorithm hint), not value | ||
| // headers. They must not count toward the "value header" tally. | ||
| const valueHeaders = Object.keys(headers).filter( | ||
| h => h.startsWith('x-amz-checksum-') && h !== 'x-amz-checksum-type' && h !== 'x-amz-checksum-algorithm', | ||
| ); |
There was a problem hiding this comment.
If the list of supported headers is known and a short list of supported algorithms, it may be cleaner to directly extract each of the possible ones, or otherwise filter like [list-of-supported-headers].includes(h).
It would change the behavior (probably in a good way) if the client sends one or more unsupported checksums along with multiple valid ones, where we probably want to return AlgoNotSupported rather than MultipleChecksumTypes in priority in this case, but that's more a nitpick, either way should work.
There was a problem hiding this comment.
AWS checks MultipleChecksumTypes before AlgoNotSupported, the current behavior is the same as AWS.
AWS also ignores only x-amz-checksum-type and x-amz-checksum-algorithm, they don't count them to the checksum count, and they also never trigger AlgoNotSupported. if we send x-amz-checksum-BAD on the other hand we get an AlgoNotSupported. If we send x-amz-checksum-BAD + x-amz-checksum-ZZZ we get MultipleChecksumTypes.
So the order is
- check no MultipleChecksumTypes
- check no AlgoNotSupported
- check actual checksum value, BadDigest if mismatch
I added a commit to ignore x-amz-checksum-type and x-amz-checksum-algorithm in the other functions, I didn't know this behavior existed.
There was a problem hiding this comment.
Sounds good to me.
IMHO we don't necessarily have to mimic every AWS behavior when it comes to error handling in particular, as it takes time to reverse engineer those aspects with no clear benefit, and AWS implementation is not perfect and their choices can be debated too. Sure it became the de facto standard and we need to stick to its way of doing things to avoid breaking existing applications, but not necessarily in all its nitty gritty details and quirks.
a90bd94 to
e468700
Compare
|
LGTM |
46b6d78 to
6e735e4
Compare
|
|
LGTM — solid implementation with thorough test coverage. |
6e735e4 to
1c92b9b
Compare
|
1c92b9b to
3df48c2
Compare
|
LGTM |
3df48c2 to
c190267
Compare
| uploadId, | ||
| error, | ||
| }); | ||
| return callNext(error, destBucket); |
There was a problem hiding this comment.
callNext is not defined anywhere in this file — this will throw a ReferenceError at runtime when computeFinalChecksum returns an error for an explicit (non-default) MPU. Should be next.
— Claude Code
|
c190267 to
7b3785c
Compare
|
LGTM |
There was a problem hiding this comment.
This diff is too big, it could be in another PR by itself
| const checksumHeaders = Object.keys(headers).filter( | ||
| header => | ||
| header.startsWith('x-amz-checksum-') && | ||
| header !== 'x-amz-checksum-type' && | ||
| header !== 'x-amz-checksum-algorithm', | ||
| ); |
There was a problem hiding this comment.
This pattern reoccurs multiple times in this file, you can make a function for that
| }); | ||
|
|
||
| it( | ||
| 'should return CRC64NVME/FULL_OBJECT on CompleteMPU response ' + 'when CreateMPU sent no checksum headers', |
There was a problem hiding this comment.
| 'should return CRC64NVME/FULL_OBJECT on CompleteMPU response ' + 'when CreateMPU sent no checksum headers', | |
| 'should return CRC64NVME/FULL_OBJECT on CompleteMPU response when CreateMPU sent no checksum headers', |
7b3785c to
a90bd94
Compare
| let caught; | ||
| try { | ||
| await promise; | ||
| } catch (err) { | ||
| caught = err; | ||
| } | ||
| assert(caught, 'expected CompleteMPU to reject'); | ||
| assert.strictEqual( | ||
| caught.name, | ||
| 'InvalidPart', | ||
| `expected InvalidPart, got ${caught.name}: ${caught.message}`, | ||
| ); |
There was a problem hiding this comment.
You can use assert.rejects
await assert.rejects(promise, err => {
assert.strictEqual(
err.name,
'InvalidPart',
`expected InvalidPart, got ${err.name}: ${err.message}`,
);
});a90bd94 to
7b3785c
Compare
|
I forced push from the wrong branch, reverted the changes |
| const log = { debug: sandbox.stub() }; | ||
| const result = await validateMethodChecksumNoChunking(request, body, log); | ||
| assert.deepStrictEqual(result, ArsenalErrors.BadDigest); | ||
| assert(log.debug.calledOnce); |
There was a problem hiding this comment.
debug logs should probably not be tested
There was a problem hiding this comment.
I will remove the log tests
|
LGTM — well-structured feature with thorough test coverage across all algorithm/type combinations, error paths, and edge cases. Two minor nits posted inline: |
| }; | ||
| const log = { debug: sandbox.stub() }; | ||
| const result = await validateMethodChecksumNoChunking(request, body, log); | ||
| assert.strictEqual(result, null); |
There was a problem hiding this comment.
In some tests above you use assert.ifError(result) and here assert.strictEqual(result, null), you should stick to only 1 pattern and be consistent everywhere
| apiMethod: 'completeMultipartUpload', | ||
| headers: { 'content-md5': '1B2M2Y8AsgTpgAmY7PhCfg==' }, | ||
| }; | ||
| const log = { debug: sandbox.stub() }; |
There was a problem hiding this comment.
It would be better to have a full mocked logger like it's done in some other test files, to avoid breakin if we change from debug to another level
There was a problem hiding this comment.
I will remove the log tests
| const algorithm = storedMetadata.checksumAlgorithm; | ||
| const type = storedMetadata.checksumType; |
There was a problem hiding this comment.
Deconstruction could be slightly more readable
| const algorithm = storedMetadata.checksumAlgorithm; | |
| const type = storedMetadata.checksumType; | |
| const { checksumAlgorithm: algorithm, checksumType: type } = storedMetadata; |
| } | ||
| const { keysToDelete, extraPartLocations } = filteredPartsObj; | ||
| const { aggregateETag, dataLocations, calculatedSize } = partsInfo; | ||
| function continueProcessParts() { |
There was a problem hiding this comment.
Shouldn't this function be another step in the waterfall?
| error: resolveErr, | ||
| }); | ||
| return next(resolveErr, destBucket); | ||
| } |
There was a problem hiding this comment.
Is there a benefit in catching those errors since we already have a catch-all below
| // CompleteMPU is not in `checksumedMethods` (x-amz-checksum-* is the | ||
| // final-object checksum, not a body digest) but it still must validate | ||
| // Content-MD5 against the XML body when present. | ||
| describe('completeMultipartUpload (md5-only path)', () => { |
There was a problem hiding this comment.
I think a lot of the unit tests could be turned into a declarative matrix for readability.
|
|
||
| // Resolves with { xml, headers } so callers can inspect both the | ||
| // response body and the response headers. | ||
| function completeMpuP(uploadId, partChecksumXml = '') { |
There was a problem hiding this comment.
It seems to me the test code is riddled with this type of local helpers, maybe we should consider a common layer of helpers for MPU construction?
| it( | ||
| 'should return CRC64NVME/FULL_OBJECT on CompleteMPU response ' + 'when CreateMPU sent no checksum headers', |
There was a problem hiding this comment.
Extra newline and concatenation doesn't look necessary.
tests/unit/api/multipartUpload.js