feat: support v4 signed url endpoint#10490
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements support for signed post policy uploads in the GCS emulator by adding a new POST endpoint and a custom multipart/form-data parser. The feedback identifies a bug where the parser could truncate data if a trailing line separator is missing and suggests adding a verification check. Additionally, the reviewer recommends against trimming custom metadata values to maintain parity with GCS behavior and suggests making the boundary extraction logic more robust to handle extra Content-Type parameters.
| const headerBuffer = sections[0]; | ||
| const bodyBuffer = sections[1]; | ||
|
|
||
| const dataRaw = bodyBuffer.slice(0, bodyBuffer.byteLength - LINE_SEPARATOR.length); |
There was a problem hiding this comment.
The parser assumes that every multipart part body ends with a line separator (\r\n), but it doesn't verify this. If the separator is missing, it will silently truncate the last two bytes of the actual data. Furthermore, the included tests (line 231 in multipart.spec.ts) expect an error to be thrown in this case, which currently won't happen, causing the tests to fail.
| const dataRaw = bodyBuffer.slice(0, bodyBuffer.byteLength - LINE_SEPARATOR.length); | |
| if (!bodyBuffer.slice(-LINE_SEPARATOR.length).equals(Buffer.from(LINE_SEPARATOR))) { | |
| throw new Error("Failed to parse multipart part: Missing trailing line separator."); | |
| } | |
| const dataRaw = bodyBuffer.slice(0, bodyBuffer.byteLength - LINE_SEPARATOR.length); |
| if (part.type === "file" || part.name === "key") continue; | ||
|
|
||
| const key = part.name.toLowerCase(); | ||
| const value = part.value.trim(); |
There was a problem hiding this comment.
Trimming the value of form fields can lead to data corruption for custom metadata (x-goog-meta-*). Google Cloud Storage preserves leading and trailing spaces in custom metadata values. Since the multipart parser already handles the trailing line separator of the part, additional trimming here is likely unnecessary and potentially incorrect.
| const value = part.value.trim(); | |
| const value = part.value; |
| if (!contentTypeHeader.startsWith("multipart/form-data")) { | ||
| throw new Error(`Bad content type. ${contentTypeHeader}`); | ||
| } | ||
| const boundaryId = contentTypeHeader.split("boundary=")[1]; |
There was a problem hiding this comment.
Extracting the boundary by simply splitting on boundary= and taking the rest of the string is fragile. If the Content-Type header contains additional parameters after the boundary (e.g., ; charset=utf-8), they will be incorrectly included in the boundaryId, causing the parser to fail to find the boundary markers in the body.
| const boundaryId = contentTypeHeader.split("boundary=")[1]; | |
| const boundaryId = contentTypeHeader.split("boundary=")[1]?.split(";")[0].trim(); |
There was a problem hiding this comment.
adjustment to gemini solution:
contentTypeHeader.split("boundary=")[1]?.split(";")[0]?.trim();
The gemini recommended solution assumes "boundary=" is present.
|
Thanks for submitting this PR @7hokerz! I don't have a good understanding of the Storage emulator, but let me try checking with our engineering team to see if anyone more familiar with the Storage emulator would be able to take a look. Just to set expectations, we're going on a code freeze, so it might take some time for this PR to get merged. |
@aalej , Thank you for your response. I will gladly wait until a storage emulator expert is available. I appreciate your assistance! |
annajowang
left a comment
There was a problem hiding this comment.
(i am also not a "storage emulator expert". I don't think there currently is one. but if bugs come up after this is merged they can always be fixed)
Left some comments, please address them and the gemini ones as well.
Thanks for your contribution!
| if (!contentTypeHeader.startsWith("multipart/form-data")) { | ||
| throw new Error(`Bad content type. ${contentTypeHeader}`); | ||
| } | ||
| const boundaryId = contentTypeHeader.split("boundary=")[1]; |
There was a problem hiding this comment.
adjustment to gemini solution:
contentTypeHeader.split("boundary=")[1]?.split(";")[0]?.trim();
The gemini recommended solution assumes "boundary=" is present.
| for (const part of formData) { | ||
| if (part.type === "file" || part.name === "key") continue; | ||
|
|
||
| const key = part.name.toLowerCase(); | ||
| const value = part.value.trim(); | ||
|
|
||
| if (key.startsWith("x-goog-meta-")) { | ||
| const metaKey = key.replace("x-goog-meta-", ""); | ||
| metadata.metadata![metaKey] = value; | ||
| } else { | ||
| switch (key) { | ||
| case "content-type": | ||
| metadata.contentType = value; | ||
| break; | ||
| case "cache-control": | ||
| metadata.cacheControl = value; | ||
| break; | ||
| case "content-disposition": | ||
| metadata.contentDisposition = value; | ||
| break; | ||
| case "content-encoding": | ||
| metadata.contentEncoding = value; | ||
| break; | ||
| case "content-language": | ||
| metadata.contentLanguage = value; | ||
| break; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
opinion nit: this seems a little easier to read to me
const HEADER_MAP: Record<string, keyof IncomingMetadata> = {
"content-type": "contentType",
"cache-control": "cacheControl",
"content-disposition": "contentDisposition",
"content-encoding": "contentEncoding",
"content-language": "contentLanguage",
};
for (const part of formData) {
if (part.type === "file" || part.name === "key") continue;
const key = part.name.toLowerCase();
if (key.startsWith("x-goog-meta-")) {
metadata.metadata![key.substring(12)] = part.value;
} else if (HEADER_MAP[key]) {
metadata[HEADER_MAP[key]] = part.value.trim();
}
}
| const contentDispositionParams = h.split(";").slice(1); | ||
| for (const param of contentDispositionParams) { | ||
| const [paramName, ...valueParts] = param.trim().split("="); | ||
| const value = valueParts.join("=").replace(/^["'](.+(?=["']$))["']$/, "$1"); | ||
| if (paramName === "name") { | ||
| name = value; | ||
| } else if (paramName === "filename") { | ||
| filename = value; | ||
| } | ||
| } |
There was a problem hiding this comment.
I believe this is all just to extract the name and filename values?
could this just be simplified to:
name = extractParam(h, "name") ?? name;
filename = extractParam(h, "filename") ?? filename;
with some descriptively-named helper function like:
function extractParam(header: string, param: string): string | undefined {
const match = header.match(new RegExp(`\\b${param}=["']?([^"';]+)["']?`, "i"));
return match ? match[1] : undefined;
}
@annajowang , Thank you for your feedback! I will take the time to review it and make improvements soon. |
Description
I added the endpoint and also added test code for the endpoint.
Additional Context
Actually, the latest version of the package has been released, but the existing method,
npm i @google-cloud/storage@latest, failed to download the package.@google-cloud/storage v7.20.0 release