Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 40 additions & 1 deletion src/libstore/filetransfer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

#include <curl/curl.h>

#include <array>
#include <cmath>
#include <cstring>
#include <queue>
Expand Down Expand Up @@ -579,6 +580,14 @@ struct curlFileTransfer : public FileTransfer
#endif
curl_easy_setopt(req, CURLOPT_CONNECTTIMEOUT, fileTransfer.settings.connectTimeout.get());

/* Enable TCP keepalive to detect dead connections and server closures.
Probes every 30s to catch network failures and idle timeouts early. */
curl_easy_setopt(req, CURLOPT_TCP_KEEPALIVE, 1L);
curl_easy_setopt(req, CURLOPT_TCP_KEEPIDLE, 30L);
curl_easy_setopt(req, CURLOPT_TCP_KEEPINTVL, 30L);
/* Don't reuse idle connections older than 90s. */
curl_easy_setopt(req, CURLOPT_MAXAGE_CONN, 90L);

curl_easy_setopt(req, CURLOPT_LOW_SPEED_LIMIT, 1L);
curl_easy_setopt(req, CURLOPT_LOW_SPEED_TIME, fileTransfer.settings.stalledDownloadTimeout.get());

Expand Down Expand Up @@ -682,7 +691,37 @@ struct curlFileTransfer : public FileTransfer
// We treat most errors as transient, but won't retry when hopeless
Error err = Transient;

if (httpStatus == 404 || httpStatus == 410 || code == CURLE_FILE_COULDNT_READ_FILE) {
// S3 returns certain retryable errors as HTTP 400/500/503 with XML error codes.
// These take precedence over the generic HTTP status handling below.
// Only parse the response body on status codes where S3 XML errors can appear.
static constexpr std::array<std::string_view, 12> s3RetryableErrors{{
"IncompleteBody", // HTTP 400 - network issue
"InternalError", // HTTP 500 - S3 internal failure
"InternalFailure", // HTTP 500 - alias for InternalError
"InternalServerError", // HTTP 500 - alias for InternalError
"RequestExpired", // HTTP 400 - clock skew / slow upload
"RequestTimeout", // HTTP 400 - stale connection reuse
"RequestTimeTooSkewed", // HTTP 403 - clock drift
"RequestThrottled", // HTTP 400 - throttling variant
"SlowDown", // HTTP 503 - throttling
"ServiceUnavailable", // HTTP 503 - temporary unavailability
"Throttling", // HTTP 400 - throttling variant
"ThrottledException", // HTTP 400 - throttling variant
}};
// S3 error responses have the form <Error><Code>...</Code>...</Error>.
// Require the <Error> root to avoid matching unrelated XML with a <Code> element.
static std::regex s3ErrorCodeRegex("<Error>[^]*<Code>([^<]+)</Code>");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, I've never seen [^]* used like that........ that's actually blowing my mind.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a tiny bit worried about:

  1. applying this regex on ALL 400/403/500/503 requests (that seems like it could be bad-ish for performance?) -- is there any way we could sniff headers or something to see if it's S3-like before regex'ing the request body?
  2. the lack of tests around this behavior -- even just unit tests validating the transition from some sample XML to the retryable error code would be swell...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upstream PR has a test but it didn't apply cleanly so I left it out.

std::smatch s3Match;
bool isS3XmlStatus = httpStatus == 400 || httpStatus == 403 || httpStatus == 500 || httpStatus == 503;
auto s3ErrorCode =
(isS3XmlStatus && errorSink && std::regex_search(errorSink->s, s3Match, s3ErrorCodeRegex))
? s3Match[1].str()
Comment on lines +711 to +718

@coderabbitai coderabbitai Bot May 18, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In C++ std::regex with ECMAScript grammar, is the pattern [^]*valid and portable across libstdc++ and libc++? If not, what portable pattern should be used to match any characters (including newlines) between<Error...>and?

💡 Result:

In C++ std::regex using the ECMAScript grammar, the pattern [^]* is not portable and is generally considered invalid or unreliable across different standard library implementations [1]. While some regex engines may interpret it as a character class containing nothing (which is logically invalid) or handle it as a non-greedy match, it is not standard-compliant behavior to rely on this syntax for matching any character [2][3]. To match any character, including newlines, the most portable and idiomatic pattern in C++ std::regex is [\s\S]. Explanation: 1. The dot (.) character in ECMAScript regex does not match line terminators (such as \n or \r) [4][5]. 2. The character class [\s\S] explicitly matches any whitespace character (\s) or any non-whitespace character (\S), which together cover the entire set of possible characters, including all newlines [5][6]. 3. Using [\s\S] (or [\s\S]+ for one or more) is the standard workaround for the lack of a "dot-all" flag in C++ std::regex [5][6]. For your specific use case of matching content between <Error...> and , you can use a non-greedy pattern: <Error[\s\S]*? This will match the literal <Error, followed by the shortest possible sequence of any characters (including newlines), ending at the first occurrence of .

Citations:


🏁 Script executed:

cd src/libstore && grep -n "s3ErrorCodeRegex" filetransfer.cc

Repository: DeterminateSystems/nix-src

Length of output: 276


Fix portable ECMAScript regex for S3 XML error parsing.

The pattern [^]* at line 713 is not portable across C++ standard library implementations (libstdc++ and libc++) and can trigger std::regex_error. The portable approach is to use [\s\S]* to match any character including newlines in ECMAScript regex.

Proposed fix
-                static std::regex s3ErrorCodeRegex("<Error>[^]*<Code>([^<]+)</Code>");
+                static const std::regex s3ErrorCodeRegex(
+                    R"(<Error\b[^>]*>[\s\S]*?<Code>([^<]+)</Code>)",
+                    std::regex::ECMAScript);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// S3 error responses have the form <Error><Code>...</Code>...</Error>.
// Require the <Error> root to avoid matching unrelated XML with a <Code> element.
static std::regex s3ErrorCodeRegex("<Error>[^]*<Code>([^<]+)</Code>");
std::smatch s3Match;
bool isS3XmlStatus = httpStatus == 400 || httpStatus == 403 || httpStatus == 500 || httpStatus == 503;
auto s3ErrorCode =
(isS3XmlStatus && errorSink && std::regex_search(errorSink->s, s3Match, s3ErrorCodeRegex))
? s3Match[1].str()
// S3 error responses have the form <Error><Code>...</Code>...</Error>.
// Require the <Error> root to avoid matching unrelated XML with a <Code> element.
static const std::regex s3ErrorCodeRegex(
R"(<Error\b[^>]*>[\s\S]*?<Code>([^<]+)</Code>)",
std::regex::ECMAScript);
std::smatch s3Match;
bool isS3XmlStatus = httpStatus == 400 || httpStatus == 403 || httpStatus == 500 || httpStatus == 503;
auto s3ErrorCode =
(isS3XmlStatus && errorSink && std::regex_search(errorSink->s, s3Match, s3ErrorCodeRegex))
? s3Match[1].str()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/libstore/filetransfer.cc` around lines 711 - 718, Replace the
non-portable ECMAScript regex in s3ErrorCodeRegex used to parse S3 XML error
bodies: the current pattern uses [^]* which is unsupported and can throw
std::regex_error; change it to use a portable equivalent that matches any char
including newlines (e.g., use [\s\S]* in the ECMAScript pattern) so the regex in
s3ErrorCodeRegex (and its use with std::regex_search on errorSink->s) works
across libstdc++ and libc++.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomberek Might be worth looking into this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@cole-h cole-h May 20, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compiled a small test program that constructed std::regex foo("[^]"); and it compiled and ran just fine. I think the robit is wrong (or at least not right in a way we care about here).

EDIT: on both linux and darwin

: "";

if (std::find(s3RetryableErrors.begin(), s3RetryableErrors.end(), s3ErrorCode)
!= s3RetryableErrors.end()) {
debug("S3 error '%s', will retry", s3ErrorCode);
} else if (httpStatus == 404 || httpStatus == 410 || code == CURLE_FILE_COULDNT_READ_FILE) {
// The file is definitely not there
err = NotFound;
} else if (httpStatus == 401 || httpStatus == 407) {
Expand Down
Loading