Skip to content

Re-enable TCP keep-alive and handle S3's XML errors#463

Merged
edolstra merged 3 commits into
mainfrom
eelcodolstra/nix-403
May 20, 2026
Merged

Re-enable TCP keep-alive and handle S3's XML errors#463
edolstra merged 3 commits into
mainfrom
eelcodolstra/nix-403

Conversation

@edolstra
Copy link
Copy Markdown
Collaborator

@edolstra edolstra commented May 18, 2026

Motivation

Upstream NixOS#15855.

Context

Summary by CodeRabbit

  • Bug Fixes
    • Improved TCP connection stability for long-running transfers with enhanced keepalive configuration.
    • Enhanced S3 error handling with smarter retry logic that accurately identifies retryable S3 errors, reducing unnecessary failures.

Review Change Stack

tomberek and others added 3 commits May 18, 2026 20:49
This allows curl to detect drops from both a server (eg: S3) as well
as intermediate networking. 30s is to be shorter than commonly used 60s
limits on the server side.
S3 returns transient errors as HTTP 400/503 inside XML. Without parsing these errors are treated as non-retryable.
The original set only covered 5 error codes. The AWS C++ SDK treats
additional error codes as retryable, including throttling variants
(Throttling, ThrottledException, RequestThrottled), internal error
aliases (InternalFailure, InternalServerError), and clock-related
errors (RequestExpired, RequestTimeTooSkewed).

Also use constexpr std::array<std::string_view> instead of
std::set<std::string> to avoid heap allocation.

Reference: https://github.com/aws/aws-sdk-cpp/blob/3d8614fbd6d2/src/aws-cpp-sdk-core/source/client/CoreErrors.cpp#L27-L55
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

This PR improves file transfer robustness by configuring TCP keepalive for better connection monitoring and implementing S3-specific error code parsing to enable smarter retry classification based on the error response body.

Changes

File Transfer Improvements

Layer / File(s) Summary
TCP Keepalive Configuration
src/libstore/filetransfer.cc
Enables CURLOPT_TCP_KEEPALIVE with probe/idle timing (CURLOPT_TCP_KEEPIDLE, CURLOPT_TCP_KEEPINTVL) and connection reuse aging (CURLOPT_MAXAGE_CONN).
S3 Error Code Parsing and Retry Logic
src/libstore/filetransfer.cc
Adds <array> header, introduces an allowlist of retryable S3 error codes (throttling, internal errors, expired requests), and extracts <Code> from S3 XML error responses via regex to determine retry eligibility before generic HTTP status handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • DeterminateSystems/nix-src#461: Also modifies HTTP error handling for S3-related status codes in curlFileTransfer::finish, mapping authentication and permission errors into distinct categories.

Suggested reviewers

  • cole-h

Poem

🐰 A transfer so swift with a keepalive beat,
S3 codes now parsed, retry's more sweet!
XML whispers through curl's steady hand,
Connection reuse—the most resilient plan. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: re-enabling TCP keep-alive and handling S3 XML errors, which directly match the primary objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch eelcodolstra/nix-403

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/libstore/filetransfer.cc`:
- Around line 711-718: Replace the non-portable ECMAScript regex in
s3ErrorCodeRegex used to parse S3 XML error bodies: the current pattern uses
[^]* which is unsupported and can throw std::regex_error; change it to use a
portable equivalent that matches any char including newlines (e.g., use [\s\S]*
in the ECMAScript pattern) so the regex in s3ErrorCodeRegex (and its use with
std::regex_search on errorSink->s) works across libstdc++ and libc++.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3af6cd2f-7ee5-47d4-80bb-423227110dfd

📥 Commits

Reviewing files that changed from the base of the PR and between dcda71e and 9979d0b.

📒 Files selected for processing (1)
  • src/libstore/filetransfer.cc

Comment on lines +711 to +718
// S3 error responses have the form <Error><Code>...</Code>...</Error>.
// Require the <Error> root to avoid matching unrelated XML with a <Code> element.
static std::regex s3ErrorCodeRegex("<Error>[^]*<Code>([^<]+)</Code>");
std::smatch s3Match;
bool isS3XmlStatus = httpStatus == 400 || httpStatus == 403 || httpStatus == 500 || httpStatus == 503;
auto s3ErrorCode =
(isS3XmlStatus && errorSink && std::regex_search(errorSink->s, s3Match, s3ErrorCodeRegex))
? s3Match[1].str()
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In C++ std::regex with ECMAScript grammar, is the pattern [^]*valid and portable across libstdc++ and libc++? If not, what portable pattern should be used to match any characters (including newlines) between<Error...>and?

💡 Result:

In C++ std::regex using the ECMAScript grammar, the pattern [^]* is not portable and is generally considered invalid or unreliable across different standard library implementations [1]. While some regex engines may interpret it as a character class containing nothing (which is logically invalid) or handle it as a non-greedy match, it is not standard-compliant behavior to rely on this syntax for matching any character [2][3]. To match any character, including newlines, the most portable and idiomatic pattern in C++ std::regex is [\s\S]. Explanation: 1. The dot (.) character in ECMAScript regex does not match line terminators (such as \n or \r) [4][5]. 2. The character class [\s\S] explicitly matches any whitespace character (\s) or any non-whitespace character (\S), which together cover the entire set of possible characters, including all newlines [5][6]. 3. Using [\s\S] (or [\s\S]+ for one or more) is the standard workaround for the lack of a "dot-all" flag in C++ std::regex [5][6]. For your specific use case of matching content between <Error...> and , you can use a non-greedy pattern: <Error[\s\S]*? This will match the literal <Error, followed by the shortest possible sequence of any characters (including newlines), ending at the first occurrence of .

Citations:


🏁 Script executed:

cd src/libstore && grep -n "s3ErrorCodeRegex" filetransfer.cc

Repository: DeterminateSystems/nix-src

Length of output: 276


Fix portable ECMAScript regex for S3 XML error parsing.

The pattern [^]* at line 713 is not portable across C++ standard library implementations (libstdc++ and libc++) and can trigger std::regex_error. The portable approach is to use [\s\S]* to match any character including newlines in ECMAScript regex.

Proposed fix
-                static std::regex s3ErrorCodeRegex("<Error>[^]*<Code>([^<]+)</Code>");
+                static const std::regex s3ErrorCodeRegex(
+                    R"(<Error\b[^>]*>[\s\S]*?<Code>([^<]+)</Code>)",
+                    std::regex::ECMAScript);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// S3 error responses have the form <Error><Code>...</Code>...</Error>.
// Require the <Error> root to avoid matching unrelated XML with a <Code> element.
static std::regex s3ErrorCodeRegex("<Error>[^]*<Code>([^<]+)</Code>");
std::smatch s3Match;
bool isS3XmlStatus = httpStatus == 400 || httpStatus == 403 || httpStatus == 500 || httpStatus == 503;
auto s3ErrorCode =
(isS3XmlStatus && errorSink && std::regex_search(errorSink->s, s3Match, s3ErrorCodeRegex))
? s3Match[1].str()
// S3 error responses have the form <Error><Code>...</Code>...</Error>.
// Require the <Error> root to avoid matching unrelated XML with a <Code> element.
static const std::regex s3ErrorCodeRegex(
R"(<Error\b[^>]*>[\s\S]*?<Code>([^<]+)</Code>)",
std::regex::ECMAScript);
std::smatch s3Match;
bool isS3XmlStatus = httpStatus == 400 || httpStatus == 403 || httpStatus == 500 || httpStatus == 503;
auto s3ErrorCode =
(isS3XmlStatus && errorSink && std::regex_search(errorSink->s, s3Match, s3ErrorCodeRegex))
? s3Match[1].str()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/libstore/filetransfer.cc` around lines 711 - 718, Replace the
non-portable ECMAScript regex in s3ErrorCodeRegex used to parse S3 XML error
bodies: the current pattern uses [^]* which is unsupported and can throw
std::regex_error; change it to use a portable equivalent that matches any char
including newlines (e.g., use [\s\S]* in the ECMAScript pattern) so the regex in
s3ErrorCodeRegex (and its use with std::regex_search on errorSink->s) works
across libstdc++ and libc++.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomberek Might be worth looking into this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Copy Markdown
Member

@cole-h cole-h May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compiled a small test program that constructed std::regex foo("[^]"); and it compiled and ran just fine. I think the robit is wrong (or at least not right in a way we care about here).

EDIT: on both linux and darwin

@github-actions
Copy link
Copy Markdown

@github-actions github-actions Bot temporarily deployed to pull request May 18, 2026 19:05 Inactive
}};
// S3 error responses have the form <Error><Code>...</Code>...</Error>.
// Require the <Error> root to avoid matching unrelated XML with a <Code> element.
static std::regex s3ErrorCodeRegex("<Error>[^]*<Code>([^<]+)</Code>");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, I've never seen [^]* used like that........ that's actually blowing my mind.

}};
// S3 error responses have the form <Error><Code>...</Code>...</Error>.
// Require the <Error> root to avoid matching unrelated XML with a <Code> element.
static std::regex s3ErrorCodeRegex("<Error>[^]*<Code>([^<]+)</Code>");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a tiny bit worried about:

  1. applying this regex on ALL 400/403/500/503 requests (that seems like it could be bad-ish for performance?) -- is there any way we could sniff headers or something to see if it's S3-like before regex'ing the request body?
  2. the lack of tests around this behavior -- even just unit tests validating the transition from some sample XML to the retryable error code would be swell...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upstream PR has a test but it didn't apply cleanly so I left it out.

Comment on lines +711 to +718
// S3 error responses have the form <Error><Code>...</Code>...</Error>.
// Require the <Error> root to avoid matching unrelated XML with a <Code> element.
static std::regex s3ErrorCodeRegex("<Error>[^]*<Code>([^<]+)</Code>");
std::smatch s3Match;
bool isS3XmlStatus = httpStatus == 400 || httpStatus == 403 || httpStatus == 500 || httpStatus == 503;
auto s3ErrorCode =
(isS3XmlStatus && errorSink && std::regex_search(errorSink->s, s3Match, s3ErrorCodeRegex))
? s3Match[1].str()
Copy link
Copy Markdown
Member

@cole-h cole-h May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compiled a small test program that constructed std::regex foo("[^]"); and it compiled and ran just fine. I think the robit is wrong (or at least not right in a way we care about here).

EDIT: on both linux and darwin

@tomberek
Copy link
Copy Markdown

Another thing to check is Darwin... there are regex differences that have crept in in the past.

@edolstra edolstra added this pull request to the merge queue May 20, 2026
Merged via the queue into main with commit 8670c07 May 20, 2026
29 checks passed
@edolstra edolstra deleted the eelcodolstra/nix-403 branch May 20, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants