Possible k8s OOM Kill prevention pill 2 - rlimit#1370
Draft
taylordowns2000 wants to merge 4 commits intomainfrom
Draft
Possible k8s OOM Kill prevention pill 2 - rlimit#1370taylordowns2000 wants to merge 4 commits intomainfrom
taylordowns2000 wants to merge 4 commits intomainfrom
Conversation
Collaborator
|
Gosh there's a lot of stuff here, and I have no idea what any of it does. I'll take a close look at it (probably tomorrow) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background: It took me about 20 seconds to crash a staging worker in Kubernetes:
The above run will show up as "lost" in the next 30 minutes.
This PR uses
prlimitto setRLIMIT_ASon each forked child process, capping virtual address space so a runawayrun crashes itself instead of OOM-killing the pod.
It's opt-in by detection: active when prlimit (from util-linux) is available on Linux; no-op on macOS / local dev, and it adds
util-linuxto the worker Docker image so it's availableTesting on staging
AI Usage
Please disclose whether you've used AI anywhere in this PR (it's cool, we just
want to know!):
You can read more details in our
Responsible AI Policy
Release branch checklist
Delete this section if this is not a release PR.
If this IS a release branch:
pnpm changeset versionfrom root to bump versionspnpm installpnpm changeset tagto generate tagsgit push --tagsTags may need updating if commits come in after the tags are first generated.