Reduce peak memory usage of full index loading and bundle install#9618
Merged
Conversation
specs.4.8 from rubygems.org contains 4.7M object links pointing at only 156k unique offsets, so allocating an element per occurrence makes links the largest part of the AST. Caching them by offset reduces peak RSS of loading the full index from 573MB to 448MB and subsumes the hard-coded OBJECT_LINKS table. #9368 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Resolution retains an EndpointSpecification for every version of every gem in the dependency closure, plus parsed checksums for all ~190k gems in the compact index, none of which are needed once the resolution result has been materialized. Dropping them (and forcing a major GC, since they are mostly old generation by that point) reduces peak RSS of a cold bundle install of a 103-gem app from 427MB to 342MB. #9368 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This was the only forced GC in the codebase and would inject a full mark pause into any future caller of release_resolution_memory!. Allocation pressure during installation triggers major GCs soon enough anyway: peak RSS of the same cold install measures 360MB without the explicit GC versus 342MB with it, still well below the 427MB baseline. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR targets peak memory reductions in two hot paths: RubyGems SafeMarshal full index loading (e.g., specs.4.8) and Bundler bundle install after dependency resolution, to avoid memory exhaustion on constrained platforms like 32-bit AIX.
Changes:
- Intern
SafeMarshalobject/symbol link elements by offset to avoid allocating a new link element per occurrence during full index reads. - Add
Definition#release_resolution_memory!and supporting plumbing to drop resolver state, remote spec indexes, and compact index client caches once resolution results have been materialized. - Invoke the new Bundler memory release hook during install, after capturing the resolved
specsset used for installation.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/rubygems/safe_marshal/reader.rb | Interns object/symbol link elements to cut allocations when reading large Marshal payloads (full index). |
| bundler/lib/bundler/definition.rb | Adds release_resolution_memory! to clear resolution-only state after materialization. |
| bundler/lib/bundler/installer.rb | Calls release_resolution_memory! after grabbing resolved specs, before installation proceeds. |
| bundler/lib/bundler/source_list.rb | Adds a source-list level hook to release resolution memory for Rubygems sources. |
| bundler/lib/bundler/source/rubygems.rb | Clears Rubygems source spec caches and forwards memory release to fetchers. |
| bundler/lib/bundler/fetcher.rb | Adds a fetcher-level hook to release resolution memory in underlying fetcher implementations. |
| bundler/lib/bundler/fetcher/base.rb | Defines the no-op base hook for release_resolution_memory!. |
| bundler/lib/bundler/fetcher/compact_index.rb | Drops the compact index client (parsed checksums cache) to free memory post-resolution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
29
to
33
| def initialize(io) | ||
| @io = io | ||
| @object_links = {} | ||
| @symbol_links = {} | ||
| end |
Comment on lines
+243
to
+247
| def release_resolution_memory! | ||
| @resolver = nil | ||
| @resolution_base = nil | ||
| sources.release_resolution_memory! | ||
| end |
Comment on lines
+198
to
+200
| specs = @definition.specs | ||
| @definition.release_resolution_memory! | ||
| spec_installations = ParallelInstaller.call(self, specs, jobs, standalone, force, local: local) |
Downloading a default gem always goes through cached_built_in_gem, which searches the remote index, so releasing it up front would force a full rebuild in the middle of parallel installation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cached_built_in_gem only needs the remote versions of one gem, so searching the full remote index is wasteful when it is not already materialized, for example after resolution memory has been released. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The remote index can now be released and lazily rebuilt while parallel install workers are running, so guard the memoization with a mutex to avoid concurrent duplicate builds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The bundler spec comes from the metadata source and is a default gem on any modern Ruby, so checking all specs disabled the memory release for every install. Only default gems from RubyGems sources go through cached_built_in_gem during installation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
When the compact index client has been released after resolution and is lazily rebuilt, for example by bundle cache --all-platforms, its first use happens inside the parallel fetch workers. They then race to create the client and update the versions file, whose temp file name is based on the pid, so concurrent renames fail with ENOENT, the fetcher falls back to an empty index, and caching crashes on an unmaterialized LazySpecification. Warm the client up on the calling thread instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What was the end-user or developer problem that led to this PR?
On 32-bit AIX,
gem source --addandgem update --systemcrash with[FATAL] failed to allocate memory(#9368).Two independent causes:
Gem::SafeMarshalallocates an ObjectLink element for every link occurrence when loading the full index (specs.4.8 contains 4.7M links pointing at only 156k unique offsets), andbundle installkeeps every resolution artifact alive through installation, an EndpointSpecification for every version of every gem in the dependency closure plus the parsed checksums of all ~190k gems in the compact index.What is your fix for the problem, implemented in this PR?
The SafeMarshal reader now interns link elements by offset, which also subsumes the hard-coded
OBJECT_LINKStable and reduces peak RSS of loading specs.4.8 from 573MB to 448MB.On the Bundler side, the new
Definition#release_resolution_memory!drops resolver state, remote spec indexes and compact index client caches once the resolution result has been materialized, all lazily memoized and therefore safe to drop, reducing peak RSS of a coldbundle installof a 103-gem app from 427MB to 360MB.This does not fully fix the AIX problem because SafeMarshal's remaining peak still exceeds the 256MB default data segment of 32-bit AIX processes. Converting it to single-pass loading is the planned follow-up.
Make sure the following tasks are checked