Skip to content

Reduce peak memory usage of full index loading and bundle install#9618

Merged
hsbt merged 8 commits into
masterfrom
claude/compassionate-haslett-99d30a
Jun 12, 2026
Merged

Reduce peak memory usage of full index loading and bundle install#9618
hsbt merged 8 commits into
masterfrom
claude/compassionate-haslett-99d30a

Conversation

@hsbt

@hsbt hsbt commented Jun 12, 2026

Copy link
Copy Markdown
Member

What was the end-user or developer problem that led to this PR?

On 32-bit AIX, gem source --add and gem update --system crash with [FATAL] failed to allocate memory (#9368).

Two independent causes: Gem::SafeMarshal allocates an ObjectLink element for every link occurrence when loading the full index (specs.4.8 contains 4.7M links pointing at only 156k unique offsets), and bundle install keeps every resolution artifact alive through installation, an EndpointSpecification for every version of every gem in the dependency closure plus the parsed checksums of all ~190k gems in the compact index.

What is your fix for the problem, implemented in this PR?

The SafeMarshal reader now interns link elements by offset, which also subsumes the hard-coded OBJECT_LINKS table and reduces peak RSS of loading specs.4.8 from 573MB to 448MB.

On the Bundler side, the new Definition#release_resolution_memory! drops resolver state, remote spec indexes and compact index client caches once the resolution result has been materialized, all lazily memoized and therefore safe to drop, reducing peak RSS of a cold bundle install of a 103-gem app from 427MB to 360MB.

This does not fully fix the AIX problem because SafeMarshal's remaining peak still exceeds the 256MB default data segment of 32-bit AIX processes. Converting it to single-pass loading is the planned follow-up.

Make sure the following tasks are checked

hsbt and others added 3 commits June 12, 2026 09:18
specs.4.8 from rubygems.org contains 4.7M object links pointing at
only 156k unique offsets, so allocating an element per occurrence
makes links the largest part of the AST. Caching them by offset
reduces peak RSS of loading the full index from 573MB to 448MB and
subsumes the hard-coded OBJECT_LINKS table.

#9368

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Resolution retains an EndpointSpecification for every version of every
gem in the dependency closure, plus parsed checksums for all ~190k gems
in the compact index, none of which are needed once the resolution
result has been materialized. Dropping them (and forcing a major GC,
since they are mostly old generation by that point) reduces peak RSS of
a cold bundle install of a 103-gem app from 427MB to 342MB.

#9368

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This was the only forced GC in the codebase and would inject a full
mark pause into any future caller of release_resolution_memory!.
Allocation pressure during installation triggers major GCs soon enough
anyway: peak RSS of the same cold install measures 360MB without the
explicit GC versus 342MB with it, still well below the 427MB baseline.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets peak memory reductions in two hot paths: RubyGems SafeMarshal full index loading (e.g., specs.4.8) and Bundler bundle install after dependency resolution, to avoid memory exhaustion on constrained platforms like 32-bit AIX.

Changes:

  • Intern SafeMarshal object/symbol link elements by offset to avoid allocating a new link element per occurrence during full index reads.
  • Add Definition#release_resolution_memory! and supporting plumbing to drop resolver state, remote spec indexes, and compact index client caches once resolution results have been materialized.
  • Invoke the new Bundler memory release hook during install, after capturing the resolved specs set used for installation.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lib/rubygems/safe_marshal/reader.rb Interns object/symbol link elements to cut allocations when reading large Marshal payloads (full index).
bundler/lib/bundler/definition.rb Adds release_resolution_memory! to clear resolution-only state after materialization.
bundler/lib/bundler/installer.rb Calls release_resolution_memory! after grabbing resolved specs, before installation proceeds.
bundler/lib/bundler/source_list.rb Adds a source-list level hook to release resolution memory for Rubygems sources.
bundler/lib/bundler/source/rubygems.rb Clears Rubygems source spec caches and forwards memory release to fetchers.
bundler/lib/bundler/fetcher.rb Adds a fetcher-level hook to release resolution memory in underlying fetcher implementations.
bundler/lib/bundler/fetcher/base.rb Defines the no-op base hook for release_resolution_memory!.
bundler/lib/bundler/fetcher/compact_index.rb Drops the compact index client (parsed checksums cache) to free memory post-resolution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 29 to 33
def initialize(io)
@io = io
@object_links = {}
@symbol_links = {}
end
Comment on lines +243 to +247
def release_resolution_memory!
@resolver = nil
@resolution_base = nil
sources.release_resolution_memory!
end
Comment on lines +198 to +200
specs = @definition.specs
@definition.release_resolution_memory!
spec_installations = ParallelInstaller.call(self, specs, jobs, standalone, force, local: local)
hsbt and others added 5 commits June 12, 2026 10:03
Downloading a default gem always goes through cached_built_in_gem,
which searches the remote index, so releasing it up front would force
a full rebuild in the middle of parallel installation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cached_built_in_gem only needs the remote versions of one gem, so
searching the full remote index is wasteful when it is not already
materialized, for example after resolution memory has been released.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The remote index can now be released and lazily rebuilt while parallel
install workers are running, so guard the memoization with a mutex to
avoid concurrent duplicate builds.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The bundler spec comes from the metadata source and is a default gem on
any modern Ruby, so checking all specs disabled the memory release for
every install. Only default gems from RubyGems sources go through
cached_built_in_gem during installation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
When the compact index client has been released after resolution and is
lazily rebuilt, for example by bundle cache --all-platforms, its first
use happens inside the parallel fetch workers. They then race to create
the client and update the versions file, whose temp file name is based
on the pid, so concurrent renames fail with ENOENT, the fetcher falls
back to an empty index, and caching crashes on an unmaterialized
LazySpecification. Warm the client up on the calling thread instead.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@hsbt hsbt merged commit d0255d4 into master Jun 12, 2026
110 checks passed
@hsbt hsbt deleted the claude/compassionate-haslett-99d30a branch June 12, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants