Skip to content

fix: use SeparateBodyFileCache for file caching#10816

Open
JPK64 wants to merge 1 commit intopython-poetry:mainfrom
JPK64:main
Open

fix: use SeparateBodyFileCache for file caching#10816
JPK64 wants to merge 1 commit intopython-poetry:mainfrom
JPK64:main

Conversation

@JPK64
Copy link
Copy Markdown

@JPK64 JPK64 commented Apr 1, 2026

Use SeparateBodyFileCache instead of FileCache for caching. FileCache attempts to serialize the HTTP response body along with the rest of the response which can cause a ValueError: memoryview is too large for files larger than 2 GiB. cachecontrol provides SeparateBodyFileCache which is better suited for caching large responses and does not cause this issue.

Pull Request Checklist

Resolves: #10814

  • Added tests for changed code.
  • Updated documentation for changed code.

Summary by Sourcery

Bug Fixes:

  • Prevent failures when caching HTTP responses for files larger than 2 GiB by avoiding serialization of large response bodies.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 1, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Switches HTTP response caching in the authenticator from FileCache to SeparateBodyFileCache to avoid serialization issues with large (>2 GiB) response bodies while preserving existing cache location and configuration.

Class diagram for authenticator cache change from FileCache to SeparateBodyFileCache

classDiagram
    class Authenticator {
        - Config _config
        - PasswordManager _password_manager
        - CacheControlAdapter _cache_control
        - Path _repository_cache_directory
        - str cache_id
        + Authenticator(config, cache_id)
    }

    class FileCache {
        - Path directory
        + FileCache(directory)
        + get(key)
        + set(key, value)
    }

    class SeparateBodyFileCache {
        - Path directory
        + SeparateBodyFileCache(directory)
        + get(key)
        + set(key, value)
    }

    class PasswordManager {
        - Config config
        + PasswordManager(config)
        + get_credentials(repository)
    }

    class Config {
        + Path repository_cache_directory
    }

    class CacheControlAdapter {
        - Any cache
        + CacheControlAdapter(cache)
        + send(request)
    }

    Authenticator --> Config : uses
    Authenticator --> PasswordManager : owns
    Authenticator --> CacheControlAdapter : owns
    Authenticator ..> SeparateBodyFileCache : constructs
    CacheControlAdapter --> SeparateBodyFileCache : uses as cache

    %% Previous relationship (now replaced)
    CacheControlAdapter ..> FileCache : previously_used_as_cache
Loading

File-Level Changes

Change Details Files
Replace the HTTP cache backend with SeparateBodyFileCache to better support large response bodies without memoryview size errors.
  • Instantiate SeparateBodyFileCache instead of FileCache when configuring the authenticator HTTP cache.
  • Keep the existing cache directory structure, cache_id handling, and _http subdirectory intact so cache paths remain consistent.
src/poetry/utils/authenticator.py

Assessment against linked issues

Issue Objective Addressed Explanation
#10814 Prevent ValueError: memoryview is too large when downloading and hashing large (>2 GiB) wheels from HTTP sources with caching enabled by changing the HTTP cache implementation so it no longer serializes the full response body.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@JPK64
Copy link
Copy Markdown
Author

JPK64 commented Apr 1, 2026

Switching the Authenticators _cache_control from FileCache to SeparateBodyFileCache fixes #10814. However, it also breaks any already existing cache files that contain the body in the response instead of a separate file. What's more, it raises a PoetryRuntimeError stating that the package source is not reachable due to the "incomplete reads" that result from there not being a separate body file to read from.

Although all of these issues can be fixed by clearing the cache, I am not keen on breaking the caches of an entire user base. I could probably implement some kind of cache converter that converts from the old cache format to the new cache format, or implement a proxy that uses the correct cache implementation for reading from the cache and just the SeparateBodyFileCache for writing to the cache. I am not sure which approach is best, which is why I first created this pull request as a draft to get some input on how to handle this gracefully.

@radoering
Copy link
Copy Markdown
Member

However, it also breaks any already existing cache files that contain the body in the response instead of a separate file. What's more, it raises a PoetryRuntimeError stating that the package source is not reachable due to the "incomplete reads" that result from there not being a separate body file to read from.

What are the steps to reproduce this issue? In a quick test, Poetry seems to lock fine after switching from FileCache to SeparateBodyCache with an existing cache.

@JPK64
Copy link
Copy Markdown
Author

JPK64 commented Apr 7, 2026

What are the steps to reproduce this issue?

Unfortunately, I am no longer able to reproduce the issue myself. Like you described, it seems to work fine. But I do get the behaviour I described when doing the reverse, so creating cache entries with the SeparateBodyFileCache and then trying to use those with the FileCache. So maybe I got my poetry versions mixed up during initial testing?

I'll mark this pull request as ready for review.

@JPK64 JPK64 marked this pull request as ready for review April 7, 2026 09:37
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Consider whether existing cache directories created with FileCache remain compatible with SeparateBodyFileCache or if a migration/cleanup strategy is needed to avoid stale or unreadable cache entries.
  • It may be useful to make the choice of SeparateBodyFileCache vs FileCache configurable (even if defaulting to SeparateBodyFileCache), in case some environments rely on the previous behavior or cache layout.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider whether existing cache directories created with `FileCache` remain compatible with `SeparateBodyFileCache` or if a migration/cleanup strategy is needed to avoid stale or unreadable cache entries.
- It may be useful to make the choice of `SeparateBodyFileCache` vs `FileCache` configurable (even if defaulting to `SeparateBodyFileCache`), in case some environments rely on the previous behavior or cache layout.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@radoering
Copy link
Copy Markdown
Member

Thus, the upgrade path seems fine but there is an issue if a user downgrades again or switches back and forth between different versions of Poetry.

I think we should change the cache suffix from _http to _http2 for example. That way both Poetry versions will continue to work and the only "issue" is that there are unused files in the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calculating SHA256 of large wheels from HTTP sources causes "memoryview is too large"

2 participants