Skip to content

[Performance] Release GIL during remote storage operations to prevent blocking #21

@beinan

Description

@beinan

Context

With the recent introduction of remote storage support (PR #20), the context store now performs network I/O (e.g., S3 PutObject, GetObject) during critical operations like Context.create, save, and checkout.

Problem

Unlike local disk operations, remote object store calls introduce significant latency (often 50ms - 500ms+). The current Rust bindings likely bridge async Rust code to synchronous Python APIs using a block_on mechanism.

If the Global Interpreter Lock (GIL) is held while waiting for these network futures to complete, the entire Python interpreter will freeze during S3 operations. This effectively starves background threads (heartbeats, UI loops, web servers) and prevents concurrency.

Proposal

We need to ensure that the GIL is released before entering the blocking async runtime and re-acquired afterwards.

In PyO3, this generally looks like wrapping the async execution in py.allow_threads:

// python/src/lib.rs (conceptual)

// ... existing setup ...
let result = py.allow_threads(|| {
    runtime.block_on(async {
        // Expensive remote I/O happens here
        store.save(data).await
    })
});
// ... result handling ...

Action Items

  • Audit python/src/lib.rs for block_on usage.
  • Wrap remote-touching calls (initialization, commit, checkout) in py.allow_threads.
  • Verify that Python background threads remain responsive during large context uploads/downloads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions