Skip to content

Improvement on RequestsTransport handling #46577

@turnbullerin

Description

@turnbullerin

Is your feature request related to a problem? Please describe.
I'm using the Storage API in Python to manage data workflows in a service that is constantly running.

In doing some profiling, I noticed that the main source of overhead is connecting to the Microsoft servers (specifically the SSL handshake part is very long). In digging into why, I discovered that when I make a new client from scratch (e.g. by calling BlobClient.from_connection_string()) it was making a new pipeline every time, which forced the SSL handshake to happen again and again.

To address this, I started caching the ContainerClient and using its get_blob_client() method which seems to pass the pipeline used for ContainerClient to the BlobClient(), see this line in _container_client.py in get_blob_client()

        _pipeline = Pipeline(
            transport=TransportWrapper(self._pipeline._transport), # pylint: disable = protected-access
            policies=self._pipeline._impl_policies # pylint: disable = protected-access
        )

My code then looked something like this:

container = ContainerClient.from_connection_string(secret, "container")
blob = container.get_blob_client("myblob")
# do stuff
# do other stuff, not caring about the blob for a bit
# come back to the blob
blob = container.get_blob_client("myblob")

Then I noticed that the requests session is never closed if you do it this way. Without calling it as a context manager, RequestsTransport.send() opens the session, but never closes it.

If you instead do this

container = ContainerClient.from_connection_string(secret, "container"
with container:
    blob = container.get_blob_client("myblob")
    # do stuff

# do other stuff, not caring about the blob for a bit
with container:
    # come back to the blob
    blob = container.get_blob_client("myblob")

Then you get an error that the HTTP transport has already been closed, making these basically a one-time use object if you use it as a context manager (but not if you don't - but then the pipeline is never closed).

Describe the solution you'd like

Rather than rebuilding the container every time I need it, it would be nice if the ContainerClient could reopen a new session as needed.

Looking at RequestsTransport (where the error originates), you can see the problem is here:

    # in __init__()
        # See https://github.com/Azure/azure-sdk-for-python/issues/25640 to understand why we track this
        self._has_been_opened = False

...

    def open(self):
        """Opens the connection."""
        if self._has_been_opened and not self.session:
            raise ValueError(
                "HTTP transport has already been closed. "
                "You may check if you're calling a function outside of the `with` of your client creation, "
                "or if you called `close()` on your client already."
            )
        if not self.session:
            if self._session_owner:
                self.session = requests.Session()
                self._init_session(self.session)
            else:
                raise ValueError("session_owner cannot be False and no session is available")
        self._has_been_opened = True

I am not clear on why the session cannot be reopened again later. This only seems to be an issue if self._session_owner is False (i.e. a specific session was provided and the new one may not be able to be built), which is already handled. It looks like it was added in response to #25640 but this is a poor fix for that issue blocking reuse of an already built connection issue to handle cases where people close something before they finish with it. The sample code giving in that issue is clearly bad use of something with a context manager and an iterator and should not be fixed this way in my opinion.

Describe alternatives you've considered
Alternatively, I could rebuild the ContainerClient (or ShareClient) every time, but the Azure clients are costly to build. Also I can monkey patch this to work for my use case, but I imagine others might benefit from it as well. I could also provide my own RequestsTransport sub-class that doesn't work like this, but that seems like overkill.

Additional context
None

Edits: fixing formatting

Metadata

Metadata

Assignees

No one assigned

    Labels

    customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions