Skip to content

Add custom H5 resolver#125

Open
TLCFEM wants to merge 8 commits into
silx-kit:mainfrom
TLCFEM:feature-s3-driver
Open

Add custom H5 resolver#125
TLCFEM wants to merge 8 commits into
silx-kit:mainfrom
TLCFEM:feature-s3-driver

Conversation

@TLCFEM
Copy link
Copy Markdown
Contributor

@TLCFEM TLCFEM commented May 8, 2026

After reconsidering, I do not think relying on the native S3 support in h5py/HDF5 is a practical approach for several reasons:

  1. Enabling the ros3 driver requires the underlying HDF5 library to be compiled with that feature enabled. Since h5py does not provide this by default, a custom build would almost always be necessary, which adds significant setup complexity.
  2. Hard-coding credentials in advance reduces flexibility, especially when files may reside in different regions or deployments that require different authentication credentials.
  3. It is incompatible with the current ecosystem (e.g. h5web) to send credentials in the request body, because doing so would require all downstream consumers to switch from GET-based access patterns to POST endpoints.

A more practical solution may be to introduce an additional abstraction layer responsible for converting file paths into file objects.

This approach:

  1. minimizes the impact on existing code,
  2. avoids introducing extra dependencies,
  3. gives users full control over how file paths are resolved and interpreted, particularly when those paths are user-defined.

@TLCFEM
Copy link
Copy Markdown
Contributor Author

TLCFEM commented May 8, 2026

#124

@TLCFEM
Copy link
Copy Markdown
Contributor Author

TLCFEM commented May 12, 2026

#80

@TLCFEM
Copy link
Copy Markdown
Contributor Author

TLCFEM commented May 18, 2026

@axelboc @loichuder Mind to have a look?

@axelboc
Copy link
Copy Markdown
Contributor

axelboc commented May 18, 2026

Hey @TLCFEM, sorry, busy times, we'll take a look within a couple weeks hopefully. 🤞 Pinging @t20100 as well in case.

@TLCFEM
Copy link
Copy Markdown
Contributor Author

TLCFEM commented May 18, 2026

Hey @TLCFEM, sorry, busy times, we'll take a look within a couple weeks hopefully. 🤞 Pinging @t20100 as well in case.

Thanks, just wanted to check that it didn’t slip through unnoticed.

@loichuder loichuder self-requested a review May 26, 2026 06:08
@loichuder
Copy link
Copy Markdown
Member

I think adding an abstraction layer is a good idea if we want to replace h5py for the file access. Can we though?

It is not only the file access that relies on h5py: all content utilities are expected to work with h5py entities. How is this supposed to work when s3fs is used for the file access? Is it returning h5py or h5py-like entities?

@TLCFEM
Copy link
Copy Markdown
Contributor Author

TLCFEM commented May 28, 2026

You shall see that it converts a nominal path to an IO object, which is fed to h5py, the h5py.File is still returned and read by other utilities, not replacing anything.

@loichuder
Copy link
Copy Markdown
Member

You shall see that it converts a nominal path to an IO object, which is fed to h5py, the h5py.File is still returned and read by other utilities, not replacing anything.

Ah yes, got it. Sorry, missed this.

Copy link
Copy Markdown
Member

@loichuder loichuder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, I think the resolver approach is sensible. I think we can improve the current implementation if you agree.

I'll suggest to instead pass the resolver as an arg of open_file_with_error_fallback (and of get_content_from_file upstream), with a default fallback to the H5pyResolver. Then, it will be up to the consumer app to handle the lifecycle of the resolver.

resolver could replace h5py_options since the options of h5py can then be passed via the resolver.extra_options instead. That would also make #80 easier since apps will have full control over the resolver and its options.

What do you think?

@TLCFEM
Copy link
Copy Markdown
Contributor Author

TLCFEM commented May 28, 2026

@contextmanager
def open_file_with_error_fallback(
    filepath: str | Path,
    create_error: Callable[[int, str], Exception],
    resolver: H5FileResolver = H5FileResolver(),
) -> Iterator[h5py.File]: ...

If I understand correctly, you are suggesting the above interface.
The question is, the downstream users have no control over how the functions are called.
For example, users have no way to change the following endpoint with a custom resolver.

@router.get("/data")
def get_data(
    file: str = Depends(add_base_path),
    path: str = "/",
    dtype: str = "origin",
    format: str = "json",
    flatten: bool = False,
    selection=None,
):
    """`/data` endpoint handler"""
    with get_content_from_file(file, path, create_error) as content:
        if not isinstance(content, DatasetContent):
            raise TypeError(f"{content.path} is not a dataset")
        data = content.data(selection, flatten, dtype)
        h5grove_response = encode(data, format)
        return Response(
            content=h5grove_response.content, headers=h5grove_response.headers
        )

I think the interface can be changed to the one you proposed, but the injection mechanism should be retained such that users can inject custom resolver.
So it may be something like this:

_resolver: H5FileResolver

def assign_resolver(custom_resolver: H5FileResolver): ...

@contextmanager
def open_file_with_error_fallback(
    filepath: str | Path,
    create_error: Callable[[int, str], Exception],
    resolver: H5FileResolver = _resolver,
) -> Iterator[h5py.File]: ...

So If downstream users have two options to provide custom behavior if they are using the utilities directly.
Or by injection if they are using fastapi and alike.

@loichuder
Copy link
Copy Markdown
Member

I see, you want to inject it at the app level, not at the function level.

I'd prefer the injection to be more explicit. The assign_resolver overrides a import at runtime, which could be confusing in some cases.

I'll think about it and get back to you.

@TLCFEM
Copy link
Copy Markdown
Contributor Author

TLCFEM commented May 28, 2026

Yes, we want to inject custom behaviour at app level.
We have the cases that we need to serve files from s3 buckets and directly from zip archives.

@TLCFEM
Copy link
Copy Markdown
Contributor Author

TLCFEM commented May 28, 2026

Extra options from #80 may likely be global as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants