Skip to content

Support glob patterns in open_datatree(group=...) for selective group loading #11196

@aladinor

Description

@aladinor

Description

When working with large hierarchical datasets, users often need only a subset of groups. Currently open_datatree(group=...) accepts a single literal path to re-root the tree. This proposal extends group to accept glob patterns (e.g., */sweep_0), filtering which groups are opened without loading the entire tree first.

Use cases

Radar data (NEXRAD): Volume scan files contain dozens of sweep groups per VCP. To analyze only the lowest elevation scan across all volumes:

dt = xr.open_datatree("radar.nc", group="*/sweep_0")

Climate model output (CMIP): Multi-model archives store data in deeply nested hierarchies like /{model}/{experiment}/{variable}. To load only temperature from all models under a specific experiment:

dt = xr.open_datatree("cmip.zarr", group="*/historical/tas")

Or to compare two specific variables across all models:

dt = xr.open_datatree("cmip.zarr", group="*/historical/ta[su]")

Proposed API

When group contains glob metacharacters (*, ?, [), it switches from root-selection mode to filter mode. Matching uses the same engine as DataTree.match() (PurePosixPath.match). Root (/) and all ancestors of matched nodes are always included to form a valid tree.

Behavior summary

group value Behavior
None Load all groups (unchanged)
"VCP-34" (no glob chars) Root selection (unchanged)
"*/sweep_0" (glob chars) Filter mode — only matched groups + ancestors
Pattern matches nothing Root-only tree

Reference

PR #10742 (async DataTree open) provides the base for this work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions