Skip to content

Support DataArray objects and nested dicts in DataTree.from_dict#10658

Merged
shoyer merged 12 commits into
pydata:mainfrom
shoyer:datatree-from-dict
Sep 23, 2025
Merged

Support DataArray objects and nested dicts in DataTree.from_dict#10658
shoyer merged 12 commits into
pydata:mainfrom
shoyer:datatree-from-dict

Conversation

@shoyer

@shoyer shoyer commented Aug 20, 2025

Copy link
Copy Markdown
Member

This PR adds three features to the DataTree.from_dict constructor:

  1. It supports DataArray objects and anything that can be coerced into a DataArray via the Dataset constructor.
  2. It adds a coords argument for explicitly specifying coordinates.
  3. It adds support for nested dictionary values, which are automatically unflattened if nested=True is passed to from_dict().

Explicitly requiring nested=True could potentially be relaxed, now or in the future. The main advantage is that it keeps the core signature simpler (allowing for more type safety), and avoids potential overlap in type signatures with "native" dict format (#9074).

Fixes #9539, #9486

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

@github-actions github-actions Bot added the topic-DataTree Related to the implementation of a DataTree class label Aug 20, 2025
@eni-awowale

Copy link
Copy Markdown
Collaborator

Thanks for adding this at @shoyer! Do you think it would be worth adding another round trip test for DataArray.from_dict and DataTree.from_dict?

Here is what I did based on DataArray.from_dict

d = {
    "coords": {
        "t": {"dims": "t", "data": [0, 1, 2], "attrs": {"units": "s"}}
    },
    "attrs": {"title": "air temperature"},
    "dims": "t",
    "data": [10, 20, 30],
    "name": "a",
}
da = xr.DataArray.from_dict(d)

dt = xr.DataTree.from_dict({'/a': da}
xr.testing.assert_identical(dt.a, da)

Comment thread xarray/core/datatree.py Outdated

Or equivalently from a dict of values coercible to DataArray objects:

>>> dt2 = DataTree.from_dict({"/a": 1, "/b/c": 2, "/b/d": 3}, coords={"/x": 0})

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to create a group with the name"/a"? On my end I am just seeing ('/', '/b') and the data variable with the name "a".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example was supposed to show creating a variable "a". I've replaced this with something a bit more concrete to illustrate the intended usage.

@shoyer

shoyer commented Aug 20, 2025

Copy link
Copy Markdown
Member Author

Thanks for adding this at @shoyer! Do you think it would be worth adding another round trip test for DataArray.from_dict and DataTree.from_dict?

So unfortunately this doesn't work. The issue is that DataTree.from_dict works fundamentally different from Dataset.from_dict and DataArray.from_dict: #9074

  • Dataset.from_dict and DataArray.from_dict parse "pure" Python dictionaries in the form you show above (e.g., {"coords": ..., "data": ..., "dims": ...})
  • In contrast, DataArray.from_dict expects xarray data structures in values.

I'm not sure it makes sense to combine both in a single function. In particular, there is some ambiguity about whether dict value should be flattened (which I've added here), e.g., does {"foo": {"data": 1}} mean a DataArray at /foo/data or at /foo?

Instead, I think we should have a dedicated methods to_pure_dict()/from_pure_dict() or add pure keyword argument for controlling the output argument. Ideally we would make this consistent across DataTree/Dataset/DataArray, too. Given that pure dictionaries are relatively niche compared to this alternate constructor, I would lean towards renaming the Dataset/DataArray methods.

@shoyer shoyer changed the title Support DataArray objects in DataTree.from_dict Support DataArray objects and nested dicts in DataTree.from_dict Aug 20, 2025
@shoyer

shoyer commented Sep 4, 2025

Copy link
Copy Markdown
Member Author

I've changed DataTree.from_dict() to require explicitly passing nested=True to unflatten nested items. This is easily discoverable, type-checkable and avoids precluding the possiblity of passing "pure" dictionary arguments like {"coords": ..., "data": ..., "dims": ...}.

@shoyer

shoyer commented Sep 7, 2025

Copy link
Copy Markdown
Member Author

@TomNicholas any thoughts?

@etienneschalk etienneschalk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments and questions out of curiosity

Comment thread xarray/core/datatree.py Outdated
Comment thread xarray/core/datatree.py Outdated
Comment thread xarray/core/datatree.py
Comment thread xarray/core/datatree.py Outdated
Comment thread xarray/core/types.py
@shoyer

shoyer commented Sep 18, 2025

Copy link
Copy Markdown
Member Author

@TomNicholas I would really appreciate your feedback on the API here, even just the docstring for DataTree.from_dict(). I am pretty happy with the implementation and that's had a few looks now, but I don't want to merge this until you're comfortable with the end-user API -- that's harder to fix later!

@shoyer shoyer added the plan to merge Final call for comments label Sep 18, 2025
@shoyer

shoyer commented Sep 22, 2025

Copy link
Copy Markdown
Member Author

Last call for review here!

@eni-awowale eni-awowale left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the addition of the nested flag and the dt2 example!

@shoyer shoyer merged commit a3bd20d into pydata:main Sep 23, 2025
45 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plan to merge Final call for comments topic-DataTree Related to the implementation of a DataTree class topic-typing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Creating DataTree from DataArrays

3 participants