Skip to content

fix: as_dataarray treating multi-index levels as extra dims#659

Merged
FabianHofmann merged 3 commits intomasterfrom
fix/as-dataarray-multiindex-coords
Apr 20, 2026
Merged

fix: as_dataarray treating multi-index levels as extra dims#659
FabianHofmann merged 3 commits intomasterfrom
fix/as-dataarray-multiindex-coords

Conversation

@FabianHofmann
Copy link
Copy Markdown
Collaborator

Changes proposed in this Pull Request

When as_dataarray is given a scalar (np.number, int, float, bool, str, or list) together with an xarray.Coordinates object whose underlying index is a pandas MultiIndex, it inferred dims from coords.keys(). That includes the level names, so a scalar with station coords containing levels letter/num was broadcast to shape (2, 2, 2) over ('station', 'letter', 'num') instead of (2,) over ('station',).

  • Use coords.dims when coords is an xarray.Coordinates instance so only real dimensions are used.
  • Merge the np.number and int | float | str | bool | list branches of as_dataarray into one, now that both need the same dim resolution.
  • Add a regression test in test_common.py covering the scalar + multi-index Coordinates case.
  • Release note added.

Checklist

  • Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • Unit tests for new features were added (if applicable).
  • A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • I consent to the release of this PR's code under the MIT license.

When a scalar is broadcast against an xarray.Coordinates object with a
pandas MultiIndex, dims were derived from coords.keys() which includes
level names. Use coords.dims for Coordinates objects so the scalar is
shaped to the actual dimensions only.
@FabianHofmann FabianHofmann requested a review from FBumann April 20, 2026 13:13
Assert coord names, values, and that level coords remain attached to the
parent dim, so silent level-name leakage can't pass. Parametrize across
scalar/array arg types and across Coordinates / dict / DataArray.coords
inputs, plus an explicit-dims-wins case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FBumann
Copy link
Copy Markdown
Collaborator

FBumann commented Apr 20, 2026

Nice catch — coords.keys() was leaking MultiIndex level names as dims; coords.dims is the right read.

I added 2c29b87 to harden the test and lock in the behaviour, as im always a bit confused by multiindex:

  • Parametrized over arg types (np.float64, int, float, np.ndarray) and coord shapes (xr.Coordinates.from_pandas_multiindex, plain dict, DataArray.coords) — the first coord case exercises the branch where coords.dims is a mapping, not a tuple.
  • Separate case pinning that explicit dims= still wins over inference.

I found the following bug, but declared it out of scope for now: as_dataarray(pd.Series(...), coords=source.coords) drops the dim name and returns dim_0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FabianHofmann
Copy link
Copy Markdown
Collaborator Author

Thanks @FBumann !

@FabianHofmann FabianHofmann merged commit 837e5ed into master Apr 20, 2026
21 checks passed
@FabianHofmann FabianHofmann deleted the fix/as-dataarray-multiindex-coords branch April 20, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants