`Dataset.to_dataframe()` and `DataArray.to_dataframe()` convert multi-index levels both to pandas `MultiIndex` and to columns

## Overview 

When `to_dataframe()` is called on an xarray `Dataset` with a multi-dimensional index along a given dimension, the index coordinates are translated both:
- into levels of a pandas `MultiIndex` for the dataframe
- into individual columns of the dataframe.

Is this expected and intended behavior? 

## Main reprex

```python
import numpy as np
import pandas as pd
import xarray as xr

data_dict = dict(x=[1, 2, 1, 2, 1], y=["a", "a", "b", "b", "b"], z=[5, 10, 15, 20, 25])
data_dict_w_dims = {k: ("my_dim", v) for k, v in data_dict.items()}

# create a dataset multi-indexed along "my_dim" by "x" and "y" 
xr_dat = xr.Dataset(data_dict_w_dims).set_coords(["x", "y"]).set_xindex(["x", "y"])

print(xr_dat)
# <xarray.Dataset> Size: 140B
# Dimensions:  (my_dim: 5)
# Coordinates:
#   * my_dim   (my_dim) object 40B MultiIndex
#   * x        (my_dim) int64 40B 1 2 1 2 1
#   * y        (my_dim) <U1 20B 'a' 'a' 'b' 'b' 'b'
# Data variables:
#     z        (my_dim) int64 40B 5 10 15 20 25

print(xr_dat.to_dataframe()) # x and y present both as columns and as multi-index
#       z  x  y
# x y
# 1 a   5  1  a
# 2 a  10  2  a
# 1 b  15  1  b
# 2 b  20  2  b
# 1 b  25  1  b
```

## Cause 

I believe the key line is here in the `_to_dataframe()` internal method:

https://github.com/pydata/xarray/blob/699d8957ec174f118108005aeb6ba99c1920167a/xarray/core/dataset.py#L7092-L7095

The constituent `IndexArrays` of the multi-index are present in `self.variables` (and not in `self.dims`), so they become columns:

```python
"x" in xr_dat.dims
# False
"x" in xr_dat.variables
# True
xr_dat.variables["x"]
# <xarray.IndexVariable 'my_dim' (my_dim: 5)> Size: 40B
# [5 values with dtype=int64]
```

## This has consequences for pandas -> xarray -> pandas conversion

Because of this, converting a `MultiIndex`-ed pandas dataframe to an xarray `Dataset` via the `xr.Dataset()`  constructor and then converting back to pandas via `.to_dataframe()` will not give back the original dataframe.

### Reprex

```python
# create a multi-indexed pandas dataframe
pd_df = pd.DataFrame(
   data_dict   
).set_index(["x", "y"])

print(pd_df) # multi-indexed-df with one column
#       z
# x y
# 1 a   5
# 2 a  10
# 1 b  15
# 2 b  20
# 1 b  25

# Conversion to xarray is as expected:
xr_from_pd = xr.Dataset(pd_df)
print(xr_from_pd)
# <xarray.Dataset> Size: 160B
# Dimensions:  (dim_0: 5)
# Coordinates:
#   * dim_0    (dim_0) object 40B MultiIndex
#   * x        (dim_0) int64 40B 1 2 1 2 1
#   * y        (dim_0) object 40B 'a' 'a' 'b' 'b' 'b'
# Data variables:
#     z        (dim_0) int64 40B 5 10 15 20 25

# Converting back to pandas df via `to_dataframe()` yields a df multi-indexed by 
# x and y that also contains `x` and `y` as columns:

print(xr_from_pd.to_dataframe()) # x and y as multi-index and as columns
#      x  y   z
# x y
# 1 a  1  a   5
# 2 a  2  a  10
# 1 b  1  b  15
# 2 b  2  b  20
# 1 b  1  b  25
```

## Thoughts 
- If this behavior is not intended, the flagged line in `_to_dataframe()` should be changed to determine column names in a way that ignores `IndexVariables` that form part of a multi-index.
- It might be important not just to filter to data variables, because one might want coordinates to become columns when they are _not_ going to be part of the pandas `MultiIndex`, e.g.
```python
# similar dataset with x and y as coordinates but not as a multi-index
dat_no_multiindex = xr.Dataset(
    data_dict_w_dims
).set_coords(["x", "y"])

# potentially intended behavior?
print(dat_no_multiindex.to_dataframe())
#        x  y   z
# my_dim
# 0       1  a   5
# 1       2  a  10
# 2       1  b  15
# 3       2  b  20
# 4       1  b  25
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Dataset.to_dataframe()` and `DataArray.to_dataframe()` convert multi-index levels both to pandas `MultiIndex` and to columns #10538

Overview

Main reprex

Cause

This has consequences for pandas -> xarray -> pandas conversion

Reprex

Thoughts

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	def _to_dataframe(self, ordered_dims: Mapping[Any, int]):
	from xarray.core.extension_array import PandasExtensionArray

	columns_in_order = [k for k in self.variables if k not in self.dims]

Uh oh!

Dataset.to_dataframe() and DataArray.to_dataframe() convert multi-index levels both to pandas MultiIndex and to columns #10538

Description

Overview

Main reprex

Cause

This has consequences for pandas -> xarray -> pandas conversion

Reprex

Thoughts

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`Dataset.to_dataframe()` and `DataArray.to_dataframe()` convert multi-index levels both to pandas `MultiIndex` and to columns #10538