Note: The architectural overview has been consolidated into the main design document. See the IO > Input > Conversion section in
sdd.mdfor the current high-level design. This document remains as a detailed implementation specification.
Design for refactoring flopy4.mf6.converter.structure.structure_array() to support multiple sparse/dense array formats from flopy 3.x while returning xarray DataArrays with proper metadata.
Location: flopy4/mf6/converter/structure.py:13
Current Capabilities:
- Handles dict-based sparse arrays with stress period keys:
{0: {cellid: value}, 1: {...}} - Returns numpy arrays or sparse COO arrays based on size threshold
- Resolves dimensions from model context
Current Limitations:
- Does NOT return
xr.DataArray(returns raw numpy/sparse) - Only supports dictionary input
- Limited to very specific dict format
- No support for list-based formats
- No duck array pass-through
- No grid reshaping (structured ↔ unstructured)
# Sparse stress period specification - each fills forward
{0: data1, 5: data2, 10: data3}
# SP 0-4 use data1, SP 5-9 use data2, SP 10+ use data3
# Single entry fills to all periods
{0: data1} # All stress periods use data1# Simple layered values
{0: 100.0, 1: 90.0, 2: 85.0}
# With metadata
{0: {'data': array1, 'factor': 1.0, 'iprn': 1},
1: {'data': array2, 'factor': 1.0, 'iprn': 1}}# Values can be different types in same dict
{
0: xr.DataArray(..., dims=['nlay', 'nrow', 'ncol']), # xarray
5: np.array([[100.0], [90.0]]), # numpy
10: [[95.0], [85.0]], # list
15: 0.004 # scalar
}# Simple list (layers/periods)
[100.0, 90.0, 85.0]
# Nested lists (structured)
[[100.0], [90.0, 90.0, 90.0, ...]]
# Fully nested (layer → row → col)
[[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]]]# xarray with named dimensions
xr.DataArray(data, dims=['nper', 'nodes'])
# numpy array (validated by shape only)
np.array(shape=(10, 1000))
# sparse array
sparse.COO(coords, data, shape=shape)# Constant value (broadcast to full shape)
0.004{'filename': 'strt.txt', 'factor': 1.0, 'data': [...], 'binary': True}# Structured grid format (from package.stress_period_data)
pd.DataFrame({
'kper': [0, 0, 1, 1],
'layer': [0, 1, 0, 1],
'row': [0, 9, 0, 9],
'col': [0, 9, 0, 9],
'head': [10.0, 5.0, 11.0, 6.0]
})
# Unstructured grid format
pd.DataFrame({
'kper': [0, 0, 1],
'node': [0, 99, 0],
'head': [20.0, 15.0, 21.0]
})
# Enables round-trip: package → stress_period_data → new packageSparse dictionary keys fill forward to next key or end of simulation:
{0: data1, 5: data2}→ SP 0-4 usedata1, SP 5+ usedata2- Mimics MF6 behavior: specifying only SP 0 applies to all periods
Most component fields use flat nodes dimension for grid-agnostic design:
- User provides:
(nlay, nrow, ncol)structured - Component expects:
(nodes,)flat wherenodes = nlay * nrow * ncol - Function must reshape:
data.reshape(-1)ordata.ravel()
Supported conversions:
(nlay, nrow, ncol)→(nodes,)(nper, nlay, nrow, ncol)→(nper, nodes)
Arrays without nper dimension apply to all stress periods:
Case A: Has nper dimension
xr.DataArray(shape=(10, 1000), dims=['nper', 'nodes'])
# All 10 stress periods explicitly specifiedCase B: No nper dimension
xr.DataArray(shape=(1000,), dims=['nodes'])
# Broadcast to all stress periods: (nper, 1000)Correct dimension names (from Tdis/Dis components):
- Time:
nper(nottime) - Structured:
nlay,nrow,ncol(notlayer,row,col) - Unstructured:
nodes(flat)
Validation rules:
- xarray: Check dimension names match expected
- numpy: Check shape matches expected (no named dims)
- Handle structured→flat reshaping automatically
- Raise clear errors on mismatches
Primary: xr.DataArray with:
- Proper dimensions from field spec (
nper,nodes, etc.) - Coordinates resolved from model context (Tdis, Dis)
- Attributes for metadata (factor, iprn, etc.)
- Underlying storage: sparse COO or dense numpy
Fallback: Raw arrays via return_xarray=False flag
def structure_array(
value: dict | list | xr.DataArray | np.ndarray | float | int,
self_,
field,
*,
return_xarray: bool = True,
sparse_threshold: int | None = None
) -> xr.DataArray | np.ndarray | sparse.COO:
"""
Convert various array representations to structured xarray DataArrays.
Parameters
----------
value : dict | list | xr.DataArray | np.ndarray | float | int
Input data in any supported format
self_ : object
Parent object containing dimension context
field : object
Field specification with dims, dtype, default
return_xarray : bool, default True
If True, return xr.DataArray; otherwise return raw array
sparse_threshold : int | None
Override default sparse threshold for COO vs dense
Returns
-------
xr.DataArray | np.ndarray | sparse.COO
Structured array with proper shape and metadata
"""def _resolve_dimensions(self_, field) -> tuple[list[str], list[int], dict]:
"""
Get expected dims, shape, and resolved dimension values.
Returns
-------
dims : list[str]
Dimension names (e.g., ['nper', 'nodes'])
shape : list[int]
Resolved shape (e.g., [10, 1000])
dim_dict : dict
Dimension values (e.g., {'nper': 10, 'nodes': 1000})
"""def _detect_grid_reshape(
value_shape: tuple,
expected_dims: list[str],
dim_dict: dict
) -> tuple[bool, tuple | None]:
"""
Check if structured↔flat conversion needed.
Returns
-------
needs_reshape : bool
True if reshape required
target_shape : tuple | None
Target shape for reshape, or None
"""def _reshape_grid(
data: np.ndarray | xr.DataArray,
target_shape: tuple,
source_dims: list[str] | None = None,
target_dims: list[str] | None = None
) -> np.ndarray | xr.DataArray:
"""
Perform structured↔flat grid conversion.
Handles full 3D grids (nodes dimension):
- (nlay, nrow, ncol) → (nodes,)
- (nper, nlay, nrow, ncol) → (nper, nodes)
Handles per-layer 2D arrays (ncpl dimension):
- (nrow, ncol) → (ncpl,)
- (nper, nrow, ncol) → (nper, ncpl)
Preserves xarray metadata if input is xarray
"""def _validate_duck_array(
value: xr.DataArray | np.ndarray,
expected_dims: list[str],
expected_shape: tuple,
dim_dict: dict
) -> xr.DataArray | np.ndarray:
"""
Validate and optionally reshape duck arrays.
- xarray: check dimension names, apply grid reshape if needed
- numpy: check shape, apply reshape if needed
- Raise clear errors on incompatibilities
"""def _fill_forward_time(
data: np.ndarray | xr.DataArray,
dims: list[str],
nper: int
) -> np.ndarray | xr.DataArray:
"""
Add nper dimension if missing (broadcast to all periods).
If 'nper' in dims but not in data:
Broadcast data to (nper, *data.shape)
Else:
Return as-is
"""def _parse_dataframe(
df: pd.DataFrame,
field_name: str,
dim_dict: dict
) -> dict[int, dict]:
"""
Parse pandas DataFrame to dict format compatible with stress period data.
Handles both structured (layer/row/col) and unstructured (node) coordinates.
Enables round-trip: package → stress_period_data → new package
Returns
-------
parsed : dict[int, dict]
Dict with stress period keys mapping to cellid dicts
Example: {0: {(0, 0, 0): 10.0, (1, 9, 9): 5.0}, ...}
"""def _parse_dict_format(
value: dict,
expected_dims: list[str],
expected_shape: tuple,
dim_dict: dict,
field
) -> dict[int, np.ndarray]:
"""
Parse dict format with fill-forward logic.
Returns
-------
parsed : dict[int, np.ndarray]
Dict with integer keys (period/layer) and normalized arrays
Handles:
- Mixed value types (xarray, numpy, list, scalar)
- Metadata dicts ({'data': ..., 'factor': ...})
- External file dicts ({'filename': ...})
- Each value independently validated/reshaped
"""def _parse_list_format(
value: list,
expected_dims: list[str],
expected_shape: tuple,
field
) -> np.ndarray:
"""
Parse nested list formats to numpy array.
Handles:
- Simple lists: [100.0, 90.0]
- Nested lists: [[...], [...]]
- Validates shape matches expected
"""def _to_xarray(
data: np.ndarray | sparse.COO,
dims: list[str],
coords: dict,
attrs: dict
) -> xr.DataArray:
"""
Wrap array in xarray DataArray with metadata.
Parameters
----------
data : np.ndarray | sparse.COO
Underlying array data
dims : list[str]
Dimension names
coords : dict
Coordinate arrays for each dimension
attrs : dict
Metadata attributes (factor, iprn, etc.)
"""structure_array(value, self_, field)
│
├─→ _resolve_dimensions() → dims, shape, dim_dict
│
├─→ Detect input type:
│ ├─→ DataFrame → _parse_dataframe() → dict
│ │ └─→ Continue processing as dict below
│ │
│ ├─→ dict → _parse_dict_format()
│ │ ├─→ For each value:
│ │ │ ├─→ xarray/numpy → _validate_duck_array()
│ │ │ ├─→ list → _parse_list_format()
│ │ │ └─→ scalar → broadcast
│ │ └─→ Apply fill-forward logic
│ │
│ ├─→ list → _parse_list_format()
│ │
│ ├─→ xarray/numpy → _validate_duck_array()
│ │ └─→ _reshape_grid() if needed
│ │
│ └─→ scalar → broadcast to shape
│
├─→ _fill_forward_time() if nper in dims
│
├─→ Apply sparse/dense threshold logic
│ ├─→ Large: build sparse COO
│ └─→ Small: build dense numpy
│
└─→ _to_xarray() if return_xarray=True
└─→ Return xr.DataArray or raw array
-
Dict formats:
- Stress period dict with nested lists
- Stress period dict with tuples
- Stress period dict with scalars
- Fill-forward logic (sparse keys)
- Layered dict with metadata
- Mixed value types in dict
-
List formats:
- Simple list
- Nested lists (2D, 3D)
- Shape validation
-
Duck arrays:
- xarray with correct dims
- xarray needing reshape
- numpy arrays
- sparse arrays
-
Grid reshaping:
- Structured → flat (3D → 1D)
- Structured → flat with time (4D → 2D)
-
Time handling:
- Array with nper dimension
- Array without nper (fill-forward)
- Dict with sparse periods
-
External files:
- Metadata dict format
-
xarray output:
- Dimension names
- Coordinate values
- Attributes
- Sparse vs dense backing
-
Edge cases:
- Empty dicts/lists
- Single values
- Dimension resolution failures
-
DataFrame integration:
- Round-trip structured grid: dict → stress_period_data → DataFrame → dict
- Round-trip unstructured grid: dict → stress_period_data → DataFrame → dict
- DataFrame with star key handling
- Verify data preservation across round-trip
return_xarray=Falseas default (backward compatible)return_xarray=Truereturns xarray DataArrays (new behavior)- Existing dict format continues to work
- Update existing tests to validate xarray output
- Unified interface: Single function for all flopy 3.x formats
- Better metadata: xarray preserves dimensions, coordinates, attributes
- Grid agnostic: Automatic structured↔unstructured conversion
- Type safety: Clear return types for static analysis
- Interoperability: xarray works with pandas, dask, netCDF, zarr
- Round-trip support: DataFrame integration enables package → stress_period_data → new package
- Future-proof: Easy to extend with new formats
Issue: Dictionary values containing strings (e.g., save_head={0: "all"}) were not being stored in arrays. The code only checked for isinstance(val, (int, float)), causing strings to fall through.
Solution: Added str to type checks in both sparse and dense array building paths:
if isinstance(val, (int, float, str)): # Added strLocation: structure.py lines 526, 600
Issue: Custom objects (e.g., Oc.PrintSaveSetting) were not handled by any conditional branch and were not being stored in arrays, resulting in fill values (3e+30) instead of actual data.
Solution: Added else clause to handle any remaining types (including custom objects):
else:
# Other types (including custom objects) - store as scalar
if len(shape) == 1:
result[kper] = val
else:
result[kper] = valAlso skip fill value replacement for object dtypes:
if field.dtype != np.object_:
result[result == FILL_DNODATA] = field.default or FILL_DNODATALocation: structure.py lines 557-565, 645-655
Issue: The flopy4.mf6.utils.grid.StructuredGrid class had swapped dimension names for delr and delc properties:
delr(row spacing) incorrectly useddims=("nrow",)instead of("ncol",)delc(column spacing) incorrectly useddims=("ncol",)instead of("nrow",)
This bug was masked by the old converter which didn't validate dimensions. The new converter's strict validation exposed it.
Solution: Corrected dimension names and coordinates in both properties:
# delc: column spacing, varies with row
dims = ("nrow",)
coords = {coord_name: self._coords[coord_name], "y": self._coords["y"]}
# delr: row spacing, varies with column
dims = ("ncol",)
coords = {coord_name: self._coords[coord_name], "x": self._coords["x"]}Location: flopy4/mf6/utils/grid.py lines 261-284
Issue: Grid-based ("g") and array-based ("a") package variants (Chdg, Welg, Drng, Rcha) were missing converter=Converter(structure_array, ...) on their period block array fields. Without converters, xattree validates dimension counts before reshape logic runs, causing errors when passing structured arrays.
Solution: Added converters to all grid-based and array-based package fields:
Chdg.head: Added converter for automatic reshapingWelg.q: Added converter for automatic reshapingDrng.elevandDrng.cond: Added converters for automatic reshapingRcha.recharge: Added converter for automatic reshaping
Location: flopy4/mf6/gwf/chdg.py, welg.py, drng.py, rcha.py
Issue: Array-based packages use ncpl ("number of cells per layer") dimension for 2D per-layer data. The reshape detection only handled nodes (3D full grids) but not ncpl (2D per-layer), causing errors like:
ValueError: Shape mismatch: (3, 15, 15) vs (3, 225)
Solution: Extended _detect_grid_reshape to handle ncpl dimension:
# Handle 'ncpl' dimension (cells per layer, 2D per-layer arrays)
if "ncpl" in expected_dims and has_structured_2d:
# Case: (nrow, ncol) → (ncpl,)
# Case: (nper, nrow, ncol) → (nper, ncpl)This allows automatic reshaping for array-based packages like Rcha.
Location: flopy4/mf6/converter/structure.py lines 113-127
The stricter validation in the new converter caught the StructuredGrid bug that had existed undetected. While this temporarily broke tests, it exposed a real issue that would have caused problems downstream. This demonstrates the value of proper input validation.
Users can now:
- Pass structured arrays directly - automatic reshaping for all dimensions:
- 3D full grids:
(nlay, nrow, ncol)→(nodes,) - 2D per-layer:
(nrow, ncol)→(ncpl,) - Time-varying 3D:
(nper, nlay, nrow, ncol)→(nper, nodes) - Time-varying 2D:
(nper, nrow, ncol)→(nper, ncpl)
- 3D full grids:
- Use DataFrames from
package.stress_period_datato initialize new packages - Mix value types within dicts (scalars, arrays, xarrays, DataFrames)
- Rely on strict validation to catch dimension mismatches early
Examples - no manual reshaping needed:
Example 1: Standard packages (3D grid arrays)
# Before (manual reshape required)
icelltype = np.stack([np.full((nrow, ncol), val) for val in [1, 0, 0]])
npf = Npf(icelltype=icelltype.reshape((nodes,)), ...)
# After (automatic reshape)
icelltype = np.stack([np.full((nrow, ncol), val) for val in [1, 0, 0]])
npf = Npf(icelltype=icelltype, ...) # Automatically reshaped to (nodes,)Example 2: Grid-based packages (time-varying 3D arrays)
# Before (manual reshape required)
head = np.full((nper, nlay, nrow, ncol), FILL_DNODATA)
head[0, :2, :, 0] = 0.0
chdg = Chdg(head=head.reshape(nper, -1), ...)
# After (automatic reshape)
head = np.full((nper, nlay, nrow, ncol), FILL_DNODATA)
head[0, :2, :, 0] = 0.0
chdg = Chdg(head=head, ...) # Automatically reshaped to (nper, nodes)Example 3: Array-based packages (time-varying 2D per-layer arrays)
# Before (manual reshape required)
recharge = np.full((nper, nrow, ncol), FILL_DNODATA)
recharge[0, ...] = 3.0e-8
rcha = Rcha(recharge=recharge.reshape(nper, -1), ...)
# After (automatic reshape)
recharge = np.full((nper, nrow, ncol), FILL_DNODATA)
recharge[0, ...] = 3.0e-8
rcha = Rcha(recharge=recharge, ...) # Automatically reshaped to (nper, ncpl)