feat: `AnnData.unwriteable` based on `AnnData._reduce` + `iter_outer` + refactorings of other relevant functions by ilan-gold · Pull Request #2372 · scverse/anndata

ilan-gold · 2026-03-19T13:21:54Z

maybe TODOs:

fold? reduce?
Implement this inside write_{zarr,h5ad} to prevent partial stores from being written?

We introduce AnnData.fold and AnnData.can_write for "crawling" an AnnData and accumulating, implementing a can_write to tell if an AnnData can be written to a given format.

I'm also open to reduce as the name instead of fold.

For #2236, I think we could implement this more complexly by building on fold and accumulating a set of AdRef (or similar) objects pointing at what can't be written. Open to implement this! I reimplemented __sizeof__ to stress-test using reference-accessors and it went ok, but it's not amazingly smooth (like uns and trying to figure out if you are in a "top-level" elem like obs or varm)

If zarr-developers/zarr-python#3613 ever happens (I hope so) or we do #2238, this will be crucial to making things smoother with hdf5 for preventing incompatibilities.

Towards Simplify backed internal checking #2369 and moving towards something raised in feat: Add HTML representation #2236
Tests added

Release note not necessary because:

codecov · 2026-03-19T13:23:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.45%. Comparing base (40e6ab1) to head (1ad82f4).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2372      +/-   ##
==========================================
- Coverage   87.35%   85.45%   -1.91%     
==========================================
  Files          49       49              
  Lines        7694     7720      +26     
==========================================
- Hits         6721     6597     -124     
- Misses        973     1123     +150

Files with missing lines	Coverage Δ
src/anndata/_core/anndata.py	`86.27% <100.00%> (+0.84%)`	⬆️
src/anndata/_io/h5ad.py	`93.56% <100.00%> (-0.10%)`	⬇️
src/anndata/_io/specs/methods.py	`90.97% <100.00%> (-0.49%)`	⬇️
src/anndata/_types.py	`100.00% <100.00%> (ø)`
src/anndata/utils.py	`86.58% <100.00%> (-0.54%)`	⬇️

... and 6 files with indirect coverage changes

ilan-gold · 2026-03-19T14:41:49Z

Just looking for a first-pass vibe check on this!

flying-sheep

Very exciting! I think this mainly needs some serious documenting and examples, not much else.

For examples, maybe first something that just handles one kind of accessor/reference first to show how one can short-circuit by returning the accumulator early.

Regarding naming: the canonical name in the stdlib is reduce, it even used to be in builtins, but maybe the scientific python world deviates for no good reason again. It’s also fair if you have other objections to reduce!

flying-sheep · 2026-03-20T16:02:26Z

+                    (is_ad_ref := isinstance(ref_acc, AdRef))
+                    or (
+                        is_ref_acc := isinstance(
+                            ref_acc, LayerAcc | MultiAcc | GraphAcc
+                        )
+                    )


this is

isinstance(ref_acc, AdRef | RefAcc) and not isinstance(ref_acc, MetaAcc)

because you want to handle each homogeneous array.

I think there are some ways to make this more ergonomic, e.g.:

we can add more marker superclasses that handle this use case, or

we could change the implementation of fold/reduce so it descends into things and returning or raising some marker would prevent descent. that might help handling things as early as possible and therefore having less complicated logic.

katosh · 2026-03-20T20:06:04Z

Hi @ilan-gold, interesting direction! The HTML repr PR (#2236) intersects with fold in a couple of ways, so I wanted to share some observations from the implementation side.

HTML repr (#2236): traversal overlap

The _repr_html_ traverses the same element tree that fold formalizes. The core loop lives in html.py:_render_all_sections and iterates over a SECTION_ORDER tuple:

SECTION_ORDER = ("X", "obs", "var", "uns", "obsm", "varm", "layers", "obsp", "varp", "raw")

for section in SECTION_ORDER:
    parts.append(_render_section(adata, section, context))

This is structurally very similar to fold's inner loop. A few differences worth noting:

raw is missing from fold. The repr includes .raw as a collapsible section (X, var, varm). Would fold eventually cover raw?
Section-level grouping. The repr builds a tree (section headers, then entries, then nested content). Since entries within a section are visited contiguously, the callback could track state in the accumulator to detect section transitions: in DFS-pre mode the parent visit signals "entering," and the next section's parent visit implicitly signals "exiting" the previous one. This is workable, but it would be smoother if fold supported explicit enter/exit events for sections, so consumers don't need to roll their own state tracking for something that fold already knows internally.
Custom/unknown sections. The repr also detects mapping-like attributes not in the standard list (for ecosystem packages like TreeData that add .obst/.vart). fold currently hardcodes the same attribute list. Worth considering whether it should be extensible or whether that's out of scope.
Nested AnnData in uns. The repr handles AnnData objects found inside uns values, with depth tracking to prevent infinite recursion. fold doesn't recurse into nested AnnData within uns, so that use case wouldn't be covered.

On the positive side, the per-column iteration of obs/var via MetaAcc aligns well with how the repr already iterates df.columns to render each column individually.

The repr would naturally use DFS-pre order (render section header first, then its entries), so it's good that option exists. If fold stabilizes, I could likely use it as the backbone of the repr's traversal by accumulating a list[RenderedSection]. The SECTION_ORDER loop would become a fold call. The main prerequisite would be that fold supports distinguishing "entering a section" from "visiting a leaf within it."

Serialization checks: `can_write` overlap

I understand can_write has a different primary goal: preventing partial writes to zarr/h5 stores by checking upfront whether the whole object is writable. That's valuable on its own. But there's a nice overlap: the repr already performs the same registry lookups at element granularity (utils.py:is_serializable, formatters.py:_check_series_serializability) to show warning icons on entries that can't be written. These checks:

Query _REGISTRY.write to test if a type has a registered writer (same approach as can_write's predicate)
Additionally check Series backing arrays, key name validity (slashes, non-string keys), and nested container contents
Return (is_serializable, reason) tuples so the repr can show why something can't be written

If the underlying element-level check were factored into a shared utility, something like can_write_elem(obj, store_type=) -> tuple[bool, str], both use cases could build on it: can_write would fold with all(...), and the repr would call it per-entry to get the reason string for its warning icon. I'd be happy to contribute to making that work if it's a direction you'd consider.

Happy to discuss further, just wanted to flag the touchpoints while the API is still taking shape.

ilan-gold · 2026-03-23T10:25:26Z

raw is missing from fold. The repr includes .raw as a collapsible section (X, var, varm). Would fold eventually cover raw?

Great catch! I think we currently don't have a path accessor for it, so I think that's the same thing as uns

Nested AnnData in uns. The repr handles AnnData objects found inside uns values, with depth tracking to prevent infinite recursion. fold doesn't recurse into nested AnnData within uns, so that use case wouldn't be covered.

Phil brought this up in a call on Friday as well, not sure (again) what is to be done

The main prerequisite would be that fold supports distinguishing "entering a section" from "visiting a leaf within it."

I'm not sure I follow this line of thinking fully - why can't you distinguish between these two now? If you are doing pre order, you know you enter a section when you see the reference to obs or varp etc. No? Why do you need an exit signal as well?

If the underlying element-level check were factored into a shared utility, something like can_write_elem(obj, store_type=) -> tuple[bool, str], both use cases could build on it: can_write would fold with all(...), and the repr would call it per-entry to get the reason string for its warning icon. I'd be happy to contribute to making that work if it's a direction you'd consider.

That could make sense. Once you merge this into your PR we can refactor this if needed + work out more kinks

ilan-gold · 2026-03-23T12:54:29Z

-
-        sizes = {}
-        attrs = ["X", "_obs", "_var"]
-        attrs_multi = ["_uns", "_obsm", "_varm", "varp", "_obsp", "_layers"]


FWIW this never handled raw anyway

ilan-gold · 2026-04-10T10:46:23Z

So I've made iter_outer i.e., what was AnnData.iter a separate function that isn't exported and I also made reduce private. So we are only exporting now can_write. This should alleviate any concerns in this PR about APIs

flying-sheep

Some nice gains, great!

I wonder if a more granular can_write would be helpful for people that shows them which arrays/columns block the writeability.

flying-sheep · 2026-04-10T15:03:40Z

    AnnData object with n_obs × n_vars = 700 × 765
        obs: 'bulk_labels', 'n_genes', 'percent_mito', 'n_counts', 'S_score', 'G2M_score', 'phase', 'louvain'
        var: 'n_counts', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
-        uns: 'bulk_labels_colors', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups'


why is this gone?

It got moved to the end, absorbed by obsp: ...

hmm, maybe replace obsp: ... with ... then to make that clear

ilan-gold · 2026-04-13T08:47:11Z

I wonder if a more granular can_write would be helpful for people that shows them which arrays/columns block the writeability.

That would be nice but feels downstream of "we might need better ref/acc objects to describe the whole anndata" which I think we have realized in the course of this PR is too big of a lift ATM.

I think keeping the reduce paradigm around is a good step towards this and other goals that might involve per-array iteration.

flying-sheep

Looks great!

I’m ill, so I won’t check for myself, but is can_write complete? Like will it actually handle all non-bug cases where an anndata object fails to write, like a nested anndata object in uns that contains something unwriteable? Or some unhandled dtype in a categorical column or so?

If yes, I just have one ask: I think we should add a note or so to can_write that the return type isn’t guaranteed to stay the same, but the coercion to bool is stable: bool(adata.can_write(...) / if adata.can_write(...):. That way we can make it return a structure later that has info about unwriteable bits or so.

ilan-gold · 2026-04-14T10:03:30Z

If yes, I just have one ask: I think we should add a note or so to can_write that the return type isn’t guaranteed to stay the same, but the coercion to bool is stable: bool(adata.can_write(...) / if adata.can_write(...):. That way we can make it return a structure later that has info about unwriteable bits or so.

I wonder if this is too strong. For example:

from anndata.acc import A

assert bool(A.obs)

We would then need special classes whose __bool__ is overriden?

flying-sheep · 2026-04-14T18:12:15Z

We would then need special classes whose __bool__ is overriden?

yeah, the only useful alternative could be renaming it to unwriteable and having it return a list of unwriteable things in the future (a empty/falsy list if all is writeable)

do you have a different idea?

ilan-gold · 2026-04-15T08:13:17Z

No and I that's even better given the current state of things. You would want to use this in a guard clause anyway so

if adata.unwriteable():

is nicer than

if not adata.can_write()

…g/fold_can_write

flying-sheep

Yay! Thank you for your patience!

Use anndata.utils.iter_outer from scverse#2372 as the canonical source for standard section iteration in the HTML repr. Drop the redundant SECTION_ORDER tuple; display order now follows iter_outer. - _render_all_sections iterates (name, elem) pairs from iter_outer and passes elem to downstream renderers, avoiding a second getattr (which would trigger another file open/close cycle on backed AnnData). - _render_dataframe_section, _render_mapping_section, _render_uns_section, _render_raw_section now take the elem directly. - _detect_unknown_sections and _get_custom_sections_by_position use a local STANDARD_SECTIONS frozenset for name-only membership checks so they don't pay iter_outer's per-yield I/O just to get names. - Drop unused adata param from _render_uns_entry. Display order changes to X, obs, var, obsm, varm, obsp, varp, layers, uns, raw (uns moves from position 4 to position 9).

feat: AnnData.can_write based on AnnData.fold

b907554

ilan-gold added this to the 0.13.0 milestone Mar 19, 2026

ilan-gold added the skip-gpu-ci label Mar 19, 2026

ilan-gold added 2 commits March 19, 2026 14:42

chore: docs

19daed5

refactor: use accessors

4125375

ilan-gold changed the title ~~feat: AnnData.can_write based on AnnData.fold~~ feat: AnnData.can_write based on AnnData.fold + refactored AnnData.__sizeof__ Mar 19, 2026

ilan-gold commented Mar 19, 2026

View reviewed changes

Comment thread src/anndata/_core/anndata.py Outdated

ilan-gold requested a review from flying-sheep March 19, 2026 14:41

ilan-gold commented Mar 19, 2026

View reviewed changes

Comment thread src/anndata/_core/anndata.py Outdated

fix: DFS order + fixes

8be5ba2

ilan-gold commented Mar 19, 2026

View reviewed changes

Comment thread src/anndata/_core/anndata.py Outdated

chore: add test for uns

0f4d1b0

flying-sheep reviewed Mar 20, 2026

View reviewed changes

Comment thread src/anndata/_core/anndata.py Outdated

flying-sheep reviewed Mar 20, 2026

View reviewed changes

Merge branch 'main' into ig/fold_can_write

9338baa

ilan-gold commented Mar 23, 2026

View reviewed changes

ilan-gold added 5 commits March 23, 2026 16:07

feat: raw + uns traversal

69daf90

fix: fold -> reduce

932d766

chore: docs

e0f3ee2

Merge branch 'main' into ig/fold_can_write

436fc68

fix: meth not func

ee04741

ilan-gold marked this pull request as ready for review March 23, 2026 15:46

ilan-gold added 4 commits March 23, 2026 16:46

fix: fold not reduce in relnote

6d6f454

fix: nested

1f77a4c

chore: more func clarification

91adffe

fix: link

928b72a

ilan-gold added 2 commits April 10, 2026 12:28

fix: writing none

9fa978a

fix: API changes

e7b201f

ilan-gold changed the title ~~feat: AnnData.can_write based on AnnData.{reduce,iter} + refactorings of other relevant functions~~ feat: AnnData.can_write based on AnnData._reduce + iter_outer + refactorings of other relevant functions Apr 10, 2026

flying-sheep reviewed Apr 10, 2026

View reviewed changes

fix use set

95136a2

ilan-gold added 2 commits April 13, 2026 10:48

fix: docs

4eba690

fix: remove unused docs / private type

fdd6b7c

ilan-gold requested a review from flying-sheep April 13, 2026 08:53

flying-sheep reviewed Apr 13, 2026

View reviewed changes

ilan-gold added 3 commits April 14, 2026 10:55

fix: nexting

5760cb2

fix: ok

7382b67

fix: handle bad categoricals

3c747e1

fix: handle index / awkward

371d535

ilan-gold requested a review from flying-sheep April 14, 2026 11:20

ilan-gold added 3 commits April 15, 2026 10:13

Merge branch 'main' into ig/fold_can_write

d8f66c3

refactor: can_write -> unwriteable

5a0fded

Merge branch 'ig/fold_can_write' of github.com:scverse/anndata into i…

a6374ce

…g/fold_can_write

flying-sheep approved these changes Apr 15, 2026

View reviewed changes

Merge branch 'main' into ig/fold_can_write

1ad82f4

ilan-gold changed the title ~~feat: AnnData.can_write based on AnnData._reduce + iter_outer + refactorings of other relevant functions~~ feat: AnnData.unwriteable based on AnnData._reduce + iter_outer + refactorings of other relevant functions Apr 15, 2026

scverse deleted a comment from azure-pipelines Bot Apr 15, 2026

ilan-gold enabled auto-merge (squash) April 15, 2026 16:38

ilan-gold merged commit 2f2edda into main Apr 15, 2026
26 checks passed

ilan-gold deleted the ig/fold_can_write branch April 15, 2026 16:52

Conversation

ilan-gold commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

ilan-gold commented Mar 19, 2026

Uh oh!

Uh oh!

Uh oh!

flying-sheep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flying-sheep Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

katosh commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

HTML repr (#2236): traversal overlap

Serialization checks: can_write overlap

Uh oh!

ilan-gold commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold commented Apr 10, 2026

Uh oh!

flying-sheep left a comment

Choose a reason for hiding this comment

Uh oh!

flying-sheep Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

flying-sheep Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ilan-gold commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flying-sheep left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold commented Apr 14, 2026

Uh oh!

flying-sheep commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold commented Apr 15, 2026

Uh oh!

flying-sheep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ilan-gold commented Mar 19, 2026 •

edited

Loading

codecov Bot commented Mar 19, 2026 •

edited

Loading

katosh commented Mar 20, 2026 •

edited

Loading

Serialization checks: `can_write` overlap

ilan-gold commented Mar 23, 2026 •

edited

Loading

ilan-gold commented Apr 13, 2026 •

edited

Loading

flying-sheep left a comment •

edited

Loading

flying-sheep commented Apr 14, 2026 •

edited

Loading