feat: Adding support for Annbatch#3620
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3620 +/- ##
==========================================
- Coverage 88.23% 88.08% -0.15%
==========================================
Files 229 229
Lines 22646 23373 +727
==========================================
+ Hits 19981 20589 +608
- Misses 2665 2784 +119
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ee2098c75
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Resolvi annabtch implementation still loads data into memory and can't be based on annbatch, although what was committed here. There is no real gain from it right now; need to think about graph dataloader for it as @canergen suggested. I will remove those commits from here and put them on a side branch for later use: Ori-annbatch-resolvi |
Added sample_key parameter to BaseModelClass.setup_annbatch to enable sample-level metadata tracking in annbatch datamodules. This is required for models like MrVI that operate on sample-level information. Also fixed import issues with anndata by updating to use experimental module for CSCDataset, CSRDataset, and read_elem. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Modified CellAssign.__init__ to accept registry= parameter - Added setup_annbatch method to compute streaming statistics (col_means, basis_means) - Modified train() to accept datamodule parameter - Added test_annbatch_setup_cellassign Note: Test fails due to missing size_factor field in annbatch datamodule This requires custom datamodule similar to ContrastiveVI/VELOVI 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… into memory ad.experimental.read_lazy fails on h5ad files with zero-shape datasets in /uns (e.g. Decipher tutorial data with /uns/decipher/config/layers_z_to_x). Replace with direct h5py read of only the var group — reads index column via _index attr, with fallback to common names. Keeps data off memory per annbatch design. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation is done with same collection split
- _save_load.py: always override serialized datamodule string with actual datamodule arg so SCANVI and other models with datamodule param don't receive a string object at load time - _base_model.py: skip module recreation in setup_datamodule load path when no datamodule is provided (_initialize_model already builds correct module from registry); preserve library_log_means/vars buffers when recreating with datamodule - _training_mixin.py: handle adata=None + datamodule=None load path in _set_indices_and_labels by using categorical_mapping (without unlabeled category) so n_labels matches training-time computation - revert docs/tutorials/notebooks submodule to origin/main pointer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lidation performance with annbatch training
No description provided.