Skip to content

aggregate - out of core#395

Merged
Intron7 merged 20 commits into
mainfrom
update-aggregate
Jul 11, 2025
Merged

aggregate - out of core#395
Intron7 merged 20 commits into
mainfrom
update-aggregate

Conversation

@Intron7
Copy link
Copy Markdown
Member

@Intron7 Intron7 commented Jun 30, 2025

#385

This PR updates aggregate to work with dask matrices. Also Allows for CSC based aggr to dense and F based aggr

@Intron7 Intron7 marked this pull request as draft June 30, 2025 16:06
@Intron7 Intron7 linked an issue Jun 30, 2025 that may be closed by this pull request
Intron7 and others added 9 commits July 2, 2025 05:24
* test map_blocks

* add dense prototype

* update kernels

* make nicer

* fix 64_bit

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@Intron7 Intron7 marked this pull request as ready for review July 10, 2025 14:41
@Intron7 Intron7 requested a review from ilan-gold July 10, 2025 14:41
@Intron7 Intron7 merged commit 14679ab into main Jul 11, 2025
10 of 16 checks passed
@Intron7 Intron7 deleted the update-aggregate branch July 11, 2025 13:49
Comment on lines +49 to +53
for i in range(4):
c = ["sum", "mean", "var", "count_nonzero"][i]
a = out_in_memory.layers[c]
b = out_dask.layers[c]
cp.testing.assert_allclose(a, b)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what happens if you run compute under a processes scheduler because for us, it actually produced inconsistent results variance sparse-in-dask (and to rectify the problem, I had to call compute on the mean within the variance calculation for sparse-in-dask).

adata_2 = adata_2.raw.to_adata()
adata_1.X = cusparse.csr_matrix(adata_1.X.astype(np.float64))
adata_2.X = as_sparse_cupy_dask_array(adata_2.X.astype(np.float64))
elif data_kind == "dense":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No C vs F contiguous in dask ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Dask Array support for Aggregation

2 participants